Wikipedia is not a reliable source for citations elsewhere on Wikipedia. Because, as a user-generated source, it can be edited by anyone at any time, any information it contains at a particular time could be vandalism
Manually cleaning data is done manually which makes it challenging. This make it prone to mistakes.
Depends really. Have you got prior coding experience? The semantics of certain languages are much easier to pick up if you've had prior experience, but it might seem impossible to someone who has minimal experience.
Assuming that you're no older than 15 or 16, I'm going to suggest Python. It's a simple, high-level language that's easier to understand than most languages. If you think you've got enough experience to quickly understand and pick up things, then I'd probably say R is a good language to start with. It's very well-suited for AI and tends to be a favourite for AI researchers.