Having worked on scrapping millions of news articles and twitter data I feel confident of the first part of the task.
I have done extensive semantic analysis on amazon reviews for another startup and seems a doable task given the kind of complexity I have dealt with.
I have used NLP, Machine learning, AI and computer vision techniques to come up with powerful engines.
Feel free to drop a message for further discussion. Thanks