Hello,
We want to calculate the similarity of several thousands of texts. The number of texts can go upto 100 K. Each text is in 1 .txt file and each file has a number: [login to view URL], [login to view URL], etc.
After that, we want to extract the less similar texts. With 2 options: extract the x less similar texts or extract all the text with a maximum similarity of n %.
A table must be generated, indicating the number of texts we can extract with a maximum similarity ratio of x %, with x going from 0 to 100, by increments of 1.
The tool must be running on demand on HPC.
We are opened to hire several people to achieve this goal if it's necessary: a mathematician to write the calculation algorithm, a computational linguist and someone experimented with HPC.
Best regards,
Fab.
This is a default bid made. we'll discuss the price later in the chat after reading your project. I can do this for you perfectly. I still have a few questions. please leave a message on my chat so we can discuss this further. Thanks.
I am a preferred freelancer here. You can check sample work I have done in my portfolio and my profile reviews: https://www.freelancer.com/u/freelancerpr0?w=f
Please come over chat and let's discuss the topics and the goals. I can surely help you with this task. Looking forward to working with you.
What is the systems(OS) of HPC. linux or windows?
I am very interested in your program, and i though this task can not be done by matlab.
I can finish the task in time, and provide late modification service within a month.