Translation results Document processing using the MapReduce paradigm
€8-30 EUR
Mbyllur
Postuar over 2 years ago
€8-30 EUR
Paguhet në dorëzim
Implement a parallel program in Java to process a set of text documents received as
input, evaluating the length of the processed words, as well as arranging the documents according to the length
words and the frequency with which they occur. Each word will be associated with a value, depending on
the number of letters. The value of a word is determined by a formula based on Fibonacci's row, so
how to explain it later. The rank of a document is calculated by summing the values of all the words in
this one. In addition, the maximum length word (or words, if any) shall be laid down for each document
several with the same maximum length).
Following the parting process, the number of letters of each existing word in a document will be determined, obtaining a list of pairs {length, number of appearances}, where the number of appearances represents the number
of appearances of all words in the document that are equal to length. The program must be
allows to calculate a metric for all processed documents and display the documents in order
this meter.
To parallelize document processing, the Map-reduce model will be used. Each document will be
Fragment into fixed-sized parts to be processed in parallel (Map operation) for each
part giving a partial dictionary (containing the length of words and the number of appearances of them)
and an account list, including the maximum size words in the processed fragment. The next step is
The combination of dictionaries (the operation of reduction) resulting in a typical dictionary
the whole document will be the same in the case of the lists of maximum words.
For each document,
the rank (based on the number of appearances of words of a certain length) and the number of
maximum words.