mini project HDFS-HIVE

Mbyllur Postuar 1 vit mё parё Paguhet në dorëzim
Mbyllur Paguhet në dorëzim

The rendering will be in the form of a report with the list of commands and screenshots of commands, results and NiFi development + export of the nfi template

Work to do:

HDFS:

In HDFS, create in HDFS command lines (hdfs dfs -??????) the following tree structure /data/common/raw/DATABASE_M1/ETUDIANT_M1

In HDFS command lines, Create a file [login to view URL] in this directory (having 3 columns firstName, lastName,email, with your data)

Display HDFS command line contents of directory

Display the HDFS command line contents of the file

HIVE:

Create a database DATABASE_M1

With HQL, create a database DATABASE_M2

With HQL, create a hive table ETUDIANT_M1 in the DATABASE_M1 database pointing to the data/common/raw/DATABASE_M1/ETUDIANT_M1 directory

With HQL, Display the contents of the STUDENT_M1 table

With HQL, Create an ETUDIANT_M1_PART table in the DATABASE_M1 database partitioned on the DateRecep field (in year month, day, hour, minute format: YYYYMMDDHHmm) and pointing to the /common/raw/DATABASE_M1/ETUDIANT_M1_PART directory

Create an external table STUDENT_M2 in the DATABASE_M2 database

NIFI :

Expose a NIFI API to receive external file data (use the 2 HandleHttpRequest and HandleHttpResponse)

Send, 10 times, the data [login to view URL] (attached to course) to nifi api.

Convert data received with CSV format to avro format

Drop the data in the directory (use the processesor putHdfs) HDFS /common/raw/DATABASE_M1/ETUDIANT_M1_PART/DateRecep=202210ddHHmm (this value must be generated dynamically by nifi, (use an attribute of the flowfile with a date value in the requested format ex: Variable_DateRecep with value DateRecep=${now():format('yyyyMMddHHmm')}

Do a select on the table, what do you notice?

Run the following sql command Msck repair table DATABASE_M1.ETUDIANT_M1_PART;

Copy the data (via an hql query executed by NIFI) from the ETUDIANT_M1_PART table to the ETUDIANT_M2 table so as to keep only the latest version of the file sent (used the OVERWRITE keyword and in the where clause of the select use the value of the last score.

Hadoop Big Data Sales Spark Apache Kafka Hive

ID Projekti: #36234284

Rreth projektit

7 propozimet Projekti në distancë Aktiv 11 muaj mё parё

7 profesionistë freelancer dërguan një ofertë mesatare prej €171 për këtë punë

MounirHoul

Greetings I'm a data engineer with extensive experience in hadoop hdfs , Hive, and big data solutions. I'm confident that I can deliver high-quality work within your budget and timeframe. Let's discuss further. Mounir

€200 EUR për 4 ditë
(1 përshtypje)
1.4
umairkaramat24

Hi, how are you? I go through the description and read it carefully, I know exactly what you are looking for. I have 5+ years’ experience in these skills Big Data Sales, Apache Kafka, Hadoop, Spark and Hive. I have so Më shumë

€250 EUR për 5 ditë
(0 Përshtypje)
0.0
Ibayoussef231

Hi, I have already worked a project very similar to yours and I believe I can make this work in 7 days maximum due to my knowledge of the big data ecosystem. We can talk in details if this interests you.

€120 EUR për 7 ditë
(0 Përshtypje)
0.0
kaish1

I am a 6+ years experienced data engineer. I can do the development for you in 1 week with professionalism.

€100 EUR për 7 ditë
(0 Përshtypje)
0.0
anydataflow

Hi, I can do this effectively as i have expertise in Hadoop, hive , nifi... Plz visit my profile for more info. Thanks

€140 EUR për 7 ditë
(1 përshtypje)
0.1
Iamerum

Hi there , I have been working in big data Hadoop projects &. I excel at Hadoop , Hive . Let me know if I can help you on this . Thanks

€140 EUR për 9 ditë
(0 Përshtypje)
0.0