Find Jobs
Hire Freelancers

Stream Event Processing Project

$10-30 USD

Mbyllur
Postuar about 3 years ago

$10-30 USD

Paguhet në dorëzim
your company runs a news portal, and collects clickstream data of its readers. As a data engineer you are tasked with building a data pipeline that collects clickstream data, enriches with articles metadata and persists for later user characteristics and behaviour analysis. Data pipeline elements: - Stream for click events - Real-time data processing - Reading the Stream - Base64 Decoding - Enriching clicks data with articles metadata - Converting into Parquet format - Persisting into the Storage - Batch processing for analytics: - Provide insights about: - Distribution of clicks on each of the environment, device group and operating system - The most and least popular articles - The most and least popular categories - The average word count per article and category - Avg time to click on article (difference between session start ts and article click ts) per category - Every hour query stored data and create aggregated report in form of CSV with required insights Producer - Script faking a real system where clicks would be tracked - Use provided clicks dataset sample as reference - Before publishing into Stream data format should be converted from CSV to JSON and Base64 encoded Dataset [login to view URL] Data structure Click data: - user_id - session_id - session_start - session_size - click_article_id - click_timestamp - click_environment - Id of the Environment: 1 - Facebook Instant Article, 2 - Mobile App, 3 - AMP (Accelerated Mobile Pages), 4 - Web - click_deviceGroup - Id of the Device Type: 1 - Tablet, 2 - TV, 3 - Empty, 4 - Mobile, 5 - Desktop - click_os - Id of the Operational System: 1 - Other, 2 - iOS, 3 - Android, 4 - Windows Phone, 5 - Windows Mobile, 6 - Windows, 7 - Mac OS X, 8 - Mac OS, 9 - Samsung, 10 - FireHbbTV, 11 - ATV OS X, 12 - tvOS, 13 - Chrome OS, 14 - Debian, 15 - Symbian OS, 16 - BlackBerry OS, 17 - Firefox OS, 18 - Android, 19 - Brew MP, 20 - Chromecast, 21 - webOS, 22 - Gentoo, 23 - Solaris - click_country Articles metadata: - article_id - category_id - created_at_ts - publisher_id - words_count Notes: - Suggested language Python - Suggested processing framework Spark - SparkStreaming - SparkSQL / Hive - Suggested streaming framework: Kinesis or Kafka - Present reasoning for topics and partitions setup - Please include diagram or description of your solution - Why did you choose this particular solution architecture? What were some of the trade-offs? How would it handle events schema evolution?
ID e Projektit: 29638706

Rreth projektit

2 propozime
Projekt në distancë
Aktive 3 yrs ago

Po kërkoni të fitoni para?

Përfitimet e ofertës për Freelancer

Vendosni buxhetin dhe afatin tuaj
Paguhuni për punën tuaj
Përshkruani propozimin tuaj
Është falas të regjistrohesh dhe të bësh oferta për punë
2 freelancers are bidding on average $675 USD for this job
Avatari i Përdoruesit
I am a big data engineer, I can build the pipeline as mentioned. Please discuss further if you are interested.
$350 USD në 10 ditë
5,0 (5 përshtypje)
2,3
2,3
Avatari i Përdoruesit
Possible. but the efforts required seems to be more. We have done similar projects in the past. Let us discuss if you are interested.
$1 000 USD në 15 ditë
0,0 (0 përshtypje)
0,0
0,0

Rreth klientit

Flamuri i TURKEY
Istanbul, Turkey
5,0
11
Mënyra e pagesës u verifikua
Anëtar që nga pri 3, 2016

Verifikimi i klientit

Faleminderit! Ne ju kemi dërguar me email një lidhje për të kërkuar kredinë tuaj falas.
Ndodhi një gabim gjatë dërgimit të email-it tuaj. Ju lutemi provoni përsëri.
Përdorues të regjistruar Punë të postuara
Freelancer ® is a registered Trademark of Freelancer Technology Pty Limited (ACN 142 189 759)
Copyright © 2024 Freelancer Technology Pty Limited (ACN 142 189 759)
Po ngarkohet shikimi paraprak
Leja u dha për Geolocation.
Seanca e hyrjes ka skaduar dhe ke dalë. Hyr sërish.