Existing Flow:
Iot Devices(Sending Json Payload) -> Kafka Streaming(receiving & Producing) -> Spark Streaming(Consuming Json using Scala) -> Cassandra(For Storage)
Project Description:
IoT devices getting uploaded with JSON payload in Cassandra.
IoT devices sending device information through a simulator to Kafka streaming.
Kafka Producer producing JSON streaming data.
Kafka JSON data-consuming in Scala 2.11.11 with Spark 2.4.7
Spark consumed data getting inserted into Cassandra.
Requirement:
1. Performance Tuning
-> Need to persist @100/sec JSON payload into Cassandra using spark streaming
2. Project should support higher version of Cassandra Spark Connector with latest Spark and Scala
(Need to validate code on higher version compatibility)
Note: -> We've already achieved @40/sec. We'll share it.
-> Project deployed on Linux( RAM 16 GB, 8 CPU Cores)
-> We tested on Spark 2.4.7, Scala 2.11.11, and Spark Cassandra Connector 2.4.0
As per the required skillset and interest will share the code.
Hello,
Cassandra needs full server installation for best results, so better use an independent cluster for purpose. secondly if you are keen on using json persristance use mongodb/dynamodb instead of cassandra, which handles json documents better.
I am having 20+ years of experience and a consultant in bgdata/IoT