added README.md

parent 7f42e2f3
test
\ No newline at end of file
Case # 1: Spark Streaming -> Source (File) -> Kafka -> Spark -> Sink (File)
Story :- As a ecommerce website analyst, To verify the list of products which have been shipped.
Algorithm:
a.) Read data from multiple file(s) which are in different format or from single file (.csv (or) .txt (or) .json) and then publish the data into Kafka Topic.
b.) Spark Structured Streaming as a consumer that connects to this kafka topic with the confgiured interval (10 secs) and sends the data to Spark job.
c.) Spark job then transforms the data and persist the output (sink) into another file (of type csv).
Requirement :- Product status Report - Real Time Processing
1.) The source file should contain product information with status
For Eg:-
[productId : P1001, productName : Mobile, productPrice : 1000.00, deliveryStatus : Puchased, timestamp]
[productId : P1001, productName : Mobile, productPrice : 1000.00, deliveryStatus : Shipped, timestamp]
[productId : P1001, productName : Mobile, productPrice : 1000.00, deliveryStatus : Pending, timestamp]
2.) Then, Spark job should filter the product(s) data based on the status - Shipped [within the time range in hours for eg : from 10 am to 12 pm, filtered products with the status "Shipped"]
3.) Once Spark job filters the prodcuts data within the time range then persist the output,
[productId : P1001, productName : Mobile, productPrice : 1000.00, deliveryStatus : Shipped, timestamp] to another file (.csv) in where this file cotains only 
the prodcuts with the status as Shipped within the time range based on the timestamp)
\ No newline at end of file
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment