added README.md

e3addc7e · Venkata Pandu Ranga Bujjam · 7f42e2f3 · e3addc7e
Commit e3addc7e authored May 29, 2023 by Venkata Pandu Ranga Bujjam
Hide whitespace changes
Inline Side-by-side

Showing with 30 additions and 1 deletion

README.md README.md +30 -1

No files found.
--- a/README.md
+++ b/README.md
-test
\ No newline at end of file
+Case # 1: Spark Streaming -> Source (File) -> Kafka -> Spark -> Sink (File)
+
+
+Story :- As a ecommerce website analyst, To verify the list of products which have been shipped.
+
+
+
+Algorithm:
+
+
+a.) Read data from multiple file(s) which are in different format or from single file (.csv (or) .txt (or) .json) and then publish the data into Kafka Topic.
+b.) Spark Structured Streaming as a consumer that connects to this kafka topic with the confgiured interval (10 secs) and sends the data to Spark job.
+c.) Spark job then transforms the data and persist the output (sink) into another file (of type csv).
+
+
+
+
+Requirement :- Product status Report - Real Time Processing
+
+1.) The source file should contain product information with status
+ For Eg:-
+ [productId : P1001, productName : Mobile, productPrice : 1000.00, deliveryStatus : Puchased, timestamp]
+ [productId : P1001, productName : Mobile, productPrice : 1000.00, deliveryStatus : Shipped, timestamp]
+ [productId : P1001, productName : Mobile, productPrice : 1000.00, deliveryStatus : Pending, timestamp]
+
+2.) Then, Spark job should filter the product(s) data based on the status - Shipped [within the time range in hours for eg : from 10 am to 12 pm, filtered products with the status "Shipped"]
+
+3.) Once Spark job filters the prodcuts data within the time range then persist the output,
+[productId : P1001, productName : Mobile, productPrice : 1000.00, deliveryStatus : Shipped, timestamp] to another file (.csv) in where this file cotains only 
+the prodcuts with the status as Shipped within the time range based on the timestamp)
\ No newline at end of file