GSPANN’s advanced analytics team examined the problem and conducted a migration from on-premises to Google Cloud Platform (GCP) and built a deployment pipeline. We also developed scripts and tables to create these data reports in a format desired by the leadership team. As a result of our implemented solution, well-segmented sales data is now available in real-time, in expected formats.
The key tasks undertaken:
- Applied a Change Data Capture (CDC) approach to stream data from MySQL BinLog into Apache Kafka by implementing a complex logic.
- Developed statistics report that provides information on the frequency of orders placed and sales of products in real-time, using the online portal.
- Removed errors in logic and business transformations with a new approach related to data cleansing, denormalization of JSON data, and more.
To elaborate, we configured Maxwell Daemon to ingest the data in near real-time and stream the data from MySQL BinLog to Kafka broker (stored into BigQuery stage dataset tables.) Now, MySQL BinLog stores all operations performed on MySQL database, while Maxwell Daemon provides information of all operations from BinLog to Maxwell database tables. Based on the information available in Maxwell database tables, Kafka streaming jobs initiate the data pull from BinLog, which is segregated since each operation is performed on different tables.