About the Client
A US-based global Fortune 500 departmental store chain with over $20B annual sales and more than 1000 stores.
The Challenge
The client sells millions of lifestyle products to customers through its e-commerce website, contributing to their way of life. The website data related to sales, such as product name, payment method, and more, are stored in MySQL database, from where the client’s analytics team access it to measure and analyze sales performance and understand customers’ choices. These insights help the client plan new business initiatives such as discounts and promotions, to push sales.
However, the challenge was that the reports were generated only once a day using an internal system based on Hadoop that fetched data in batches rather than real-time. This led to gaps in data reporting that reflected in MySQL (source of truth) and the on-premises system. This lack of real-time data analytics and data mismatches led to inept decision-making, hampering the client’s progression.
Our Solution
GSPANN’s advanced analytics team examined the problem and conducted a migration from on-premises to Google Cloud Platform (GCP) and built a deployment pipeline. We also developed scripts and tables to create these data reports in a format desired by the leadership team. As a result of our implemented solution, well-segmented sales data is now available in real-time, in expected formats.
The key tasks undertaken:
- Applied a Change Data Capture (CDC) approach to stream data from MySQL BinLog into Apache Kafka by implementing a complex logic.
- Developed statistics report that provides information on the frequency of orders placed and sales of products in real-time, using the online portal.
- Removed errors in logic and business transformations with a new approach related to data cleansing, denormalization of JSON data, and more.
To elaborate, we configured Maxwell Daemon to ingest the data in near real-time and stream the data from MySQL BinLog to Kafka broker (stored into BigQuery stage dataset tables.) Now, MySQL BinLog stores all operations performed on MySQL database, while Maxwell Daemon provides information of all operations from BinLog to Maxwell database tables. Based on the information available in Maxwell database tables, Kafka streaming jobs initiate the data pull from BinLog, which is segregated since each operation is performed on different tables.
Business Impact
- Accurate hourly reports generated with no mismatches with the source—MySQL.
- Access to properly formatted real-time data analytics enabled the leadership to make quick and impactful business decisions.
- Timely business decisions resulted in a massive increase in online sales.
Related Capabilities
Utilize Actionable Insights from Multiple Data Hubs to Gain More Customers and Boost Sales
Unlock the power of the data insights buried deep within your diverse systems across the organization. We empower businesses to effectively collect, beautifully visualize, critically analyze, and intelligently interpret data to support organizational goals. Our team ensures good returns on the big data technology investments with the effective use of the latest data and analytics tools.
Related Services
Technologies Used
- Python
- Hive
- BigQuery
- Apache Pig
- Spark Streaming
- Apache Kafka






