The company’s business is based upon a multi-level marketing model. They needed accurate and timely information on all new sales partner enrollments, processed online via web or mobile apps handled between four geographically dispersed regions.
Upstream systems, implemented in all four regions, use SAP Commerce Cloud (formerly Hybris), an omnichannel e-commerce solution that facilitates customer engagement, product content, and order management, among other features.
Downstream systems use an Apache Solr search implementation and Google Cloud Platform (GCP), including a cloud data warehouse with Big Query tables, a serverless document database that leverages Google Firestore, and cloud storage Buckets. Also, the downstream side uses Salesforce Marketing Cloud for customer support and engagement along with a MuleSoft Secure API Gateway.
Prior to our involvement, the company’s management and administrative staff had to wait for over 45 minutes before getting information on new enrollments. This delay was mainly due to a maze of cron jobs and schedulers that processed and sent data. A related challenge was that the existing infrastructure was simply not up to the task of handling 300K to 400K of streaming data every minute in an accurate manner. The massive volume was produced by the company’s enormous global roster of sales partners that includes updates, orders, and subscriptions.
Before introducing Kafka, all API calls sending data were routed through the MuleSoft API gateway, often causing massive data gridlock. To further compound the problem, the existing architecture did not lend itself to the automated testing and validation of data. The company needed a solution to test the streaming data in all possible combinations promptly and accurately.
The company required:
- Fast and accurate access to enrollment data: New partner enrollments were the company’s lifeblood. The company desperately needed quick access to accurate new enrollment data.
- The ability to validate and verify a massive amount of data: There was no infrastructure in place to confirm data integrity. Additionally, there was no ability to test data loss, missing mandatory fields from contracts, environment health, flow validation, schema, and formats. The company needed the ability to handle a massive amount of data accurately.
- A way to test for possible data flow gridlock: In its original state, the company’s infrastructure had no way to test the streaming data infrastructure for obstructions.