Improve Data Governance by Modernizing Legacy Systems

Necessary cookies
Name	Hostname	Vendor	Expiry
cookiehub	.gspann.com	CookieHub	365 days
Used by CookieHub to store information about whether visitors have given or declined the use of cookie categories used on the site.
__cf_bm	.vimeo.com	Cloudflare, Inc.	1 hour
The __cf_bm cookie supports Cloudflare Bot Management by managing incoming traffic that matches criteria associated with bots. The cookie does not collect any personal data, and any information collected is subject to one-way encryption.
_cfuvid	.vimeo.com		Session
Used by Cloudflare WAF to distinguish individual users who share the same IP address and apply rate limits

Preferences
Name	Hostname	Vendor	Expiry
li_gc	.linkedin.com	LinkedIn Ireland Unlimited Company	180 days
Used by LinkedIn to store consent of guests regarding the use of cookies for non-essential purposes
lidc	.linkedin.com	LinkedIn Ireland Unlimited Company	1 day
Used by LinkedIn for routing.
vuid	.vimeo.com		400 days
These cookies are used by the Vimeo video player on websites.

Analytical cookies
Name	Hostname	Vendor	Expiry
_ga	.gspann.com	Google	400 days
Contains a unique identifier used by Google Analytics to determine that two distinct hits belong to the same user across browsing sessions.
bcookie	.linkedin.com	LinkedIn Ireland Unlimited Company	365 days
This is a Microsoft MSN 1st party cookie for sharing the content of the website via social media.
_ga_	.gspann.com	Google	400 days
Contains a unique identifier used by Google Analytics 4 to determine that two distinct hits belong to the same user across browsing sessions.
_hjSessionUser_	.gspann.com	Hotjar	365 days
Hotjar cookie. This cookie is set when the customer first lands on a page with the Hotjar script. It is used to persist the Hotjar User ID, unique to that site on the browser. This ensures that behavior in subsequent visits to the same site will be attributed to the same user ID.
_hjSession_	.gspann.com	Hotjar	1 hour
Used by Hotjar to hold current session data.
hjActiveViewportIds		Hotjar	Persistent
User by Hotjar to store user active viewports IDs and an expirationTimestamp that is used to validate active viewports on script initialization.
hjViewportId		Hotjar	Session
This Hotjar cookie stores information about the user viewport such as size and dimensions.
_gid	.gspann.com	Google	1 day
Contains a unique identifier used by Google Analytics to determine that two distinct hits belong to the same user across browsing sessions.
_gat	.gspann.com	Google	1 hour
Used by Google Analytics to throttle request rate (limit the collection of data on high traffic sites)

Marketing cookies
Name	Hostname	Vendor	Expiry
_gcl_au	.gspann.com	Google Advertising Products	90 days
Used by Google AdSense to understand user interaction with the website by generating analytical data.
IDE	.doubleclick.net	Google Advertising Products	390 days
Used by Google's DoubleClick to serve targeted advertisements that are relevant to users across the web. Targeted advertisements may be displayed to users based on previous visits to a website. These cookies measure the conversion rate of ads presented to the user.
AnalyticsSyncHistory	.linkedin.com	LinkedIn Ireland Unlimited Company	30 days
Used by LinkedIn to store information about the time a sync with the lms_analytics cookie took place for users in the Designated Countries
UserMatchHistory	.linkedin.com	LinkedIn Ireland Unlimited Company	30 days
Contains a unique identifier used by LinkedIn to determine that two distinct hits belong to the same user across browsing sessions.
bscookie	.www.linkedin.com	LinkedIn Ireland Unlimited Company	365 days
Used by the social networking service, LinkedIn, for tracking the use of embedded services.
test_cookie	.doubleclick.net	Google	1 hour
Used to check if the user's browser supports cookies
_gcl_ls		Google Advertising Products	Persistent
Used by Google AdSense to understand user interaction with the website by generating analytical data.
prism_	prism.app-us1.com		30 days
Used by ActiveCampaign to track visitors across marketing channels.

Other cookies
Name	Hostname	Vendor	Expiry
jobFilterFlag			Persistent

Home / Resources / Case Studies / Improve Data Governance by Modernizing Legacy Systems

Who is the Company

A food and beverage company with an extensive chain of stores worldwide.

The Challenge

The company used an IBM AS-400 with a DB2 Database for its revenue accounting system. After each fiscal period closure, the company was required to summarize and send accounting data to the General Ledger but faced data accuracy problems. One of the major challenges they encountered was that the data originated from different upstream systems. Their current data governance implementation had difficulty resolving conflicts between inputs.

The company also needed help applying business rules to produce valid data intended for its users and stakeholders. They wanted user-friendly reports for their day-to-day activities, which proved impossible on the old system due to its underlying technology stack.

An additional challenge was that the company needed to validate a massive amount of data, with daily data items numbering in the millions. The company sought a solution to guarantee the integrity of its data pipeline after data conversion and business rules were applied. They needed a scalable solution to:

Verify each row of every data record.
Verify that each business rule is applied correctly.
Verify there are no data discrepancies, including duplicates and null entries.

In short, the company needed:

Accurately ingested financial data: The company was looking for a solution capable of accurately ingesting accounting data from different sources.
User-friendly reports: The company sought a system that could apply business rules correctly and guard against discrepancies to produce easily readable reports.
Validation of vast amounts of data: Performing manual validation was challenging due to the high volume of transactions received, totaling more than 8 million daily transactions.

Published Jun 20 2024

The Solution

Our team designed and executed a migration plan that moved ten years of historical data along with the latest daily feeds from different upstream systems to an Azure data lake. We configured and scheduled extract, transform, and load (ETL) jobs to summarize data and send reports to business users through Power BI and Azure SQL Reporting.

Our quality engineering team assisted the company in the validation performed during data migration and conversion. We also performed business validation on revenue accounting system data. The team validated business rules applied to the data and conducted data sampling testing for migrated and converted records in Azure Databricks via Structured Query Language (SQL). Using SQL in our solution was ideal because, at this stage, there was no UI (User Interface) or API involvement.

Image Credit: Tutorial: Extract, transform, and load data by using Azure Databricks

During our preliminary analysis, we explored and proposed two proof-of-concepts. One used ‘Great Expectations,’ and the other leveraged ‘Tricentis Tosca Data Integrity’ (Tosca DI).

Great Expectations is an open-source Python library that helps produce assertions for PySpark data frames. This would have been a good choice as we built our software on Azure Data Lake. However, subsequent investigation uncovered some shortcomings.

Tosca DI successfully fulfilled all our automation needs, enabling us to compare a high volume of data on an enormous scale in a reasonable timeframe. The company was already utilizing Tosca for other applications. Consequently, the company decided upon Tosca and we recommended the Tosca DI module, a tool purposefully designed to meet data integrity testing needs.

Our team developed automation scripts using SQL in Tosca DI to address the challenges we encountered. These scripts effectively identified numerous functional defects that functional testing had previously missed due to the overwhelming data scale. These automation scripts became indispensable in regression testing, which was manually impractical due to the data volume.

Image Credit: Tricentis Tosca

Below are some key results we achieved through automation:

Metadata testing: This allows the company to obtain information about the data within the schema, for example, the column names, the data type of a column, and constraints imposed on the data. The new system incorporates a rich set of additional validation that includes checks on the data type, length, constraint, and naming conventions.
Data completeness testing: All expected data is extracted from the source and loaded into the target without loss or corruption. The company can confirm the row count, aggregate column data values, validate primary keys, and perform a full-value comparison validation. In addition, they can now monitor missing setups.
Data quality testing: The new system allows us to check that the business rules and logic have been applied to the ETL pipeline. Quality tests detect duplicate data, confirm that business rules have been applied, and perform data integrity validation.
Draft run: Our solution lets the company view journal data before the final posting to the company’s Oracle accounting systems.
Fully automated auto-accept process:The new system validates all the daily store data per business requirements.
Scalability and futureproofing:Testing and validation automation using Tosca DI helped validate millions of upstream records in minutes.

Business Impact

One of the greatest benefits company stakeholders experienced was moving away from traditional AS-400 “green screen” text-based apps into a modern graphical user interface (GUI).

The other areas of business impact include:

Greatly enhanced and user-friendly interfaces improve daily work: The new intuitive navigation capabilities have reduced business team onboarding times. The day-to-day processing is now faster and easier due to the enhanced sales estimate maintenance screens. The new UI dashboards also provide advanced reporting and analytics.
Increased revenue accuracy: The new system increased revenue accuracy by up to $200,000 monthly. Also, the store comp process has been dramatically streamlined and shows increased reporting accuracy.
Improved sales estimating: Due to the ready availability of accurate data, the new system improved the company’s ability to estimate sales. This substantially impacted its offline sales discount process and has benefited the company overall. The new system of alerts and notifications provides improved visualization for trending vs average sales.
Hard close time reduced by 95%: The hard close processing and validation completion now takes only 2 hours. Under the old system, this took up to 2 days to complete.

Technologies Used

UI: Node.js, React.js, MongoDB Cosmo, Express.js, Apache NiFi

Backend: Microsoft Azure Databricks, Oracle EBs, Tricentis Tosca DI

Related Capabilities

Utilize Actionable Insights from Multiple Data Hubs to Gain More Customers and Boost Sales

Unlock the power of the data insights buried deep within your diverse systems across the organization. We empower businesses to effectively collect, beautifully visualize, critically analyze, and intelligently interpret data to support organizational goals. Our team ensures good returns on the big data technology investments with the effective use of the latest data and analytics tools.

Services Include:

Facilitate Test Automation for Microservices-based APIs

E-Commerce Automated Regression Testing Reduces Manual Efforts by 75%

Accelerate Automation Testing of E-Commerce APIs for a Faster Release Cycle

GSPANN for Quality Engineering

Improve Data Governance by Modernizing Legacy Systems

Who is the Company

The Challenge

The Solution

Business Impact

Technologies Used

Like the case study? Subscribe with us

Related Capabilities

Services Include:

Do you have a similar project in mind?

Enter your email address to start the conversation

You May Also Like

Case Study

Case Study

Case Study