About the Client
A global supplier of semiconductor equipment
The Challenge
The company employed multiple platforms on Databricks for data science tasks, each serving various business groups or units. The owner of one such platform sought to track the allocation of resources (e.g., Databricks Notebooks and Clusters) among different teams via a shareable dashboard for stakeholder usage and additional purposes. However, due to the intricate nature of these requirements, the REST APIs provided by Databricks did not encompass all the necessary information. The metrics needed to be grouped by business units and available for daily, weekly, monthly, and quarterly periods. To summarize, the company wanted to go beyond Databricks APIs to build a dashboard with the following requirements:
- User monitoring information:
- Notebook tracking information:
- Infrastructure metrics:
Our Solution
Our IA team studied the problem and recommended the implementation of Overwatch, a tool developed by Databricks, for analyzing log data from Databricks workspaces.
Databricks Overwatch is a powerful real-time analytics monitoring and alerting solution designed to provide insights into the performance, cost, and usage of Databricks workspaces and clusters. Overwatch offers granular details such as pipeline performance, cost, ingress, and egress data. It can assist the company by optimizing data-driven decision-making. It allows the company to capture workspace activities through structured datasets.
The team collaborated closely with company stakeholders to ensure Overwatch's successful deployment and integration across all relevant platforms. Additionally, our engineers designed the solution to be extensible, making it suitable for use with multiple workspaces.
The new system enables user activity logging through Databricks Event Hub integration and extracts the logged data on clusters, notebooks, account logins, and jobs using Overwatch. Extracted data is structured in the form of delta tables to be used for dashboard creation and further analysis.
Here are a few highlights of the expanded system:
- Effective user activity monitoring: The company can now get an accurate count of active users and the number of unique logins.
- Comprehensive cloud usage metrics: The new system gives the company real-time information on its Databricks component usage. The insights drawn from usage metrics allow the company to allocate its resources more efficiently, saving time and money.
- Fully automated extensible solution: GSPANN engineers designed the implementation to be fully automated, extensible, and reusable. The same real-time analytics solution can easily be extended to multiple workspaces, saving the company significant future development costs.
Business Impact
- Improved cost management: The integration of this solution significantly improved the mapping of business units and teams to the consumption of various Databricks services at a more granular level. Additionally, it provided valuable information on unnecessary resource utilization and costs, resulting in tangible cost savings for the company.
- Increased efficiency and productivity: Dashboard analysis helps the company manage resources efficiently. The insights drawn from the new dashboard could have the company subscribe to better cloud service plans, resulting in elevated efficiency and productivity.
Related Capabilities
Utilize Actionable Insights from Multiple Data Hubs to Gain More Customers and Boost Sales
Unlock the power of the data insights buried deep within your diverse systems across the organization. We empower businesses to effectively collect, beautifully visualize, critically analyze, and intelligently interpret data to support organizational goals. Our team ensures good returns on the big data technology investments with the effective use of the latest data and analytics tools.
Related Services
Technologies Used
- Azure Event Hubs
- Databricks Overwatch
- Databricks
- Databricks Notebooks






