GSPANN found that a huge number of Control-M agents were failing, which was affecting the critical e-commerce applications. The client’s e-commerce system was generating 200 to 300 alerts on failed scripts. We identified all 120 servers where Control-M agents were failing and automated the self-healing process for the scripts.
We created Shell scripts to auto-restart the Control-M agents whenever the services went down. Now, the automatic healing of failed scripts does not require any manual intervention and the stable Control-M environment increased the availability of client's e-commerce applications.
The Control-M agents intended to execute batch jobs to increase efficiency of e-commerce applications. Control-M agent submits the job on behalf of Control-M server, track the job’s processing, and provides the status update back to the Control-M server. But whenever the Control-M environment went down, it impacted the availability of Control-M agents. We analyzed the scripts that were running through Control-M agents and developed an auto retrieval mechanism to execute failed scripts through self-healing Shell scripts.