GSPANN’s DevOps team identified the challenges related to the company’s existing approach toward its backup/restore process and proposed a new solution. We implemented Velero to take backups and restore the objects in case of disaster. Velero is an open-source tool that migrates Kubernetes cluster resources and persistent volumes.
We connected an Azure storage account to Velero, which uses an Azure blob container to upload backup files. A Velero server component was installed and configured on all individual Kubernetes clusters. We also installed a Velero client on the company’s local system to interact with the Velero server through the command-line interface (CLI).
Once we finished this initial setup, we scheduled automatic Velero-Kubernetes backup jobs using the Velero client CLI. These jobs’ purpose is to backup respective Kubernetes objects from a given Kubernetes cluster once a week. Backup taken by these jobs is stored in an Azure blob storage container. To minimize Azure Storage cost, we defined Velero backup jobs with a TTL (time to live) of 60 days. This configuration removes any backup files older than 60 days from the Azure storage container.
If disaster recovery is required, we restore Kubernetes objects using backup files present in Azure Storage container also through Velero client CLI.
Key benefits provided by our solution include:
- Eliminated need for manual script modification: Backup and restore operations are now straightforward and don’t require modification of BASH scripts.
- Automated the backup and restore process:The automated process of scheduled Velero backup jobs eliminates manual intervention and increases operational efficiency.
- Provided precision control over backup and restore: The company now has complete control over Kubernetes objects. They can now backup the entire Kubernetes cluster-specific namespace or a specific Kubernetes object.
- Production cluster replication promotes development and testing:We replicated the production cluster to create LLC environments that help with development and testing efforts. Developers and QA engineers can now work within an environment that mirrors the live production environment.
- Massively improved cluster restoration time: Our solution improved cluster restoration speed by more than 80%.