Skip to main content


In this blog, we review two customer use cases to understand how Autopilot saves time on risk assessment and approvals, assisting developers and Ops team in diagnosing issues in the software delivery process. Case1: Reduce the Risk of Errors in Production Challenges: A destination web property, with more than 200M average monthly users, needed to roll out new features to remain competitive. The primary bottleneck in software delivery was a slow manual approval process required to promote changes from dev to test and from testing to production. Large and complex environment made risk assessment difficult: Their website consists of many monoliths as well as microservices based on Kubernetes, and they deploy roughly 1000 changes every month. Each change is reviewed manually to reduce errors in production. This process has become too slow as the number and complexity of changes has grown. Further, it was difficult for less-experienced engineers to understand the service dependencies, test results and performance metrics and logs. This meant that only the most senior engineers could effectively promote any change, causing delays in their most important projects. Cost of assessing risk was more than $1M per year: The risk assessment process requires the equivalent of six full-time expert engineers to gather and analyze vast amounts of data from various tools. There was high cost involved utilizing experts, and the company estimated that this direct cost was more than $1M per year. Because the process was difficult to master and perform consistently, too many errors made their way to the production applications, which had a direct impact on revenue and customer satisfaction. Solution: The customer implemented OpsMx Autopilot, and integrated it into their CI/CD tool chain in less than a day. Autopilot gathers logs from Elasticsearch, metrics from Datadog and Prometheus, test results from JMeter, and information from other software delivery tools. Using machine learning algorithms, Autopilot provides a confidence score at each decision stage of the delivery pipeline. When the confidence score is high enough, pipelines automatically promote the deployment into the next stage, including automatic promotion into production. When the score is low, the deployment is automatically rejected and returned to the development team for correction. In cases where the decision is unclear, the deployment is flagged for the DevOps team to resolve. Autopilot highlights potential issues to the DevOps engineers, enables collaboration between teams, and finally learns from the DevOps decision on promotion. This improves Autopilot’s ability to make decisions in the future. Results: Accuracy in the promotion decision was a key success criteria for the company. Autopilot is able to catch deployment errors more consistently than the expert engineers because of the volume of data that needed to be analyzed each time. This has resulted in significant improvement in performance and stability of the company’s website. Additionally, Autopilot is able to reduce the number of times a deployment was mistakenly rejected. This means that more deployments are able to move to production without any human review. Because Autopilot helped the company automate many deployment decisions, they were able to free the expensive resources consumed for cumbersome risk assessments activities. This saved more than $1M per year on direct costs of the verification process. “We automatically verify more than 1000 releases each /month.” Further, through automation, Autopilot has freed experts to spend more time on their core innovative activities. This improves velocity as well as job satisfaction as key engineers are able to concentrate on innovation rather than deployment validation.