The 10 Essential DevOps Metrics That Really Matter
Mar 17, 2021
DevOps is an evolving computing philosophy. What's clear about the idea is that it concatenates developers with the operational side of software delivery. It's considerably tough to benchmark DevOps efforts and results. Especially given how any randomly selected number of companies approach it so differently. Be that as it may, there exist some performance indicators (metrics) every DevOps team can measure and use to gauge the effectiveness of their approach.
This article serves up 10 DevOps metrics every software company should be actively observing. Being on this list, each of these metrics matters enough to affect the success of that developer-operations synergy implied by the concept of DevOps. We'll also show how you can measure each of the metrics by providing real-life examples.
With that, here's a list of the 10 DevOps metrics that really matter:
Application availability
Traffic and application usage
Number of tickets
Commit count
Number of tests conducted
Rate of deployment
Speed of deployment
Deployment rollbacks / fails frequency
Version lead time
Rate of response to tickets (MTTR)
Let's expand and add some weight to each of the metrics above.
Application Availability
Unless you alerted users of some planned maintenance downtime, they should have access to your application around the clock. Measuring availability is a good and simple metric to rally resources toward maintaining an application. Even a bug-riddled application that stays online is a warmer situation when finding solutions than an app that's completely offline.
Sometimes after deployment, an application's availability takes a slump. The DevOps team should be able to follow logs and roll back changes to its last most available version. Easy as it sounds, companies like Netflix and Amazon, which deploy their applications thousands of times on a daily basis, will argue differently.
To measure availability, an orchestration tool, such as Kubernetes or Jenkins, comes in handy. However, you may also want to look into application performance managers (APMs) to properly handle the logs and rollback points.
You may have seen service-level agreements (SLAs) claim upwards of 90% of app availability. Just because you're hosting your application on a cloud services provider with 99.99% availability doesn't mean your app inherits the same availability. Close as they may be, it's worth calculating your own app's availability before promising any percentages to customers. You do this by observing how often your application is down and subtract that frequency from the perfect case of 100%.
Traffic and Application Usage
Let's say your application's availability is in check. The next concern worth measuring is its traffic and usage numbers. Sometimes there's a relationship between availability and traffic. If your application gets too much attention, it could buckle under the pressure. An intuitive log analyzer can notify developers whenever that breaking point is near.
It's also important to keep track of usage statistics as your app version number increases. A dip in usage statistics is in itself actionable feedback. Chances are, you implemented a change that's not pleasing on the end-user side. This brings acceptability into the mix. A competent DevOps team will infuse measures and controls to quickly correct dips in usage caused by new features.
For one, you can introduce new changes using feature flags. These allow you to gradually release features to sections of your user base. Facebook did this with dark mode for their mobile apps. For one, it saved them from being called a dictator by users who just wanted the app the way it was. All the while, the feature rolled out as an option slowly enough to gain some acceptance without dips in usage.
Let's not forget that a spike in traffic can actually be a sign of neither of the reasons we've so far discussed. While we all love high traffic, too much of it will actually cause resource allocation issues. Your application can crash when overrun by too much traffic. Knowing what normal traffic for your app looks like makes it easy to single out DDoS attacks.
Number of Tickets
You know your users still have interest in your app when they ask for help using it better. The usual way for them to achieve this is by submitting tickets to your support team. This kind of feedback slots right between coding and tests in a DevOps pipeline. They essentially determine what gets coded into the app and the results to expect from tests that follow.
Keeping the logistics around this DevOps metric as simple as possible is a best practice we urge you to adopt. Firstly, any tickets that arise because of the ticketing system create a paradox. Third-party tools commonly used to track tickets and their life cycles will do here. Secondly, while the urge to create something as the in-house team proves you're competent while also keeping you busy, it could just as well be more cost-efficient to buy. Cost isn't always money-related; time and effort also come into play.
Once you've managed to sustain low ticket volumes, your DevOps team can focus on keeping the system fresh. Technically this involves internal tickets. However, they're naturally of lower priority than externally generated ones. Measuring this DevOps metric is done by simply reading out the count for any set period of observation. Tip: A graphical representation of tickets vs. time is a good way to focus on the trend and not just the tickets themselves.
Commit Count
By now, you should see the flow in our presentation of the DevOps metrics that matter. Next in line, right after your code crew acts on tickets, is the commit count. Commits are changes sent to the main source code file using a version control system (VCS). GitHub is one such VCS commonly used by DevOps teams because of its (relative) ease of access and use.
The more commits your team is recording, the easier it is to consider them "productive." We're going to keep the quotes around the productivity claim because a commit is only useful when senior developers have reviewed and approved it. This way, the approved commits matter way more.
Any VCS you pick should allow you to count the total commits done for any period of interest. Some companies gamify this metric. But generally speaking, developers with more commits inspire the rest of the team to pick up on their speed. There's a correlation between the commit count and rate of deployment, as you shall see in the next DevOps metric.
Low commits, in general, can reveal why your application versioning is slow. Watching why this is happening for your team allows you to target and help the developers with the lowest commits.
Number of Tests Conducted
More than the overall commit count, you should be concerned about the number of tests conducted on a single commit. If you're having to do more than one test on a single possible change, there could be something wrong in your process. To start with, you could be conducting tests manually. Starting with the scripting part, then running on each possible device (emulators). While thorough, the inclusion of a mere mortal can infuse their morale into the mix. A tired tester gives chance to mistakes when running manual experiments on your commits and builds before deployments.
On the other hand, you could start using automated tests spread across as many containers as possible in production environments. This lessens the amount of time you'll have to wait for a session. Without a doubt, it also increases the number of concurrent tests done (which count as one). Even when you're testing many changes, this approach is by far the most optimal.
DevOps Metric: Rate of Deployment
Deployments are what you get when a fresh version of an application is "published." How quickly you're creating these is an overall revelation of your team's competitiveness. However, you don't suddenly start getting more deployments without a good system managing every other step that comes before deployments. To make sense of this, consider how you can have a thousand commits awaiting tests. If you're not using the best testing methods (automated, AI-infused, and fast), then you can still have the best developers but not deploy much. At least not as much for users on the other end to notice the improvements.
A common question asked from the outside is how Amazon deploys thousands of times daily but the site is basically the same all year round. In answer to this conundrum, consider how the Amazon (and even Netflix) site is anything but a simple website. Behind the simple interface is an intricate network of microservices. DevOps teams working on these have to make improvements daily, if anything just to keep it the same.
Let's come back to smaller enterprises. You know, ones that don't need to deploy hundreds of times a day as an Atlas act on their availability metric. Even those few deployments, when done daily, are a sign that some work is going on behind closed doors. While developers themselves can feel good with the rise in the rate of deployments, it matters more for management. Dips in this metric are clues for the ops side of DevOps to improve processes that lead to deployments.
Speed of Deployment
When you look closely at the process of deployment, you'll undoubtedly start thinking about optimizing each instance. The speed of deployment DevOps metric measures the time it takes for a single deployment to complete. Some apps can last a few minutes when building and packaging their executables. Then you'll come across an app that never takes less than an hour to do the same. Clearly, this is a cause for concern.
Now, it's safe to assume each application's deployment time will vary based on the final size of the app file. The more lines of code, assets, and dependencies an app has, the longer it should take to deploy. Even still, there are always clever ways to hasten the process without negotiating on the quality of the final product.
The main reason companies adopt the DevOps psychology and ensure a continuous deployment strategy is in place is to stay competitive. This can only occur when deployments happen quickly, safely, and cost-effectively. Hosting applications in the cloud covers two of these attributes from the get-go: safely and affordably. The speed part is left completely in the hands of the devs manning the workstations.
Here's a line of thought worth sitting on as a developer: Would it be better to commit once and build just as much daily? Or would it make more sense to apply tiny changes incrementally over the course of the day? Looking at it objectively, only one option allows quicker deployments, saves the compute resources required at any given time (i.e., costs), and does so safely enough that you can roll back to the state where changes made were done correctly.
Deployment Rollbacks / Fails Frequency
Even when deployments are taking place at a satisfactory pace, they're sometimes undone—via rollbacks. Reasons for undoing changes include missing the scope of an app's requirements, executive directions, and bugs that could have escaped testing. It's worth your time monitoring this DevOps metric. Mostly because it provides qualitative information for steps that lead to deployment.
All continuous integration / continuous deployment (CI/CD) tools keep logs of successfully completed deployments as well as those that hit a snag and stop before completion. Both are relevant to this metric and reasons why we're monitoring. After realizing that a lot of rollbacks are happening due to features going forward before proper testing, you'd be wise to place more manual checks before builds.
Ideally, you want as few rollbacks as possible. Such a situation is a testament to a smooth and well-managed pipeline. The fails frequency of your cycle is also worth keeping an eye on. It looks at not only the final step (deployment) but every step along the way. A well-defined feedback process leads to better coding and commits, which then leads to fewer rollbacks after deployment.
DevOps Metric: Version Lead Time
Say you started the clock at the very moment you received feedback (a ticket) from a user and kept it running until their concern was ironed out in the system. What you get is a DevOps metric commonly called the lead time. Perhaps one of the most accurate depictions of your DevOps processes' productivity, version lead time is the same as taking a step back to watch everything in action.
A shortcoming you may have noticed right away is how it doesn't measure each step close enough. This way, you'd recognize rate-determining steps quickly. Project managers typically use this one more than the other DevOps metrics to estimate task deadlines when apportioning resources to projects.
Also to note when measuring your pipeline's lead time is how tasks vary in difficulty. You're best including the relative weight of a change in the reports gathered to present lead times. Also, include how many developers were involved in the change. Logically, the more hands you put to work on a change, the less time it should take to complete.
Rate of Response to Tickets (MTTR)
If you want to improve your pipeline, you've got to start right where all activities begin: the ticketing step. When optimizing for speed, you get a lot of insight monitoring the rate of response to new tickets. We can safely claim that the other DevOps metrics in this post gain momentum from the results you get from this step.
Regardless of how hard a change suggested by a ticket seems, the ticketing process can get enough users assigned to it. This immediately cuts the expected lead time, at the same time increasing the number of commits possible.
The key to a great response rate to tickets lies in having an inspiring value stream management tool. It essentially connects information across all the applications you use in the departments along your pipeline. The best outcome from having one is increased visibility. Every department involved in a ticket gets wind of it early. This increases the background information your developers have when creating and mending features. Down the line, ticket issues reach the deploy stage quicker and with fewer chances of being rolled back.
Getting Started Measuring DevOps Metrics
These DevOps metrics are just a few worth measuring when delivering software. The key objective after knowing the figures is to tweak specific areas so that each metric gets better with time. Some of them are best when at their lowest, while others you should make as high as possible. In the end, having a clear objective known by all team members for each of the DevOps metrics sets you in the right direction toward an optimized DevOps architecture. As far as solutions are concerned, cut out the complexity from the process with value stream management platforms like Plutora. Plutora gives you real-time insights into your progress and captures end-to-end metrics to drive data-driven decisions on how to improve your performance of your DevOps metrics.
Download our free eBook
Mastering Software Delivery with Value Stream Management
Discover how to optimize your software delivery with our comprehensive eBook on Value Stream Management (VSM). Learn how top organizations streamline pipelines, enhance quality, and accelerate delivery.