Plutora Blog - Release Management
12 Essential Release Management MetricsReading time 4 minutes
Previously on this blog we’ve talked about how release-reporting metrics is an art rather than a science, and we’ve also outlined some core release metrics our customers use to track progress. In this first follow-up post, I’ll provide more detail on three of the 10 key metrics previously discussed:
• Total Release Downtime in Hours
• Releases by Priority and Type
• On-time Release Delivery
Total Release Downtime Metrics
Deploying software is a risky business, and one of the most visible forms of risk is downtime associated with a release. For any number of reasons a given release can mean different kinds of downtime. A quick emergency patch to production might require a simple server restart and result in only a handful of minutes of downtime, while a major version release of a large, interconnected system might require days of downtime over a weekend. Other releases that need testing over many weeks might require a tiered set of releases over time to transition traffic between a control and a variable system.
One thing is true of all software releases: they introduce the risk of downtime. Before you rush to release software you should be able to provide the business with a measure of this risk. When a stakeholder asks what the impact of a given release is going to be on revenue or production stability you should have an answer that is informed by real data.
With Plutora you can measure these metrics and associate them with release,s creating a history that can be mined and audited to provide the business with a reliable way to assess risk.
In these releases, we recommend measuring several release management metrics associated with Total Release Downtime.
Estimated Release Downtime (Hours)
This is downtime that is built into the release plan. If you need to rebuild a database during a release or migrate large data sets, your release is going to have release downtime captured in a timeline. Capture the amount of downtime your teams expect to experience during a release and record this value before the release starts.
Actual Release Downtime (Hours)
After you’ve completed your release, record the actual amount of downtime your system experiences as a result of the release. Discrepancies between the estimated release downtime and the actual release downtime over several releases can make future release plans more accurate.
If your teams are consistently over or under estimating release downtime you can track these estimates so that you can provide the business with more accurate assessments of risk associated with a particular release.
Root Causes for Unplanned Release Downtime
As a release manager it is your job to perform root cause analysis in the case of unanticipated downtime. If a release goes awry (and they often do) you must create a record of root causes that contributed to unplanned downtime. This record should be used to rectify unanticipated risks in your release process.
Unnecessary Release Downtime (Hours)
Software projects that are released repeatedly over many months and years should have an unplanned release downtime that approaches zero. After a few releases of a software system there should be very few surprises during a release sequence. Unfortunately, some projects are more stable than others.
This metric tracks unplanned downtime during a release process that is directly attributable to known root causes that have remained unaddressed between releases. If your releases are inconsistent and buggy due to previously identified root causes, this is a sign of problems in your project’s organization or process. When a release process fails to improve over time, it is a sign that something needs to change.
If you’ve really perfected your release process you’ll be able to measure this in minutes or seconds, and some organizations have achieved zero-downtime deployments where a push to production has no impact on end-user experience. While zero-downtime deployments are a requirement for critical, web-scale systems that run high-profile e-commerce systems, many organizations can plan for regular deployments that require some level of downtime.
Releases Priority and Type Metrics
Software releases are Epoch Releases – These are rare releases, but when they happen they mark a sea change for an entire organization or department. Some examples of an epoch release would be a large e-commerce system switch from an old to a new platform that requires a global retooling of every aspect of the IT organization. Epoch releases mark point-in-time transitions between entirely different architectures and systems.
These are releases that often require all-hands-on-deck commitment throughout the operations and development functions of organizations for many months. and they involve a high degree of planning and risk. These releases require hundreds or thousands of actors coordinating across multiple departments, and organizations can often only sustain one such release in a given year without causing widespread burnout.
Major Coordinated Releases – These are major releases across multiple platforms and applications that require a higher level of ceremony because of the risk associated with orchestration across departments and organizations. An example of a major coordinated release would be a large e-commerce site adopting a new approach to order fulfillment or inventory management. Major coordinated releases often require weeks of preparation and dedicated environments to test integration between different applications. These releases are often scheduled as quarterly events.
Minor Coordinated Releases – These are minor changes to systems across application and departmental boundaries. An example of a minor coordinated release would be multiple applications in an organization upgrading to a new database driver or a minor change being made to a shared service or schema that requires a coordinated software deployment. Minor coordinated releases require weeks or days of preparation with the involvement of a smaller staff of QA and on-call specialists than a major coordinated release. These releases often happen with a monthly cadence.
Major Isolated Releases – When a single application team needs to perform a major application release, you may need the involvement of other teams’ QA groups to test that an interface still delivers on an application’s SLA, but these release procedures can be isolated to a single application development team. While these releases can require other teams to be on alert for possible problems, these releases introduce less risk, fewer unknowns. These releases can often be completed with ten or fewer actors and are a regular, monthly fixture for more organizations.
Minor Isolated Releases – If an application needs a minor release for a bug fix or a minor feature, these are the lowest level of planned release in this list. Often handled by a staff of six or fewer, these releases don’t require custom environments and they occur on a regular, weekly cadence.
Emergency Production Patches – Emergency production patches can be unpredictable in terms of risk. A patch can be for everything from a production system that is completely broken to a critical typo fix on a home page. Often an emergency production patch has the same level of risk as a minor coordinated release, as these releases often require expedited quality assurance across multiple teams.
Each of these release types is also associated with a level of priority. While epoch releases are often urgent and minor isolated releases are often lower priority, priority is also independent of the scope of a release. For example, a minor isolated release could involve a small but critical change that is preventing the organization from realizing millions in revenue. While scope is a function of technical risk and the complexity of coordination involved, priority is usually driven by business objectives.
Every application should be measuring the following release management metrics:
- Number of Releases by Release Type & Number of Releases by Release Priority
These are straightforward metrics to measure, but measuring them is important. You should have a healthy mix of release types from major to minor with a few emergency patches for production.
If every release is a major release it could be a signal that your applications are not properly isolated or easily maintained in a distributed system. If every release is an emergency bug fix it may suggest that your release process isn’t allowing for enough time to validate quality.
One of the risks of making every release high priority is that your personnel may stop taking your release process seriously – when everything is an emergency, nothing is an emergency.
- Time spent by Release Type (Days)
Are certain applications’ teams more prone to emergency bug fixes, and if so what does this say about that team’s commitment to quality? You need to measure the types of releases a team is engaged in over time to assess the risk associated with each application. You can also use this metric to measure which teams are associated with major coordinated releases and which teams are well isolated. If a single team is always in the middle of a major coordinated release it might suggest that this team needs to build greater isolation between systems to avoid affecting other applications on a regular basis. On the other hand, some teams are designed to be at the center of integration challenges.
- Time spent by Release Priority (Days)
If your teams are spending 100% of their time at an urgent level of priority, it is time to rethink your approach to software releases. Teams can operate efficiently in an emergency level of priority only for a few days or weeks at a time. From a project management perspective you need to manage the mix of priority levels for a team or risk team exhaustion. You should be constantly assessing the level of priority each team’s releases have been assigned to make sure that your teams are not “Stuck in Emergency Mode.” A team that is constantly working on an urgent release is a team that will suffer attrition as job satisfaction suffers. Alternatively, if a team is rarely assigned an urgent release you may be able to shift resources from teams working on low priority releases.
On-time Release Delivery Metrics
This is a tough metric for many organizations to measure because on-time delivery is a very rare beast given our industry’s broken approach to release management. We have yet to see a large organization that doesn’t experience some level of schedule slippage over time. Measuring schedule slippage is very similar to measuring downtime but at a different level. This is a metric for managers and business stakeholders that should be used to constantly refine the “correction factor” that the business applies to IT-generated timelines.
As a release manager you sit at the intersection between operations and development, but you also sit at the intersection between IT and the business, and it is your job to provide the business with solid dates for planning along with some indication of schedule certainty. If a business is planning a large release to facilitate sales three or four quarters in advance, the release process should be able to predict with certainty when software will be delivered.
If a release manager can provide predictability then the business can plan for a release, they can buy inventory to meet demand, they can hire support personnel, and they can execute according to a plan. In most organizations struggling with release management, the business has learned to treat IT-driven schedules with a high level skepticism. To most executives IT is a problem child, as something is always going wrong. As a result, the business often waits until software is fully delivered to start incorporating it into operations.
From a release management perspective, the first step toward recovery it to admit that your releases have this problem. Using Plutora you can track release management metrics that cover schedule slippage both before a release, to mitigate risk and report accurate information to the business, and after a release, to incorporate unanticipated uncertainty into your future planning.
Here are metrics you should measure when it comes to on-time delivery:
- Number of releases delivered on schedule by Application
Are certain teams constantly behind schedule? Do they need more resources and support to get back on track? This is the most obvious metric to measure, as it helps address problems in the organization.
- Number of releases delivered on schedule by Priority
Does making a project urgent result in a more on-time delivery? It should. If your metric show no correlation between priority and on-time delivery, this suggests that your priority isn’t being factored into the day-to-day decision making process of your teams.
- Cumulative Number of Days Late by Application
Exactly how late is a given application team? This is an important measure of capacity for a team. If a team is constantly delivering late, either they need more assistance and support from release management, or a serious organizational change needs to be made to address this source of risk.
- Root Causes for Late Delivery Cumulative Number of Days Late by Application
If software is delivered late it should be assessed in a similar fashion to production downtime. A root cause should be identified and recorded for future reference. This is an important part of continuous improvement.
- Cumulative Number of Avoidable Days Late by Application
Related to the root causes identified above, because software is an iterative process you need to keep track of teams that deliver late for the same reasons over and over again.
When it comes to on-time delivery software, it is no longer adequate to just expect software to be delivered late because “it’s complex.” Plutora gives you the tools and templates to measure risk and use a single source of metrics to drive continuous improvement for both development and operations. Also refer to 10 Key Release Management Metrics and Test Environment Management Metrics.