Continuous Delivery and Deployment Pipeline: Metrics, Myths, and Milestones

Video ScriptGARY GRUVER: If you could improve the productivity of your software development processes by 2-3x, would it matter?

What if you could start saying ‘yes’ to all those business requests you created all those elaborate processes you’ve created to say ‘no’ to?

Would that make a difference?

What if you decided all that was too hard, but your competitors went all in, can you see how that would have an impact on your company, on your co-workers, and eventually yourself?

Every presentation I give with those three questions because I think it truly represents the opportunities associated with embracing these new ways of working, and the risk associated with ignoring these trends in the industry.

I’ve been fortunate enough to spend the last several years going around the world and talking to people and different places about the transformation, and I was fortunate enough to lead at HP, where a great group of people came together and completely transformed how we did software development.

And we started this before there was DevOps, and we captured this in my book ‘A Practical Approach to Large-Scale Agile Development’.

The Evolution of DevOps

What we realized was that a majority of the things that we were doing was less about how the teams worked and the practices at the team level, and more about how the team worked together to deliver value, which has kind of evolved to become the DevOps practices approach.

We started creating that a long time ago and what’s great is, as I go more and more around the world, people have shifted from thinking there are some good things about Agile and releasing more often, but how do you do DevOps.  Everybody’s excited to do DevOps, and I’m hearing more and more enthusiasm about it.

The Challenges of Doing DevOps in Large Organizations

The challenge I see when I go into most large organizations and doing consulting is when I ask them about DevOps, it’s like five blind men describing an elephant.  I’ll ask developers, “‘Are you excited about doing DevOps?”’ and they say “‘Yeah, it’s gonna be great, I can hardly wait.”’.

OK really, so you’re gonna make sure that all your code has automated testing?  Are you gonna make sure when you check in those tests are passing on an ongoing basis?  Whenever you make a change to a configuration on the server, or open a patch, or update the firewall you are going to do that with infrastructure as code where everybody knows what’s going on and it’s under version control, and you’re going to make sure your code is released on an ongoing basis and you’re getting feedback from the customer?

Well, no, wait.  I can’t do that, that sounds like a lot of work.  No, I just want to make sure that I can release my code easily.  Right now, I hear at Netflix, your first day on the job you get to release code into production, and you have no idea how difficult that is in my organization.

I have to start by going to the Change Resistance Board, then I have to get on a launch call, then it’s launched.  Then there’s stuff that people have to figure out, it’s so painful.  I just want to push stuff into production and not have to worry about all that complexity.  I’m not sure DevOps is going to work here, our operations people aren’t that good.  For example, I knew I had to open a firewall, and I knew I had to patch my system four months ago when I was writing this code- but these ops guys hadn’t even thought to do that.  We’ve need better operations people, or maybe operations people that can read my mind?

Then I go talk to the release engineers and ask them if they are excited about doing DevOps.  And they say, yeah, we’re really excited.  And I say, really?  So, you are going to make sure that you automate your release processes, so you can have smaller batch sizes and release more frequently, get quick feedback on smaller iterations for new features and capabilities, and know exactly what’s going on?

Wait, you don’t know how hard it is to do this stuff in my organization.  Those launch calls are painful.  If I had to do more than one launch call a month I’d never do it.  And my Change Resistance Board only meets once a month….no I can’t do that.  I want to do DevOps where everything will be under version control…anything about the environment, anything about the firewalls, all the test automation, it’s going to be automated.  I don’t have to pick and choose which tests I need to run, I just run them all for a last-minute showstopper.

Anytime anyone has changed anything I know immediately what it is.  Right now, I’m just the babysitter.  I’ve got people in dev that changed things and never told anyone in operations.  I’ve got to go around, chase that down, and figure it out.

I want all that in version control so it’s easy to figure out.  Then I talk to the operations people, and I ask if they are excited about doing DevOps.  They say yeah, I’m excited.  So, everything is going to be under version control, and you are going to manage changes through the pipeline?

Well, that sounds like a lot of work.  I’m the only one who knows how all this works.  I’ve got each server configured specifically, and I’m the only one knows.  I’ve actually started naming them and treating them more like pets.  I know that if I spend all my time working with developers, who would do all the firefighting to keep this place running?  I don’t have time for that.

I want to do DevOps because everything has a feature toggle so if one developer sneaks something in on a Friday night, and I get a call at 2am I can just go in and turn that feature off so we can all deal with it on Monday morning.  So everybody’s excited about DevOps.  Everybody has a different view of what it’s going to be.

What could possibly go wrong?  When everybody decides to do DevOps, you end up with a DevOps elephant that has all the right parts and pieces, but it doesn’t quite work like an elephant, it doesn’t act like an elephant, and it doesn’t deliver the results we were hoping.  In fact, from the business perspective it’s more of a face plant.

What DevOps Really Is

One of the challenges I see when I go into different organizations is getting everybody on the same page of what DevOps is and how to approach it.  One of the first places I like to start is with Gene Kim’s definition.  It’s not about the technology, it’s not about the culture, it’s not about the organization…it’s all those things that allow to release code on a more frequent basis while enabling all aspects of quality.

I like that definition because you would be doing it that way if there weren’t inefficiencies in the system, and you can’t just optimize this quite like a manufacturing process there’s different types of work in an organization.  One is the whole approach to new features and new capabilities.  If you use Six Sigma, what you are doing is changing the product and the process at the same time, so you can’t quite do statistical analysis.

What you are trying to do is remove waste by eliminating developers wasting time on code that won’t work with everybody else’s code, it won’t work in a production-like environment, or it doesn’t meet the business needs.  So, to improve that and take waste out of the system, we want to reduce the cycle time, improve the quality of the feedback loop for the developers so they can learn and adjust.

The other thing that large organizations do is spend more time setting up these complex enterprise release environments than they actually writing the code.  So what DevOps tries to do is get that into smaller batch sizes, so you can triage more quickly, so instead of code that, was checked in 3 weeks ago by someone, you know it was code checked in this morning by one of three people.  This enables you to triage more efficiently.

The third type of work in the system is the repetitive tasks that are there.  In a lot of cases we move to automate that as it helps to take work out of the system that you have been doing manually.  Two, when you automate it you do it repeatedly and you are doing less triage of manual errors.  And third, when it’s automated you can run this on a more frequent basis which enables you to get to the small batch sizes.

I like DevOps because as you force the increase in frequency it forces you to deal with issues that have been in your organization for years that you previously been able to muscles and brute-force your way through, and this requires you to fix them once and for all.

Implementing DevOps in the Enterprise

And doing DevOps in a large, complex organization is somewhat hard to understand.  So I tend to start with the ‘Deployment Pipeline’ by Jez and David Farley, and use that as a framework to think about how code flows through your organization and what the issues are.  And the deployment pipeline is nothing more complex than…how do you start with the business idea, move it to a developer, move it into a testing environment, then move the code over for testing, decide that it’s ready to release, go into production, and then monitor it.

It’s a pretty straight forward process, how hard can it be?  Why did we have over 900 people sign up for this webinar to do that?  It’s because as you get more and more people and complexity, while these things look easy they start to become harder.

This first thing that a lot of organizations do wrong is they end up building too much inventory in terms of requirements sitting in front of developers.  And if we learned anything from lean manufacturing, it’s that anytime that we have too much in inventory in the system it’s a risk of needing rework and waste associated with that.  So, with DevOps we move more to a just-in-time approach to manage requirements.

The next thing is when we move to the developer committing code, you are going to need an environment that’s properly configured.  And for some organizations, DevOps is all about creating environments on demand.  I know one large financial organization that started their DevOps journey just by trying to figure out how long it would take their standard processes to get ‘Hello World’ up and running in a production environment.  They turned their experiment off after 250 days, not because they had ‘Hello World’ up and running, but because they understood all the constraints.  Then they did all the same things in AWS and got it up and going in 3 hours.  So, they understood the opportunity, they understood the constraints, and for them this was their biggest issue to release more frequently.  From there they went off to infrastructure as code, environments on demand through internal cloud like capabilities.

Next, you move the code over to testing.  I think automated testing is one of the most important things you can be doing to transform your software development processes.  And it’s one of the things that is most frequently done wrong.  I’ve got several chapters in my different books on this.  Probably ‘Leading the Transformation’ probably has one of the better ones.  What I find is that a lot of people have way too much manual testing in the system, or they’ve created automated testing that is not maintainable, triage-able, de-bug-able.  So, before I have anybody start to write more test automation, I try to take what automation they have and use it to gate code.  Which means, if these tests don’t pass, the new code doesn’t make it in.

And the developer’s number one job is fixing that.  And what I find is that most organizations need to go back and re-work their test automation so it’s usable.  So, don’t write a bunch more test automation until you start using it, use it to gate code.

The next thing you need to do is take a decision to mode code into production.  When you move it into production, what you want to be using is the exact same tools, the exact same processes, the exact same approach that you replicated 100’s of times going into the test environment as you go into production, so it eventually becomes boring.  And eventually you go into monitoring.  A lot of organizations moving into DevOps realize they are not going to find everything up front the first time, and they solve that problem.  It’s not permission to not test well before you go into production, but it’s a realization that you are not going to find everything, so you may go to canary releases or feature toggles that enable you to get going.

The challenge isn’t an environment for just one developer, and I just described a lot of work that took a lot of organizations several years to do.  When I go in and work with organizations, I start by asking, for this deployment pipeline, where’s the largest source of waste, so I start with metrics.  How much of your capacity of going into planning, how much requirements inventory and what percent are reworking and what percent reach expected results.

How long does it take to get an environment?  What kind of issues do you have?  How often do you need environments?  How long is your testing taking?  What’s your branch time.  What’s your repeatability?  What’s your approval time?  Are you really seeing new issues when you go into production?  What’s the source?  What new issues are you seeing when you get into monitoring?

Metrics-based Analysis of Your Deployment Pipeline

Once you’ve mapped your deployment pipeline out specifically like this, you can put metrics on top of it if you can figure out where and when to get the metrics.  Then you can start prioritizing the work.  So, this is for one developer.  This is the waste and inefficiencies that can come into the system.  So, what happens when you have more than one developer?  The first thing you need to do is make sure the business ideas are spread across a group of developers.  Then make sure the developers are coordinating and bringing their code together to make sure it works in a production-like environment.

This is a lot of what the Agile and Scrum processes are designed to do to, that is integrate with the business people and talk with each other. Then you have the continuous integration process – what’s  gated, what automated tests you need – you need to make sure it’s working. To really do this, developers need to embrace different ways of doing things. They need to make sure they’re checking in code without breaking other things; versioning their services. They need to go into evolutionary databases. They need to prioritize keeping these builds green.

When I go into large organizations, one of the first things I do is look at all the applications and start segmenting them. I wouldn’t recommend going through the process and changing processes in applications that are not business-critical. Once you have the business-critical items, you can segment them into loosely coupled and tightly coupled. You have heard many organizations talk about DevOps. Usually they have small, loosely coupled teams. But the types of things you do to coordinate five people are not the same as what you need to do to coordinate 500. Need to deal with them differently – I go into more detail in my book. For loosely coupled what you’re trying to do is just set up CI environments and let them run independently. Empower the teams. Get them to own what’s in production and responding to any issues there.

Remove barriers. Put in guardrails. In tightly coupled architectures, ideally you would want to build as many times a day as you can then put it together and test it so you can localize the triage process. But in large organizations you can’t build and test frequently enough. In most of these situations you need to break these into subsystems that you can manage and ensure they’re stable. You can break these apart with service virtualization. Look for natural boundaries within your organization and your architecture. The trend is for microservices but that’s very time-consuming. Use the service virtualization to break applications into subsystems then use a similar CI process for each subsystem and gate that with automated tests. Then build that up into a stable subsystem. What you’re doing there is, you’re forcing that organization to debug, triage, get rapid feedback, and ensure each component is stable.

Next thing you look at after that is build up these subsystems into a more complex environment. This is what I means when I refer to a deployment pipeline in a large, tightly coupled architecture. Use the right metrics here. One of the first metrics I look at is cycle time: how long does it take you to go from a developer checking in up until production. In some places it’s run by predetermined build times and deploy times. It’s usually fully automated. Many organizations don’t move forward through the cycle until they’ve met a product lifecycle milestone. You should write down and map out each stage along the way in order to figure out how much time you have in each stage.

Next, go out and collect some metrics on it and figure what are the types of things that you’re seeing. If you notice a stage that’s taking longer than it should, you should look into and try to fix it. The map will tell you which things to prioritize to work on. Next, get to a point where you can gate code and go through different stages with automated testing. Try to collect some metrics. For example, this stage has many environment and test issues. That means you’re really not ready to gate code. You also can’t have the developers be responsible for keeping these builds green because this stuff is outside their control. You need to redesign and re-architect your testing. Look at things like Docker and Infrastructure as code for more consistency in your environment. This ability to create these maps and figure out what’s going on is what you need everybody in the organization to have a common view of the elephant. Here you can see that 60 percent of issues are coming from subsystem 1. I will probably start the continuous improvement process there. So it’s a metrics-based analysis of watching the flow, how it goes through the organization, and get everybody aligned on it. The problems you’ve been struggling with for years become obvious.

The Benefits of this Approach

This approach, where you see how value flows through the organizations and put the right metrics on it and getting people together, you can get a common view of the elephant and that can be a very high-performing DevOps elephant.

However, you may never be able to deploy as quickly as some small Agile teams. But I would argue that applying DevOps in large, tightly coupled organizations is valuable and more important than it is for small, loosely coupled organizations because the inefficiencies of managing hundreds of people is more pronounced than for small teams.

I believe these types of results are possible and realistic. I’ve tried to capture everything that I do in workshops in my third book, ‘Leading the Transformation: Applying Agile and DevOps Principles at Scale.’ This book will help you deliver positive business results.

With that, Lenore, I’m going to hand it back to you…

Our goal is to show how Plutora helps release teams implement within a tightly-coupled architecture that Gary has just described.

The Plutora Story

First, we wanted to share the story of Plutora.

  • Plutora was started over 5 years ago by our two founders who were working in IT at an investment capital bank in Australia.

  • There they had to deal with the IT challenges of releasing software, where they had first hand experience in the challenges in managing complex application delivery.

  • The felt the enormous pressure to deliver value into the market.

  • They knew that everything that delivers growth to the business has some IT element in it, as software runs our daily lives.

What happens when you get it wrong?

But in a large company like there’s, how do they do it and what happens if they get it wrong?

  • The enterprise faces challenges of scale, complex architectures with many dependencies between applications, and they may need to deal with regulatory compliance and geographically dispersed teams.

  • As IT managers then, they started looking at how to enable new growth.

Enterprise software delivery ecosystem

Looking at the technology stack typically in place in the enterprise…

  • There is Project and Portfolio Management that spins up projects.

  • There’s dev tools where agile processes live.

  • There’s backend ITSM tools that defend the integrity of the production system.

So the question was…how efficient can they be so they can deliver software faster?

  • In the middle of these 3 disciplines here, they saw a whole bunch of people doing manual work on spreadsheets, word, emails, sharepoint, what have you.

  • Their view for solving that challenge was to create a platform that gets rid of all those manual pieces.

The Plutora mission is to enable predictable, high quality enterprise software delivery across the entire release portfolio.

  • We do this by creating a layer of abstraction between the tools currently used in the enterprise.

  • Plutora is a SaaS based platform that integrates with engineering tools, PPM tools, and ITSM change management tools to create one version of truth and eliminate manual tracking.

  • We establish a bi-directional sync with these tools to aggregate release, test, and quality data.

IT business operating system

We have four modules:

Plutora Release

  • Plutora Release integrates with the disparate delivery team tools to create a common repository that collects and correlates data for a wholistic view of multiple release pipelines.

  • This helps delivery teams align with a centralized release calendar to track delivery timelines, events and block-out periods, and track system dependencies across projects.

  • Management and control encompasses planning, scheduling, and establishing release criteria to manage complex releases.

  • Release managers can establish governance and controls for the release process, then centrally view activity and deployment status for any project in the portfolio.

  • Plutora Release auto-syncs with existing delivery team tools to gather metrics, insights, and analytics without disrupting these specialized teams.

  • And automated report publishing helps keep everyone informed of health and status of a release.

  • For regulatory compliance, Plutora establishes traceability of all release artifacts to maintain an audit history of any project changes.

Plutora Environments

  • Then, once you know how the releases are running, you need to know what test environments the releases need to go through.

  • Plutora Environments is a single resource to schedule, manage, and maintain test environments, with a consolidated calendar that shows where and when environments have been assigned.

  • Cross-project dependencies are tracked, and centralized booking avoids conflicts.

  • Test environments are defined here, with full visibility and tracking of environment configurations and change history.

  • By tracking demand and supply, we can also facilitate charge backs and future capacity planning.

These are the 2 modules that we’ll mainly be focusing on later in the presentation, but we also have…

Plutora Deploy

  • A Deploy modulus helps release teams plan, track, and rehearse deployment plans and production cutover activities.

  • A centralized cutover plan results in faster deployment and a higher level of confidence in successful RunBook executions.

  • Release teams create their own cutover plans, so each team can refine and nd perfect their tasks in preparation for go-live deployments.

Customer slide

Over the past 5 years we’ve helped many enterprise customers greatly improve their release processes.  We do have many case studies available that highlight the great results we’ve been able to achieve together.

Transition

With that context, let’s look at Plutora in practice, specifically around managing CD pipelines with the tightly coupled architecture that Gary described in his presentation.

First, we’ll talk about milestones…

To set up the discussion, we thought we’d highlight a couple relevant quotes from his book, Starting and Scaling DevOps that are interesting:

The biggest inefficiencies in most large organizations exists in the large, complex, tightly coupled systems that require coordination across large numbers of people from the business all the way out to Operations.”

“For these orgs, it is much more important to push the majority of testing and defect fixing into smaller, less complex test environments with quality gates to keep defects out of the bigger more complex environments.  This helps to reduce the cost and complexity of the testing and also helps with the triage process because the issues are localized to the subsystem or application that created them.”

Defined phases, gated code

With Continuous Delivery, the pipeline will consist of a series of test environments- from less complex to increasingly complex the closer code is to production release.

  • In tightly coupled architectures, release pipelines have multiple application dependencies which adds to the complexity of maintaining quality while managing the increased velocity of change.

With Plutora, release managers can define phases and quality gates to create a well-structured release process for promoting code along the deployment pipeline.

  • Environments & related configurations are assigned to each phase of the release so that each environment is fit for purpose.

  • Then every time you do an environment handover you can specify the criteria for progressing that build on to the next phase.

  • Typically, we see our customers run on average 4-5 test environments per release pipeline.

In his book, Gary outlines the important of creating quality gates to keep defects as close to developers as possible.

  • Shift-left testing emphasizes test execution early and often, to provide that rapid feedback to dev teams for each new code commit.

  • Defining phases and quality gates keeps issues from progressing along the pipeline, especially important if they turn out to be sev1 or showstopper issues.

In these tightly coupled architectures, as the build rate increases with Continuous Integration, the probability that any given change will impact multiple systems also increases.

  • With interdependent release trains, a defect can impact a majority of your delivery pipeline…yet not be at all associated with code.

  • QA teams end up running tests that aren’t passing that have nothing to do with the code, but instead are a result of a test environment configuration or deployment issue for example.

  • Establishing quality gates helps reduce these multifaceted triage scenarios.

  • Gary’s quote from his book regarding gating code is that… ‘The reduction in cycle time minimizes the time different groups across the deployment pipeline invest in things that won’t work together.’

Next, we’ll look at metrics when managing CD pipelines…

Here’s another great quote from Gary that references the complexity of test environments in a tightly coupled system.  Managing environments and the release process requires close collaboration from dev to ops, with the end result to not just deliver faster, but with a better quality product.

Monitor product quality as code progresses along the pipeline

Release managers are tasked with assessing schedule risk and monitoring application quality.

  • But many activities in the CD pipeline (code commits, builds and their version numbers, test environment deployments, test plans created for business requirements or issues, test cases run against each build, status and results of these tests) are not visible, as they are hidden within the specialized tools of dev and test teams.

  • Automation can limit visibility to the broader team.

  • This results in a large amount of release activity with limited visibility and coordination.

With Plutora, everyone can see test status and results in real time to continually monitor product quality and evaluate schedule risk at each phase of the CD pipeline.

  • Centralized dashboards provide actionable reporting and analytics as multiple release trains converge.

Increased release cadence increases pressure and a greater chance of production issues.

  • With full visibility, Plutora helps delivery teams manage proactively rather than reactively, to maintain quality while under pressure of increased release velocity.

It’s important to associate metrics with version- controlled test environments

  • Build versions are rapidly changing in a Continuous Delivery model, and evaluating the wrong set of metrics can quickly throw delivery teams off track.

  • To ensure quality, release teams need to be confident that tests are being executed on correct configurations.

  • It’s not uncommon for an organization to spend more time getting environments spun up and configured correctly that it does to write the code in the first place.

Whether test environments are on-prem, cloud, or a hybrid, Plutora integrates with Jenkins to automatically fetch version-controlled development artifacts from the Jenkins build server, creating a single source to track and update configuration settings for test environments.

  • Gary also states in his book: ‘The Devops journey towards efficiency must involve getting test environment configurations under version control, that way everyone can see exactly who changed what and when.’

With Plutora, teams can see all components of that environment and underlying versions that are relevant to that environment.

  • Then there’s no need to hold big meetings because everyone can see code progress down the pipeline to understand status.

We also track workflows and change requests to ensure compliance, establishing traceability of where each change has been introduced, when versions were deployed, and what tests have been run.

  • Release audit history is automatically updated to include Jenkins build numbers deployed to each test environment over the course of a release.

Next metric slide…

  • On the topic of metrics, we couldn’t resist creating a report that Gary shows in one of his books.  Because Plutora pulls and centralizes data from the various delivery tools, we’re able to create this consolidated view of work left in the system.

  • Metrics such as stories that are branch ready, percent passing rate of tests, and number of defects all contribute to an analysis of release readiness.

For the last part of the discussion today we’ll end on a myth…

The book ‘The Goal’ is on a lot of required reading lists for understanding lean principles.

  • This is an interesting read, and helps us understand why “Any improvement before or after the bottleneck is a waste of time”.

So, the myth asserted here is that automation eliminates all bottlenecks.

Other phrases used in agile training highlight the same concept…such as:

  • Optimizing a component doesn’t optimize the system.

Or…

  • A system can’t evolve faster than its slowest integration point.

Improving the partnership between dev and test

This system thinking in agile development points a big red arrow to process wait states as a major inefficiency.

  • That perpetual feedback loop of the Continuous Delivery pipeline results in a lot of hand offs from dev to test.

  • The hand off process can often add to process wait states if test teams are waiting for information or test environments aren’t available.

Plutora helps expedite this handoff and reduces wait states in a couple ways.

  • First, test teams are automatically alerted when new build are ready for testing and where they will be deployed.

  • Plutora links Jenkins jobs to test environments, where a build can be triggered on-demand to efficiently move binaries along the pipeline.

  • And as dev team velocity increases, test teams may have a hard time tracking the change requests associated with a new build, making it difficult to accurately assign the test cases necessary to ensure accurate test coverage.

  • Plutora automatically links change IDs with new builds, so test teams can quickly associate and assign appropriate test cases to ensure accurate and complete test coverage.

CDP screenshot

  • This screenshot shows the new Jenkins build version in the middle column, with associated change ID’s in the righthand columns, that have been automatically pulled from the code commit in github.

And with that we’ll move onto the Q&A.