Plutora Blog - Agile Release Management, Deployment Management, DevOps, Test Environment Management
Expedite the Handoff from Dev to Test: An Interview with Gary Gruver (Part 2)Reading time 5 minutes
In the second part of our conversation with Gary Gruver (read part 1 here), we continue to discuss delivery pipeline inefficiencies of enterprise releases. As Gary describes in his book Starting and Scaling DevOps:
“The biggest inefficiencies in most large organizations exists in the large, complex, tightly coupled systems that require coordination across large numbers of people from the business all the way out to Operations.”
Quite a broad topic, so for our discussion we are drilling down on the dev to test handoff as well as test environments. We left off part 1 of the interview with this awesome soundbite from Gary: “Lots of orgs spend more time and effort getting the big, complex enterprise environments up and ready for testing than they do actually writing the code.”
In your book Starting and Scaling DevOps you state:
“For these orgs, it is much more important to push the majority of testing and defect fixing into smaller, less complex test environments with quality gates to keep defects out of the bigger more complex environments. This helps to reduce the cost and complexity of the testing and also helps with the triage process because the issues are localized to the subsystem or application that created them.”
Can you provide some additional insight on gating code?
GG: “For organizations with tightly coupled architectures, it’s important to build up stable enterprise systems using a well-structured deployment pipeline with good quality gates. Large test environments are incredibly expensive and hard to manage. As a result, they are not a very efficient approach for finding defects. You have to establish quality gates, where release teams are not allowed to move code further along the pipeline until they pass specific tests.”
What’s the process for introducing gated code into a delivery pipeline?
GG: “Find out what are the sources of the apps or subsystems that are breaking most frequently. You’ll want to start by gating those apps to find and fix defects before moving code onto the next, more complex test environment.
Create a paredo chart as to why tests are failing. When you have those metrics for each environment, use different tags for the various defects. Then you can see all the waste that can potentially come out of the system. This sounds easy, but it’s so rare that people step back and look at how their organization works in order to put metrics on it. There’s never time to step back and look at everything.”
How do you identify areas for improvement in the dev to test handoff?
GG: “First, I tell teams to answer this question: What percentage of defects are you finding in each of your environments? For example, if 90% of defects occur in the initial test environment, the system is good, meaning the dev team is receiving quality feedback. If you are finding a majority of your issues on the right side of the pipeline, closer to production, then feedback to dev is not that valuable as triage and root cause is much more complex. By the time you are in a complex environment, you should only be testing the interfaces.
The next question to ask is: What are the different types of issues found in the different test environments? Are you finding environment issues, deployment issues, problems with automated tests, the code…? We’ll look at test results from two perspectives to understand the best approach to triage. First, we look at whether the same test has passed from build to build, then we’ll look at how well the same build is performing as the test environment complexity increases from release phase to phase. To do this, each stage of the deployment pipeline needs a stable test environment for gating code.”
Besides incorrect configurations, what other problems have you run into with test environments?
GG: “I work with many organizations that are on a journey to test automation, meaning they still have manual testing left in the system. It takes work to make sure tests are triagable, maintainable, and can adapt as the app is changing. One organization I’m working with now runs 3000 automated tests on a weekly basis. Recently a release train was held up not because there was a problem with the tests, but it turns out a test environment didn’t have enough memory and CPU power to run the automated test.”
In your book, you state:
“For large tightly coupled systems, developers often don’t understand the complexities of the production environments. Additionally, the people that understand the production environments don’t understand well the impact of the changes that developers are making. There are also frequently different end points in different test environments at each stage of the deployment pipeline. No one person understands what needs to happen all the way down the deployment pipeline. Therefore, managing environments for complex systems requires close collaboration from every group between dev and ops.”
What is your advice for managing test environments in tightly coupled systems?
GG: “The DevOps journey towards efficiency must involve getting test environment configurations under version control, that way everyone can see exactly who changed what and when. Then there’s no need to hold big meetings like a Scrum of Scrums because everyone can see code progress down the pipeline to understand status. Also, developers need faster access to early stage test environments, so they can validate their changes and catch their own defects. Success really requires being able to provide environments with cloud-like efficiencies on demand.”
Stable test environments, under version control, and on demand. Sage advice.
Plutora Environments integrates with Jenkins to trigger builds and track component versions of test environments.
Watch the on-demand webinar with Gary Gruver: Continuous Delivery Pipelines: Metrics, Myths, and Milestones.