What is Configuration Drift and How to Solve It
Aug 16, 2023
Delivering new software requires contributions from multiple teams delivering changes across multiple applications. As an example, let’s say you are an insurance company and want to add two-factor authentication to your solution. This means changing the iOS mobile app, Android mobile app, website, APIs, database, logging, monitoring, and more. These different applications need to come together for integrated testing with other components to ensure the new feature is functional from an end-to-end point of view. Often teams can be isolated. They understand their application but cannot speak to the performance of the entire solution. So it is critical to have test environments ready with all the components for this purpose.
Software often progresses through a series of increasingly complex environments. For instance, the development environment that is often on the developer’s laptop could simply be exercising APIs rather than connecting to the other components. Upon success, the software is promoted to an integration environment that contains most components but is simplified from a scale point of view. Once the integration is complete, the software is promoted to more complex environments with scale and redundancy for performance and penetration testing. The software is then promoted to a staging environment that replicates production as closely as possible. The staging environment can play dual purposes of testing new features as well as being the platform for testing patches for critical defects in production. Lastly, the software is promoted to production.
Problem - Test Environments are a Bottleneck
Test environments can be a bottleneck for many reasons. Is there one available? Is someone else using it? Is it up and running? Does it have the proper components? Are the applications and microservices the proper version? Is the test data appropriate? Are the infrastructure, operating system, and security profile correct? All of these potential problems take time to resolve, slow the teams down, and impact the quality of the software delivered.
What is Configuration Drift?
When there is variation between the version of components across test environments, that is called Configuration Drift.
This drift can be good when it is planned and visible, showing how features and defect fixes are proceeding through the pipelines. This drift can be bad when it is poorly planned and not visible to the teams and testing occurs on inappropriately configured environments. In the best case scenario, this testing must be redone – wasting time not only on the testing resources but also impacting the entire team who need the results before moving on to other tasks. In the worst case, a defect escapes into production causing an incident that impacts customers or the business.
Most companies are using manual, ad-hoc solutions like spreadsheets, email, or wiki pages for managing their test environments. This is time-consuming as well as error-prone.
I recall working with a large bank where they had a robust security approach to production. There was a vulnerability found in production and a fix was quickly tested and deployed. The problem was the security patch was not back-ported to the under environments. All testing for the next release was done on an old version of the operating system and when the release was promoted to production, critical customer-facing services would not come up. They lacked visibility into their environment drift as well as processes for back-porting.
Solution - Test Environment Management for Configuration Visibility
The solution is to have a proper test environment management solution that captures the configuration as well as manages and tracks configuration changes. Such solutions must also be able to handle bookings, help resolve conflicts, provide visibility into schedules, help understand dependencies as well as provide all the analytics for understanding utilization, uptime, etc.
Plutora Test Environment Management captures and tracks environment configuration in a multi-tier model. This information is best captured via automation leveraging the Plutora integration layer to avoid any duplicate data entry. The following are the key elements of the configuration and the sources:
Capture a list of the applications/microservices which typically come from a CMDB tool like ServiceNow or Remedy.
Capture the instance of each application. Ideally, this information comes from an integration to the CI/CD pipeline like Jenkins, Azure, etc. in which every build and the associated scope is captured as well as each deployment. This data can then be used to “enrich” the CMDB so it properly reflects the under environments if desired.
Side Note: I worked with a large utility provider where we integrated with Splunk for configuration information. Splunk regularly scans all the test environments and that information is fed to Plutora such that they have perfect visibility into all the configurations.
Arrange the application instances into integrated environments.
Capture any configuration item that is needed for a quality outcome.
All changes to the test environments should be managed with an appropriate workflow process which can be manual, automated, or a combination of both. This is not meant to be a heavy process but not meant to be a light process either. You must understand all changes to the test environments as each one needs to be considered for production.
Now we have real-time visibility of every environment including all their components and versions.
Benefit - Test Environment for Speed, Quality, and Efficiency
There are three main benefits to properly managing environment configuration and providing visibility into drift.
Speed - The speed of software delivery will increase when environments are ready when needed with the proper components at their proper versions. The EMA report “The ROI of Plutora Environment Management” found increased application development to be the number one benefit in the range of $454,000 to $3,734,000 over four years.
Efficiency - The teams will be more efficient saving time by reducing troubleshooting, reducing rework, and being able to focus on testing. EMA found that eliminating the manual and time-consuming processes can save between $681,000 to $1,048,000 over four years.
Quality - The quality of software delivered will improve as there is clear visibility into the configuration of test environments as well as production. EMA found that eliminating misconfigurations can save between $190,000 and $2,226,000 over four years.
To learn more about how to address configuration drift and test environment issues: