Testing Against Production? Blame Your Data
Apr 22, 2016
Managing databases and newer, more exotic NoSQL stores is an emerging challenge in an industry under constant pressure to move faster. While the industry has perfected the art of cloud-based provisioning and configuration of application servers, databases continue to be a pain point for environment managers.
There are a few factors at play. Most large companies practicing software development are working with systems that have been operating for decades. When a bank throws a fancy, web-based interface on account management it doesn’t always mean that the IT department performed an end-to-end rewrite of the entire system. Legacy account management systems and customer service infrastructure is often adapted to work with faster-moving front-end components that serve the website.
Baremetal systems still power most of the largest businesses and NoSQL databases present new challenges that often back teams into a corner. Instead of creating QA and Staging environments to facilitate end-to-end testing, some test environment managers find themselves configuring QA and staging environments to point to production databases. This anti-pattern complicates environment management and it introduces risks when QA and production data is mixed.
Complicating Factor: Relational Databases are Still Prevalent and Difficult to Manage
At Plutora, we see this pattern almost everywhere we go. Front-end web applications are using the latest tools and technologies while connecting to backend services that are running on baremetal services running relational databases. Large databases in the Fortune 500 are still centralized systems running Oracle or DB2. These databases are so valuable and so full of risk that they are still maintained by a team of dedicated DBAs who are more focused on manual operation and less interested in deployment automation or self-service tools for test environment management.
If an organization doesn’t invest in data masking technology or ways to automate the setup and teardown of relational databases test environment managers have no choice but to configure QA systems to point to production. DBAs often fall into a pattern of only worrying about production systems. To avoid this push your DBAs to take on more responsibility for automation and ask them to directly support test environment management activities.
Complicating Factor: Application Performance Tests Requires Data at Scale
If your production system has a customer management database with 1 TB of data you can test functionality with a reduced data set of a few GBs, but you’ll never be able to test the real performance of the application or the database. A database with terabytes of data is an entirely different beast than a database with only a few GBs. Query plans are going to be dramatically different depending on the size of various tables, and a DBA will have a completely different set of indexes to optimize and queries to analyze if your QA and Staging databases are several of orders of magnitude smaller than your production database.
While it may be impractical to create multiple copies of a production database if your production database is very large you should strive to always create at least one performance testing copy of production that is properly masked. If management expects release managers to run performance tests or stress tests before releases having a copy of a production database will allow you to run these tests without having to create mixed test environments.
Complicating Factor: What to do about NoSQL or caching components?
Maybe you use Cassandra or MongoDB? Maybe your site incorporates Couchbase as a caching layer? When you are trying to recreate the architecture and configuration of production NoSQL layers can be a complicating factor. Very often the data stored in NoSQL or caching components has some relationship to the data stored in an environment’s database. While there are no direct “foreign key” relationships that can be enforced across these different technologies, a test environment manager is still expected to ensure that these components are synchronized with the relational databases that continue to run most large applications.
NoSQL and caching components are often easier to automate. It’s synchronizing data across different data storage technologies that is the problem. To avoid having to test against production you should make sure that your DBAs and your developers agree on who has responsibility for these more “exotic” forms of data storage. Very often your developers will take ownership of NoSQL data stores and incorporate them into configuration management and deployment automation scripts. As a test environment manager you need to make sure that someone owns the responsibility for setup and teardown.
Complicating Factor: No Good Answers for (Really) Big Data
With the advent of petabyte-scale storage systems storing structured or unstructured data there is another complication – some data components of a complex architecture will never be duplicated to support QA or testing. You are not going to be able to test a QA or Staging system that interacts with petabytes of storing by copying this data. When it comes to Big Data, some of these testing exercises will need to be modeled against the realities of production. If you have petabytes (or exabytes) of data there may only be one system to test against: production.
There’s no good answer here for big data at scale. It may be unavoidable to create test environments that hit these gargantuan repositories of data. If you find yourself creating systems that need to be tested against these production data stores one way to ensure that a mixed test environment doesn’t cause problems is to always make sure that your testing scripts are using read-only accounts.
Conclusion: Bring the DBAs into the Test Environment Conversation
As I mentioned previously, DBAs are a job title that has remained relatively untouched by the DevOps and Agile revolutions. Most DBAs still practice the same job they practiced a decade ago, and it is an important, operations-oriented job focused on managing production risks.
A DBAs job is often focused on one thing, and one thing only: keeping a production system up and running and making sure it performs well. This singular focus can make it difficult to attract their attention to the requirements of QA and Staging. Database and data quality issues are the reason why application development teams decide to forgo QA and Staging environments and just run critical tests in production. The best way to convince DBAs to participate in providing better data solutions for QA and Staging is to make the risks of testing against production known throughout the organization.
If you use Plutora to track environment-related issues and release-related downtime you can make a direct connection between teams that test against production and teams that have adequate data to support QA and Staging. To start the conversation capture the risks related to testing in production and make these risks visible to your DBAs teams and to IT management. With the right focus on data and data quality test environment managers can find a way to avoid mixed test environments that mix QA resources with production.
Download our free eBook
Mastering Software Delivery with Value Stream Management
Discover how to optimize your software delivery with our comprehensive eBook on Value Stream Management (VSM). Learn how top organizations streamline pipelines, enhance quality, and accelerate delivery.