Plutora Blog - Agile Release Management, Business Intelligence, Deployment Management, DevOps, Release Management, Software Development
A DataOps Definition: Separating Hype From ValueReading time 13 minutes
Software development has evolved quickly in recent years. It has changed following diverse methodologies—for example, waterfall, extreme programming, scrum, and agile, just to name a few. All of them have a common set of goals. Those goals are to deliver what the customer wants in time, with quality, and with the lowest cost possible.
I still remember when I started learning about them—back then it was a rule to define a requirement. Then the client had to sign off, and the requirement seemed like it was written in stone. Not a line of code was written before that step. Now, developers only need a basic specification to start coding, as the rest will be defined along the way. Furthermore, production releases now occur many times in a single day. In order to accomplish that task, we came to develop automation practices. Additionally, we changed some management mindsets. Afterward, development operations (DevOps) was born.
Now, there’s a new movement, a new set of practices derived from the DevOps implementation. Those are aimed at gaining insights from all processes. Then again, the goals are the same—to increase development quality, reduce development time, and reduce costs. Since this movement is aimed at target data and analytics, it’s called data operations (DataOps).
In this post, I intend to give you a clear view of what DataOps is. Also, and most importantly, I want to give you an idea of how DataOps can benefit your software development process.
First, I’m going to share a couple of definitions of DataOps with you:
DataOps is a collaborative data management practice focused on improving the communication, integration, and automation of data flows between data managers and data consumers across an organization.—Gartner
DataOps is a series of principles and practices that promises to bring together the conflicting goals of the different data tribes in the organization, data science, BI, line of business, operations, and IT.—Datasciencecentral.com
In my opinion, both definitions express the same basic ideas. Those refer to communication across the organization and data integration for everybody’s use.
At the moment of writing these lines, I wonder if communication is really missing within organizations. For instance, I remember getting frequent emails about each year’s objectives, changes in management, and those who were leaving or joining. Then, I recall how a team made a change in the way employee data flowed from one database to another. They didn’t tell all teams, perhaps just some they had direct contact with. Then errors about outdated data made users start generating error tickets. In summary, I had to create a new software module to handle that change. Specific day-to-day information needs to be shared. We’re still looking for a way to share this kind of information.
As for data integration and management, it’s a bit more obvious. Organizations work better when they have only a single source of truth. From it, all questions can be asked, and most issues can be solved.
In order to achieve the goals identified in the last section, we can summarize the principles stated in the DataOps manifesto as what they value the most in analytics:
- Individuals and interactions over processes and tools
- Customer collaboration over contract negotiation
- Cross-functional ownership of operations over siloed responsibilities
- Working analytics over comprehensive documentation
- Experimentation, iteration, and feedback over extensive up-front design
The first three points refer to communication within the organization. As with my example, if there were communication, the final user would have never reported an error. In this case, the solution would have been implemented before the change in the database. In other words, communication is key for DataOps’ successful implementation—not only within teams, but among all teams in the organization.
The last two statements are directly related to data. There’s no need for additional clarification when you answer the right questions. So, it’s all clear from the graphics or reports. Still, you don’t always spot the right questions in the beginning. Hence, you start asking, make mistakes, fix them, and start again. In time, this process will be faster, with fewer iterations, and give more value to the organization. You’ll also spot when the information you have isn’t enough, and there’s probably a source you didn’t consider at first. What matters is that you keep it all in a single place. That way it’ll be easier to manage the data.
DataOps in the Software Development Process
Now that you’re more familiar with the DataOps concept, we’ll look at how to apply it. DataOps will be useful in any industry. However, there are more companies that rely on developing software to help their clients solve their particular needs. Thus, applying DataOps to improve the overall development process makes it a good subject to use as an example.
Similarly, software development comprises many different processes. We can say a value stream – the series of activities from idea to production that create value to customers. However while some parts add value to the process, often other steps in the process aren’t worth having. To that end, we’ll use the term value stream map (VSM) as the set of all development activities. Let me explain it more in the next section.
Value Stream Management
The concept of value stream management is relatively new. In summary, it’s the process of optimizing the activities in a VSM to minimize waste and focus on value-producing activities. If this still seems a little unclear, take a look at this example.
Why is it worth mentioning? Because it helps you continuously improve your software development process by giving you a view of all the steps you follow and which ones are working and which are not. After you apply some analytics to every process, you can find ways to improve the whole stream and notice processes that shouldn’t exist or don’t add value to it. It doesn’t matter what methodology you follow; having a clear view of all the processes always makes it easier to make improvements.
In order to get enough data to analyze, you need a tool that has many connectors. Each connector will be for a different data source. For instance, a data source might be the issues management database—JIRA, for example. The source code version control is another source. Code analyzers will let you know beforehand if your developers adhere to coding standards. The data they produce needs to be concentrated into a single place for further processing and analysis.
To apply DataOps to the development phase, you have to define some insightful metrics. What would you like to know? For example, how many issues are open during the week? How many of those are solved each week?
The above image is an example report. It shows how many stories are ready for a specific release, by day. You’re able to see your developers’ progress as they finish each story. You can also see that tests are passed correctly. In addition, you can see errors disappear each day. By looking at this graph, you can be confident that your release will be executed on time, with all defects being addressed.
However, your data may show something different. Take the following image for example:
In the last image, you can see the status for each pull request. It shows those more advanced and also those more delayed. With this view, you can set priorities and take action, either by assigning more resources or updating the upcoming release features. If that’s not your call, just show it to the appropriate person.
What matters here is having the data that will enable upper management to take action on time in order to get the best results for your company.
Over time, you’ll accumulate enough data to estimate more accurate release dates and efforts and even have a better resource allocation.
I don’t think there’s a need to stress the importance of the testing phase. What I want to highlight instead is that you get better results when testing is automated—not only the test, but the whole environment to perform tests. Manual configuration is error-prone. Furthermore, it can affect the test results, affecting the confidence in the final product. Part of automation will also require environment scheduling and prioritization. In this way, you’re able to avoid the presence of errors not correctly addressed in other dependencies.
In this phase, DataOps may contribute to assessing the quality of the developers by measuring the number of tests passed or the times an issue returned to the developer because of failed tests. Moreover, asking the proper questions in analytics can lead to new error discoveries and also to additional cases not accounted for in the beginning of the process.
Additionally, historical data can demonstrate the presence of bugs not properly addressed. For example, if in every release there’s an error in the same function, you can arrive at some conclusions. One of them could be the lack of test cases for unforeseen edge conditions. Another reason could be a bad requirement definition. In such cases, upper management will have to intervene immediately.
On the other side, a lack of error detection could mean a lack of testing or a lack of test cases. For each case, the data obtained through analytics is the base for getting to conclusions.
After testing, we’re ready to release our product’s new feature. As I’ve stated before, the more process automation, the better. For this phase, we need to make sure that all feature dependencies are met. Otherwise, although testing passes successfully, releases will fail. We also need to make sure that all features are ready. Additionally, it helps to have a calendar for expected releases. This way we can trace issues that may affect releasing the product on time.
DataOps is helpful in this phase by providing tractability. Features will come from different teams, using different tools to track their progress. Concentrating all data to use in a single tool will give visibility to all people in the organization. The more views, the more opportunities to spot failure points.
In the image above, you’re able to see in detail every pull request included in a single release. With only a quick look, you get a glimpse of all that’s happening. Most requests are overdue and still not ready. Also, for each activity, you can see that their completion status varies. In addition, there are suggestions to overcome problematic activities.
Again, the more information, the better the decisions that can be made about the process. Management can see the image and already know what action to take.
Automating our product deployment ensures that we’re able to repeat the process as many times as we need to make it right. Also, it can give us a clue on the estimated time it takes for each deployment activity. In addition, with the proper tool, we’ll be able to handle as many deployment targets as needed without having to worry about details.
For this phase, DataOps can help us visualize what platforms are harder to deploy than others. In similar fashion, we may discover differences between deployment environments that shouldn’t exist.
Gathering analytics from previous phases can feed a table like the one in the previous image. You can check the status for each planned deployment and consider if it’s ready to launch.
Perhaps time between deployments will also make you think more about your deployment schedule. Should some features be deployed more often, the overall time may be too much, which will lead to adjustments in the whole stream. All decisions are backed on data generated by your process tool.
Even though you aren’t in charge of the decision, you have the tools to make better recommendations.
In the last section, I covered some ideas on how to use DataOps in the development process. Still, you also need a recap that helps you visualize this cycle, what worked about it, and what didn’t.
The above image shows a dashboard with the number of stories per release. Upon threshold definition, you’ll be able to visualize and interpret the numbers. Green represents an elevated number of stories. Red should be very low. Go back to the records you generated over those releases, and you’ll find why the number is high or low. Update your process accordingly to repeat productive actions and to avoid unproductive ones.
The following image contains a chart with environment allocations, start to end.
This kind of data visualization can enable you to forecast expenses. Highs and lows will point to data related to specific situations resulting in those costs. You’ll acquire more knowledge. You’ll find issues. Then you’ll increase or decrease the chances for those issues to repeat again.
There exist many ways to view the same data. Each one is better for a different profile in the organization. Every visualization also answers a different question. The more questions, the better decisions you can make.
DataOps is using all data generated in our processes and analyzing it in an automated way. Thus, the goal is to gain insights. For this case, our goal is to gather data from all of our development phases. So, by analyzing it, we’ll find truths about the way our organization works—facts backed by our own data. From here, it’s time to make decisions in order to improve.
There will always be more data sources to consider. Over the years to come, the information we may gather will increase exponentially. Consequently, the noise will be bigger. Soon, not even a human team will be able to analyze all the information available, thus the need for automation as well as constant improvement. The world moves fast, and organizations need to move faster.
But you aren’t alone. There are already platforms enabling organizations to implement DataOps, especially in the software development arena. Plutora is one of them. It already has some predefined metrics and dashboards to get necessary insights from most development processes. (All images in this post come from their website). It also connects to many data sources to gather data. In addition, it connects to other platforms to facilitate the process itself.
You’ll find additional platforms for DataOps. What you have to do now is to define your own criteria and then make some comparisons. Then you’ll find the solution that works best for you.