Support in the Age of DevOps: 8 Ways to Revamp Your ITSM Processes

Feb 12, 2020

The origins of DevOps lie in agile system administration and the recognition that whilst software development teams were taking advantage of agile methodologies to become more responsive to change and uncertainty, the IT Operations people were not.

Sometimes they were even oblivious to what was happening on the other side of the ‘wall of confusion’ and painful tensions and misunderstandings occurred between the two technology teams: IT Ops guys grumbled about developers wanting administrator access to production machines, the developers moaned that IT Ops guys took too long to provision environments, and releasing an update was always a ‘hair on fire’ moment that frequently resulted in blame games and mostly happened at weekends.

“One of the main problems was that we just didn’t have visibility of all of our test environments and that frequently resulted in double bookings and uncontrolled changes because there wasn’t sufficient management in place to be effective.”

-Mark Lewis, UAT Test Environment Manager at O2

The battle between change and stability seemed as if it would rage on. But DevOps principles have taught us how to balance throughput and reliability without compromise to either. DevOps is not just practiced by the ‘born on the web’ behemoths but by an ever-increasing number of traditional enterprises, those who in the past have embraced IT Service Management (ITSM) approaches to service delivery.

Whilst some organizations consciously or unconsciously drive their DevOps evolution from their development teams, there always comes a time where they seek to understand how to optimize service management activities as part of the end-to-end technology delivery value stream. And whilst development is about agile, IT Ops are about ITSM and we can use lean tools to marry the two and create lightweight, “just-enough” processes that allow both teams to work at the same cadence.

DevOps has evolved to focus on the end-to-end optimization of the value stream, accelerating flow from idea to value realization. How we handle and manage key technology delivery services changes when our primary goals are to optimize the flow of value and system integrity.

The DevOps Approach to Support

Support people are typically the lowest paid and least respected in the technology hierarchy. Strange, when they are on the frontline, dealing with our customers, our reason for being, on a daily basis. The Second Way in DevOps is to amplify and shorten feedback loops - and in Value Stream Management we are particularly interested in customer feedback. So whilst the function of a support role is to fix customer problems, it’s also to sense customer sentiment and identify value delivery opportunities.

“The Second Way is about creating the right to left feedback loops. The goal of almost any process improvement initiative is to shorten and amplify feedback loops so necessary corrections can be continually made."

-Gene Kim

1. You Build it, You Own it

This way of working is centered around small (because of what we’ve learned about how humans build trust and social connections), autonomous (because we don’t want them to have to wait for decisions to be made on their behalf and because we hired them because they are capable of doing this themselves, and best-placed), multifunctional (because we don’t want them having to wait for other teams to do stuff for them) teams. They change and run their product. This isn’t about giving developers ‘pagers’; this is about having end-to-end ownership of a value stream.

2. Minimize Handoffs

As with all the traditional ITSM patterns described here, there are good reasons for why they have been widely implemented, and for some time they worked. But the world keeps turning and right now, digital disruption demands we all change the way that we work to optimize flow through a value stream. Having a support or service desk makes less sense when our users experience few problems or are mostly able to resolve them themselves using online documentation. If we want to shorten a feedback loop, it’s best not to have multiple handoffs through teams - delays don’t help with our flow or with delighting our customers.

3. Move to Swarming Instead of Streaming

Tiers create queues of work in progress which we seek to minimize as queuing creates delays. Whilst the tiered approach is intended to ‘protect’ the ‘best’ (read: most expensive) staff from trivial customer issues (is there such a thing?), when we seek to put the customer at the center of all we do and want them to have optimized service, why would we put our best people at the back of the process?

So instead of streaming, we move to swarming. There are several models organizations work with, but they all follow these broad principles:

  • There should be no tiered support teams or hierarchy

  • There should be no escalations from one team to another

  • The issues should move directly to the person most likely to be able to resolve it

  • The person who takes the issue is the one who sees it through to resolution

Swarming isn’t solely for Severity 1 issues or incidents; it establishes teams whose priority to ensure that the issue gets to the right person as fast as possible and that it receives attention as soon as possible.

4. Bring Ops Capabilities into Product-Centric Teams

Having small, autonomous and multi-functional teams arranged around products is the foundation to the ‘you build it, you own it’ mantra. Many agile transitions start by bringing developers and testers into the same team along with the ideation capabilities (Product Owners and business analysis roles). DevOps and value stream thinking brings Ops capabilities into the team too and many teams start with support roles. This isn’t simply about putting the developers on 24/7 call duties but about automating the front end of support as far as possible and getting the issue in front of the right person as soon as possible.

DevOps balances throughput and stability so as organizations improve their posture, teams experience a reduction in the volume of issues and a shortening of resolution time. When teams are dedicated solely to support issue resolution, they often find Kanban a suitable way of managing the flow of work. Where teams are working in development sprints, they may find it helpful to record unplanned work and practice assigning a percentage of the sprint to it. Unplanned work is an effective proxy metric for quality and when measured is extremely useful when teams want to assign time to invest in paying down technical debt.

5. Automate Support with ChatOps and Bots

ChatOps is the use of a group messaging tool integrated with the DevOps toolchain. Chat channels can be created as needed (typically for an incident) or in permanent use (typically for a theme for a particular product). A swarming support use case might allow the received of the customer issue to access a specific backlog channel and request interaction from that product team or the team may have their own channels for support issues relating to items such as payment gateway for example.

The service desk can also encourage customers/consumers of their service to interact via online chat once they have been guided through available topics and support artifacts in a knowledge base. Bots can try to resolve the issue initially and as needed the issue can be automatically routed to the team and swarmed from there.

6. Encourage Knowledge and Self-Service

Many people don’t enjoy committing extended periods to writing and documentation; however, to optimize a value stream, ‘just enough’ documentation is key. Underpinning this then is the ‘little and often’ principle; ensuring that small pieces are documented frequently at source and held in a repository that is easily searchable and visible. This takes the burden off the support team as people can find and resolve common issues themselves, leaving the support swarms to work with the edge cases.

7. Telemetry Everywhere 

Much of the waste in the support value stream is in the fault diagnosis (after we’ve removed delays through handoffs in a tiered model) so the team needs data to help them identify unknown and unusual issues. Support teams are frequently poorly supported by tooling, other than ticketing systems, so providing the product teams with tools that radiate telemetry means everyone in the team can benefit. Application monitoring and logging tools accelerate the identification of the root cause(s) of an issue (and these should be used in pre-production too) - it’s over to the team then to fix it fast - but their CICD pipeline will help validate and deploy it at speed. And it’ll be an emergency fix or a small change so they won’t be slowed down by CAB or the release schedule.

These types of tools also provide customer journey insights and real-time feedback on the business value of features and changes that the whole team can use in the sprint reviews to check the outcome of their hypotheses and in their sprint planning to set up their next round of experiments.

8. Intelligent Risk Management

Once a team is collaborating on a shared and visible backlog and is proficient in performing continuous delivery, it will have reduced its incidents and improved its MTTR. AI tools that help teams to assess the risk of a release help teams make decisions on when to act and who to have pre-warned. Having this data visible to central release teams provides evidence, builds trust and earns the right to autonomy.

Taking a DevOps and value stream approach to service delivery puts the priority on optimization of the flow of work from the idea to the realization of the value in the hands of the customer. Necessarily it demands a rethink of the traditional approaches and organizational practices, just as becoming agile and product-focused demands we rethink an inherent waterfall and project-centric approach.

Since improvements in value stream flow are likely to necessitate significant and far-reaching decisions about things like the roles in the organization, the organizational design, how work is funded and how investments are prioritized, it’s helpful for the people making those decisions to be as well-informed as possible and able to monitor feedback, learnings and evolutionary progress over time. Following our telemetry everywhere mantra, it’s best to support the human-driven value stream mapping efforts with data-driven value stream management evidence.

Learn how to bring support up to DevOps speed by reading our free eBook:
Service Management in a DevOps World

Deliver Better Software Faster with Plutora

Deliver Better Software Faster with Plutora

Deliver Better Software Faster with Plutora