Can Agile Be Used in Operations?

Published

I recently coached someone who had achieved great results as a Product Owner in their company. Thanks to a reorg, they recently got promoted to a role in charge for IT operations — and were now struggling to make use of the same ways of work in the complex and ambiguous world of production support.

Their problem was that all the recordings of agile lectures that they had watched on YouTube and all the DevOps books that they had read on Kindle weren’t helping. There was simply too big of a gap between the reality of their organization, function, and team, and the scenarios that everyone in the DevOps community talked about.

This is a more common problem that most people in the agile community care to admit. The things they do at Spotify and Google don’t always apply to the reality in the field for most agile practitioners; especially those that work in IT functions in companies that operate outside of the tech sector.

Can agile be used in operations?

Yes, the 4 values and 12 principles of agile can generally be used in IT operations. Since the agile movement began as an approach for software development, some of these values and methods will be more relevant than others to IT operations teams.

The same can’t be said for agile project management frameworks. For a number of reasons that I’m going to share with you later on in this post, some frameworks are simply less useful for an IT operations team than they are for a software development team.

When it comes to agile methods and agile frameworks, Kanban is a process improvement method that works best for IT operations. This is because Kanban doesn’t introduce new roles or events to an agile team, but enables it to focus on its existing roles and ways of work instead — helping them continuously improve them as a result.

Scrum and Large Scale Scrum (“LeSS”), on the other hand, are less useful. I’ve seen Scrum work for hiring an IT operations team and for designing and implementing its services. But the iterative nature of Scrum, which requires you to make a plan ahead of time, starts to break if you try to use it to execute an IT operations function.

Extreme Programming (“XP”) is significantly less relevant to IT operations teams as an agile framework because it is primarily a software development framework and prescribes development-centric rituals practices. Of course, there are exceptions to the rule. If your team has Site Reliability Engineers (“SREs”), who have equal-part operations and equal-part development know-how, some of the practices from XP can be highly relevant and useful to help them build better integrations and automations for the processes and tools they use.

Using Agile Values and Principles in Operations

Agile is, above all, an approach to developing software and managing software development projects that’s based on four values:

  • Individuals and interactions over processes and tools
  • Working software over comprehensive documentation
  • Customer collaboration over contract negotiation
  • Responding to change over following a plan

These four values and the mindset behind them can be very useful for people leading and working in the IT operations field. And I’m going to take up the majority of this post to share with you why and how I’ve come to think so.

Individuals and Interactions Over Processes and Tools

In my line of work, I’ve helped numerous IT operations teams adopt agile principles and practices. One of the biggest inefficiencies I’ve seen is when Ops Engineers focus so much on following the processes and using the tools, that they forget to speak to the customer or vendor.

This happens especially when there’s a high-severity incident in play or the Ops Engineer him or herself is overloaded. Stress causes us to focus on execution (“I have so many tickets to respond to that I want to clear this queue as soon as possible.”) instead of on dialogue (“Before trying to solve this ticket, I want to make sure I’ve got it right.”).

One of the fundamental challenges of IT operations is that, more often than not, customers are unable to articulate or pinpoint their problem. When an Ops Engineer or an entire Ops Team reads between the lines of customer requests, they end up chasing red herrings. This leads to minutes, days, hours, or even weeks of waste.

Whenever there’s high pressure and something smells fishy, the agile mindset can help you to stop and think. “Am I missing something? I am trying to solve the right problem? Have we checked all the possible options?” This type of introspection (and the conversations with customers and Dev Teams or third-party vendors that follow) can be invaluable.

The next time you’re talking about processes and tools instead of the problem you need to solve, catch yourself and course correct.

Working Software Over Comprehensive Documentation

Most IT operations teams I’ve worked with keep a ton of detailed documentation that they never use. Without doubt, an actionable and up-to-date knowledge base is essential for running and scaling operations. So why is it so frequently misused?

One of the reasons I hear the most often is “we need to comply with ISO 9001,” or ISO 27001, or any internal policy that the organization requires its IT function to follow. But I don’t really buy that…

An IT operations team can do 99% of its documentation on its own, without having to use expensive Business Process Management (“BPM”) software, and without needing to generate endless documents that no one will read.

The only three tools you need are Google Drawings, Google Docs, and Google Calendar. Or Visio, Word, and Outlook. Or Lucidchart, a wiki, and your mail server. It doesn’t really matter what tool you pick, as long as you’re using it today and you can use it effortlessly.

In my experience, the only types of documentation that an Ops Team really needs is:

  • Process diagrams to visualize, on a high level, how business processes are executed in sequential steps
  • Procedures to describe what these steps are and who gets to do them on what conditions

I’m going to make the extra step and open-source my templates. Here’s the process diagram template I start with in Google Drawings and the procedure template I use in Google Docs. Make a copy and test them out.

Recurring process? Keeping track of it is as simple as creating a recurring invite in your calendar and making sure that the right members of the team are on the distribution list. Create a distribution log on Google Sheets to keep track of who executed the process and when.

Customer Collaboration Over Contract Negotiation

To make this agile value even more relevant to IT operations, let’s rename it to “customer and vendor collaboration over contract negotiation and escalations.”

Yes, negotiation with your customers, internal or external, about the scope of services and level of support they get from your IT operations team is a must. Vendor contracts at a fair price and with reasonable Service Level Agreements (SLAs) and Service Level Objectives (SLOs) are the absolute minimum.

The problem is that when systemic problems appear and escalations become the norm, operations leaders all too often start flipping the pages of contracts and writing dramatic escalation emails instead of trying to resolve the root cause.

And the root cause is almost always that something, somewhere is going wrong in the way that your operations team is collaborating with a customer or vendor.

A surprising number of systemic problems can be solved if you create a communication channel and collaboration cadence between the right people. Simply said, set a routine, get your team and your customer or vendor on it, and foster collaboration and conversation until they solve the problem together — and the routine is no longer necessary.

Here’s my personal playbook for situations like these:

  • Set a 30-minute routine between your team and the right people from the customer or vendor. Most of the time, this is best done mid-week (unless your situation requires otherwise; for example production deployments are done on Mondays).
  • Spend the first 15 minutes syncing up on the open issues. Make sure everyone is clear on what the status is, who owns the next step, when that next step is expected to be completed, and how everyone else will get an update once it’s done.
  • Spend the remaining 15 minutes on sharing and agreeing how to test ideas for continuous improvement that come from the actual people doing the work (no matter if they are from your team, the customer’s team, or the vendor’s team).

Try this out. The first one or two occurrences of the routine are going to be awkward and painful. Stick with it, making sure that everyone has a safe space to participate and collaborate on identifying and solving the problems.

You will end up surprised just how well conversation and collaboration can work when something in a buyer/supplier relationship is going systematically wrong. And it’s humbling, in a way, when you see that some of the issues are often on your end.

Responding to Change Over Following a Plan

Operations is all about responding to change, isn’t it? Even if the goal is to restore things back to “normal.” When it comes to the fourth and final agile value, I think that Dev Teams actually have a harder time stomaching it than Ops Teams.

Nevertheless, keeping an agile mindset by staying open to change is key. Operations work can be highly stressful. In times of stress, humans have the tendency to cling to what they know (or what we think they know). If you’re an Ops Lead or Ops Engineer who’s not mindful of that tendency, you will end up clinging to old ways of work that no longer do you any good.

One of the best ways to apply this value is to seek out feedback and monitor the right metrics. Are you sending out a Net Promoter Score (NPS) survey to your customers at the end of every interaction? How do you measure and identify changes in customer satisfaction levels? Are you seeing significant ups or downs on your response and resolution times? What’s causing that to happen? How can you identify positive and negative trends, so that you can capitalize on them or address them before they’ve ended or turned into a pain point?

So much on the applicability of the 4 values of agile to IT operations. When it comes to the 12 principles of agile, let me know if you think they’re applicable as well. Once you’re done reading this post, share your thoughts in the comments section below.

Which Agile Frameworks Work in Operations?

Of all agile methods and agile frameworks, I think that Kanban is the most suitable for IT operations. Scrum and Extreme Programming (“XP”), on the other hand, apart from the obvious focus on programming practices for the latter, are least suitable.

Here’s how I’ve come to think so. And, to explain my reasoning, I’ll introduce you to a concept from supply chain management and lean manufacturing.

Every day, manufacturers are faced with the following dilemma: “Should we make products before or after customers order them?”

If the manufacturer chooses option one — and makes the not-uncommon mistake of producing more products than customers actually buy — they get excess inventory. And they’ll need to sell this inventory at a lower price, as the more time it takes them to sell it, the lower their profit margin. This is why car makers discount their models at least a couple of times every year.

If the manufacturer goes for option two — and gets more customer orders than they can service — they will delay delivery on some orders and lose customers to the competition on other orders that they can’t fulfil. This results in missed opportunities for profit and growth. This model of work is more typical for service providers and consulting firms, but is prevalent in the world of 3D printing and built-to-order manufacturing.

In supply chain management and lean manufacturing, this dilemma is called push systems vs. pull systems. A push system is when you produce goods in anticipation of customer demand. A pull system is exactly the opposite; when you produce goods in response to customer demand.

I am telling you this because Dev Teams usually work in push systems. They build features and ship them to production, then seek out customer feedback and monitor metrics to see if and how these features are used.

Ops Teams, on the other hand, tend to work in pull systems. They respond to support tickets and resolve incidents as they come in.

When you visualize this on a diagram, here’s how this difference looks like:

Comparing development and operations as push and pull systems

This is an important distinction to make. Push systems and pull systems require different management approaches and solve different problems.

Scrum, Extreme Programming (“XP”), and most frameworks that scale agile, are push systems. Agile teams that use them make plans in anticipation of customer needs, focus their time to iteratively build features that meet them, and seek out feedback or measure metrics to determine if their hypothesis for what would create value was correct or not.

Kanban, on the other hand, is a process improvement method for pull systems. Agile teams that use Kanban respond to customer needs as they come — with the goal to continuously improve the flow of work items and identify opportunities to eliminate waste.

A push system helps you to solve the problem of what to do next. A pull system helps you do what you need to do faster, better, and cheaper.

In a similar way, a Dev Team is focused on effectiveness: “What is the best product to build? What is the highest-value feature to ship? And how do we develop them in a better way?”

Whereas an Ops Team is focused on efficiency. “What is the highest-priority ticket to resolve? How do we reduce the number of tickets of this caliber? And how do we resolve them in a smarter way?”

It doesn’t make sense for IT operations teams to adopt agile frameworks aimed for push systems as they don’t really solve their problems.

Kanban: The Best Agile Method for Operations

By now, you know my reasoning why Kanban is the most-suited agile method for IT operations.

Contrary to what most people think, Kanban isn’t even an agile framework. It’s simply a method that you can use to visualize how you do your work today, so that you can become better at managing and doing it tomorrow.

Kanban starts with 4 principles:

  1. Start with what you know
  2. Agree to pursue incremental, evolutionary change
  3. Respect the current process, roles, and responsibilities
  4. Encourage acts of leadership at all levels

As you’re reading the 4 principles of Kanban, some of you are probably thinking… “Yes!!! Finally an agile method that doesn’t tell me to completely change my way of work and research roles I’ve never even heard of.”

And you’re right. Scrum is more like a car. If you drive it, it will get you from Point A to Point B. Kanban is more like a style of driving. To use Kanban, you need to have a car in the first place. You can even use Kanban in Scrum, a discipline that bears the lovely name of Scrumban.

Which is good news for Ops Leaders, as most IT operations teams usually have an established way of work before they start to look into agile.

Like I wrote in “Can Agile Be Used for Production Support?”, Kanban is all about two things:

  • Improving the flow of work items from idea/intake to completion/resolution.
  • Eliminating waste through continuous improvement.

Check out the rest of that post to read an example of how Kanban helps you make this happen.

By Dim Nikolov

Jack of all trades and master of none. Dim is a Certified Scrum Product Owner (CSPO) and Certified Scrum Master (CSM). He has a decade of experience as a stakeholder, member, leader, and coach for agile teams.