Category Archives: Metrics

DevOps – how to find the constraints in your IT services? Part I

One of the key quotes in “The Phoenix Project” was, for me, “Any improvement not made at the constraint is an illusion”.

“Any improvement not made at the constraint is an illusion”

The logic is clear – any improvements in delivery upstream from the constraint just causes more work to queue up at the bottleneck constraint, and any improvement downstream just means more idle time for the downstream resources as they wait for work to be released from the bottleneck.

So, how can you identify the constraint(s) in your IT services?

Well, in the Phoenix Project is obvious – the constraint is “Brent”, the uber-geek with his hands in every project and too much un-documented knowledge in head. However, in your organisation the constraints might be much more subtle and may need a more methodical investigation to uncover.

I am sure that the business process re-engineering and Six-Sigma experts out there have a wealth of methodologies to discover these constraints, but the purpose of this article is to outline a pragmatic approach, derived from first principles, that you can use to get started. (p.s. if you ARE a business process re-engineering and Six-Sigma expert please feel free to point us to relevant techniques and models via the comments!).

Where should you start? Clearly, your focus needs to be on those IT services (people, process and technology) that are key to your business success and contribute the most to your organisation’s strategic objectives. (BTW one of the other key lessons from the Phoenix Project was how many “mission critical projects” actually had no real linkage to the company’s strategic objects or business plans).

So, step #1 is to dig out your organisation’s annual report and most recent strategy presentation and make sure that YOU know clearly what the organisation’s goals are. This is your “BS filter” for the rest of the rest of the process… if someone complains about an IT service but can’t link that service or project back to a key business objective then put that one the bottom of the pile!

The next step is to do a bit of exploratory research with your key business users (note, business users, not IT staff!) which should quickly provide you with a list of “Top Ten IT services that get in the way of the business success”. This list might include services that you DON’T currently deliver e.g. “if we had a more flexible and responsive laptop support team e.g. “like a genius bar” where our mobile sales teams could get their laptop problems fixed fast and get back on the road selling to our customers.”.

Step #3 is to try to validate these subjective opinions with some objective data.

Now, assuming you have some type of helpdesk logging system you should have a ready source of data about the performance of your key IT services. If you DON’T have some form of logging tool then find a way to start logging your work, immediately, even if it’s just one big shared Excel spreadsheet. Personally, I like and have used that very successfully at a number of sites but the key point is “you can’t manage what you can’t measure” and to have any hope of demonstrating improvement you need to be able to measure the before & after impact of whatever changes you make.

“You can’t manage what you can’t measure”

What sort of things can you look at? Well, things like which tasks/services/processes are used the most? Take the longest? Breach the SLA’s the most (days overdue)? Who in your team has the most items assigned to them? Which systems/services have the lowest availability? All of these (and many others) should start pointing you in the right direction for things that need improvement.

So, now you have a list of the stuff that’s most important to the business, backed up with objective data about how frequently the issues might occur, how many users are impacted and that should enable you to make a first pass at what’s important and where to start first.

So, write them up on cards, stick them up on your Kanban Board, and move your top pick to “in progress” and start working on the detail of finding and fixing the constraints in that service.

How we’ll do that step will be in Part II!

–          TheOpsMgr


Photo: Flickr/bjornmeansbear

DevOps – How do you measure team success?

In an earlier blog post we talked about the importance of re-structuring your Dev & Ops organisation to remove silos and in another post we touched on how incentives and metrics influence how the teams work together.

Organisationally we remained siloed however – we were incentivised in different ways (Operations emphasising availability, Development emphasising feature delivery), we remained in essentially a waterfall delivery model and Ops VS Dev was a constant struggle for manpower & resources. All the usual problems that the DevOps movement is trying to address.

So how do we create alignment in incentives across our merged DevOps team and what Metrics should we be tracking to measure success?

Jesse Robbins from OpsCode wants us to “make more awesome” and proposes a measure of “time to value” as a measure of DevOps success.

Rich Steer gets excited about the “time to value” concept here – – but I’ve yet to see a truly operational definition of how it works.

As someone points out in the comments on Rich’s article when do you start/stop the “Time to Value” clock?

  • Do you start the clock when the business first has the idea (timing being vague and nebulous) or when it becomes a Story in the backlog (more quantifiable but late)?
  • How do you measure “value” (which is intangible) and how long after “deployment” do you decide “value” has been achieved, if it ever does? Or does “deployment=value”?
  • Some changes will have more “value” than others so how do you weight the metric to account for that?

So if “time to value” is a bit nebulous what other metrics are people proposing?

PuppetLab’s State of DevOps report focuses on improvement in 4 key metrics – Change Frequency, Change Lead Time, Change Failure Rate and Mean Time To Recover (MTTR). We suggest that classic metrics like Availability, Performance (page load time) and Mean Time Between Failure (MTBF) should be in there too in order to give a better rounded measure of the overall site performance.

Thoughtwork’s have re-worked the old “function point” or Agile “story point” concept by adding a business spin and came up with “Business Value Points”. (As an aside why the points-based approach might be a bad idea you should read this article here on Agile Story Points and why they might be harming your velocity, not helping).

I’m not convinced that we can find a single metric that can effectively incentivise or measure a DevOps teams performance – I suspect that we’ll have to create a weighted equation of some type to derive a synthetic metric or use a “balanced scorecard” approach to weigh different metrics from different perspectives.

Regardless of the metrics we pick we have to ensure that they incentivise the behaviour we want to reward because as we know from the Met Police and the NHS the wrong incentives result in the wrong behaviour.

This is where an interesting piece of management theory comes into play – it’s called “Vroom’s Expectancy Theory”, and it outlines 3 factors I always like to keep in mind when I am setting team incentive metrics

Basically it boils down to 3 simple things.

“I am only going to be motivated to work harder IF…”:

 (1)    Expectancy = My effort will really make a difference to the overall performance (conversely, “why bust a gut if no matter how hard I work it doesn’t make a difference to the overall performance”, hence why getting the weighting in the overall equation is critical)

(2)    Instrumentality = If we achieve our performance goals I will get the reward/bonus (i.e. do I trust my Boss enough to deliver on the bonus and not weasel out with some BS about the “state of the economy”).

(3)    Valence = The reward is something I care about enough to make the incremental effort worthwhile. (e.g. it might be an order of magnitude harder to go from 99.9 to 99.99% availability but if it is only an extra 1% in my bonus is that really worth all that effort?)

 The real key here is finding metrics that EVERYONE in the DevOps team can contribute to, and that overall the metric balances out the different contributions from the different roles within the Team.

If you get the “balance” wrong it can lead to frustration for the team members

For example, a systems administrator might think “no matter how hard I work (e.g. say on infrastructure automation) the crappy code quality will still make us miss out targets, but I’m not a developer and I have few (if any) ways to improve that…”.

Conversely from a developer perspective “I can write awesome code but if the ops guys keep messing up the server configuration and causing downtime then what’s the point?”?

As a thought experiment I tried to think of what sort of “equation” might be the basis for a meaningful synthetic metrics.

What I came up with was this:

DevOps_Incentive_Bonus_Metric = “Development Velocity ((100availability percentage”)*100) + (Apdex(3) * 100) + NetPromoterScore ?

If we take my sample equation let’s see how some of that works if we turn this into “user stories” for our DevOps Team:

As an “Operations Person”
I want “to create automated test environment build scripts using Chef”
So that “Developers can work faster and increase their Development Velocity”

As a “Development Person”
I want “to incorporate page load time non-functional requirements into my user stories
So that “we only ship code that meets the 3 second page load time objective”

As an “Operations Person”
I want “to create a load-balanced production environment”
So that “we can remove any single point of failures and improve availability”

As a “Developer Person”
I want “to remove any dependency on per-server session state”
So that “my code works better in a load-balanced environment and improves availability”

As a “Developer Person”
I want “ship awesome feature XYZ”
So that “customers are delighted and increase our Net Promoter Score”

These “team user stories” seems to work pretty well  to me (but I am sure that many of you could come up with something better!).

So the questions for the DevOps community at large are:

  1. What Metrics do you use to motivate and measure the successes of your DevOps Team?
  2. How are those metrics shared across the different roles within the Team?
  3. How well are those metrics working for your DevOps team?

I am sure lots of people would like to know so please give us your thoughts in the comments or link over to your blogs etc.