DevOpsFragileBorg

DevOps, Antifragility and the Borg Collective

Whilst researching how to reconcile ITIL with DevOps I came across this interesting blog post from the IT Skeptic entitled “Kamu: a unified theory of IT management – reconciling DevOps and ITSM/ITIL”. This lead me to Jez Humble’s post on “On Antifragility in Systems and Organizational Architecture” referencing Nicholas Taleb’s book “Antifragile” and generally lead to a lot of intense cogitation on fragility versus robustness versus antifragility.

The IT Skeptic (Rob England) expands on his thoughts in this presentation which introduces this diagram below

Kamu: reconciling DevOps and ITSM/ITIL

However I struggled to mentally conceptualise the differences between the 3 points of the triangle until I came up with the following analogies (and please bear with me while I explain my thoughts behind them!):

  • Fragile = Humpty Dumpty
  • Robust = A medieval castle
  • Anti-fragile = The Borg collective

Fragile

“Fragile” systems are those (often legacy) systems that you really, really don’t want to touch if you don’t have to! Like Humpty Dumpty, one good push and all the King’s horses and all the King’s Men and a 24 hour round-the-clock marathon from the Ops team won’t get that pile of crap system up and running again.

It’s a snowflake – not documented properly, there are dependencies you can’t trace, the hardware’s out of warranty, the platform is 3 versions behind and can’t be upgraded because of some customisation that no-one understands, the code is spaghetti and the guy that wrote it retired last year.

We all know what fragile looks like!

Robust

“Robust” systems are those that have been through the ITIL life-cycle and for most of us they are probably our pride & joy. Monitored, instrumented and well documented with their own run book and wiki pages they are highly available with redundancy at every level we “know” they can withstand the slings and arrows of outrageous fortune.

Just like a medieval castle they are impregnable. The very essence of robust!

And then comes along the “black swan” event… something we haven’t anticipated, a failure mode we can’t have foreseen, a cascade of errors that we did not plan for.

Just as our predecessor, the medieval castle owner, didn’t foresee the invention of gunpowder and cannons that reduce his impregnable castle to rubble. Just like the builders of the Maginot Line didn’t anticipate the invention of Blitzkreig and mechanised warfare nor the defenders of the dams of the Ruhr Valley a bomb that bounces.

This is the key message of Taleb’s book and Jez’s post – that the “robustness” mindset often leads to a resistance to change. As Jez explains in the context of organisations:

“The problem with robust organizations is that they resist change. They aren’t quickly killed by changes to their environment, but they don’t adapt to them either – they die slowly.” – Jez Humble

A castle is robust… but it’s fixed, immobile, and its very robustness to “normal” assaults reduces the incentives to change and adapt.

Anti-fragile

Contrast these to the “anti-fragile” system (or organisation) typified by the Borg Collective. The Borg seek out new life and new civilisations to assimilate into the Collective in order to improve.

With each change and adaptation the system (the Collective) becomes more resilient – it improves as the result of the external stress (the essence of an adaptive, evolutionary system).

Anti-fragile organisations seek out and embrace change – they are inherently “outward-focused” and seek to be continually learning, adapting and assimilating (not hiding behind the walls of their castle, content in their robust impregnability).

Likewise, DevOps seeks to be “anti-fragile” by embracing change (and disorder a la Chaos Monkey) whilst incorporating feedback mechanisms (the “3rd Way” of DevOps) to ensure that learning is correctly assimilated.

The DevOps mindset encourages continual learning; through experimentation and collaboration in order to seek to improve the current system as opposed to a codified mindset of a fixed position of “one way of doing things” implied in a formulaic, rigid ITIL worldview.

In this way DevOps encourages what Schumpeter called “creative destruction” – clearing out the old to make way for the new (and hopefully improved) system.

Summary

I’ve summarised these 3 points of the triangle into the following table;

 

Fragile

Robust

Anti-Fragile

Icon

Humpty Dumpty

Medieval Castle

The Borg

Methodology

“Spaghetti”

ITIL

DevOps

Attitude to change

Fear Change

Resist Change

Embrace Change

Response to change

Break

Repel

Adapt

Rate of Change

Ideally never!

Slow

Rapid

Change initiated by

Needs CEO approval

Change Management Board

User-initiated
(via automation)

Focuses on

Survival

Process*

Business Value

* Yes, I know that ITIL v3 in particular *IS* in theory very focused on business value and benefits realisation BUT in my experience the end result of an “ITIL implementation” is often the triumph process over outcome.

If anyone has any ideas for more rows to add to the table please let us know on the comments!

-TheOpsMgr

20 thoughts on “DevOps, Antifragility and the Borg Collective”

  1. Yes that is precisely what the diagram means. I wish I was familar enough with popular media to know what the Borg Colective is but I’llgo look it up :)
    You had me right down to the last line of the table. I’ve written before about how it is unfair to compare worst-case ITIL with best-case DevOps. They both seek value, it is how they seek it that differs. in your table I would say DevOps seeks automagic, the magical solving of problems with technology. Thta is a crude negative characterisation of DevOps analogous to saying ItTIL seeks process.
    IT exists for two reasons: to protect and serve. ITIL over emphasises the protect, DevOps the serve.

  2. ITIL (like Agile, Lean, etc) is a framework and if it is too rigid for your organisation, then that’s a result of the implementation, not the framework. The implementation of these frameworks should also be re-assessed to ensure they are assisting in generating business value. Like a car, if you don’t maintain/tune it, it will run poorly and cost you time and money down the track (technical debt). ITIL highlights this as Continual Service Improvement – a point that missing in the post by highlighted for DevOps. I agree with itskeptic that that it is unfair to make claim that all changes must go through a CAB. In my organisation, only a small percentage do and the vast majority are standard or low risk changes only needing team leader approval. We’ve supporting Continous Delivery by providing scripts to automate this with our ITSM tool.

  3. Thanks for putting into words a nagging doubt I have had about ITIL since starting to explore devops concepts. Great article (if you get the star trek reference :-))

  4. Thanks for the feedback Rob, Ian & skipster2k2.

    I totally agree I am not being very fair on ITIL in that last line either – as I said “* Yes, I know that ITIL v3 in particular *IS* in theory very focused on business value and benefits realisation BUT in my experience the end result of an “ITIL implementation” is often the triumph process over outcome.”.

    However in my personal experience ITIL shops tend to get captured by the “process police” – those people who for reasons of training or personality put adherence to processes over business outcomes.

    But I take Ian’s point in that in a well-tuned ITIL shop you should not have onerous process because it should be weeded out by CSI.

    Sadly, again in my experience, CSI is (1) under-resourced and tend to get ignored in order to keep fighting fires and (2) CSI itself is seen “as a change to be resisted”.

    I’ve seen people “fight to the death” to defend a process that is obviously not working nor delivering business value.

    That said, DevOps is immature and has a lot to learn from ITIL, and perhaps lends itself to a too laissez-faire approach that can destroy just as much business value as over-rigid adherence to a process overloaded ITIL approach.

    thanks again for your feedback!

    1. If devops is as widely adopted as itil in 10 years time – which is not a given – then it will be just as debased in the execution with just as many horror stories. Right now it is still in the honeymoon period, on top of the Hype Curve, executed mainly by passionate true believers. You wait.

    2. The most important thing about ITIL and that is that the outcome is the goal that the processes are there to ensure occur within expectations assuming reasonable parameters. Anything else is not and should be called ITIL.

  5. Nice post!
    I am currently working on an organisation that decided to implement ITIL and Continuous Delivery at the same time. What this post says is exactly what I feel, but I must admit that I am like a DevOp and I am not very involved with the guys in charge of ITIL implementation. Give me a couple of months and I will tell you how this evolves.

  6. I will immediately snatch your rss feed as I can not in finding your email subscription hyperlink
    or newsletter service. Do you have any? Kindly permit me
    recognise in order that I may subscribe. Thanks.

  7. I realise this is somewhat old post now but I just wanted to comment on the ITL/DevOps reconciliation and the medieval castle that resist change analogy. Whilst it’s often the case that ITIL appears to resist change it is not actually ITIL to blame but the people, i.e. the practitioners implementing it, and I have a little anecdote to illustrate it.

    I’m an Ops guy. I have worked in ISPs and am now in gaming. I do all the DevOps type-stuff and have actually been doing it since before it had that name I think. Many moons ago, back in 2004 I joined a global household name ISP in their UK office. We were in the process of building a new backend broadband provisioning and management system that was gearing up for the Local Loop Unbundling project (that was another system we subsequently built).

    We were also using some Agile evangelist from the development side and were implementing continuous integration/delivery with fast iteration etc. Part of that work also involved me becoming “embedded” with the delivery aspect of the system to collaborate and ensure that from an Ops point of view we could deploy fast, repeatable etc.

    A tool had been developed (shell scripts and function mainly, later also a bit of python) that would build the entire software stack on a target system wholesale. Not just deploy the code (java) but also build the application server, the proxied web servers, run some smoke tests and then, in production, flip traffic over too it with the ability to roll it back quickly too.

    What does this have to do with ITIL? Well we were also implementing ITIL and, not long after a sale of the UK part of the business and the beginning of long 3 year transition programme we got a new ITIL Change Manager. All previous Change Manager had been anal about the process, the process was King, no deviation etc etc.

    The new Change Manager however had a continual saying from Heroclitus that “the only thing that is constant is change”. The implication of which being that you can’t resist change you must embrace it because it is the only thing that is always happening.

    So she built a process, and one of her starting points for any process building was, I guess, part of the “First Way”. She wanted to understand how the entire system and systems hung together. The touchpoints between them. In other words, she wanted to know the answer “if work is carried on X what other systems could it impact”. She then looked at the tools and how we did things.

    The result was what I would call a “DevOps ITIL change process”. It worked by being flexible and accepting the fact that the process was there to be moulded to aid delivery not to stop people doing work. So we had the concept of “pre-approved changes”, this included deployments of rapid iteration bugfixes that used our ever growing deployment toolset. The tools were trusted so why bother with a full on CAB, she got the business to buy into the fact that Ops and Dev knew what they were doing and unless it was some massive infrastructure change, or something on the “fragile” systems then JFDI.

    There would be Post Implementation Reviews and retrospectives to ask if anything went wrong and if it did what could we do to stop it happening next time. We’d then go off, implement any new fixes and carry on.

    Sadly it didn’t remain like this as the UK business was bought by Waterfall junkies beholden to a large contract suppliers of other systems that meant creaking releases and so on. The lights were kept on the other systems which continued to be deployed too rapidly with little to know downtime thanks to the tools, whilst the other “new” systems regularly fell over when they did massive monthly deployments where they knowingly introduced more bugs than they were fixing.

    Eventually, sometime after I left, they made the Change Manager redundant as they were moving to the ITIL world you describe above where form filling in was King and productivity was dead.

    So what’s my point? ITIL is not the resistor of change per se, it’s the people that do ITIL that do that, but occasionally you get an ITIL practitioner that “gets it”, as I experienced when I was so often reminded that “change is the only constant”.

    I suppose at this point I should mention that subsequently the Change Manager and I had a little boy and are getting married next year.

  8. For antifragile instead of the borg recommend nature. Evolution/adaptation core to antfragility.

    This works particularly if you wish to include grand narrative of progress in antifragility concept.

    Otherwise hunter-gatherer societies that continue to exist to this day are a good example of antifragility too.

  9. The Anti-fragile concept is daft. Google “antonym of fragile” and you immediately get “robust”, so it does not even make sense.

    The core idea that ITIL is all about rigidly protecting production is as absurd as saying Agile Development is all about lashing it together and sod the quality, just do it as quickly as possible.

    I think this is essentially misleading, and shows a basic misunderstanding of what ITIL v3 is. ITIL provides a suggested framework on how to manage the application/service lifecycle. It does not proscribe anything at all, and so if someone decides to make the transition from dev to ops very rigid, it was their decision, not ITIL’s.

    ITIL neither allows nor disallows a piece of code to be transitioned entirely automatically from dev to ops. It just lists the suggested steps to be taken. Whether they are automated or not is up to the implementer.

    For me devops is simply another (loosely) defined framework for transitioning from dev to ops in a controlled but highly automated manner. Which you can do in ITIL as well.

    What is absolutely right is that developers should have a better understanding of the requirements of operations, and vice-versa. That way developers can design for operations, and operators know what they are actually looking after, and can feedback useful data to the designers. The transition and improvement processes are sped up immensely.

    What is true is that ITIL is often rigidly implemented by people with a poor understanding of what it really is, and with little idea of the real world of ops and dev. But that’s usually driven by the environment, and the framework of choice is irrelevant. Banks and Hospitals are very rigid, the consequences of cock-ups are potentially huge. They want robust. But robust does not have to mean safe but slow, in the same way that Agile does not have to mean fast but flaky.

Give us your thoughts!

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s