Category Archives: Culture

What does DevOps look like in UK job ads

#DevOps – Organic versus Transformational DevOps

Despite what some people seem to think there is more to DevOps than just Continuous Delivery and Infrastructure Automation with Puppet, Chef or Ansible.

To me, DevOps is “an alternative model for the creation of business value from the software development life-cycle that encompasses a product-centric view across the entire product life-cycle (from inception to retirement) and recognises the value in close collaboration, experimentation and rapid feedback”.

Moving from one model of value creation can either be an organic process or a transformational one – you can “grow into” the new model or you can plan a strategy of change to transform your organisation from one to the other.

It’s in this “Organic DevOps” versus “Transformational DevOps” that I see a growing disconnect between different sectors of the DevOps community, particularly between “DevOps for Start-ups” and “DevOps for Enterprise”.

IMHO, “Start-Ups DevOps” normally follows the “organic DevOps” path – you’re often starting from a relatively “greenfields” approach based on a cloud infrastructure. You probably already have a very close, collaborative culture because there’s only 20 of you anyway and you all work in the same office and you spend 18hrs a day there. Automation is part of your DNA because you’ve never had the staffing levels to do it manually.

“Enterprise DevOps” is normally “Transformational DevOps” – you have large, distributed IT teams that cross geographic locations, time-zones and probably organisational boundaries (due to outsourcing). You have extensive legacy applications and infrastructure estates (JVM 1.4 on Tomcat 5 anyone?) and you’re likely to have well developed Prince2/ITIL/SixSigma delivery models rigidly enforced by a centralised command&control mindset, backed by an army of highly-paid consultant from the Big 5 telling your CEO, CIO and CTO the best way to manage their IT budget.

Moving an enterprise to DevOps via a transformation programme is a very different challenge to introducing DevOps concepts into a receptive start-up and watching them grow organically, and the DevOps community needs to make sure that when it’s evangelising DevOps to the world that it’s aware of the differences and challenges inherent in each approach.

If you want to debate this idea of “Start-up Organic versus Enterprise Transformational DevOps” we’re taking part in a Webinar tonight with the great folks over at ScriptRock that’s focussing on Enterprise DevOps. It’s at 1900 BST,  11:00am PT / 2:00pm ET (60 minutes).

We’d really like to get your thoughts on this by asking a question on the webinar or by leaving a comment below as these concepts are still experimental and, just like DevOps itself, the faster we get feedback and the more we iterate around the concept the stronger it will be!

http://info.scriptrock.com/devops_webinar-2

Enterprise DevOps Webinar
Enterprise DevOps Webinar

 

 

A scientific basis for #DevOps success?

A fascinating TED talk from “Predictably Irrational” author Dan Ariely has some interesting pointers to some of the underlying psychological mechanisms that make the DevOps model a better way to structure work within IT departments.

“we care much more about a product if we’ve participated from start to finish rather than producing a single part over and over.”

The accompanying article lists 7 insights into our motivations when performing tasks (and the research that supports those insights).

  1. Seeing the fruits of our labour may make us more productive
  2. The less appreciated we feel our work is, the more money we want to do it
  3. The harder a project is, the prouder we feel of it
  4. Knowing that our work helps others may increase our unconscious motivation
  5. The promise of helping others makes us more likely to follow rules
  6. Positive reinforcement about our abilities may increase performance
  7. Images that trigger positive emotions may actually help us focus

Many of these tie directly back to key DevOps principles.

For example the “First Way of DevOps” encourages “systems thinking” which relates directly to #1 above – if we are looking at the entire system (not just our small part) we will inherently be looking at the “fruits of our labour”.

Similarly fostering a team-based DevOps culture where we can see “how our work impacts on others” is closely aligned with #4.

For me, #2 and #6 tie directly back to “leadership” (as opposed to “management”). Good leaders know that praise (either private 1:1 praise with individuals or public praise in front of the team) can have a huge impact on morale, with a subsequent impact on productivity and quality.

It’s fascinating to see how behavioural science is increasing our understanding of human motivation. The challenge for us in the DevOps movement is to take these science-based insights and see how we can apply them with our teams to create a better way of working.

DevOps 101 for Recruiters

We put together this “DevOps Intro” to the recruitment team at a London recruitment consultancy to help them understand the DevOps Market place.

The goal was to help them understand both the WHY and the WHAT of DevOps and what that might mean for recruiters.

Hopefully you will find this interesting as well!

7348317140_e30eeda034

DevOps and the Product Owner

In a previous post we talked a lot about the “Product-centric” approach to DevOps but what does this mean for the role of the Agile “Product Owner”?

So what is the traditional role of the Product Owner? Agile author Mike Cohn from MountainGoat Software defines it thus:

“The Scrum product owner is typically a project’s key stakeholder. Part of the product owner responsibilities is to have a vision of what he or she wishes to build, and convey that vision to the scrum team. This is key to successfully starting any agile software development project. The agile product owner does this in part through the product backlog, which is a prioritized features list for the product.

The product owner is commonly a lead user of the system or someone from marketing, product management or anyone with a solid understanding of users, the market place, the competition and of future trends for the domain or type of system being developed” – Mike Cohn

The definition above is very “project-centric” – the Product Owner’s role appears to be tied to the existence and duration of the project and their focus is on the delivery of “features”.

DevOps, conversely, asks us (in the “First Way of DevOps”) to use “Systems Thinking” and focus on the bigger picture (not just “feature-itis”) and the “Product-centric” approach says we need to focus on the entire lifecycle of the product, not just the delivery of a project/feature/phase.

Whilst decomposing the “big picture” into “features” is something we completely agree with, as features should be the “unit of work” for your Scrum teams or “Agile Software Development Factory”, it needs to be within the context of the Product Lifecycle (and the “feature roadmap”).

So the key shift here then is to start talking about the “Product Lifecycle Owner”, not just the Product Owner, and ensure that Systems Thinking is a critical skill for that role.

The second big shift with DevOps is that “Non-Functional Requirements” proposed by Operations as being critical to the manageability and stability of the product across its full lifecycle “from inception to retirement” must be seen as equally important as the functional requirements proposed by the traditional Product Owner role.

In fact, we’d like to ban the term “Non-Functional Requirements” (NFR’s) completely, as the name itself seems to carry an inherent “negativity” that we feel contributes to the lack of importance placed on NFR’s in many organisations.

We propose the term “Operational Requirements” (OR’s) as we feel that this conveys the correct “product lifecycle-centric” message about why these requirements are in the specification – “This is what we need to run and operate this product in Production across the product’s lifecycle in order to maximise the product’s likelihood of meeting the business objects set for it”.

We propose the term “Operational Requirements” (OR’s) as we feel that this conveys the correct “product lifecycle-centric” message about why these requirements are in the specification.

For the slightly more pessimistic or combative amongst you the “OR” in Operational Requirements can stand for “OR this doesn’t get deployed into Production…” .

The unresolved question is do we need an “Operational Product Owner” or does the role of the traditional, business-focussed Product Owner extend to encompass the operational requirements?

You could argue that the “Operational Product Owner” already partly exists as the “Service Delivery Manager” (SDM) within the ITIL framework but SDM’s rarely get involved in the software development lifecycle as they are focussed on the “delivery” part at the end of the SDLC. Their role could be extended to include driving Operational Requirements into the SDLC as part of the continual service improvement (CSI) process however.

That said, having two Product Owners might be problematic and confusing from the Agile development team perspective so it would probably be preferable if the traditional Business product owner was also responsible for the operational requirements as well as the functional requirements. This may require the Product Owner to have a significantly deeper understanding of technology and operations than previously otherwise trying to understand why “loosely-coupled session state management” is important to “horizontal scalability” might result in some blank faces!

So in summary a “DevOps Product Owner” needs to:

  • Embrace “System Thinking” and focus on the “Product Lifecycle” not just projects or features
  • Understand the “Operational Requirements” (and just say “No to NFR’s”!)
  • Ensure that the “OR’s” are seen as important as the “Functional Requirements” in the Product roadmap and champion their implementation

In future posts we’ll examine the impact of DevOps on other key roles in the SDLC & Operations. We’ve love to get your opinions in the comments section below!

-TheOpsMgr

image source: – CC  - CannedTuna via Flickr - http://www.flickr.com/photos/cannedtuna/7348317140/sizes/m/

5401494223_e339c6425a

DevOps and the “Product-centric” approach

When doing the research for our BrightTalk Webinar on DevOps I came across this quote from Jez Humble on “products not projects”, which really struck a chord with our thinking about what we call the “product-centric” approach to DevOps.

Products not Projects quotation

Figure 1 – DevOpsGuys BrightTalk Webinar

One of the key elements of DevOps is to ensure that IT strategy is directly linked with Business strategy.

One of the critical scenes in “The Phoenix Project” is where our hero Bill gets to meet with the CFO and they work through the list of “pet IT projects” in the organisation and find that many of them can’t really be tied back to any business strategy or organisational goal.

This is, we feel, one of the major problems with the “project-centric” view of IT and why we need to push for a “product-centric” view.

The “project-centric” view, whilst important for mobilising resources and organising activities, can easily get disconnected from the original business objectives. Any “Project”, just like the “Phoenix Project” in the novel, runs the risk of becoming its own “raison d’etre” as it becomes more about getting “the Project” over the line than whatever business benefits were originally proposed.

In contrast a “Product-centric” viewpoint is focussed on the Product (or Service) that you are taking to market. By keeping the focus on “the Product” you are constantly reminded that you are building a product, for customers, as part of an organisational objective that ties back to over-arching business strategy.

For example if you were in the travel sector you might be adding a new “Online Itinerary Manager” product to your website to enable your customers to view (and possibly update) their itinerary online as part of your business strategy to both empower customers online and reduce the number of “avoidable contacts” to your call centre (and hence reduce costs).

One of the other benefits of the “product-centric” view is also highlighted in Jez’s quote above – “From Inception to Retirement”.

The “product-centric” view reminds us to think about the “Product lifecycle” and not just the “software development lifecycle”.

The “product-centric” view reminds us to think about the “Product lifecycle”
and not just the “software development lifecycle”.

You need to understand how this product is going to be deployed, managed, patched, upgraded, enhanced and ultimately retired… and that means you need close cooperation between the business, the developers and operations (= DevOps!).

So how do you introduce the “Product-centric” view into an organisation that might already have existing website that offers products/services to customers?

Well, firstly, you need to stop referring to it as “the website”.

A “website” is a platform and a channel to market, it’s not a “product”.

A product is a good or service that you offer to the market in order to meet a perceived market need in the hope that you will in return received an economic reward.

For some “websites” there might be a single product e.g. Dominos sells pizza online (food products) and for other there might be multiple products e.g. theAA.com sells membership (breakdown services), Insurance, Financial Services, Driver Services (mostly training) and Travel Services. If you’re ever in doubt on the products your website sells your top navigation menu will probably give you a pretty good indication!

What Products does the AA Sell?

Figure 2 – what products does the AA sell?

It’s worth mentioning that the AA has another product too – the “RoutePlanner”. Although the route planner is “free” is has a value to the organisation both indirectly (by drawing traffic to the site that you might then cross-sell too) and intangibly (by offering a valuable service for “free” it enhances the brand).

Secondly, you need to identify the “product owners” for each of the core products that use your website as a channel to market, so in our AA example you’d probably have separate product owners for Membership, Insurance, Finance etc.

If you’re ever in doubt about who is, or isn’t, the right product owner then there is a simple test – “Do you have a significantly financial incentive (bonus) for Product “X” to succeed in the market?”.

The “Product Owner” Test:
“Do you have a significantly financial incentive (bonus) for Product “X” to succeed in the market?”

If the answer is “no” keep going up the hierarchy until you find someone who says “yes”. A product owner who doesn’t have any “skin in the game” regarding the success of their product line is a bad idea!

Thirdly, you need to work with the product owner to map out the product lifecycle of that product, and then identify the IT dependencies and deliverables at every stage along that product lifecycle. Within your product lifecycle you might also want to map out the “feature roadmap” if your product devolves into multiple features that will be released over time. Creating the “big picture” can be vital in motivating your teams and helping them to understand what you’re trying to achieve, and this in turn helps that to make better decisions.

Fourthly, you need to “sense check” your product lifecycle and feature roadmap against your business strategy and organisational goals. If they don’t align you either need to re-work the plan or you might decide to drop the product altogether and re-deploy those resources to a product that *IS* part of your core strategy.

Lastly you need to re-organise your DevOps teams around these products and align your delivery pipeline and processes with the product lifecycle (and feature roadmap). Your DevOps teams are responsible for the “inception to retirement” management of that product (*Top Tip – just like your “product owner” it might be a great idea to incentivise your DevOps teams with some “product success” (business) metrics in their bonus, not just technical metrics like “on-time delivery” or “system availability”. It never hurts for them to have some “skin in the game” to promote a sense of ownership in what they are delivering!).

So, to summarise, the key elements in a “product-centric approach” are:

(1)    Breakdown your “website” into the key “Products” (that generate business value)

(2)    Identify “Product Owners” with “skin in the game”

(3)    Map out the product lifecycle (and ideally feature roadmap) for each product with them

(4)    Sense check this product strategy with your organisational strategy

(5)    Align your DevOps teams with the core products (and incentivise them accordingly!)

So next time you’re in a meeting and someone proposes a new “Project” see if you can challenge them to create a new “Product” instead!

 

image source: - CC  - jeremybrooks via Flickr - http://www.flickr.com/photos/jeremybrooks/5401494223/sizes/s/
shadowitmain

Is DevOps your defence against Shadow IT?

We’ve been evangelising DevOps quite a bit lately amongst customers and partners and one of the arguments that seems to resonant the most with people about why the current paradigm of IT Development and Operations is “broken” is around the rise of “Shadow IT”.

“Shadow IT is a term often used to describe IT systems and IT solutions built and used inside organizations without organizational approval. It is also used, along with the term “Stealth IT,” to describe solutions specified and deployed by departments other than the IT department.” – Wikipedia

“Shadow IT” is nothing new – it’s been around pretty much since the invention of the PC and client/server computing – but what is new is the speed and ease at which “Shadow IT” can be deployed, and the performance, reliability and stability of that “Shadow IT” solution.

“Sally from Marketing”, armed with nothing more than a company credit card, can instantiate an arbitrary number of servers, of varying capacity and specification, from a wide variety of Cloud hosting providers. Depending on the quality of the internal IT and her choice of Cloud provider it’s quite possible that she will get better uptime and performance from her Shadow IT solution than what she might get internally.

She can then find a 3rd software house to write some bespoke software (and then someone like DevOpsGuys to deploy and manage it… Oooopps!) and her “Shadow IT solution” is up and running (not to mention the many other SaaS solutions that she could consume).

shadowit

Figure 1 – DevOpsGuys – BrightTalk Webinar

In our recent BrightTalk webinar we spoke about how Gartner predicts that “Shadow IT” is expected to grow, and that there is some evidence (the PwC survey) that there is a negative correlation between “IT control” and organisational performance.

So, in summary, the traditional silo-mentality model of IT clearly isn’t meeting the customer’s needs for flexibility, innovation and time-to-market, and Cloud Computing is enabling the growth of “Shadow IT” on a scale never seen before.

To take a military analogy this is like the invention of highly mobile, mechanised “manoeuvre warfare” during WW II. The entrenched positions of the Maginot Line (think “traditional IT departments”) were rendered irrelevant by the “blitzkrieg” tactics (think “Shadow IT”) of the Wehrmacht as they simply bypassed the fixed fortifications with their more manoeuvrable, dare I say “agile”, mechanised infantry.

What is particularly fascinating, if Wikipedia can be believed, is that “blitzkrieg”, contrary to popular belief, was never a formal warfighting “doctrine” of the German Army (emphasis mine):

“Naveh states, “The striking feature of the blitzkrieg concept is the complete absence of a coherent theory which should have served as the general cognitive basis for the actual conduct of operations”. Naveh described it as an “ad hoc solution” to operational dangers, thrown together at the last moment”  – Wikipedia

An “ad-hoc solution” to operational dangers, thrown together at the last moment” is probably a pretty good definition of “Shadow IT” too but the important fact to remember is that Blitzkrieg worked (whether it was a formal doctrine or not). It crushed the opposition and subsequently became the cornerstone of modern military “combined arms” doctrine.

So, what’s this got to do with DevOps?

Well, clearly Gartner think that “Shadow IT” is working well too, and will continue to “outflank” traditional IT Departments.

Our view is that DevOps can be seen as the perfect defence to “Shadow IT” as it co-opts many of the key “manoeuvre warfare” concepts to provide the user with the speed, flexibility and time-to-market they want, but still within the control of IT to ensure standards, security and compliance.

DevOps, by breaking down the silos between Development and Operations, seeks to create unified cross-functional teams organised around specific objectives (ideally specific products that generate value for your organisation).

Compare this to the Wikipedia definition of “combined arms” doctrine” (bold emphasis mine):

“Combined arms is an approach to warfare which seeks to integrate different combat arms of a military to achieve mutually complementary effects (for example, using infantry and armor in an urban environment, where one supports the other, or both support each other). Combined arms doctrine contrasts with segregated arms where each military unit is composed of only one type of soldier or weapon system. Segregated arms is the traditional method of unit/force organisation, employed to provide maximum unit cohesion and concentration of force in a given weapon or unit type.”

Let’s paraphrase this for DevOps…

“[DevOps] is an approach to [IT Service delivery] which seeks to integrate different [technical silos] of a [IT Department) to achieve mutually complementary effects (for example, using [Development] and [Operations] in an [e-commerce] environment, where one supports the other, or both support each other). [DevOps] doctrine contrasts with [Traditional IT] where each [IT Team] is composed of only one type of [technical specialist] or [Technology] system. [Traditional IT] the traditional method of [team/department] organisation, employed to provide maximum [team] cohesion and concentration of [technical expertise] in a given [technology] or [team] type.”

That seems to be a pretty good definition of DevOps doctrine, to me!

The only true defence to “Shadow IT” is to offer a level of service that meets the internal customer needs for speed, flexibility and time-to-market they want. If they can get it “in-house” then the impetus to build a “Shadow IT” organisation is reduced.

The best way to deliver this level of service is, in our view, to adopt the lessons of Blitzkrieg and “combined arms” doctrine as embodied within DevOps by “integrating different teams… to achieve mutually complementary effects” and leveraging new technologies (like Cloud, APM and continuous delivery) to ensure ability AND stability.

image source: - CC  - faungg's via Flickr - http://www.flickr.com/photos/44534236@N00/2461920698
AdamSmith

DevOps, Adam Smith and the legend of the Generalist

Is DevOps at risk of being misinterpreted as “everyone should be able to do everything”? 

That somehow DevOps means you should be able to switch seamlessly between Development and Operations tasks, writing some Java application code in the morning, re-configuring the SAN in the afternoon and finishing off with a bit of light firewall maintenance in the evening ?

Take this great quote;

I’ll tell you EXACTLY what DevOps means.

DevOps means giving a shit about your job enough to not pass the buck. DevOps means giving a s**t about your job enough to want to learn all the parts and not just your little world.

Developers need to understand infrastructure. Operations people need to understand code. People need to f***ing work with each other and not just occupy space next to each other. -  John E. Vincent (@Lusis)

Whilst we whole heartedly agree with the sentiments in the quote I think that there is a risk that the phrase “need to know” about Development or Operations will be misinterpreted by some as “we need generalist staff who can do it all”.

Sadly, this would be a tragic mistake for several reasons but first and foremost because division of labour exists for a reason.

There is literally hundreds of years of economic theory, dating back to Adam Smith’s “Wealth of Nations” in 1776 and beyond, that shows how & why division of labour increases overall productivity.

Whether it’s making pins, cars or software applications the correctly co-ordinated activity between a sequence of specialists will massively increase overall productivity. In the context of making pins Smith quoted an increase in productivity of 24,000% (in comparison to one person performing all of the tasks required to make a pin).

Note that there is a critical caveat in the paragraph above – “correctly coordinated”.

Poorly coordinated activities between specialists impedes productivity and creates bottlenecks, mistakes and re-work. Reinert’s ideas of “flow” in product development, Goldratt’s Theory of Constraints (TOC) and the whole Lean movement are about ensuring that activities in the “value chain” are correctly aligned and smoothly flow from one stage (or specialism) to another to ensure that maximal productivity and value are achieved.

So what does this mean in terms of DevOps and the quote above?

Coordination requires effective communication so I think that the quote above could be re-stated to say “need to know enough about Infrastructure/software development in order to be able to communicate effectively with other specialists”.

For example, Developer’s need to know enough about infrastructure and web operations to understand why a scalable user session state management solution is critical when running a distributed system because “just running it all on one box like we do in my dev environment” isn’t a good solution. Conversely Operations need to know enough about the language stack they support to be able to read a stack trace as part of their troubleshooting workflows to help identify the root cause of a problem.

So whilst hopefully agreeing that DevOps doesn’t mean the end of specialists performing specialized skills and ergo the creation of hybrid generalists it’s worth mentioning that Adam Smith did have some warnings on the downsides of over-specialization that have some resonance for DevOps:

“The man whose whole life is spent in performing a few simple operations, of which the effects are perhaps always the same, or very nearly the same, has no occasion to exert his understanding or to exercise his invention in finding out expedients for removing difficulties which never occur. He naturally loses, therefore, the habit of such exertion, and generally becomes as stupid and ignorant as it is possible for a human creature to become. The torpor of his mind renders him not only incapable of relishing or bearing a part in any rational conversation, but of conceiving any generous, noble, or tender sentiment, and consequently of forming any just judgment concerning many even of the ordinary duties of private life” – Adam Smith (Wealth of Nations)

Alexis de Tocqueville agreed with Smith:

“Nothing tends to materialize man, and to deprive his work of the faintest trace of mind, more than extreme division of labour.” Alexis de Tocqueville

Smith warned that workers “become ignorant and insular as their working lives are confined to a single repetitive task” which is the silo-oriented mental state that @Lusis rails against in the opening quote when he says DevOps “means giving a shit about your job enough to not pass the buck”.

DevOps “First Way” emphasises “Systems Thinking” – looking at the larger system as a whole not just your narrow silo view – so the risks raised by Smith and De Tocqueville are valid concerns for DevOps.

So there is a tension here… we need the productivity of specialisation but we don’t want the insular worldview of “extreme division of labour”.

Scrum in Agile software development addresses similar concerns by emphasising cross-functional teams, “Individuals and interactions over processes and tools” and generally focussing on high-bandwidth communication as a way to tear down silos.

DevOps can (and should) emphasise the same solutions to resolve this tension.

hacker

DevOps, Death Marches and the Hero Hacker Myth

I’ve been thinking a lot about DevOps culture over the last few weeks which is probably why my ears pricked up when I overheard someone telling a story about an IT project they’d worked on where they did “10 weeks straight, 7 days a week” in order to get the project in on time.

These types of stories are a key part of any organisations culture – they form part of the mythology that defines the cultural norms and “the way things are done” within the organisation and can tell you a lot about its values.

In this case the speaker, judging by his tone of voice, was very proud of the effort they’d put in and the fact they’d managed to “pull it out of the bag” with such a super-human effort.

After all, the project was a success, right?

Death Marchs

My perception, from the outside, was a bit different. To me it sounded like a “death march project”, which Ed Yourdon defines as:

“A death march project is one for which an unbiased, objective risk assessment (which includes an assessment of technical risks, personnel risks, legal risks, political risks, etc.) determines that the likelihood of failure is ~ 50 percent.” – Edward Yourdon

Wikipedia goes on to say:

Often, the death march will involve desperate attempts to right the course of the project by asking team members to work especially grueling hours (14-hour days, 7-day weeks, etc) or by attempting to “throw (enough) bodies at the problem”, often causing burnout.
- Wikipedia “Death march (project management)

Sound familiar?

Hero Hacker Myths

Closely related to the “Death March” is the “Hero Hacker Myth”, neatly described here by Rob Mee:

“The myth of the hero hacker is one of the most pervasive pathologies to be found in Silicon Valley start-ups: the idea that a lone programmer, fuelled by pizza and caffeine, swaddled in headphones, works all hours of the night to build a complex system, all by himself. Time out. Software development, it turns out, is a team sport. All start-ups grow, if they experience any meaningful success. What works for a lone programmer will not work in a company of 10. And what’s worse, encouraging the hero mentality leads to corrosive dysfunction in software teams. Invariably the developers who do a yeoman’s 9-to-5, week after week, cranking out solid features that the business is built on, lose out to the grasping egomaniacs who stay up all night (usually just one night) looking to garner lavish praise. Rather than reward the hero, it’s better to cultivate a true esprit de corps.” – Rob Mee, Pivotal Labs

The bold and red highlighting in the quote above is mine because that sentence is central to the message of this post – if your organisational culture mythologises “Death March”-type projects or the “Hero Hacker” then you have a dysfunctional culture that needs to be fixed.

Once the Death March mythology is engrained into your organisational culture EVERY project will end up being a Death March. Why? Because if they’ve “succeeded” in the past (because of super-human effort on behalf of the project team) with a project that was only allocated 50% of the time, resources or budget it needed to have a realistic chance of success then that 50% becomes the new norm.

What manager is going to scope the next project, of equivalent size, with TWICE the time, resources or budget of their last Death March? It would probably be career suicide.

Of course, this is only possible because the hidden costs of Death March projects like staff turnover, sickness, burn-out, divorce, poor quality / huge technical debt, etc. don’t appear on the project’s balance sheet. The narrow ROI view of the project budget doesn’t include these intangibles which are normally shouldered by the wider IT Department budget, or borne solely by the individuals whose health, marriages and families are harmed in the process.

Similarly the Hero Hacker Myth is an enabler for the Death March by mythologizing the long hours and hero mentality that any Death March project requires to have any chance of success.

In addition, as in the highlighted section in Rob Mee’s post, the Hero Hacker Myth is corrosive to the idea of teamwork and shared responsibility that methodologies like Agile and DevOps seek to engender. Almost by definition the “Hero Hacker” is anti-social in his work practices and is likely to horde his knowledge in a way that hurts the team’s success.

DevOps

So what does this have to do with DevOps?

Well, if we view the Death March and the Hero Hacker through the lens of the “Three ways of DevOps” we can see that DevOps should stop Death Marches after a few miles and show the Hero Hacker the door.

The “First Way” emphasises “Systems Thinking” – seeing the big picture about how value is delivered to the organisation and the customer.

Systems thinking should encompass the intangibles discussed above and see that you can’t deliver sustainable long-term value on a foundation of projects that have a 50% chance of failure and that require super-human effort to have any chance to succeed.

It’s like tossing a coin and expecting it to come down heads time and time again – eventually it will come down tails and you’ll have a massive project failure that will probably destroy any value generated by your earlier “successful” Death March projects.

The “Second Way” says “Amplify Feedback Loops” – and the key way to do this is to shorten the feedback cycle by iterating faster. This is the essence of the “fail fast, fail often” mantra which is the antithesis of the long, destructive Death March. With short feedback loops the message that the goal is likely to be unachievable should be received faster.

If course, just because you get rapid feedback that your project is turning into a Death March (or that your Hero Hacker is on his 3rd all-nighter and is running out of Jolt Cola) doesn’t necessarily mean anyone will head the message and take corrective action.

That’s where the Third Way of DevOps comes in – foster a “Culture of Continual Experimentation and Learning”. DevOps should foster a culture of learning from your mistakes, whether it’s your previous Death March or the fast feedback on your current project that you receive from amplifying your feedback loops.

“Those who do not learn from history are doomed to repeat it” and the essence of the Third way is to create the “virtuous spiral” by taking the learnings and lessons from the previous iteration and applying them to the next cycle.

Similarly a culture of learning is ipso facto a culture of sharing which means the knowledge hoarding of the Hero Hacker should be anathema to a DevOps culture.

In summary, listen to the stories your organisation mythologises and see if those stories are taking your down the path of DevOps learning or yet another destructive Death March waiting for a Hero Hacker to save it…

DevOpsFragileBorg

DevOps, Antifragility and the Borg Collective

Whilst researching how to reconcile ITIL with DevOps I came across this interesting blog post from the IT Skeptic entitled “Kamu: a unified theory of IT management – reconciling DevOps and ITSM/ITIL”. This lead me to Jez Humble’s post on “On Antifragility in Systems and Organizational Architecture” referencing Nicholas Taleb’s book “Antifragile” and generally lead to a lot of intense cogitation on fragility versus robustness versus antifragility.

The IT Skeptic (Rob England) expands on his thoughts in this presentation which introduces this diagram below

Kamu: reconciling DevOps and ITSM/ITIL

However I struggled to mentally conceptualise the differences between the 3 points of the triangle until I came up with the following analogies (and please bear with me while I explain my thoughts behind them!):

  • Fragile = Humpty Dumpty
  • Robust = A medieval castle
  • Anti-fragile = The Borg collective

Fragile

“Fragile” systems are those (often legacy) systems that you really, really don’t want to touch if you don’t have to! Like Humpty Dumpty, one good push and all the King’s horses and all the King’s Men and a 24 hour round-the-clock marathon from the Ops team won’t get that pile of crap system up and running again.

It’s a snowflake – not documented properly, there are dependencies you can’t trace, the hardware’s out of warranty, the platform is 3 versions behind and can’t be upgraded because of some customisation that no-one understands, the code is spaghetti and the guy that wrote it retired last year.

We all know what fragile looks like!

Robust

“Robust” systems are those that have been through the ITIL life-cycle and for most of us they are probably our pride & joy. Monitored, instrumented and well documented with their own run book and wiki pages they are highly available with redundancy at every level we “know” they can withstand the slings and arrows of outrageous fortune.

Just like a medieval castle they are impregnable. The very essence of robust!

And then comes along the “black swan” event… something we haven’t anticipated, a failure mode we can’t have foreseen, a cascade of errors that we did not plan for.

Just as our predecessor, the medieval castle owner, didn’t foresee the invention of gunpowder and cannons that reduce his impregnable castle to rubble. Just like the builders of the Maginot Line didn’t anticipate the invention of Blitzkreig and mechanised warfare nor the defenders of the dams of the Ruhr Valley a bomb that bounces.

This is the key message of Taleb’s book and Jez’s post – that the “robustness” mindset often leads to a resistance to change. As Jez explains in the context of organisations:

“The problem with robust organizations is that they resist change. They aren’t quickly killed by changes to their environment, but they don’t adapt to them either – they die slowly.” – Jez Humble

A castle is robust… but it’s fixed, immobile, and its very robustness to “normal” assaults reduces the incentives to change and adapt.

Anti-fragile

Contrast these to the “anti-fragile” system (or organisation) typified by the Borg Collective. The Borg seek out new life and new civilisations to assimilate into the Collective in order to improve.

With each change and adaptation the system (the Collective) becomes more resilient – it improves as the result of the external stress (the essence of an adaptive, evolutionary system).

Anti-fragile organisations seek out and embrace change – they are inherently “outward-focused” and seek to be continually learning, adapting and assimilating (not hiding behind the walls of their castle, content in their robust impregnability).

Likewise, DevOps seeks to be “anti-fragile” by embracing change (and disorder a la Chaos Monkey) whilst incorporating feedback mechanisms (the “3rd Way” of DevOps) to ensure that learning is correctly assimilated.

The DevOps mindset encourages continual learning; through experimentation and collaboration in order to seek to improve the current system as opposed to a codified mindset of a fixed position of “one way of doing things” implied in a formulaic, rigid ITIL worldview.

In this way DevOps encourages what Schumpeter called “creative destruction” – clearing out the old to make way for the new (and hopefully improved) system.

Summary

I’ve summarised these 3 points of the triangle into the following table;

 

Fragile

Robust

Anti-Fragile

Icon

Humpty Dumpty

Medieval Castle

The Borg

Methodology

“Spaghetti”

ITIL

DevOps

Attitude to change

Fear Change

Resist Change

Embrace Change

Response to change

Break

Repel

Adapt

Rate of Change

Ideally never!

Slow

Rapid

Change initiated by

Needs CEO approval

Change Management Board

User-initiated
(via automation)

Focuses on

Survival

Process*

Business Value

* Yes, I know that ITIL v3 in particular *IS* in theory very focused on business value and benefits realisation BUT in my experience the end result of an “ITIL implementation” is often the triumph process over outcome.

If anyone has any ideas for more rows to add to the table please let us know on the comments!

-TheOpsMgr

4727475559_881dab57f3_q

DevOps – how to find the constraints in your IT services? Part I

One of the key quotes in “The Phoenix Project” was, for me, “Any improvement not made at the constraint is an illusion”.

“Any improvement not made at the constraint is an illusion”

The logic is clear – any improvements in delivery upstream from the constraint just causes more work to queue up at the bottleneck constraint, and any improvement downstream just means more idle time for the downstream resources as they wait for work to be released from the bottleneck.

So, how can you identify the constraint(s) in your IT services?

Well, in the Phoenix Project is obvious – the constraint is “Brent”, the uber-geek with his hands in every project and too much un-documented knowledge in head. However, in your organisation the constraints might be much more subtle and may need a more methodical investigation to uncover.

I am sure that the business process re-engineering and Six-Sigma experts out there have a wealth of methodologies to discover these constraints, but the purpose of this article is to outline a pragmatic approach, derived from first principles, that you can use to get started. (p.s. if you ARE a business process re-engineering and Six-Sigma expert please feel free to point us to relevant techniques and models via the comments!).

Where should you start? Clearly, your focus needs to be on those IT services (people, process and technology) that are key to your business success and contribute the most to your organisation’s strategic objectives. (BTW one of the other key lessons from the Phoenix Project was how many “mission critical projects” actually had no real linkage to the company’s strategic objects or business plans).

So, step #1 is to dig out your organisation’s annual report and most recent strategy presentation and make sure that YOU know clearly what the organisation’s goals are. This is your “BS filter” for the rest of the rest of the process… if someone complains about an IT service but can’t link that service or project back to a key business objective then put that one the bottom of the pile!

The next step is to do a bit of exploratory research with your key business users (note, business users, not IT staff!) which should quickly provide you with a list of “Top Ten IT services that get in the way of the business success”. This list might include services that you DON’T currently deliver e.g. “if we had a more flexible and responsive laptop support team e.g. “like a genius bar” where our mobile sales teams could get their laptop problems fixed fast and get back on the road selling to our customers.”.

Step #3 is to try to validate these subjective opinions with some objective data.

Now, assuming you have some type of helpdesk logging system you should have a ready source of data about the performance of your key IT services. If you DON’T have some form of logging tool then find a way to start logging your work, immediately, even if it’s just one big shared Excel spreadsheet. Personally, I like Service-now.com and have used that very successfully at a number of sites but the key point is “you can’t manage what you can’t measure” and to have any hope of demonstrating improvement you need to be able to measure the before & after impact of whatever changes you make.

“You can’t manage what you can’t measure”

What sort of things can you look at? Well, things like which tasks/services/processes are used the most? Take the longest? Breach the SLA’s the most (days overdue)? Who in your team has the most items assigned to them? Which systems/services have the lowest availability? All of these (and many others) should start pointing you in the right direction for things that need improvement.

So, now you have a list of the stuff that’s most important to the business, backed up with objective data about how frequently the issues might occur, how many users are impacted and that should enable you to make a first pass at what’s important and where to start first.

So, write them up on cards, stick them up on your Kanban Board, and move your top pick to “in progress” and start working on the detail of finding and fixing the constraints in that service.

How we’ll do that step will be in Part II!

-          TheOpsMgr

 

Photo: Flickr/bjornmeansbear