Outages occur from time to time in high-profile, critical services such as communications, finance and cloud provision. The recent events in the Blackberry network are the latest example. For obvious reasons, they always attract attention because they have a wide impact. The cause often seems to be that there has been insufficient disaster planning, a view supported by analysts such as Gartner. A technical or environmental problem can compromise the whole system.
I talked about poor disaster planning – and especially testing disaster recovery processes – in my previous blog, posted on 4th October. The Blackberry event got me thinking more about why it is that organisations are not as thorough as necessary when it comes to planning for the unplanned. It’s not a lack of understanding the consequences. Suppliers of such critical services are well aware of the importance of availability.
Possible explanations include wishful thinking (it can’t happen to me) and hubris (my plans are so good they can’t fail). But there seems to be another cause: a tendency to focus on the provision of new and expanded services while paying insufficient attention to the consequences for resilience.
CIOs and other senior management want to increase the agility of their IT systems, allowing them to expand the coverage of existing services and create new ones. This leads to increased complexity – more connections between systems both internal and external, for instance – resulting in a risk of instability.
However, commercial and other pressures mean that the time to deliver is shrinking. So the emphasis is on delivering the new, with less attention given to the consequences for overall resilience. And yet the need for robust systems increases along with service expansion – the organisation becomes ever-more dependent on its IT. If self-service is the business model, the systems delivering the service had better be there. If not, customers will go elsewhere.
What can be done to alleviate these problems? Unisys has developed a two-part approach to help its clients keep agility and resilience in step. The first part is a simple model for the state of an IT environment. Agility and resilience are represented by the degree of integration and level of automation respectively. Integration can range from little or none up to a complete SOA implementation. Automation extends from none at all to a fully-automated data centre, requiring no operator intervention. The state of any IT environment is somewhere in the space defined by the two dimensions of integration and automation.
The second part is the appraisal process. Unisys conducts a workshop with clients, using the model to define the current state of the environment and the desired future state, and hence what is needed to close the gap – and there often is a gap. Any lack of automation shows up clearly, so remedial action can be proposed.
Although the approach does not depend on any specific system types, appraisals have been carried out with a number of ClearPath clients. Among many beneficial results, the importance of keeping automation in step with integration has emerged as a clear theme. It’s not a case of either expansion or resilience but expansion and resilience – both are necessary.
To find out more, download the Understanding IT System State – Experiences from the ClearPath Appraisal Process white paper.