In my ‘planning for the unplanned’ pieces, I examined what IT organisations do – or all too often don’t do – to prepare for the unexpected and unpleasant event. It could be a sudden traffic surge, which overloads the systems. Or it could be something far worse, such as a flood which wipes out the data centre. How effective is the response likely to be? For systems that aspire to be called mission-critical, it has to be pretty comprehensive.
But what exactly do we mean by mission critical? I think it’s characterised by four attributes: availability, reliability, performance and security. Shortcomings in one or more of them seriously compromise the system concerned. And they are not independent of each other. Problems in one area may affect others – security violations lead to loss of availability, for example.
Let’s look at each attribute in turn.
Availability means that the system must be functional when required. The requirement depends on the business, which may only be critical for restricted times. An electronic stock exchange may operate from 08.00 to 17.00 on Monday to Friday. During that time, system availability is absolutely vital; at other times, it is not critical. Systems involved in seasonal activities may only be critical for part of the year. Systems recording and tracking fresh produce such as fruit are critical for the few weeks of the picking season, or the produce will be lost.
But 24 x 7 critical operation is becoming the norm. Obviously, systems used in response to emergencies must be available round the clock. Globalisation and rising user expectations are causing more and more commercial systems to require continuous availability.
Reliability goes hand in hand with availability. During the time that they are available, critical systems must not fail. And in the event of a (rare) failure, they must recover quickly. Not all failures are the fault of the system or its operators. External events leading to a data centre loss, for example, require an efficient disaster recovery process.
Performance, the third attribute, means consistently responding in the time required by the user. For interactive systems serving people, speed of response should be accompanied by minimal variation. Users quickly find uneven response times frustrating and will go elsewhere if there is an option to do so. Truly real-time systems, in process control for instance, impose more stringent conditions. Not responding quickly enough – including as a result of system failure – can be very expensive if continuous processes such as steel hot-rolling or aluminium smelting come to a stop. It may even be disastrous – think of reactor control in a nuclear power station!
The final attribute is security. Mission-critical systems must provide protection against data corruption or theft, and other forms of attack, and track violation attempts. Although the level required will vary according to the application, security is moving to the top of a list of CIOs’ concerns.
There are two key factors to consider in delivering mission-critical IT services: the system platforms used and they way they are deployed. As long experience has shown, ClearPath systems are engineered to deliver on all four attributes. As I discussed in an earlier blog post, The Benefits of Integrated Stacks, the integrated stack is in large part responsible for the quality.
But even the best systems have to be deployed correctly. Network design, appropriate provision for DR and, especially, high levels of automation, are required to capitalise on system qualities. And attitude and understanding play as much a part as technology in creating the right environment.