Ensuring Continuity of Services During Change Incidents

Friday, January 25, 2013

Bozidar Spirovski


Services are most vulnerable during change. Continuity of service needs to be ensured during change, and large portions of several ISO and BSI standards are focused on proper management of change.

However well controlled, an incident can occur during the change, thus causing failure of service. We will discuss the IT Service change planning process from a point of view of preventing unplanned downtime in case of problems.

Ensure continuity of IT Service planning process

Most major changes require some sort of planned downtime for an IT service. This downtime may be actually required by the change, or may be of a preventive nature, to reduce pressure to involved parties during the change.

But extending this downtime is very undesirable, and the effects of such extensions can range from customer dissatisfaction to regulatory or contractual penalties. In order to ensure continuity of service as planned, the following process can be applied:

  • Identify the time window available to apply the change – this is the time period of the ‘planned downtime’, or period during which the change will impact a minimal number of customers. Breaking this time window will put the service in an undesirable failed state for customers.
  • Have a very detailed plan of the change – Each step of the change needs to be described with actual actions, responsible persons and tasks.
  • Time every step and confirm- calculate the time required for each step of the plan. Lean towards pessimistic timing when unsure of actual time required.
  • Assess risks – Assess risks at each step, and identify mitigating measures. After that, identify which steps have remained critical and can cause significant problems and delays.
  • Define corrective measures – for each critical risk, define corrective measures and steps. For each corrective measure, calculate the time for each
  • Prepare a back-out plan – prepare a very detailed plan that will be applied in case there is an incident or problems during the change which prevent the change to be applied successfully in the defined time window of planned downtime. The back-out plan must include any activities that need to be performed by business stakeholders (for example, data entry due to restoring of an older database copy). We call this, the ‘if all else fails we need to keep working’ plan
  • Time the back-out plan – Calculate the time needed to implement the back-out activities
  • Check whether your plan and back-out plan fit within the time window of planned downtime – With all the elements prepared and timed, add up the timing. Your time window must accommodate the times of the plan of change as well as the back-out plan within the time window. The total timing window can be viewed in the diagram below
  • Calculate the point of mandatory back-out – In any change, should something go wrong, you can attempt to fix the issues as long as you still have enough time to back-out before the time window of planned downtime expires. In other words, the point of mandatory back-out is the moment at which you must start the back-out plan and be up and running.
  • Start implementing

Cross-posted from ClearMorning

Possibly Related Articles:
Enterprise Security
Information Security
Enterprise Security Business Continuity ISO Standards
Post Rating I Like this!
Rafal Los Bozidar,
I believe you are right, services are most vulnerable during change. What we must also consider is that in many cases, and more every day, outage windows are a relic of outdated processes... so then what?

Mikko Jakonen Good stuff. Yet again, important topic - though very difficult to adopt in real life.

I would like to rise a one important element for the discussion - the things that are considered or thought to be functioning, but actually never really touched during exercises as those are believed to be "under margin".

I believe that requirements management plays crucial role for all the operations included in change management, especially while making them "secure" and able. Understanding which other services, components, processes, technologies and resources the target on change needs and how it behaves while one or multiple of them fails even a bit.

Then comes the X - the unknowns. Planning continuity on change management (I do agree, the most important part considering vulnerabilities for the end2end service provess) requires special attention what to do when those believing's originally drafted does not constitute the facts.

That means extending the 'What If part' and keeping in mind that when disaster declaration is in effect during changes, the 'what ifs' should have a defined rolling back / restoration timetable, process and level.

Only through extending the REQUIREMENTS management process through the whole serice life-cycle can help here. I can not see any other way. Remember to obtain the most qualified people whom are able to extend as well.

All in all - important topic to maneuver in 100% business running world.

The views expressed in this post are the opinions of the Infosec Island member that posted this content. Infosec Island is not responsible for the content or messaging of this post.

Unauthorized reproduction of this article (in part or in whole) is prohibited without the express written permission of Infosec Island and the Infosec Island member that posted this content--this includes using our RSS feed for any purpose other than personal use.