Disaster Recovery Site vs. Multiple Live Sites?
Several of our clients have approached us recently to talk about upgrading or replacing their disaster recovery (DR) systems with full production systems.
Upon investigation, it seems that this trend is driven by a number of factors, only one of which is business continuity. Others include cost reduction, system upgrades, regulator requirements and greater manageability. In this article we review some of the drivers for this change, and the benefits that accrue.
Business Continuity
The primary issue is clearly business continuity - in the event of your primary data centre becoming inoperable, how long would it take to get your DR site up and working? If your DR site is a third-party owned rented facility, when did you last check the communications links? How long would it take to get them up and replicate your critical data in order to recommence operations? And how much business might you lose while you sort this out?
Setting aside the business impact of your systems being down for any period of time, can you be sure that all the system upgrades, software patches and data feed changes you've made to the live systems have been applied to the systems in the dark and dusty DR site?
It seems that several of our clients have decided that there’s a better way. They have decided to own the second site and make it fully operational - that way it's always there and working. Each site has the capacity to handle 100% of the firm's peak traffic. Having alternate communications providers to both sites, and ensuring that the primary provider to each site is different ensures that you can keep operating at full capacity even if you lost one site and the primary connection to the other!
I asked the market data manager at a leading European investment bank to explain how they decide which traffic to route through which site. He said, "We don't. As each user logs on, they're connected to whichever site is least loaded."
Regulatory Requirements
There is also a general trend, driven by regulators, to separate the location of the bulk of the users from the primary data centre - so that users being denied access to their normal place of work won’t necessarily compromise the physical systems at either live data centre. Obviously it makes sense to have the two data centres physically remote from one another so that an event that affected one wouldn't compromise the other. In the US the regulator requires that the sites are at least 200 miles apart, but in the UK the FSA has chosen not to adopt a prescriptive approach to business continuity, it merely suggests that standard practice is for them to be at least 10km apart. The FSA is one of the participants in the financial sector continuity project known as the Tripartite Authorities (FSA, Bank of England, HM Treasury). This has conducted a major consultation resulting in the “Business Continuity Management Practice Guide” available from the Financial Sector Continuity web site.
In this guide both standard and best practice are detailed. In particular even standard practice suggests that critical IT systems are spread across different buildings, and best practice suggests that “if buildings, content and non-replicated data were destroyed this would create no noticeable backlogs or impact on operations”. It is this kind of operational requirement that’s driving the review of these facilities.
Cost Reduction
As for costs, one of our clients has discovered that upgrading all of their systems to the latest rack servers has reduced both the cost of maintenance and the cost of depreciation, so an upgrade of all the equipment once it's written off, even though it's duplicated on each site, actually reduces the ongoing costs compared to the existing data centre and the DR site. Furthermore the new equipment supports gigabit networking as standard, so latency within the system has been reduced.
Going Green
They’ve also discovered further cost savings in the reduced need for cooling and power consumption; in these days where reducing the corporate carbon footprint is vital and not only being green but being seen to be green is increasingly a market differentiator, this benefit should not be dismissed lightly.
Future Trends
With the increased risk of terrorism, particularly in London and New York, we expect the trend towards multiple live data centres to continue. One of our clients is even going one step further and commissioning three replicated data centres around the world. While this may be seen as overkill by other organisations, they certainly don’t mean to have their operations affected by any form of disaster, whether natural or man-made.
|