Single-Site or Highly-Available?

Simplicity is prerequisite for reliability.”
Edsger Dijkstra

This is the last in this “So, what is it going to be?” series.

Do you need your OctopOSS to be highly available (HA), with mullti-site failover, redundancy mechanisms, constant data sych, etc? Or is your OctopOSS non mission critical and doesn’t require a DR (Disaster Recovery) site, doesn’t need complex data duplication and only requires best effort support and recovery mechanisms?

This is where your OctopOSS gets interesting. There are a number of techniques available to ensure business continuity if there is a failure within your OSS solution. However, duplication of resources generally means an increase in costs (hardware, network, HA software, virtualisation, storage, database, power, human resources, etc).

Your decision here will basically come down to the level of reliability you can afford.

The list below provides examples listed in order of increasing reliability (all else being equal):

  • Single site, single server, single database instance
  • Single site, clustered servers, HA storage, clustered database
  • Dual site, clustered solution.

When a clustered solution exists, there a number of different models:

  • Active – Standby – When only one of two devices is in an active state (ie carrying traffic). The standby device only becomes active if the primary fails (ie failover). Invariably the previous “standby” device now becomes the primary until an event toggles back to the initial state (ie fail-back).
  • Active-Active – In the event of a server failure within the cluster, traffic is diverted to another active server and / or load balanced across the remaining servers
  • N+1 – There is one spare server that is equally capable of replacing any of the other N servers in the cluster

A dual site clustered solution adds an extra layer of configuration, ie the consideration of human resourcing and related processes. Will one site be a primary site with all operators? Will the DR site be used only upon catastrophic failure of the primary or will there be staff on both sites sharing load/effort?

Blah, blah, blah. But which variant is best for you? There isn’t a “best” answer. As mentioned above, it comes down to your tolerance of risk and available budget.

For small CSPs, it will usually be a single site, 2 node cluster similar to this link on Wikipedia.
For larger CSPs, there is usually a DR data centre to provide greater business continuity reliability.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.