Redundancy and Failover

Among the more useful innovations in computing, actually invented decades ago, are the twin ideas of redundancy and failover. These fancy words name very common sense concepts. When one computer (or part) fails, switch to another. Doing that seamlessly and quickly versus slowly with disruption defines one difference between good hosting and bad.

Network redundancy is the most widely used example. The Internet is just that, an inter-connected set of networks. Between and within networks are paths that make possible page requests, file transfers and data movement from one spot (called a 'node') to the next. If you have two or more paths between a user's computer and the server, one becoming unavailable is not much of a problem. Closing one street is not so bad, if you can drive down another just as easily.

Of course, there's the catch: 'just as easily'. When one path fails, the total load (the amount of data requested and by how many within what time frame) doesn't change. Now the same number of 'cars' are using fewer 'roads'. That can lead to traffic jams.

A very different, but related, phenomenon occurs when there suddenly become more 'cars', as happens in a massively widespread virus attack, for example. Then, a large number of useless and destructive programs are running around flooding the network. Making the situation worse, at a certain point, parts of the networks may shut down to prevent further spread, producing more 'cars' on now-fewer 'roads'.

A related form of redundancy and failover can be carried out with servers, which are in essence the 'end-nodes' of a network path.

Servers can fail because of a hard drive failure, motherboard overheating, memory malfunction, operating system bug, web server software overload or any of a hundred other causes. Whatever the cause, when two or more servers are configured so that another can take up the slack from one that's failed, that is redundancy.

That is more difficult to achieve than network redundancy, but it is still very common. Not as common as it should be, since many times a failed server is just re-booted or replaced or repaired with another piece of hardware. But, more sophisticated web hosting companies will have such redundancy in place.

And that's one lesson for anyone considering which web hosting company may offer superior service over another -similarly priced- company. Look at which company can offer competent assistance when things fail, as they always do sooner or later.

One company may have a habit of simply re-booting. Others may have redundant disk arrays. Hardware containing multiple disk drives to which the server has access allows for one or more drives to fail without bringing the system down. The failed drive is replaced and no one but the administrator is even aware there was a problem.

Still other companies may have still more sophisticated systems in place. Failover servers that take up the load of a crashed computer, without the end-user seeing anything are possible. In fact, in better installations, they're the norm. When they're in place, the user has at most only to refresh his or her browser and, bingo, everything is fine.

The more a web site owner knows about redundancy and failover, the better he or she can understand why things go wrong, and what options are available when they do. That knowledge can lead to better choices for a better web site experience.

Search

Share This Page