Failover and NetEqualizer: The Whys and Why Nots

Do you want failover on your NetEqualizer or wondered why it’s not available? Let me share a story with you that has developed our philosophy on failover.

A long time ago, back in 1993 or so, I was the Unix and operating system point person for the popular AT&T (i.e. Lucent and Avaya) voice messaging product called Audix. It was my job to make sure that the Unix operating system was bug free and to trouble shoot any issues.

At the time, Audix sales accounted for about $300 million in business and included many Fortune 500 companies around the world. One of the features which I investigated, tested, and certified was our RAID technology. The data on our systems consisted of the archives of all those saved messages that were so important, even more so before e-mail became the standard.

I had a lab setup with all sorts of disk arrays and would routinely yank one from the rack while an Audix system was running. The RAID software we’d integrated worked flawlessly in every test. We were one of the largest companies in the world and we spared no expense to ensure quality in our equipment, and we also charged a premium for everything we sold. If the RAID line item feature was included with an Audix system, it could run as high as $100,000.

Flash forward to the future. We get a call that a customer has lost all their data. A RAID system had failed. It was a well-known insurance company in the Northeast. Needless to say, they were not pleased that their 100 K insurance policy against disk failure did not pan out.

I had certified this mechanism and stood behind it. So, I called together the RAID manufacturer and several Unix kernel experts to do a postmortem. After several days locked in a room, we found was that the real world failure did not follow the lab testing where we had pulled live disk drives in our lab. In fact, it failed in such a way as to slowly corrupt the customer data on all disk drives rendering it useless.

I did some follow up research on failover strategies over the years and discovered that many people implement them for political reasons to cover their asses. I do not mean to demean people covering their asses, it is an important part of business, but the problem is the real cost of testing and validating failover is not practical for most manufacturers.

Many customers ask, “If a NetEqualizer fails, will the LAN cards still pass data?” The answer is, we could certainly engineer our product this way, but there is no guarantee for fail safe systems.

Here are the pros and cons of such a technology:

1) Just like my disk drive failure experience, a system can fail many different ways and the failover mechanism is likely not foolproof. So, I don’t want to recreate history for something we cannot (nor can anybody) reliably real-world test.

2) NetEqualizer’s failure rate is about two percent over two years, which is mostly attributed to harsh operating conditions. That means you have a 1 in 50 chance of having a failure over a two-year period. Put simply, the odds are against this happening.

3) If a NetEqualizer fails, it is usually a matter of moving a cable, which can be easily fixed. So, if you, or anyone with access to the NetEqualizer, are within an hour of your facility, that means you have a 1 in 50 chance of your network being down for one hour every two years because of a NetEqualizer.

4) Customers that really need a fully redundant failover for their operation duplicate their entire infrastructure and purchase two NetEqualizers. These customers are typically brokerage houses where large revenue could be lost. Since they already have a fully tested strategy at the macro level, a failover card on the NetEqualizer is not needed.

5) For customer that is just starting to dabble, they have gone to Cisco spanning tree protocol. Cisco has many years and billions of dollars invested in their switching technology and is rock solid.

6) Putting LAN failover cards in our product would likely raise our base price by about $1000. That would be a significant price increase for most customers, and one that would most likely not be worth paying for.

7) Most equipment failures are software or system related. We take pride in the fact that our boxes run forever and don’t lock up or need rebooting. A failover LAN card does not typically protect against system-type failures.

So, yes, we could sell our system as failsafe with a failover LAN card, but we would rather educate than exploit fears and misunderstandings. Hopefully we’ve accomplished that here.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: