Network Redundancy must start with your provider

By Art Reisman

Art Reisman CTO

Editor’s note: Art Reisman is the CTO of APconnections. APconnections designs and manufactures the popular NetEqualizer bandwidth shaper.

The chances of being killed by a shark are 1 in 264 million. The chance of being mauled by a bear on your weekend outing in the woods are even less.   Fear is a strange emotion rooted deep within our brains. Despite a rational understanding of risks people are programmed to lose sleep and exhaust their adrenaline supply worrying about events that will never happen.

It is this same lack of rational risk evaluation that makes it possible  for vendors to sell unneeded equipment to otherwise budget conscious businesses.  The current , in vogue,  unwarranted  fears used to move network equipment    are IPv6 preparedness, and  equipment redundancy.

Equipment vendors tend to push customers toward internal redundant hardware solutions , not because they have your best interest in mind ,  if they did, they would first encourage you to get a redundant link to your ISP.

Twenty years of practical hands on experience tells us  that your Internet router’s chance of catastrophic failure is about 1 percent over a three-year period. On the other hand, your internet provider has a 95-percent chance of having a full-day outage during that same three-year period.

If you are truly worried about a connectivity failure into your business, you MUST source two separate paths to the Internet to have any significant reduction in risk. Requiring fail-over on individual pieces of equipment, without first securing complete redundancy in your network from your provider is like putting a band-aid on your finger while pleading from your jugular vein.

Some other useful tips on making your network more reliable include

Do not turn on unneeded bells and whistles on your router and firewall equipment.

Many router and device failures are not absolute. Equipment will get cranky, slow, or belligerent based on human error or system bugs. Although system bugs are rare when these devices are used in the default set-up, it seems turning on bells and whistles is often an irresistible enticement for a tech. The more features you turn on, the less standard your configuration becomes, and all too often the mission of the device is pushed well beyond its original intent. Routers doing billing systems, for example.

These “soft” failure situations are common, and the fail-over mechanism likely will not kick in, even though the device is sick and not passing traffic as intended. I have witnessed this type of failure first-hand at major customer installations. The failure itself is bad enough, but the real embarrassment comes from having to tell your customer that the fail-over investment they purchased is useless in a real-life situation. Fail-over systems are designed with the idea that the equipment they route around will die and go belly up like a pheasant shot point-blank with a 12-gauge shotgun. In reality, for every “hard” failure, there are 100 system-related lock ups where equipment sputters and chokes but does not completely die.

Start with a high-quality Internet line.

T1 lines, although somewhat expensive, are based on telephone technology that has long been hardened and paid for. While they do cost a bit more than other solutions, they are well-engineered to your doorstep.

Make sure all your devices have good UPS sources and surge protectors.

Consider this when purchasing redundant equipment,  what is the cost of manually moving a wire to bypass a failed piece of equipment?

Look at this option before purchasing redundancy options on single point of failure. We often see customers asking for redundant fail-over embedded in their equipment. This tends to be a strategy of purchasing hardware such as routers, firewalls, bandwidth shapers, and access points that provide a “fail open” (meaning traffic will still pass through the device) should they catastrophically fail. At face value, this seems like a good idea to cover your bases. Most of these devices embed a failover switch internally to their hardware. The cost of this technology can add about $3,000 to the price of the unit.

If equipment is vital to your operation, you’ll need a spare unit on hand in case of failure. If the equipment is optional or used occasionally, then take it out of your network.

Again, these are just some basic tips, and your final Internet redundancy plan will ultimately depend on your specific circumstances. But, these tips and questions should put you on your way to a decision based on facts rather than one based on unnecessary fears and concerns.

Failover and NetEqualizer: The Whys and Why Nots

Do you want failover on your NetEqualizer or wondered why it’s not available? Let me share a story with you that has developed our philosophy on failover.

A long time ago, back in 1993 or so, I was the Unix and operating system point person for the popular AT&T (i.e. Lucent and Avaya) voice messaging product called Audix. It was my job to make sure that the Unix operating system was bug free and to trouble shoot any issues.

At the time, Audix sales accounted for about $300 million in business and included many Fortune 500 companies around the world. One of the features which I investigated, tested, and certified was our RAID technology. The data on our systems consisted of the archives of all those saved messages that were so important, even more so before e-mail became the standard.

I had a lab setup with all sorts of disk arrays and would routinely yank one from the rack while an Audix system was running. The RAID software we’d integrated worked flawlessly in every test. We were one of the largest companies in the world and we spared no expense to ensure quality in our equipment, and we also charged a premium for everything we sold. If the RAID line item feature was included with an Audix system, it could run as high as $100,000.

Flash forward to the future. We get a call that a customer has lost all their data. A RAID system had failed. It was a well-known insurance company in the Northeast. Needless to say, they were not pleased that their 100 K insurance policy against disk failure did not pan out.

I had certified this mechanism and stood behind it. So, I called together the RAID manufacturer and several Unix kernel experts to do a postmortem. After several days locked in a room, we found was that the real world failure did not follow the lab testing where we had pulled live disk drives in our lab. In fact, it failed in such a way as to slowly corrupt the customer data on all disk drives rendering it useless.

I did some follow up research on failover strategies over the years and discovered that many people implement them for political reasons to cover their asses. I do not mean to demean people covering their asses, it is an important part of business, but the problem is the real cost of testing and validating failover is not practical for most manufacturers.

Many customers ask, “If a NetEqualizer fails, will the LAN cards still pass data?” The answer is, we could certainly engineer our product this way, but there is no guarantee for fail safe systems.

Here are the pros and cons of such a technology:

1) Just like my disk drive failure experience, a system can fail many different ways and the failover mechanism is likely not foolproof. So, I don’t want to recreate history for something we cannot (nor can anybody) reliably real-world test.

2) NetEqualizer’s failure rate is about two percent over two years, which is mostly attributed to harsh operating conditions. That means you have a 1 in 50 chance of having a failure over a two-year period. Put simply, the odds are against this happening.

3) If a NetEqualizer fails, it is usually a matter of moving a cable, which can be easily fixed. So, if you, or anyone with access to the NetEqualizer, are within an hour of your facility, that means you have a 1 in 50 chance of your network being down for one hour every two years because of a NetEqualizer.

4) Customers that really need a fully redundant failover for their operation duplicate their entire infrastructure and purchase two NetEqualizers. These customers are typically brokerage houses where large revenue could be lost. Since they already have a fully tested strategy at the macro level, a failover card on the NetEqualizer is not needed.

5) For customer that is just starting to dabble, they have gone to Cisco spanning tree protocol. Cisco has many years and billions of dollars invested in their switching technology and is rock solid.

6) Putting LAN failover cards in our product would likely raise our base price by about $1000. That would be a significant price increase for most customers, and one that would most likely not be worth paying for.

7) Most equipment failures are software or system related. We take pride in the fact that our boxes run forever and don’t lock up or need rebooting. A failover LAN card does not typically protect against system-type failures.

So, yes, we could sell our system as failsafe with a failover LAN card, but we would rather educate than exploit fears and misunderstandings. Hopefully we’ve accomplished that here.

%d bloggers like this: