Network Bottlenecks – When Your Router Drops Packets, Things Can Get Ugly


By Art Reisman

CTO – APconnections

As a general rule, when a network router sees more packets than it can send or receive on a link, it will drop the extra  packets. Intuitively, when your router is dropping packets, one would assume that the perceived slow down, per user, would be just a gradual shift slower.

What happens in reality is far worse…

1) Distant users get spiraling slower responses.

Martin Roth, a colleague of ours who founded one of the top performance analysis companies in the world, provided this explanation:

“Any device which is dropping packets “favors” streams with the shortest round trip time, because (according to the TCP protocol) the time after which a lost packet is recovered is depending on the round trip time. So when a company in Copenhagen/Denmark has a line to Australia and a line to Germany on the same internet router, and this router is discarding packets because of bandwidth limits/policing, the stream to Australia is getting much bigger “holes” per lost packet (up to 3 seconds) than the stream to Germany or another office in Copenhagen. This effect then increases when the TCP window size to Australia is reduced (because of the retransmissions), so there are fewer bytes per round trip and more holes between to round trips.”

In the screen shot above (courtesy of avenida.dk), the Bandwidth limit is 10 Mbit (= 1 Mbyte/s net traffic), so everything on top of that will get discarded. The problem is not the discards, this is standard TCP behaviour, but the connections that are forcefully closed because of the discards. After the peak in closed connections, there is a “dip” in bandwidth utilization, because we cut too many connections.

2) Once you hit a congestion point, where your router is forced to drop packets, overall congestion actually gets worse before it gets better.

When applications don’t get a response due to a dropped packet, instead of backing off and waiting, they tend to start sending re-tries, and this is why you may have noticed prolonged periods (3o seconds or more) of no service on a congested network. We call this the rolling brown out. Think of this situation as sort of a doubling down on bandwidth at the moment of congestion. Instead of easing into a full network and lightly bumping your head, all the devices demanding bandwidth ramp up their requests at precisely the moment when your network is congested, resulting in an explosion of packet dropping until everybody finally gives up.

How do you remedy outages caused by Congestion?

We have written extensively about solutions to prevent bottlenecks. Here is a quick summary with links:

1) The most obvious being to increase the size of your link.

2) Enforce rate limits per user.

3) Wse something more sophisticated like a Netequalizer, a device that is designed to specifically counter the effects of congestion.

From Martin Roth of Avenida.dk

“With NetEqualizer we may get the same number of discards, but we get fewer connections closed, because we “kick” the few connections with the high bandwidth, so we do not get the “dip” in bandwidth utilization.

The graphs (above) were recorded using 1 second intervals, so here you can see the bandwidth is reached. In a standard SolarWinds graph with 10 minute averages the bandwidth utilization would be under 20% and the customer would not know they are hitting the limit.”

———————————————————————-

The excerpt below was a message from a reseller who had been struggling with congestion issues at a hotel, he tried basic rate limits on his router first. Rate Limits will buy you some time , but on an oversold network you can still hit the congestion point, and for this you need a smarter device.

“…NetEq delivered a 500% gain in available bandwidth by eliminating rate caps, possible through a mix of connection limits and Equalization.  Both are necessary.  The hotel went from 750 Kbit max per accesspoint (entire hotel lobby fights over 750Kbit; divided between who knows how many users) to 7Mbit or more available bandwidth for single users with heavy needs.

The ability to fully load the pipe, then reach out and instantly take back up to a third of it for an immediate need like a speedtest was also really eye-opening.  The pipe is already maxed out, but there is always a third of it that can be immediately cleared in time to perform something new and high-priority like a speed test.”
 
Rate Caps: nobody ever gets a fast Internet connection.
Equalized: the pipe stays as full as possible, yet anybody with a business-class need gets served a major portion of the pipe on demand. “
– Ben Whitaker – jetsetnetworks.com

Are those rate limits on your router good enough?

NetGladiator: A Layer 7 Shaper in Sheep’s Clothing


When explaining our NetGladiator technology the other day, a customer was very intrigued with our Layer 7 engine. He likened it to a caged tiger under the hood, gobbling up and spitting out data packets with the speed and cunning of the world’s most powerful feline.

He was surprised to see this level of capability in equipment offered at our prices.  He was impressed with the speed attained for the price point of our solution (more on this later in the article)…

In order to create a rock-solid IPS (Intrusion Prevention System), capable of handling network speeds of up to 1 gigabit with standard Intel hardware, we had to devise a technology breakthrough in Layer 7 processing. Existing technologies were just too slow to keep up with network speed expectations.

In order to support higher speeds, most vendors use semi-custom chip sets and a technology called “ASIC“. This works well but is very expensive to manufacture.

How do typical Layer 7 engines work?

Our IPS story starts with our old Layer 7 engine. It was sitting idle on our NetEqualizer product. We had shelved it when we got away from from Layer 7 shaping in favor of Equalizing technology, which is a superior solution for traffic shaping.  However, when we decided to move ahead with our new IPS this year, we realized we needed a fast-class analysis engine, one that could look at all data packets in real time. Our existing Layer 7 shaper only analyzed headers because that was adequate for its previous mission (detecting P2P streams).  For our new IPS system, we needed a solution that could do a deep dive into the data packets.  The IPS mission requires that you look at all the data – every packet crossing into a customer network.

The first step was to revamp the older engine and configure it to look at every packet. The results were disappointing.  With the load of analyzing every packet, we could not get throughput any higher than about 20 megabits, far short of our goal of 1 gigabit.

What do we do differently with our updated Layer 7 engine?

Necessity is the mother of invention, and so we invented a better Layer 7 engine.

The key was to take advantage of multiple processors for analysis of data without delaying data packets. The way the old technology worked was that it would intercept a data packet on a data link, hold it, analyze it for P2P patterns, and then send it on.  With this method, as packets come faster and faster you end up not having enough CPU time to do the analysis and still send the packet on without adding latency.  Many customers find this out the hard way when they update their data speeds from older slower T1 technology.  Typical analysis engines on affordable routers and firewalls often just can’t keep up with line speeds.

What we did was take advantage of a utility in the Linux Kernel called “clone skb”.  This allows you to make a temporary copy of the data packet without the overhead of copying.  More importantly, it allows us to send the packet on without delay and do the analysis within a millisecond (not quite line speed, but fast enough to stop an intruder).

We then combined the cloning with a new technology in the Linux kernel called Kernel Threading.  This is different than the technology that large multi-threaded HTTP servers use because it happens at the kernel level, and we do not have to copy the packet up to some higher-level server for analysis. Copying a packet for analysis is a huge bottleneck and very time-consuming.

What were our Results?

With kernel threading, cloning, and a high-end Intel SMP processor, we can make use of 16 CPU’s doing packet analysis at the same time and we now have attained speeds close to our 1 gigabit target.

When we developed our bandwidth shaping technology in 2003/2004, we leveraged technology innovation to create a superior bandwidth control appliance (read our NetEqualizer Story).  With the NetGladiator IPS, we have once again leveraged technology innovation to enable us to provide an intrusion prevention system at a very compelling price (register to get our price list), hence our customer’s remark about great speed for the price.

What other benefits does our low cost, high-speed layer 7 engine allow for? Is it just for IPS?

The sky is the limit here.  Any type of pattern you want to look at in real-time can now be done at one tenth (1/10th) the cost of the ASIC class of shapers.  Although we are not a fan of unauthorized intrusion into private data of the public Internet (we support Net Neutrality), there are hundreds of other uses which can be configured with our engine.

Some that we might consider in the future include:

– Spam filtering
– Unwanted protocols in your business
– Content blocking
– Keyword spotting

If you are interested in testing and experimenting in any of these areas with our raw technology, feel free to contact us ips@netgladiator.net.

Economic Check List for Bandwidth Usage Enforcement


I just got off the phone with a good friend of mine that contracts out IT support for about 40 residential college housing apartment buildings. He was asking about the merits of building a quota tool to limit the amount of total consumption, per user, in his residential buildings. I ended up talking him out of building an elaborate quota-based billing system, and I thought it would be a good idea share some of the business logic of our discussion.

Some background on the revival of usage-based billing (and quotas)

Although they never went away completely, quotas have recently revived themselves as the tool of choice for deterring bandwidth usage and secondarily as cash generation tool for ISPs.  There was never any doubt that they were mechanically effective as a deterrent.  Historically, the hesitation of implementing quotas was that nobody wanted to tell a customer they had a limit on their bandwidth.  Previously, quotas existed only in fine print, as providers kept their bandwidth quota policy tight to their belt.  Prior to the wireless data craze, they only selectively and quietly enforced them in extreme cases.  Times have changed since we addressed the debate with our article, quota or not to quota, several years ago.

Combine the content wars of Netflix, Hulu, and YouTube, with the massive over-promising of 4G networks from providers such as Verizon, AT&T and Sprint, and it seems that quotas on data have followed right along where limitations used to reign supreme. Consumers seem to have accepted the idea of a quota on their data plan. This new acclimation of consumers to quotas may open the door for traditional fixed-line carriers to offer different quota plans as well.

That brings us to the question of how to implement a quota system, what is cost effective?

In cases where you have just a few hundred subscribers (as in my discussion with our customer above), it just does not make economic sense to build a full-blown usage-based billing and quota system.

For example, it is pretty easy to just eyeball a monthly usage report with a tool such as ntop, and see who is over their quota. A reasonable quota limit, perhaps 16 gigabytes a month, will likely have only a small percentage of users exceeding their limits. These users can be warned manually with an e-mail quite economically.

Referencing a recent discussion thread where the IT Administrator of University of Tennessee Chattanooga chimed in…

“We do nothing to the first 4Gb, allowing for some smoking “occasional” downloads/uploads, but then apply rate limits in a graduated fashion at 8/12/16Gb. Very few reach the last tier, a handful may reach the 2nd tier, and perhaps 100 pass the 4Gb marker. Netflix is a monster.”

I assume they, UTC, have thousands of users on their network, so if you translate this down to a smaller ISP with perhaps 400 users, it means only a handful are going to exceed their 16 GB quota. Most users will cut back on the first warning.

What you can do if you have 1000+ customers (you are a large ISP)

For a larger ISP, you’ll need an automated usage-based billing and quota system and with that comes a bit more overhead.  However, with the economy-of-scale of a larger ISP, the cost of a more automated usage-based billing and quota system should start to reach payback at 1000+ users. Here are some things to consider:

1) You’ll need to have a screen where users can login and see their remaining data limits for the billing period.

2) Have some way to mitigate getting them turned back on automatically if the quota system starts to restrict them.

3) Send out automated warning levels at 50 and 80 percent (or any predefined levels of your choice).

4) You may need a 24 hour call center to help them, as they won’t be happy when their service unknowingly comes to a halt on a Sunday night (yes, this happened to me once), and they have no idea why.

5) You will need automated billing and security on your systems, as well as record back-up and logging.

What you can do if you have < 1000 customers (you are a small ISP)

It’s not that this can’t be done, but the cost of such a set of features needs to be amortized over a large set of users. For the smaller ISP, there are simpler things you can try first.

I like to first look at what a customer is trying to accomplish with their quota tool, and then take the easiest path to accomplish their goal. Usually the goal is just to keep total bandwidth consumption down, secondarily the goal is to sell incremental plans and charge for the higher amounts of usage.

Send out a notice announcing a quota plan
The first thing I pointed out from experience is that if you simply threaten a quota limitation in your policy, with serious consequences, most of your users will modify their behavior, as nobody wants to get hit with a giant bill. In other words, the easiest way to get started is to send out an e-mail about some kind of vague quota plan and abusers will be scaled back. The nice part of this plan is it costs nothing to implement and may cut your bandwidth utilization overnight.

I have also noticed that once a notice is sent out you will get a 98 percent compliance rate. That is 8 notices needed per 400 customers. Your standard reporting tool (in our case ntop) can easily and quickly show you the overages over a time period and with a couple of e-mails you have your system – without creating a new software implementation. Obviously, this manual method is not practical for an ISP with 1 million subscribers; but for the small operator it is a great alternative.

NetEqualizer User-Quota API (NUQ-API)

If we have not convinced you, and you feel that you MUST have a quota plan in place, we do offer a set of APIs with the NetEqualizer to help you build your own customized quota system. Warning: these APIs are truly for tech geeks to play with. If that is not you, you will need to hire a consultant to write your code for you. Learn more about our NUQ-API (NetEqualizer User-Quota API).

Have you tried something else that was cost-effective? Do you see other alternatives for small ISPs? Let us know your thoughts!

APconnections Releases FREE Version of Intrusion Detection and Prevention Device


APconnections quietly released a free version of their IPS device yesterday. Codenamed StopHack, you can install this full-featured IPS with a little elbow grease on your own hardware. This powerful technology is used to detect and block hacker intrusion attempts before they get into your network.

Although the price is free for this version, under the hood, the StopHack software can handle about 10,000 simultaneous streams (users) hitting your network and will check every query for malformed and invasive URL’s. These type of attacks are the most dangerous and are typically exploited by probing bots to knock holes in your servers. StopHack also has a nice log where you can see who has attempted to breach your network, and a white list to exempt users from being scrutinized at all.

It comes with 16 of the most common intrusion techniques blocked, (more can be purchased with a support contract), and uses behavior-based techniques to differentiate a friendly IP from a non-friendly IP.

Click here for the StopHack FAQ.

Click here to get the download and installation instructions.

NOTE: StopHack is free to use but support must be purchased if you need help for any reason, including installation.

FCC is the Latest Dupe in Speed-Test Shenanigans


Shenanigans: is defined as the deception or tomfoolery on the part of carnival stand operators. In the case of Internet speed, claims made in the latest Wall Street Journal article, the tomfoolery is in the lack of details on how these tests were carried out.

According to the article, all the providers tested by the FCC delivered 50 megabits or more of bandwidth consistently for 24 hours straight. Fifty megabits should be enough for 50 people to continuously watch a YouTube stream at the same time. With my provider, in a large metro area, I often can’t even watch one 1 minute clip for more than a few seconds without that little time-out icon spinning in my face. By the time the video queues up enough content to play all the way through, I have long since forgotten about it and moved on. And then, when it finally starts playing again, I have to go back and frantically find it and kill the YouTube window that is barking at me from somewhere in the background.

So what gives here? Is there something wrong with my service?

I am supposed to have 10 megabit service. When I run a test I get 20 megabits of download enough to run 20 YouTube streams without issue, so far so good.

The problem with translating speed test claims to your actual Internet experience is that there are all kinds of potentially real problems once you get away from the simplicity of a speed test, and yes, plenty of deceptions as well.

First, lets look at the potentially honest problems with your actual speed when watching a YouTube video:

1) Remote server is slow: The YouTube server itself could actually be overwhelmed and you would have no way to know.

How to determine: Try various YouTube videos at once, you will likely hit different servers and see different speeds if this is the problem.

2) Local wireless problems: I have been the victim of this problem. Running two wireless access points and a couple of wireless cameras jammed one of my access points to the point where I could hardly connect to an Internet site at all.

How to determine: Plug your computer directly into your modem, thus bypassing the wireless router and test your speed.

3) Local provider link is congested: Providers have shared distribution points for your neighborhood or area, and these can become congested and slow.

How to determine: Run a speed test. If the local link to your provider is congested, it will show up on the speed test, and there cannot be any deception.

 

The Deceptions

1) Caching

I have done enough testing first hand to confirm that my provider caches heavily trafficked sites whenever they can. I would not really call this a true deception, as caching benefits both provider and consumer; however, if you end up hitting a YouTube video that is not currently in the cache, your speed will suffer at certain times during the day.

How to Determine: Watch a popular YouTube video, and then watch an obscure, seldom-watched YouTube.

Note: Do not watch the same YouTube twice in a row as it may end up in your local cache, or your providers local cache, after the first viewing.

2) Exchange Point Deceptions

The main congestion point between you and the open Internet is your providers exchange point. Most likely your cable company or DSL provider has a dedicated wire direct to your home. This wire, most likely has a clean path back to the NOC central location. The advertised speed of your service is most likely a declaration of the speed from your house to your providers NOC, hence one could argue this is your Internet speed. This would be fine except that most of the public Internet content lies beyond your provider through an exchange point.

The NOC exchange point is where you leave your local providers wires and go out to access information from data hosted on other provider networks. Providers pay extra costs when you leave their network, in both fees and in equipment costs. A few of things they can do to deceive you are:

– Give special priority to your speed tests through their site to insure the speed test runs as fast as possible.

– Re-route local traffic for certain applications back onto their network. Essentially limiting and preventing traffic from leaving their network.

– They can locally host the speed test themselves.
How to determine: Use a speed test tool that cannot be spoofed.

See also:

Is Your ISP Throttling your Bandwidth

NetEqualizer YouTube Caching

How to Speed Up Your Internet Connection with a Bandwidth Controller


slow-internet

It occurred to me today, that in all the years I have been posting about common ways to speed up your Internet, I have never really written a plain and simple consumer explanation dedicated to how a bandwidth controller can speed up your Internet. After all, it seems intuitive, that a bandwidth controller is something an ISP would use to slow down your Internet; but there can be a beneficial side to a bandwidth controller, even at the home-consumer level.

Quite a bit of slow Internet service problems are due to contention on your link to the Internet. Even if you are the only user on the Internet, a simple update to your virus software running in the background can dominate your Internet link. A large download often will cause everything else you try (email, browsing) to come to a crawl.

What causes slowness on a shared link?

Everything you do on your Internet creates a connection from inside your network to the Internet, and all these connections compete for the limited amount of bandwidth which your ISP provides.

Your router (cable modem) connection to the Internet provides first-come, first-serve service to all the applications trying to access the Internet. To make matters worse, the heavier users (the ones with the larger persistent downloads), tend to get more than their fair share of router cycles. Large downloads are like the school yard bully – they tend to butt in line, and not play fair.

So how can a bandwidth controller make my Internet faster?

A smart bandwidth controller will analyze all your Internet connections on the fly. It will then selectively take away some bandwidth from the bullies. Once the bullies are removed, other applications will get much needed cycles out to the Internet, thus speeding them up.

What application benefits most when a bandwidth controller is deployed on a network?

The most noticeable beneficiary will be your VoIP service. VoIP calls typically don’t use that much bandwidth, but they are incredibly sensitive to a congested link. Even small quarter-second gaps in a VoIP call can make a conversation unintelligible.

Can a bandwidth controller make my YouTube videos play without interruption?

In some cases yes, but generally no. A YouTube video will require anywhere from 500kbs to 1000kbs of your link, and is often the bully on the link; however in some instances there are bigger bullies crushing YouTube performance, and a bandwidth controller can help in those instances.

Can a home user or small business with a slow connection take advantage of a bandwidth controller?

Yes, but the choice is a time-cost-benefit decision. For about $1,600 there are some products out there that come with support that can solve this issue for you, but that price is hard to justify for the home user – even a business user sometimes.

Note: I am trying to keep this article objective and hence am not recommending anything in particular.

On a home-user network it might be easier just to police it yourself, shutting off background applications, and unplugging the kids’ computers when you really need to get something done. A bandwidth controller must sit between your modem/router and all the users on your network.

Related Article Ten Things to Consider When Choosing a Bandwidth Shaper.

Dynamic Reporting With The NetEqualizer


Update  Feb 2014

The spread sheet reporting features  described below as an excel Integration have now been integrated into the NetEqualizer GUI as of 2013. We have also added protocol reporting for common applications.  We generally do not break links to old articles hence we did not take this article down.

 

 

Have you ever wanted an inexpensive real-time bandwidth reporting tool?

The following excel integration, totally opens up the power of the NetEqualizer bandwidth data. Even I love watching my NetEqualizer data on my spreadsheet. Last night, I had it up and watched as the bandwidth spiked all of a sudden, so I looked around to see why it was – turns out my son started watching NetFlix on his Nintendo DS! Too funny, but very persuasive in terms of enhancing your ability to do monitoring.

This blog shows just one example, but suffice it to say that the reporting options are endless. You could easily write a VBA routine in Excel to bring this data down every second. You could automatically log the days top 10 highest streams, or top 10 highest connections. You could graph the last 60 seconds (or other timeframe) of per second peak usage. You could update this graph, watching it scroll by in real time. It’s endless what you could do, with relatively little effort (because Excel does all the computationally hard work as pre-programmed routines for reporting and display).

Here’s a picture of what’s happening on my NetEqualizer right now as I write this:

Fig-1

Pretty slick eh? After I put this spreadsheet together the first time, I won’t have to do anything to have it report current data every minute or sooner. Let me explain how you can do it too.

Did you know that there’s a little known feature in Microsoft Excel called an Excel Web Query?  This facility allows you to specify an http: address on the web and use the data off the resulting web page for automatic insertion into Excel.  Further, you can tell Excel that you want your spreadsheet to be automatically updated regularly – as frequently as every minute or whenever you hit the “Refresh All” key. If you combine this capability with the ability to run a NetEqualizer report from your browser using the embedded command, you can automatically download just about any NetEqualizer data into a spreadsheet for reporting, graphing and analysis.

Fig-1 above shows some interesting information all of it gathered from my NetEqualizer as well as some information that has been programmed into my spreadsheet. Here’s what’s going on: Cells B4 & B5 contain information pulled from my NetEqualizer, it is the total bandwidth Up & Down respectively going through the unit right now. It compares this with cells C4 & C5, which are the TrunkUp & TrunkDown settings (also pulled from the NetEqualizer’s configuration file and downloaded automatically) and calculates cells D4 & D5 showing the % of trunk used. The Cells B8:K show all the data from the NetEqualizer’s Active Connections Report. The column titled “8 Second Rolling Average Bandwidth” shows Wavg and this data is also automatically plotted in a pie chart showing the bandwidth composition of my individual flows. Also, I put a conditional rule on my bandwidth flow that says because I’m greater than 85% of my TrunkDown speed, all Flows greater than HOGMIN should be highlighted in Red. All of this updated every minute, or sooner if I hit the refresh key.

I’ll take you through a step by step on how I created the page above so you unlock the power of Excel on your critical bandwidth data.

The steps I outline are for Excel 2007, this can be done in earlier versions of Excel but the steps will be slightly different. All I ask is if you create a spreadsheet like this and do something you really like, let us know about it (email: sales@apconnections.net).

I’m going to assume that you know how to construct a basic spreadsheet. This document would be far too long if I took you through each little step to create the report above. Instead, I’ll show you the important part – how to get the data from the NetEqualizer into the spreadsheet and have it automatically and regularly refresh itself.

In this page there are two links: One at B4:B5, and another at B8:K (K has no ending row because it depends on how many connections it pulls – thus K could range from K8 to K99999999 – you get the idea).

Let’s start by linking my total up and down bandwidth to cells B4:B5 from the NetEqualizer.  To do this, follow these steps:

Select cell B4 with your cursor.

Select the “Data” tab and click “From Web”.


Click “No” and Erase the address in the address bar:

Put the following in the Address Bar instead – make sure to put the IP Address of your NetEqualizer instead of “YourNetEqualizersIPAddress” – and hit return:

—Please contact us (support@apconnections.net) if you are a current NetEqualizer user and want the full doc—

You may get asked for your User ID and Password – just use your normal NetEqualizer User ID and Password.

Now you should see this:


Click on the 2nd arrow in the form which turns it into a check mark after it’s been clicked (as shown in the picture above). This highlights the data returned which is the “Peak” bandwidth (Up & Down) on the NetEqualizer .  Click the Import button.  In a few seconds this will populate the spreadsheet with this data in cells B4 & B5.

Now, let’s tell the connection that we want the data updated every 1 minute. Right Click on B4 (or B5), and you will see this:


Click on Data Range Properties.

Change “Refresh every” to 1 minute. Also, you should copy the other click marks as well.  Hit “OK”.

Done! Total Bandwidth flow data from the NetEqualizer bridge will now automatically update into the spreadsheet every 60 seconds.

For the Active Connections portion of this report, follow the same instructions starting by selecting cell B8. Only for this report, use the following web address (remember to use your NetEqualizer’s IP):

—Please contact us (support@apconnections.net) if you are a current NetEqualizer user and want the full doc—

(note: we’ve had some reports that this command doesn’t cut and paste well probably because of the “wrap”, you may need to type it in)

Also, please copy and paste this exactly (unless you’re a Linux expert – and if you are send me a better command!) since there are many special formatting characters that have been used to make this import work in a well behaved manner.  Trust me on this, there was plenty of trial an error spent on getting this to come in reliably.

Also, remember to set the connection properties to update every 1 minute.

At this point you may be noticing one of the cool things about this procedure is that I can run my own “custom” reports via a web http address that also issues Linux commands like “cat” & “awk” – being able to do this allows me to take just about any data off the NetEqualizer for automatic import into Excel.

So that’s how it’s done. Here’s a list of a few other handy web connection reports:

For your NetEqualizer’s configuration file use:

—Please contact us (support@apconnections.net) if you are a current NetEqualizer user and want the full doc—

For your NetEqualizer’s log file use:

—Please contact us (support@apconnections.net) if you are a current NetEqualizer user and want the full doc—

(note: we’ve had some reports that this command doesn’t cut and paste well probably because of the “wrap”, you may need to type it in)

Once you get all the data you need into your Excel, you can operate on the data using any Excel commands including macros, or Excel Visual Basic.

Lastly, do you want to see what’s happening right now, and you don’t want to wait up to 60 seconds? Hit the “Refresh All” button on the “Data” tab – that will refresh everything as of this second:

Good luck, and let us know how it goes…

Caveat – this feature is unsupported by APConnections.

Five More Tips on Testing Your Internet Speed


By Art Reisman

Art Reisman is currently CTO and co-founder of NetEqualizer

Imagine if every time you went to a gas station the meters were adjusted to exaggerate the amount of fuel pumped, or the gas contained inert additives. Most consumers count on the fact that state and federal regulators monitor your local gas station to ensure that a gallon is a gallon and the fuel is not a mixture of water and rubbing alcohol. But in the United States, there are no rules governing truth in bandwidth claims. At least none that we are aware of.

Given there is no standard in regulating Internet speed, it’s up to the consumer to take the extra steps to make sure you’re getting what you pay for. In the past, we’ve offered some tips both on speeding up your Internet connection as well as questions you should ask your provider. Here are some additional tips on how to fairly test your Internet speed.

1. Use a speed test site that mimics the way you actually access the Internet.

Why?

Using a popular speed test tool is too predictable, and your Internet provider knows this. In other words, they can optimize their service to show great results when you use a standard speed test site. To get a better measure of you speed,  your test must be unpredictable. Think of a movie star going to the Oscars. With time to plan, they are always going to look their best. But the candid pictures captured by the tabloids never show quite as well.

To get a candid picture of your providers true throughput, we suggest using a tool such as the speed test utility from M-Lab.

2. Try a very large download to see if your speed is sustained.

We suggest downloading a full Knoppix CD. Most download utilities will give you a status bar on the speed of your download. Watch the download speed over the course of the download and see if the speed backs off after a while.

Why?

Some providers will start slowing your speed after a certain amount of data is passed in a short period, so the larger the file in the test the better. The common speed test sites likely do not use large enough downloads to trigger a slower download speed enforced by your provider.

3. If you must use a standard speed test site, make sure to repeat your tests with at least three different speed test sites.

Different speed test sites use different methods for passing data and results will vary.

4. Run your tests during busy hours — typically between 5 and 9 p.m. — and try running them at different times.

Often times IPs have trouble providing their top advertised speeds during busy hours.

5. Make sure to shut off other activities that use the Internet when you test. 

This includes other computers in your house, not just the computer you are testing from.

Why?

All the computers in your house share the same Internet pipe to your provider. If somebody is watching a Netflix movie while you run your test, the movie stream will skew your results.

Created by APconnections, the NetEqualizer is a plug-and-play bandwidth control and WAN/Internet optimization appliance that is flexible and scalable. When the network is congested, NetEqualizer’s unique “behavior shaping” technology dynamically and automatically gives priority to latency sensitive applications, such as VoIP and email. Click here for a full price list.

Just How Fast Is Your 4G Network?


By Art Reisman, CTO, www.netequalizer.com

Art Reisman CTO www.netequalizer.com

The subject of Internet speed and how to make it go faster is always a hot topic. So that begs the question, if everybody wants their Internet to go faster, what are some of the limitations? I mean, why can’t we just achieve infinite speeds when we want them and where we want them?

Below, I’ll take on some of the fundamental gating factors of Internet speeds, primarily exploring the difference between wired and wireless connections. As we have “progressed” from a reliance on wired connections to a near-universal expectation of wireless Internet options, we’ve also put some limitations on what speeds can be reliably achieved. I’ll discuss why the wired Internet to your home will likely always be faster than the latest fourth generation (4G) wireless being touted today.

To get a basic understanding of the limitations with wireless Internet, we must first talk about frequencies. (Don’t freak out if you’re not tech savvy. We usually do a pretty good job at explaining these things using analogies that anybody can understand.) The reason why frequencies are important to this discussion is that they’re the limiting factor to speed in a wireless network.

The FCC allows cell phone companies and other wireless Internet providers to use a specific range of frequencies (channels) to transmit data. For the sake of argument, let’s just say there are 256 frequencies available to the local wireless provider in your area. So in the simplest case of the old analog world, that means a local cell tower could support 256 phone conversations at one time.

However, with the development of better digital technology in the 1980s, wireless providers have been able to juggle more than one call on each frequency. This is done by using a time sharing system where bits are transmitted over the frequency in a round-robin type fashion such that several users are sharing the channel at one time.

The wireless providers have overcome the problem of having multiple users sharing a channel by dividing it up in time slices. Essentially this means when you are talking on your cell phone or bringing up a Web page on your browser, your device pauses to let other users on the channel. Only in the best case would you have the full speed of the channel to yourself (perhaps at 3 a.m. on a deserted stretch of interstate). For example, I just looked over some of the mumbo jumbo and promises of one-gigabit speeds for 4G devices, but only in a perfect world would you be able to achieve that speed.

In the real world of wireless, we need to know two things to determine the actual data rates to the end user.

  1. The maximum amount of data that can be transmitted on a channel
  2. The number of users sharing the channel

The answer to part one is straightforward: A typical wireless provider has channel licenses for frequencies in the 800 megahertz range.

A rule of thumb for transmitting digital data over the airwaves is that you can only send bits of  data at 1/2 the frequency. For example, 800 megahertz is 800 million cycles per second and 1/2 of that is 400 million cycles per second. This translates to a theoretical maximum data rate of 400 megabits. Realistically, with noise and other environmental factors, 1/10 of the original frequency is more likely. This gives us a maximum carrying capacity per channel of 80 megabits and a ballpark estimate for our answer to part one above.

However, the actual answer to variable two, the number of users sharing a channel, is a closely guarded secret among service providers. Conservatively, let’s just say you’re sharing a channel with 20 other users on a typical cell tower in a metro area. With 80 megabits to start from, this would put your individual maximum data rate at about four megabits during a period of heavy usage.

So getting back to the focus of the article, we’ve roughly worked out a realistic cap on your super-cool new 4G wireless device at four megabits. By today’s standards, this is a pretty fast connection. But remember this is a conservative benefit-of-the-doubt best case. Wireless providers are now talking about quota usage and charging severely for overages. That translates to the fact that they must be teetering on gridlock with their data networks now.  There is limited frequency real estate and high demand for content data services. This is likely to only grow as more and more users adopt mobile wireless technologies.

So where should you look for the fastest and most reliable connection? Well, there’s a good chance it’s right at home. A standard fiber connection, like the one you likely have with your home network, can go much higher than four megabits. However, as with the channel sharing found with wireless, you must also share the main line coming into your central office with other users. But assuming your cable operator runs a point-to-point fiber line from their office to your home, gigabit speeds would certainly be possible, and thus wired connections to your home will always be faster than the frequency limited devices of wireless.

Related Article: Commentary on Verizon quotas

Interesting  side note , in this article  by Deloitte they do not mention limitations of frequency spectrum as a limiting factor to growth.

NetEqualizer P2P Locator Technology


Editor’s NoteThe NetEqualizer has always been able to thwart P2P behavior on a network. However, our new utility can now pinpoint an individual P2P user or gamer without any controversial layer-7 packet inspectionThis is an extremely important step from a privacy point of view as we can actually spot P2P users without looking at any private data.

A couple of months ago, I was doing a basic health check on a customer’s heavily used residential network. In the process, I instructed the NetEqualizer to take a few live snapshots. I then used the network data to do some filtering with custom software scripts. Within just a few minutes, I was able to inform the administrator that eight users on his network were doing some heavy P2P, and one in particular looked to be hosting a gaming session. This was news to the customer, as his previous tools didn’t provide that kind of detail.

A few days later, I decided to formally write up my notes and techniques for monitoring a live system to share on the blog. But, as I got started, another lightbulb went on…in the end, many customers just want to know the basics — who is using P2P, hosting game servers, etc. They don’t always have the time to follow a manual diagnostic recipe.

So, with this in mind, instead of writing up the manual notes, I spent the next few weeks automating and testing an intelligent utility to provide this information. The utility is now available with NetEqualizer 5.0.

The utility provides: 

  • A list of users that are suspected of using P2P
  • A list of users that are likely hosting gaming servers
  • A confidence rating for each user (from high to low)
  • The option of tracking users by IP and MAC address

The key to determining a user’s behavior is the analysis of the fluctuations in their connection counts and total number of connections. We take snapshots over a few seconds, and like a good detective, we’ve learned how to differentiate P2P use from gaming, Web browsing and even video. We can do this without using any deep packet inspection. It’s all based on human-factor heuristics and years of practice.

Enclosed is a screen shot of the new P2P Locator, available under our Reports & Graphing menu.

Our new P2P Locator technology

Contact us to learn more about the NetEqualizer P2P Locator Technology or NetEqualizer 5.0. For more information about ongoing changes and challenges with BitTorrent and P2P, see Ars Technica’s “BitTorrent Has New Plan to Shape Up P2P Behavior.”

VLAN tags made simple


By Art Reisman, CTO, www.netequalizer.com

Art Reisman CTO www.netequalizer.com

Why am I writing a post on VLAN tags ?

VLAN tags and Bandwidth Control are often intimately related, but before I can post on the relationship I thought it prudent to comment on VLAN tags, I definitly think they are way over used and hope to comment on that also in a future post.

I generally don’t like VLAN tags, the original idea behind them was to solve the issue with  Ethernet broadcasts saturating network segment. Wikipedia explains it like this…

After successful experiments with voice over Ethernet from 1981 to 1984, Dr. W. David Sincoskie joined Bellcore and turned to the problem of scaling up Ethernet networks. At 10 Mbit/s, Ethernet was faster than most alternatives of the time; however, Ethernet was a broadcast network and there was not a good way of connecting multiple Ethernets together. This limited the total bandwidth of an Ethernet network to 10 Mbit/s and the maximum distance between any two nodes to a few hundred feet.

What does that mean and why do you care?

First lets address how an Ethernet broadcast works and then we can discuss Dr Sincoskies solution and make some sense of it.

When a bunch of computers share a single Ethernet segment of a network separated by switches everybody can hear each other talking

Think of 2 people in a room yelling back and forth to communicate, that might work if one person pauses after each yell to give the other person a chance to yell back.  Now if you had three people in a room they can still yell at each other and pause and listen for other people yelling and that might still work, but if you had 1000 people in the room and they are trying to talk to people on the other side of the room the pausing technique waiting for other people to talk does not work very well.  And that is exactly the problem with Ethernet as it grows everybody is trying to talk on the same wire at once.  VLAN tags work by essentially creating a bunch of smaller virtual  rooms where only the noise and yelling from the people in the virtual room can be heard at one time.

Now when you set up a VLAN tag (virtual room ) you have to put up the dividers. On a network this is done by having  the switches, the things the computers plug into,  be aware of what virtual room each computer is in. The Ethernet tag specifies the identifier for the virtual room and so once set up you have a bunch of virtual rooms and everybody can talk.

This sort of begs the question

Does everybody attached to the Internet live in a virtual room ?

No virtual rooms  (VLANs) were needed so a single organization like a company can put a box around their network segments to protect them with a common set of access rules ( firewall router), the Internet works fine without VLAN tags.

So a VLAN tag is only appropriate when a group of users sit behind a common router ?

Yes that is correct , Ethernet broadcasts ( yelling  as per our analogy) do not cross cross router boundaries on the Internet.

Routers handle public IP addresses to figure out where to send things. A router does not use broadcast (yelling), it is much more discrete , it only sends on data to another router if it knows that the data is supposed to go there.

So why do we have two mechanisms one for  local computers sending Ethernet broadcasts and another for routers using point to point routing ?

This post was supposed to be about VLAN tags….. I’ll take it one step further to explain the difference.

Perhaps you have heard about the layers of networking, layer 2 is Ethernet and Layer 3 is IP.

Answers.com gave me the monologue below, which is technically correct, but does not really make much sense unless you already had a good understanding of networking in the first place , so I’ll finish by breaking down this into something a little more relevant with some in-line comments.

Basically a layer 2 switch operates utilizing Mac addresses in it’s caching table to quickly pass information from port to port. A layer 3 switch utilizes IP addresses to do the same.

What this means is that an Ethernet switch looks at MAC addresses which are used by your router for local addressing to a computer on your network. Think back to people shouting in the room to communicate, the MAC address would be a Nick name that only their closest friends would use when they shout at each other. At the head end of your network is a router, this is where you connect to the Internet, and other Internet users send data to you from your IP address and this is essentially the well known public address at your router. The IP address could be thought of as the address of the building where everybody is inside shouting at each other. The routers job is to get information,sent by IP address  destined for some body inside the room to the door. If you are a Comcast home user you likely have a Modem where you cable plugs in the Modem is the gateway to your house and is addressed by IP address by the outside world.


Essentially, A layer 2 switch is essentially a multiport transparent bridge. A layer 2 switch will learn about MAC addresses connected to each port and passes frames marked for those ports.

The above paragraph is referring to how an Ethernet switch sends data around, everybody in room registers their Nick-Name to the switch so it can shout in the direction of the person in the room when new data comes in.

It also knows that if a frame is sent out a port but is looking for the MAC address of the port it is connected to and drop that frame. Whereas a single CPU Bridge runs in serial, todays hardware based switches run in parallel, translating to extremly fast switching.


I left this paragraph in because it is completely unrelated to the question I asked that Answers.com responded to, so ignore it. This is  a commentary about how modern switches can be reading and sending from multiple interfaces at the same time.

Layer 3 switching is a hybrid, as one can imagine, of a router and a switch. There are different types of layer 3 switching, route caching andtopology-based. In route caching the switch required both a Route Processor (RP) and a Switch Engine (SE). The RP must listen to the first packet to determine the destination. At that point the Switch Engine makes a shortcut entry in the caching table for the rest of the packets to follow.

More random stuff unrelated to the question “What is the difference between layer 3 and layer 2 ”

Due to advancement in processing power and drastic reductions in the cost of memory, today’s higher end layer 3 switches implement a topology-based switching which builds a lookup table and and poputlates it with the entire network’s topology. The database is held in hardware and is referenced there to maintain high throughput. It utilizes the longest address match as the layer 3 destination.

This is talking about how a Router translates between the local address Nick-Name of people yelling in the room and the public address of data leaving the building.
Now when and why would one use a l2 vs l3 vs a router? Simply put, a router will generally sit at the gateway between a private and a public network. A router can performNAT whereas an l3 switch cannot (imagine a switch that had the topology entries for the ENTIRE Internet!!).

Network Redundancy must start with your provider


By Art Reisman

Art Reisman CTO www.netequalizer.com

Editor’s note: Art Reisman is the CTO of APconnections. APconnections designs and manufactures the popular NetEqualizer bandwidth shaper.

The chances of being killed by a shark are 1 in 264 million. The chance of being mauled by a bear on your weekend outing in the woods are even less.   Fear is a strange emotion rooted deep within our brains. Despite a rational understanding of risks people are programmed to lose sleep and exhaust their adrenaline supply worrying about events that will never happen.

It is this same lack of rational risk evaluation that makes it possible  for vendors to sell unneeded equipment to otherwise budget conscious businesses.  The current , in vogue,  unwarranted  fears used to move network equipment    are IPv6 preparedness, and  equipment redundancy.

Equipment vendors tend to push customers toward internal redundant hardware solutions , not because they have your best interest in mind ,  if they did, they would first encourage you to get a redundant link to your ISP.

Twenty years of practical hands on experience tells us  that your Internet router’s chance of catastrophic failure is about 1 percent over a three-year period. On the other hand, your internet provider has a 95-percent chance of having a full-day outage during that same three-year period.

If you are truly worried about a connectivity failure into your business, you MUST source two separate paths to the Internet to have any significant reduction in risk. Requiring fail-over on individual pieces of equipment, without first securing complete redundancy in your network from your provider is like putting a band-aid on your finger while pleading from your jugular vein.

Some other useful tips on making your network more reliable include

Do not turn on unneeded bells and whistles on your router and firewall equipment.

Many router and device failures are not absolute. Equipment will get cranky, slow, or belligerent based on human error or system bugs. Although system bugs are rare when these devices are used in the default set-up, it seems turning on bells and whistles is often an irresistible enticement for a tech. The more features you turn on, the less standard your configuration becomes, and all too often the mission of the device is pushed well beyond its original intent. Routers doing billing systems, for example.

These “soft” failure situations are common, and the fail-over mechanism likely will not kick in, even though the device is sick and not passing traffic as intended. I have witnessed this type of failure first-hand at major customer installations. The failure itself is bad enough, but the real embarrassment comes from having to tell your customer that the fail-over investment they purchased is useless in a real-life situation. Fail-over systems are designed with the idea that the equipment they route around will die and go belly up like a pheasant shot point-blank with a 12-gauge shotgun. In reality, for every “hard” failure, there are 100 system-related lock ups where equipment sputters and chokes but does not completely die.

Start with a high-quality Internet line.

T1 lines, although somewhat expensive, are based on telephone technology that has long been hardened and paid for. While they do cost a bit more than other solutions, they are well-engineered to your doorstep.

Make sure all your devices have good UPS sources and surge protectors.

Consider this when purchasing redundant equipment,  what is the cost of manually moving a wire to bypass a failed piece of equipment?

Look at this option before purchasing redundancy options on single point of failure. We often see customers asking for redundant fail-over embedded in their equipment. This tends to be a strategy of purchasing hardware such as routers, firewalls, bandwidth shapers, and access points that provide a “fail open” (meaning traffic will still pass through the device) should they catastrophically fail. At face value, this seems like a good idea to cover your bases. Most of these devices embed a failover switch internally to their hardware. The cost of this technology can add about $3,000 to the price of the unit.

If equipment is vital to your operation, you’ll need a spare unit on hand in case of failure. If the equipment is optional or used occasionally, then take it out of your network.

Again, these are just some basic tips, and your final Internet redundancy plan will ultimately depend on your specific circumstances. But, these tips and questions should put you on your way to a decision based on facts rather than one based on unnecessary fears and concerns.

The Facts and Myths of Network Latency


There are many good references that explain how some applications such as VoIP are sensitive to network latency, but there is also some confusion as to what latency actually is as well as perhaps some misinformation about the causes. In the article below, we’ll separate the facts from the myths and also provide some practical analogies to help paint a clear picture of latency and what may be behind it.

Fact or Myth?

Network latency is caused by too many switches and routers in your network.

This is mostly a myth.

Yes, an underpowered router can introduce latency, but most local network switches add minimal latency — a few milliseconds at most. Anything under about 10 milliseconds is, for practical purposes, not humanly detectable. A router or switch (even a low-end one) may add about 1 millisecond of latency. To get to 10 milliseconds you would need eight or more hops, and even then you wouldn’t be near anything noticeable.

The faster your link (Internet) speed, the less latency you have.

This is a myth.

The speed of your network is measured by how fast IP packets arrive. Latency is the measure of how long they took to get there. So, it’s basically speed vs. time. An example of latency is when NASA sends commands to a Mars orbiter. The information travels at the speed of light, but it takes several minutes or longer for commands sent from earth to get to the orbiter. This is an example of data moving at high speed with extreme latency.

VoIP is very sensitive to network latency.

This is a fact.

Can you imagine talking in real time to somebody on the moon? Your voice would take about eight seconds to get there. For VoIP networks, it is generally accepted that anything over about 150 milliseconds of latency can be a problem. When latency gets higher than 150 milliseconds, issues will emerge — especially for fast talkers and rapid conversations.

Xbox games are sensitive to latency.

This is another fact.

For example, in may collaborative combat games, participants are required to battle players from other locations. Low latency on your network is everything when it comes to beating the opponent to the draw. If you and your opponent shoot your weapons at the exact same time, but your shot takes 200 milliseconds to register at the host server and your opponent’s shot gets there in 100 milliseconds, you die.

Does a bandwidth shaping device such as NetEqualizer increase latency on a network ?

This is true, but only for the “bad” traffic that’s slowing the rest of your network down anyway.

Ever hear of the firefighting technique where you light a back fire to slow the fire down? This is similar to the NetEqualizer approach. NetEqualizer deliberately adds latency to certain bandwidth intensive applications, such as large downloads and p2p traffic, so that chat, email, VoIP, and gaming get the bandwidth they need. The “back fire” (latency) is used to choke off the unwanted, or non-time sensitive, applications. (For more information on how the NetEqualizer works, click here.)

Video is sensitive to latency.

This is a myth.

Video is sensitive to the speed of the connection but not the latency. Let’s go back to our man on the moon example where data takes eight seconds to travel from the earth to the moon. Latency creates a problem with two-way voice communication because in normal conversion, an eight second delay in hearing what was said makes it difficult to carry a conversion. What generally happens with voice and long latency is that both parties start talking at the same time and then eight seconds later you experience two people talking over each other. You see this happening a lot with on television with interviews done via satellite. However most video is one way. For example, when watching a Netflix movie, you’re not communicating video back to Netflix. In fact, almost all video transmissions are on delay and nobody notices since it is usually a one way transmission.

New APconnections Corporate Speed Test Tool Released for NetEqualizer


For many Internet users, one of the first troubleshooting steps when online access seems to slow is to run a simple speed test. And, under the right circumstances, speed tests can be an effective way to pinpoint the problem.

However, slowing Internet speeds aren’t just an issue for the casual user. Over our years of troubleshooting thousands of corporate and other commercial links, a recurring issue has been customers not getting their full-advertised bandwidth from their upstream provider. Some customers are aware something is amiss from examining bandwidth reports on their routers and some of these problems we stumble upon while troubleshooting network congestion issues.

But, what if you have a shared, busy corporate Internet connection such as this — with hundreds or thousands of users on the link at one time? Should a traditional speed test be the first place to turn? In this situation, the answer is “no.” Running a speed test under these conditions is neither meaningful nor useful.

Let me explain.

The problem starts with the overall design and process of the speed test itself. Speed tests usually run short duration files. For example, a 10-megabit file sent over a hundred-megabit link might complete in 0.1 seconds, reporting the link speed to the operator at 100 megabits. However, statistically this is just a snapshot of one very small moment in time and is of little value when the demands on a network are constantly changing. Furthermore, with this type of test, the link must be free of active users, which is nearly impossible when you have an entire office, for example, accessing the network at once.

On these larger shared links, the true speed can only be measured during peak times with users accessing a wide variety of applications persistently over a significant period. But, there is no easily controlled Web speed test site that can measure this type of performance on your link.

Yes, a sophisticated IT administrator can run reports and see trends and make assumptions. And many do. Yet, for some businesses, this isn’t practical.

For this reason, we’ve introduced the NetEqualizer Speed Test Utility.

How Does the NetEqualizer Speed Test Utility Work?

The NetEqualizer Speed Test Utility is an intelligent tool embedded in your NetEqualizer that can be activated from your GUI. On high-traffic networks, there is always a busy hour background load on the link – a baseline if you will. When you set up the speed test tool, you simply tell the NetEqualizer some basics about your network, including:

  • Link Speed
  • Number of Users
  • Busy Hours

After turning the tool on, it will keep track of your network’s bandwidth usage. If your usage drops below expected levels, it will present a mild warning on the GUI screen that your bandwidth may be compromised and give an explanation of the deviation. The operator can also be notified by e-mail.

This set up allows bandwidth to be monitored without having to depend on unreliable speed tests or run time-consuming reports, allowing the problem to be more quickly identified and addressed.

For more information about the NetEqualizer Speed Test Utility, contact APconnections at sales@apconnections.net.

Seven Points to Consider When Planning Internet Redundancy


By Art Reisman

Art Reisman CTO www.netequalizer.com

Editor’s note: Art Reisman is the CTO of APconnections. APconnections designs and manufactures the popular NetEqualizer bandwidth shaper.

The chances of being killed by a shark are 1 in 264 million. Despite those low odds, most people worry about sharks when they enter the ocean, and yet the same people do not think twice about getting into a car without a passenger-side airbag.

And so it is with networking redundancy solutions. Many equipment purchase decisions are enhanced by an irrational fear (created by vendors) and not on actual business-risk mitigation.

The solution to this problem is simple. It’s a matter of being informed and making decisions based on facts rather than fear or emotion. While every situation is different, here a few basic tips and questions to consider when it comes to planning Internet redundancy.

1) Where is your largest risk of losing Internet connectivity?

Vendors tend to push customers toward internal hardware solutions to reduce risk.  For example, most customers want a circuit design within their servers that will allow traffic to pass should the equipment fail. Yet our polling data of our customers shows that your Internet router’s chance of catastrophic failure is about 1 percent over a three-year period.  On the other hand, your internet provider has an almost 100-percent chance of having a full-day outage during that same three-year period.

Perhaps the cost of sourcing two independent providers is prohibitive, and there is no choice but to live with this risk. All well and good, but if you are truly worried about a connectivity failure into your business, you cannot meaningfully mitigate this risk by sourcing hot failover equipment at your site.  You MUST source two separate paths to the Internet to have any significant reduction in risk.  Requiring failover on individual pieces of equipment, without complete redundancy in your network from your provider down, with all due respect, is a mitigation of political and not actual risk.

2) Do not turn on unneeded bells and whistles on your router and firewall equipment.

Many router and device failures are not absolute.  Equipment will get cranky,  slow, or belligerent based on human error or system bugs.  Although system bugs are rare when these devices are used in the default set-up, it seems turning on bells and whistles is often an irresistible enticement for a tech.  The more features you turn on, the less standard your configuration becomes, and all too often the mission of the device is pushed well beyond its original intent.  Routers doing billing systems, for example.

These “soft” failure situations are common, and the fail-over mechanism likely will not kick in, even though the device is sick and not passing traffic as intended.  I have witnessed this type of failure first-hand at major customer installations.  The failure itself is bad enough, but the real embarrassment comes from having to tell your customer that the fail-over investment they purchased is useless in a real-life situation. Fail-over systems are designed with the idea that the equipment they route around will die and go belly up like a pheasant shot point-blank with a 12-gauge shotgun.  In reality, for every “hard” failure, there are 100 system-related lock ups where equipment sputters and chokes but does not completely die.

3) Start with a high-quality Internet line.

T1 lines, although somewhat expensive, are based on telephone technology that has long been hardened and paid for. While they do cost a bit more than other solutions, they are well-engineered to your doorstep.

4) If possible, source two Internet providers and use BGP to combine them.

Since Internet providers are the usually weakest link in your connection, critical operations should consider this option first before looking to optimize other aspects of your internal circuit.

5) Make sure all your devices have good UPS sources and surge protectors.

6) What is the cost of manually moving a wire to bypass a failed piece of equipment?

Look at this option before purchasing redundancy options on single point of failure. We often see customers asking for redundant fail-over embedded in their equipment. This tends to be a strategy of purchasing hardware such as  routers, firewalls, bandwidth shapers, and access points that provide a “fail open” (meaning traffic will still pass through the device) should they catastrophically fail.  At face value, this seems like a good idea to cover your bases. Most of these devices embed a failover switch internally to their hardware.  The cost of this technology can add about $3,000 to the price of the unit.

7) If equipment is vital to your operation, you’ll need a spare unit on hand in case of failure. If the equipment is optional or used occasionally, then take it out of your network.

Again, these are just some basic tips, and your final Internet redundancy plan will ultimately depend on your specific circumstances.  But, these tips and questions should put you on your way to a decision based on facts rather than one based on unnecessary fears and concerns.

%d bloggers like this: