Network Bottlenecks – When Your Router Drops Packets, Things Can Get Ugly


By Art Reisman

CTO – APconnections

As a general rule, when a network router sees more packets than it can send or receive on a link, it will drop the extra  packets. Intuitively, when your router is dropping packets, one would assume that the perceived slow down, per user, would be just a gradual shift slower.

What happens in reality is far worse…

1) Distant users get spiraling slower responses.

Martin Roth, a colleague of ours who founded one of the top performance analysis companies in the world, provided this explanation:

“Any device which is dropping packets “favors” streams with the shortest round trip time, because (according to the TCP protocol) the time after which a lost packet is recovered is depending on the round trip time. So when a company in Copenhagen/Denmark has a line to Australia and a line to Germany on the same internet router, and this router is discarding packets because of bandwidth limits/policing, the stream to Australia is getting much bigger “holes” per lost packet (up to 3 seconds) than the stream to Germany or another office in Copenhagen. This effect then increases when the TCP window size to Australia is reduced (because of the retransmissions), so there are fewer bytes per round trip and more holes between to round trips.”

In the screen shot above (courtesy of avenida.dk), the Bandwidth limit is 10 Mbit (= 1 Mbyte/s net traffic), so everything on top of that will get discarded. The problem is not the discards, this is standard TCP behaviour, but the connections that are forcefully closed because of the discards. After the peak in closed connections, there is a “dip” in bandwidth utilization, because we cut too many connections.

2) Once you hit a congestion point, where your router is forced to drop packets, overall congestion actually gets worse before it gets better.

When applications don’t get a response due to a dropped packet, instead of backing off and waiting, they tend to start sending re-tries, and this is why you may have noticed prolonged periods (3o seconds or more) of no service on a congested network. We call this the rolling brown out. Think of this situation as sort of a doubling down on bandwidth at the moment of congestion. Instead of easing into a full network and lightly bumping your head, all the devices demanding bandwidth ramp up their requests at precisely the moment when your network is congested, resulting in an explosion of packet dropping until everybody finally gives up.

How do you remedy outages caused by Congestion?

We have written extensively about solutions to prevent bottlenecks. Here is a quick summary with links:

1) The most obvious being to increase the size of your link.

2) Enforce rate limits per user.

3) Wse something more sophisticated like a Netequalizer, a device that is designed to specifically counter the effects of congestion.

From Martin Roth of Avenida.dk

“With NetEqualizer we may get the same number of discards, but we get fewer connections closed, because we “kick” the few connections with the high bandwidth, so we do not get the “dip” in bandwidth utilization.

The graphs (above) were recorded using 1 second intervals, so here you can see the bandwidth is reached. In a standard SolarWinds graph with 10 minute averages the bandwidth utilization would be under 20% and the customer would not know they are hitting the limit.”

———————————————————————-

The excerpt below was a message from a reseller who had been struggling with congestion issues at a hotel, he tried basic rate limits on his router first. Rate Limits will buy you some time , but on an oversold network you can still hit the congestion point, and for this you need a smarter device.

“…NetEq delivered a 500% gain in available bandwidth by eliminating rate caps, possible through a mix of connection limits and Equalization.  Both are necessary.  The hotel went from 750 Kbit max per accesspoint (entire hotel lobby fights over 750Kbit; divided between who knows how many users) to 7Mbit or more available bandwidth for single users with heavy needs.

The ability to fully load the pipe, then reach out and instantly take back up to a third of it for an immediate need like a speedtest was also really eye-opening.  The pipe is already maxed out, but there is always a third of it that can be immediately cleared in time to perform something new and high-priority like a speed test.”
 
Rate Caps: nobody ever gets a fast Internet connection.
Equalized: the pipe stays as full as possible, yet anybody with a business-class need gets served a major portion of the pipe on demand. “
– Ben Whitaker – jetsetnetworks.com

Are those rate limits on your router good enough?

Pros and Cons of Using Your Router as a Bandwidth Controller


So, you already have a router in your network, and rather than take on the expense of another piece of equipment, you want to double-up on functionality by implementing your bandwidth control within your router. While this is sound logic and may be your best decision, as always, there are some other factors to consider.

Here are a few things to think about:

1. Routers are optimized to move packets from one network to another with utmost efficiency. To do this function, there is often minimal introspection of the data, meaning the router does one table look-up and sends the data on its way. However, as soon as you start doing some form of bandwidth control, your router now must perform a higher-level of analysis on the data. Additional analysis can overwhelm a router’s CPU without warning. Implementing non-routing features, such as protocol sniffing, can create conditions that are much more complex than the original router mission. For simple rate limiting there should be no problem, but if you get into more complex bandwidth control, you can overwhelm the processing power that your router was designed for.

2. The more complex the system, the more likely it is to lock up. For example, that old analog desktop phone set probably never once crashed. It was a simple device and hence extremely reliable. On the other hand, when you load up an IP phone on your Windows PC,  you will reduce reliability even though the function is the same as the old phone system. The problem is that your Windows PC is an unreliable platform. It runs out of memory and buggy applications lock it up.

This is not news to a Windows PC owner, but the complexity of a mission will have the same effect on your once-reliable router. So, when you start loading up your router with additional missions, it is increasingly more likely that it will become unstable and lock up. Worse yet, you might cause a subtle network problem (intermittent slowness, etc.) that is less likely to be identified and fixed. When you combine a bandwidth controller/router/firewall together, it can become nearly impossible to isolate problems.

3. Routing with TOS bits? Setting priority on your router generally only works when you control both ends of the link. This isn’t always an option with some technology. However, products such as the NetEqualizer can supply priority for VoIP in both directions on your Internet link.

4. A stand-alone bandwidth controller can be  moved around your network or easily removed without affecting routing. This is possible because a bandwidth controller is generally not a routable device but rather a transparent bridge. Rearranging your network setup may not be an option, or simply becomes much more difficult, when using your router for other functions, including bandwidth control.

These four points don’t necessarily mean using a router for bandwidth control isn’t the right option for you. However, as is the case when setting up any network, the right choice ultimately depends on your individual needs. Taking these points into consideration should make your final decision on routing and bandwidth control a little easier.

Seven Points to Consider When Planning Internet Redundancy


By Art Reisman

Art Reisman CTO www.netequalizer.com

Editor’s note: Art Reisman is the CTO of APconnections. APconnections designs and manufactures the popular NetEqualizer bandwidth shaper.

The chances of being killed by a shark are 1 in 264 million. Despite those low odds, most people worry about sharks when they enter the ocean, and yet the same people do not think twice about getting into a car without a passenger-side airbag.

And so it is with networking redundancy solutions. Many equipment purchase decisions are enhanced by an irrational fear (created by vendors) and not on actual business-risk mitigation.

The solution to this problem is simple. It’s a matter of being informed and making decisions based on facts rather than fear or emotion. While every situation is different, here a few basic tips and questions to consider when it comes to planning Internet redundancy.

1) Where is your largest risk of losing Internet connectivity?

Vendors tend to push customers toward internal hardware solutions to reduce risk.  For example, most customers want a circuit design within their servers that will allow traffic to pass should the equipment fail. Yet our polling data of our customers shows that your Internet router’s chance of catastrophic failure is about 1 percent over a three-year period.  On the other hand, your internet provider has an almost 100-percent chance of having a full-day outage during that same three-year period.

Perhaps the cost of sourcing two independent providers is prohibitive, and there is no choice but to live with this risk. All well and good, but if you are truly worried about a connectivity failure into your business, you cannot meaningfully mitigate this risk by sourcing hot failover equipment at your site.  You MUST source two separate paths to the Internet to have any significant reduction in risk.  Requiring failover on individual pieces of equipment, without complete redundancy in your network from your provider down, with all due respect, is a mitigation of political and not actual risk.

2) Do not turn on unneeded bells and whistles on your router and firewall equipment.

Many router and device failures are not absolute.  Equipment will get cranky,  slow, or belligerent based on human error or system bugs.  Although system bugs are rare when these devices are used in the default set-up, it seems turning on bells and whistles is often an irresistible enticement for a tech.  The more features you turn on, the less standard your configuration becomes, and all too often the mission of the device is pushed well beyond its original intent.  Routers doing billing systems, for example.

These “soft” failure situations are common, and the fail-over mechanism likely will not kick in, even though the device is sick and not passing traffic as intended.  I have witnessed this type of failure first-hand at major customer installations.  The failure itself is bad enough, but the real embarrassment comes from having to tell your customer that the fail-over investment they purchased is useless in a real-life situation. Fail-over systems are designed with the idea that the equipment they route around will die and go belly up like a pheasant shot point-blank with a 12-gauge shotgun.  In reality, for every “hard” failure, there are 100 system-related lock ups where equipment sputters and chokes but does not completely die.

3) Start with a high-quality Internet line.

T1 lines, although somewhat expensive, are based on telephone technology that has long been hardened and paid for. While they do cost a bit more than other solutions, they are well-engineered to your doorstep.

4) If possible, source two Internet providers and use BGP to combine them.

Since Internet providers are the usually weakest link in your connection, critical operations should consider this option first before looking to optimize other aspects of your internal circuit.

5) Make sure all your devices have good UPS sources and surge protectors.

6) What is the cost of manually moving a wire to bypass a failed piece of equipment?

Look at this option before purchasing redundancy options on single point of failure. We often see customers asking for redundant fail-over embedded in their equipment. This tends to be a strategy of purchasing hardware such as  routers, firewalls, bandwidth shapers, and access points that provide a “fail open” (meaning traffic will still pass through the device) should they catastrophically fail.  At face value, this seems like a good idea to cover your bases. Most of these devices embed a failover switch internally to their hardware.  The cost of this technology can add about $3,000 to the price of the unit.

7) If equipment is vital to your operation, you’ll need a spare unit on hand in case of failure. If the equipment is optional or used occasionally, then take it out of your network.

Again, these are just some basic tips, and your final Internet redundancy plan will ultimately depend on your specific circumstances.  But, these tips and questions should put you on your way to a decision based on facts rather than one based on unnecessary fears and concerns.

%d bloggers like this: