Editor’s Note: It was a long road to get here (building the NetEqualizer Caching Option (NCO) a new feature offered on the NE3000 & NE4000), and for those following in our footsteps or just curious on the intricacies of YouTube caching, we have laid open the details.
This evening, I’m burning the midnight oil. I’m monitoring Internet link statistics at a state university with several thousand students hammering away on their residential network. Our bandwidth controller, along with our new NetEqualizer Caching Option (NCO), which integrates Squid for caching, has been running continuously for several days and all is stable. From the stats I can see, about 1,000 YouTube videos have been played out of the local cache over the past several hours. Without the caching feature installed, most of the YouTube videos would have played anyway, but there would be interruptions as the Internet link coughed and choked with congestion. Now, with NCO running smoothly, the most popular videos will run without interruptions.
Getting the NetEqualizer Caching Option to this stable product was a long and winding road. Here’s how we got there.
First, some background information on the initial problem.
To use a Squid proxy server, your network administrator must put hooks in your router so that all Web requests go the Squid proxy server before heading out to the Internet. Sometimes the Squid proxy server will have a local copy of the requested page, but most of the time it won’t. When a local copy is not present, it sends your request on to the Internet to get the page (for example the Yahoo! home page) on your behalf. The squid server will then update a local copy of the page in its cache (storage area) while simultaneously sending the results back to you, the original requesting user. If you make a subsequent request to the same page, the Squid will quickly check it to see if the content has been updated since it stored away the first time, and if it can, it will send you a local copy. If it detects that the local copy is no longer valid (the content has changed), then it will go back out to the Internet and get a new copy.
Now, if you add a bandwidth controller to the mix, things get interesting quickly. In the case of the NetEqualizer, it decides when to invoke fairness based on the congestion level of the Internet trunk. However, with the bandwidth controller unit (BCU) on the private side of the Squid server, the actual Internet traffic cannot be distinguished from local cache traffic. The setup looks like this:
Internet->router->Squid->bandwidth controller->users
The BCU in this example won’t know what is coming from cache and what is coming from the Internet. Why? Because the data coming from the Squid cache comes over the same path as the new Internet data. The BCU will erroneously think all the traffic is coming from the Internet and will shape cached traffic as well as Internet traffic, thus defeating the higher speeds provided by the cache.
In this situation, the obvious solution would be to switch the position of the BCU to a setup like this:
Internet->router->bandwidth controller->Squid->users
This configuration would be fine except that now all the port 80 HTTP traffic (cached or not) will appear like it is coming from the Squid proxy server and your BCU will not be able to do things like put rate limits on individual users.
Fortunately, with the our NetEqualizer 5.0 release, we’ve created an integration with NetEqualizer and co-resident Squid (our NetEqualizer Caching Option) such that everything works correctly. (The NetEqualizer still sees and acts on all traffic as if it were between the user and the Internet. This required some creative routing and actual bug fixes to the bridging and routing in the Linux kernel. We also had to develop a communication module between the NetEqualizer and the Squid server so the NetEqualizer gets advance notice when data is originating in cache and not the Internet.)
Which do you need, Bandwidth Control or Caching?
At this point, you may be wondering if Squid caching is so great, why not just dump the BCU and be done with the complexity of trying to run both? Well, while the Squid server alone will do a fine job of accelerating the access times of large files such as video when they can be fetched from cache, a common misconception is that there is a big relief on your Internet pipe with the caching server. This has not been the case in our real world installations.
The fallacy for caching as panacea for all things congested assumes that demand and overall usage is static, which is unrealistic. The cache is of finite size and users will generally start watching more YouTube videos when they see improvements in speed and quality (prior to Squid caching, they might have given up because of slowness), including videos that are not in cache. So, the Squid server will have to fetch new content all the time, using additional bandwidth and quickly negating any improvements. Therefore, if you had a congested Internet pipe before caching, you will likely still have one afterward, leading to slow access for many e-mail, Web chat and other non-cachable content. The solution is to include a bandwidth controller in conjunction with your caching server. This is what NetEqualizer 5.0 now offers.
In no particular order, here is a list of other useful information — some generic to YouTube caching and some just basic notes from our engineering effort. This documents the various stumbling blocks we had to overcome.
It seemed that the URL tags on these files change with each access, like a counter, and a normal Squid server is fooled into believing the files have changed. By default, when a file changes, a caching server goes out and gets the new copy. In the case of YouTube files, the content is almost always static. However, the caching server thinks they are different when it sees the changing file names. Without modifications, the default Squid caching server will re-retrieve the YouTube file from the source and not the cache because the file names change. (Read more on caching YouTube with Squid…).
2. We had to move to a newer Linux kernel to get a recent of version of Squid (2.7) which supports the hooks for YouTube caching.
A side effect was that the new kernel destabilized some of the timing mechanisms we use to implement bandwidth control. These subtle bugs were not easily reproduced with our standard load generation tools, so we had to create a new simulation lab capable of simulating thousands of users accessing the Internet and YouTube at the same time. Once we built this lab, we were able to re-create the timing issues in the kernel and have them patched.
3. It was necessary to set up a firewall re-direct (also on the NetEqualizer) for port 80 traffic back to the Squid server.
This configuration, and the implementation of an extra bridge, were required to get everything working. The details of the routing within the NetEqualizer were customized so that we would be able to see the correct IP addresses of Internet sources and users when shaping. (As mentioned above, if you do not take care of this, all IPs (traffic) will appear as if they are coming from the Proxy server.
4. The firewall has a table called ConnTrack (not be confused with NetEqualizer connection tracking but similar).
The connection tracking table on the firewall tends to fill up and crash the firewall, denying new requests for re-direction if you are not careful. If you just go out and make the connection table randomly enormous that can also cause your system to lock up. So, you must measure and size this table based on experimentation. This was another reason for us to build our simulation lab.
5. There was also the issue of the Squid server using all available Linux file descriptors.
Linux comes with a default limit for security reasons, and when the Squid server hit this limit (it does all kinds of file reading and writing keeping descriptors open), it locks up.
Tuning changes that we made to support Caching with Squid
a. To limit the file size of a cached object of 2 megabytes (2MB) to 40 megabytes (40MB)
- minimum_object_size 2000000 bytes
- maximum_object_size 40000000 bytes
If you allow smaller cached objects it will rapidly fill up your cache and there is little benefit to caching small pages.
b. We turned off the Squid keep reading flag
- quick_abort_min 0 KB
- quick_abort_max 0 KB
This flag when set continues to read a file even if the user leave the page, for example when watching a video if the user aborts on their browser the Squid cache continues to read the file. I suppose this could now be turned back on, but during testing it was quite obnoxious to see data transfers talking place to the squid cache when you thought nothing was going on.
c. We also explicitly told the Squid what DNS servers to use in its configuration file. There was some evidence that without this the Squid server may bog down, but we never confirmed it. However, no harm is done by setting these parameters.
d. You have to be very careful to set the cache size not to exceed your actual capacity. Squid is not smart enough to check your real capacity, so it will fill up your file system space if you let it, which in turn causes a crash. When testing with small RAM disks less than four gigs of cache, we found that the Squid logs will also fill up your disk space and cause a lock up. The logs are refreshed once a day on a busy system. With a large amount of pages being accessed, the log will use close to one (1) gig of data quite easily, and then to add insult to injury, the log back up program makes a back up. On a normal-sized caching system there should be ample space for logs
e. Squid has a short-term buffer not related to caching. It is just a buffer where it stores data from the Internet before sending it to the client. Remember all port 80 (HTTP) requests go through the squid, cached or not, and if you attempt to control the speed of a transfer between Squid and the user, it does not mean that the Squid server slows the rate of the transfer coming from the Internet right away. With the BCU in line, we want the sender on the Internet to back off right away if we decide to throttle the transfer, and with the Squid buffer in between the NetEqualizer and the sending host on the Internet, the sender would not respond to our deliberate throttling right away when the buffer was too large (Link to Squid caching parameter).
f. How to determine the effectiveness of your YouTube caching statistics?
I use the Squid client cache statistics page. Down at the bottom there is a entry that lists hits verses requests.
TOTALS
- ICP : 0 Queries, 0 Hits (0%)
- HTTP: 21990877 Requests, 3812 Hits (0%)
At first glance, it may appear that the hit rate is not all that effective, but let’s look at these stats another way. A simple HTTP page generates about 10 HTTP requests for perhaps 80K bytes of data total. A more complex page may generate 500k. For example, when you go to the CNN home page there are quite a few small links, and each link increments the HTTP counter. On the other hand, a YouTube hit generates one hit for about 20 megabits of data. So, if I do a little math based on bytes cached we get, the summary of HTTP hits and requests above does not account for total data. But, since our cache is only caching Web pages from two megabits to 40 megabits, with an estimated average of 20 megabits, this gives us about 400 gigabytes of regular HTTP and 76 Gigabytes of data that came from the cache. Abut 20 percent of all HTTP data came from cache by this rough estimate, which is a quite significant.
Like this:
Like Loading...
Our Take on Network Instruments 5th Annual Network Global Study
March 27, 2012 — netequalizerEditors Note: Network Instruments released their “Fifth Annual State of the Network Global study” on March 13th, 2o12. You can read their full study here. Their results were based on responses by 163 network engineers, IT directors, and CIOs in North America, Asia, Europe, Africa, Australia, and South America. Responses were collected from October 22, 2011 to January 3, 2012.
What follows is our take (or my .02 cents) on the key findings around Bandwidth Management and Bandwidth Monitoring from the study.
Finding #1: Over the next two years, more than one-third of respondents expect bandwidth consumption to increase by more than 50%.
Part of me says “well, duh!” but that is only because we hear that from many of our customers. So I guess if you were an Executive, far removed from the day-to-day, this would be an important thing to have pointed out to you. Basically, this is your wake up call (if you are not already awake) to listen to your Network Admins who keep asking you to allocate funds to the network. Now is the time to make your case for more bandwidth to your CEO/President/head guru. Get together budget and resources to build out your network in anticipation of this growth – so that you are not caught off guard. Because if you don’t, someone else will do it for you.
Finding #2: 41% stated network and application delay issues took more than an hour to resolve.
You can and should certainly put monitoring on your network to be able to see and react to delays. However, another way to look at this, admittedly biased from my bandwidth shaping background, is get rid of the delays!
If you are still running an unshaped network, you are missing out on maximizing your existing resource. Think about how smoothly traffic flows on roads, because there are smoothing algorithms (traffic lights) and rules (speed limits) that dictate how traffic moves, hence “traffic shaping.” Now, imagine driving on roads without any shaping in place. What would you do when you got to a 4-way intersection? Whether you just hit the accelerator to speed through, or decided to stop and check out the other traffic probably depends on your risk-tolerance and aggression profile. And the result would be that you make it through OK (live) or get into an ugly crash (and possibly die).
Similarly, your network traffic, when unshaped, can live (getting through without delays) or die (getting stuck waiting in a queue) trying to get to its destination. Whether you look at deep packet inspection, rate limiting, equalizing, or a home-grown solution, you should definitely look into bandwidth shaping. Find a solution that makes sense to you, will solve your network delay issues, and gives you a good return-on-investment (ROI). That way, your Network Admins can spend less time trying to find out the source of the delay.
Finding #3: Video must be dealt with.
24% believe video traffic will consume more than half of all bandwidth in 12 months.
47% say implementing and measuring QoS for video is difficult.
49% have trouble allocating and monitoring bandwidth for video.
Again, no surprise if you have been anywhere near a network in the last 2 years. YouTube use has exploded and become the norm on both consumer and business networks. Add that to the use of video conferencing in the workplace to replace travel, and Netflix or Hulu to watch movies and TV, and you can see that video demand (and consumption) has risen sharply.
Unfortunately, there is no quick, easy fix to make sure that video runs smoothly on your network. However, a combination of solutions can help you to make video run better.
1) Get more bandwidth.
This is just a basic fact-of-life. If you are running a network of < 10Mbps, you are going to have trouble with video, unless you only have one (1) user on your network. You need to look at your contention ratio and size your network appropriately.
2) Cache static video content.
Caching is a good start, especially for static content such as YouTube videos. One caveat to this, do not expect caching to solve network congestion problems (read more about that here) – as users will quickly consume any bandwidth that caching has freed up. Caching will help when a video has gone viral, and everyone is accessing it repeatedly on your network.
3) Use bandwidth shaping to prioritize business-critical video streams (servers).
If you have a designated video-streaming server, you can define rules in your bandwidth shaper to prioritize this server. The risk of this strategy is that you could end up giving all your bandwidth to video; you can reduce the risk by rate capping the bandwidth portioned out to video.
As I said, this is just my take on the findings. What do you see? Do you have a different take? Let us know!
Share this:
Like this: