Editor’s Note: It was a long road to get here (building the NetEqualizer Caching Option (NCO) a new feature offered on the NE3000 & NE4000), and for those following in our footsteps or just curious on the intricacies of YouTube caching, we have laid open the details.
This evening, I’m burning the midnight oil. I’m monitoring Internet link statistics at a state university with several thousand students hammering away on their residential network. Our bandwidth controller, along with our new NetEqualizer Caching Option (NCO), which integrates Squid for caching, has been running continuously for several days and all is stable. From the stats I can see, about 1,000 YouTube videos have been played out of the local cache over the past several hours. Without the caching feature installed, most of the YouTube videos would have played anyway, but there would be interruptions as the Internet link coughed and choked with congestion. Now, with NCO running smoothly, the most popular videos will run without interruptions.
Getting the NetEqualizer Caching Option to this stable product was a long and winding road. Here’s how we got there.
First, some background information on the initial problem.
To use a Squid proxy server, your network administrator must put hooks in your router so that all Web requests go the Squid proxy server before heading out to the Internet. Sometimes the Squid proxy server will have a local copy of the requested page, but most of the time it won’t. When a local copy is not present, it sends your request on to the Internet to get the page (for example the Yahoo! home page) on your behalf. The squid server will then update a local copy of the page in its cache (storage area) while simultaneously sending the results back to you, the original requesting user. If you make a subsequent request to the same page, the Squid will quickly check it to see if the content has been updated since it stored away the first time, and if it can, it will send you a local copy. If it detects that the local copy is no longer valid (the content has changed), then it will go back out to the Internet and get a new copy.
Now, if you add a bandwidth controller to the mix, things get interesting quickly. In the case of the NetEqualizer, it decides when to invoke fairness based on the congestion level of the Internet trunk. However, with the bandwidth controller unit (BCU) on the private side of the Squid server, the actual Internet traffic cannot be distinguished from local cache traffic. The setup looks like this:
The BCU in this example won’t know what is coming from cache and what is coming from the Internet. Why? Because the data coming from the Squid cache comes over the same path as the new Internet data. The BCU will erroneously think all the traffic is coming from the Internet and will shape cached traffic as well as Internet traffic, thus defeating the higher speeds provided by the cache.
In this situation, the obvious solution would be to switch the position of the BCU to a setup like this:
This configuration would be fine except that now all the port 80 HTTP traffic (cached or not) will appear like it is coming from the Squid proxy server and your BCU will not be able to do things like put rate limits on individual users.
Fortunately, with the our NetEqualizer 5.0 release, we’ve created an integration with NetEqualizer and co-resident Squid (our NetEqualizer Caching Option) such that everything works correctly. (The NetEqualizer still sees and acts on all traffic as if it were between the user and the Internet. This required some creative routing and actual bug fixes to the bridging and routing in the Linux kernel. We also had to develop a communication module between the NetEqualizer and the Squid server so the NetEqualizer gets advance notice when data is originating in cache and not the Internet.)
Which do you need, Bandwidth Control or Caching?
At this point, you may be wondering if Squid caching is so great, why not just dump the BCU and be done with the complexity of trying to run both? Well, while the Squid server alone will do a fine job of accelerating the access times of large files such as video when they can be fetched from cache, a common misconception is that there is a big relief on your Internet pipe with the caching server. This has not been the case in our real world installations.
The fallacy for caching as panacea for all things congested assumes that demand and overall usage is static, which is unrealistic. The cache is of finite size and users will generally start watching more YouTube videos when they see improvements in speed and quality (prior to Squid caching, they might have given up because of slowness), including videos that are not in cache. So, the Squid server will have to fetch new content all the time, using additional bandwidth and quickly negating any improvements. Therefore, if you had a congested Internet pipe before caching, you will likely still have one afterward, leading to slow access for many e-mail, Web chat and other non-cachable content. The solution is to include a bandwidth controller in conjunction with your caching server. This is what NetEqualizer 5.0 now offers.
In no particular order, here is a list of other useful information — some generic to YouTube caching and some just basic notes from our engineering effort. This documents the various stumbling blocks we had to overcome.
It seemed that the URL tags on these files change with each access, like a counter, and a normal Squid server is fooled into believing the files have changed. By default, when a file changes, a caching server goes out and gets the new copy. In the case of YouTube files, the content is almost always static. However, the caching server thinks they are different when it sees the changing file names. Without modifications, the default Squid caching server will re-retrieve the YouTube file from the source and not the cache because the file names change. (Read more on caching YouTube with Squid…).
2. We had to move to a newer Linux kernel to get a recent of version of Squid (2.7) which supports the hooks for YouTube caching.
A side effect was that the new kernel destabilized some of the timing mechanisms we use to implement bandwidth control. These subtle bugs were not easily reproduced with our standard load generation tools, so we had to create a new simulation lab capable of simulating thousands of users accessing the Internet and YouTube at the same time. Once we built this lab, we were able to re-create the timing issues in the kernel and have them patched.
3. It was necessary to set up a firewall re-direct (also on the NetEqualizer) for port 80 traffic back to the Squid server.
This configuration, and the implementation of an extra bridge, were required to get everything working. The details of the routing within the NetEqualizer were customized so that we would be able to see the correct IP addresses of Internet sources and users when shaping. (As mentioned above, if you do not take care of this, all IPs (traffic) will appear as if they are coming from the Proxy server.
4. The firewall has a table called ConnTrack (not be confused with NetEqualizer connection tracking but similar).
The connection tracking table on the firewall tends to fill up and crash the firewall, denying new requests for re-direction if you are not careful. If you just go out and make the connection table randomly enormous that can also cause your system to lock up. So, you must measure and size this table based on experimentation. This was another reason for us to build our simulation lab.
5. There was also the issue of the Squid server using all available Linux file descriptors.
Linux comes with a default limit for security reasons, and when the Squid server hit this limit (it does all kinds of file reading and writing keeping descriptors open), it locks up.
Tuning changes that we made to support Caching with Squid
a. To limit the file size of a cached object of 2 megabytes (2MB) to 40 megabytes (40MB)
- minimum_object_size 2000000 bytes
- maximum_object_size 40000000 bytes
If you allow smaller cached objects it will rapidly fill up your cache and there is little benefit to caching small pages.
b. We turned off the Squid keep reading flag
- quick_abort_min 0 KB
- quick_abort_max 0 KB
This flag when set continues to read a file even if the user leave the page, for example when watching a video if the user aborts on their browser the Squid cache continues to read the file. I suppose this could now be turned back on, but during testing it was quite obnoxious to see data transfers talking place to the squid cache when you thought nothing was going on.
c. We also explicitly told the Squid what DNS servers to use in its configuration file. There was some evidence that without this the Squid server may bog down, but we never confirmed it. However, no harm is done by setting these parameters.
d. You have to be very careful to set the cache size not to exceed your actual capacity. Squid is not smart enough to check your real capacity, so it will fill up your file system space if you let it, which in turn causes a crash. When testing with small RAM disks less than four gigs of cache, we found that the Squid logs will also fill up your disk space and cause a lock up. The logs are refreshed once a day on a busy system. With a large amount of pages being accessed, the log will use close to one (1) gig of data quite easily, and then to add insult to injury, the log back up program makes a back up. On a normal-sized caching system there should be ample space for logs
e. Squid has a short-term buffer not related to caching. It is just a buffer where it stores data from the Internet before sending it to the client. Remember all port 80 (HTTP) requests go through the squid, cached or not, and if you attempt to control the speed of a transfer between Squid and the user, it does not mean that the Squid server slows the rate of the transfer coming from the Internet right away. With the BCU in line, we want the sender on the Internet to back off right away if we decide to throttle the transfer, and with the Squid buffer in between the NetEqualizer and the sending host on the Internet, the sender would not respond to our deliberate throttling right away when the buffer was too large (Link to Squid caching parameter).
f. How to determine the effectiveness of your YouTube caching statistics?
I use the Squid client cache statistics page. Down at the bottom there is a entry that lists hits verses requests.
- ICP : 0 Queries, 0 Hits (0%)
- HTTP: 21990877 Requests, 3812 Hits (0%)
At first glance, it may appear that the hit rate is not all that effective, but let’s look at these stats another way. A simple HTTP page generates about 10 HTTP requests for perhaps 80K bytes of data total. A more complex page may generate 500k. For example, when you go to the CNN home page there are quite a few small links, and each link increments the HTTP counter. On the other hand, a YouTube hit generates one hit for about 20 megabits of data. So, if I do a little math based on bytes cached we get, the summary of HTTP hits and requests above does not account for total data. But, since our cache is only caching Web pages from two megabits to 40 megabits, with an estimated average of 20 megabits, this gives us about 400 gigabytes of regular HTTP and 76 Gigabytes of data that came from the cache. Abut 20 percent of all HTTP data came from cache by this rough estimate, which is a quite significant.
The Promise of Streaming Video: An Unfunded MandateJune 1, 2010 — netequalizer
By Art Reisman, CTO, www.netequalizer.com
Art Reisman is a partner and co-founder of APconnections, a company that provides bandwidth control solutions (NetEqualizer) to ISPs, Universities, Libraries, Mining Camps, and any organization where groups of users must share their Internet resources equitably. What follows is an objective educational journey on how consumers and ISPs can live in harmony with the explosion of YouTube video.
The following is written primarily for the benefit of mid-to-small sized internet services providers (ISPs). However, home consumers may also find the details interesting. Please follow along as I break down the business cost model of the costs required to keep up with growing video demand.
In the past few weeks, two factors have come up in conversations with our customers, which has encouraged me to investigate this subject further and outline the challenges here:
1) Many of our ISP customers are struggling to offer video at competitive levels during the day, and yet are being squeezed due to high bandwidth costs. Many look to the NetEqualizer to alleviate video congestion problems. As you know, there are always trade-offs to be made in handling any congestion issue, which I will discuss at the end of this article. But back to the subject at hand. What I am seeing from customers is that there is an underlying fear that they (IT adminstrators) are behind the curve. As I have an opinion on this, I decided I need to lay out what is “normal” in terms of contention ratios for video, as well what is “practical” for video in today’s world.
2) My internet service provider, a major player that heavily advertises how fast their speed is to the home, periodically slows down standard YouTube Videos. I should be fair with my accusation, with the Internet you can actually never be quite certain who is at fault. Whether I am being throttled or not, the point is that there are an ever-growing number of video content providers , who are pushing ahead with plans that do not take into account, nor care about, a last mile provider’s ability to handle the increased load. A good analogy would be a travel agency that is booking tourists onto a cruise ship without keeping a tally of tickets sold, nor caring, for that matter. When all those tourists show up to board the ship, some form of chaos will ensue (and some will not be able to get on the ship at all).
Some ISPs are also adding to this issue, by building out infrastructure without regard to content demand, and hoping for the best. They are in a tight spot, getting caught up in a challenging balancing act between customers, profit, and their ability to actually deliver video at peak times.
The Business Cost Model of an ISP trying to accommodate video demands
Almost all ISPs rely on the fact that not all customers will pull their full allotment of bandwidth all the time. Hence, they can map out an appropriate subscriber ratio for their network, and also advertise bandwidth rates that are sufficient enough to handle video. There are four main governing factors on how fast an actual consumer circuit will be:
1) The physical speed of the medium to the customer’s front door (this is often the speed cited by the ISP)
2) The combined load of all customers sharing their local circuit and the local circuit’s capacity (subscriber ratio factors in here)
3) How much bandwidth the ISP contracts out to the Internet (from the ISP’s provider)
4) The speed at which the source of the content can be served (Youtube’s servers), we’ll assume this is not a source of contention for our examples below, but it certainly should remain a suspect in any finger pointing of a slow circuit.
The actual limit to the am0unt of bandwidth a customer gets at one time, which dictates whether they can run a live streaming video, usually depends on how oversold their ISP is (based on the “subscriber ratio” mentioned in points 1 and 2 above). If your ISP can predict the peak loads of their entire circuit correctly, and purchase enough bulk bandwidth to meet that demand (point 3 above), then customers should be able to run live streaming video without interruption.
The problem arises when providers put together a static set of assumptions that break down as consumer appetite for video grows faster than expected. The numbers below typify the trade-offs a mid-sized provider is playing with in order to make a profit, while still providing enough bandwidth to meet customer expectations.
1) In major metropolitan areas, as of 2010, bandwidth can be purchased in bulk for about $3000 per 50 megabits. Some localities less some more.
2) ISPs must cover a fixed cost per customer amortized: billing, sales staff, support staff, customer premise equipment, interest on investment , and licensing, which comes out to about $35 per month per customer.
3) We assume market competition fixes price at about $45 per month per customer for a residential Internet customer.
4) This leaves $10 per month for profit margin and bandwidth fees. We assume an even split: $5 a month per customer for profit, and $5 per month per customer to cover bandwidth fees.
With 50 megabits at $3000 and each customer contributing $5 per month, this dictates that you must share the 50 Megabit pipe amongst 600 customers to be viable as a business. This is the governing factor on how much bandwidth is available to all customers for all uses, including video.
So how many simultaneous YouTube Videos can be supported given the scenario above?
Live streaming YouTube video needs on average about 750kbs , or about 3/4 of a megabit, in order to run without breaking up.
On a 50 megabit shared link provided by an ISP, in theory you could support about 70 simultaneous YouTube sessions, assuming nothing else is running on the network. In the real world there would always be background traffic other than YouTube.
In reality, you are always going to have a minimum fixed load of internet usage from 600 customers of approximately 10-to-20 megabits. The 10-to-20 megabit load is just to support everything else, like web sufing, downloads, skype calls, etc. So realistically you can support about 40 YouTube sessions at one time. What this implies that if 10 percent of your customers (60 customers) start to watch YouTube at the same time you will need more bandwidth, either that or you are going to get some complaints. For those ISPs that desperately want to support video, they must count on no more than about 40 simultaneous videos running at one time, or a little less than 10 percent of their customers.
Based on the scenario above, if 40 customers simultaneously run YouTube, the link will be exhausted and all 600 customers will be wishing they had their dial-up back. At last check, YouTube traffic accounted for 10 percent of all Internet Traffic. If left completely unregulated, a typical rural ISP could find itself on the brink of saturation from normal YouTube usage already. With tier-1 providers in major metro areas, there is usually more bandwidth, but with that comes higher expectations of service and hence some saturation is inevitable.
This is why we believe that Video is currently an “unfunded mandate”. Based on a reasonable business cost model, as we have put forth above, an ISP cannot afford to size their network to have even 10% of their customers running real-time streaming video at the same time. Obviously, as bandwidth costs decrease, this will help the economic model somewhat.
However, if you still want to tune for video on your network, consider the options below…
NetEqualizer and Trade-offs to allow video
If you are not a current NetEqualizer user, please feel free to call our engineering team for more background. Here is my short answer on “how to allow video on your network” for current NetEqualizer users:
1) You can determine the IP address ranges for popular sites and give them priority via setting up a “priority host”.
This is not recommended for customers with 50 megs or less, as generally this may push you over into a gridlock situation.
2) You can raise your HOGMIN to 50,000 bytes per second.
This will generally let in the lower resolution video sites. However, they may still incur Penalities should they start buffering at a higher rate than 50,000. Again, we would not recommend this change for customers with pipes of 50 megabits or less.
With either of the above changes you run the risk of crowding out web surfing and other interactive uses , as we have described above. You can only balance so much Video before you run out of room. Please remember that the Default Settings on the NetEq are designed to slow video before the entire network comes to halt.
For more information, you can refer to another of Art’s articles on the subject of Video and the Internet: How much YouTube can the Internet Handle?
Other blog posts about ISPs blocking YouTube