Stick a Fork in Third Party Caching (Squid Proxy)


I was just going through our blog archives and noticed that many of the caching articles we promoted circa 2011 are still getting hits.  Many of the hits are coming from less developed countries where bandwidth is relatively expensive when compared to the western world.  I hope that businesses and ISPs hoping for a miracle using caching will find this article, as it applies to all third-party caching engines, not just the one we used to offer as an add-on to the NetEqualizer.

So why do I make such a bold statement about third-party caching becoming obsolete?

#1) There have been some recent changes in the way Google provides YouTube content, which makes caching it almost impossible.  All of their YouTube videos are generated dynamically and broken up into segments, to allow differential custom advertising.  (I yearn for the days without the ads!)

#2) Almost all pages and files on the Internet are marked “Do not Cache” in the HTML headers. Some of them will cache effectively, but you must assume the designer plans on making dynamic, on the fly, changes to their content.  Caching an obsolete page and delivering it to an end user could actually result in serious issues, and perhaps even a lawsuit, if you cause some form of economic harm by ignoring the “do not cache” directive.

#3) Streaming content as well as most HTML content is now encrypted, and since we are not the NSA, we do not have a back door to decrypt and deliver from our caching engines.

As you may have noticed I have been careful to point out that caching is obsolete on third-party caching engines, not all caching engines, so what gives?

Some of the larger content providers, such as Netflix, will work with larger ISPs to provide large caching servers for their proprietary and encrypted content. This is a win-win for both Netflix and the Last Mile ISP.  There are some restrictions on who Netflix will support with this technology.  The point is that it is Netflix providing the caching engine, for their content only, with their proprietary software, and a third-party engine cannot offer this service.  There may be other content providers providing a similar technology.  However, for now, you can stick a fork in any generic third-party caching server.

Caching Your iOS Updates Made Easy


If you have talked to us about caching in recent months, you probably know that we are now lukewarm on open ended third party caching servers . The simple un-encrypted content of the Internet circa 2010 has been replaced by dynamically generated pages along with increased content encryption.  It’s not that the caching servers don’t work, it’s just that if they follow rules of good practice, the amount of data that a caching server can cache has diminished greatly over the last few years.

The good news is that Apple has realized the strain they are putting on Business and ISP networks when their updates come out. They have recently released an easy to implement low-cost caching solution specifically for Apple content.  In fact, one of our customers noted in a recent discussion group that they are using an old MAC mini to cache iOS updates for an entire College Campus.

Other notes on Caching Options

Akamai offers a cloud solution. Usually hosted at larger providers, but if you are buying bandwidth in bulk sometimes you can often piggyback on their savings and get a discount on cached traffic.

There is also a service offered by Netflix for larger providers.  However, last I checked you must be using 10 gigabits sustained Netflix traffic to qualify.

So You Think you Have Enough Bandwidth?


There are actually only two tiers of bandwidth , video for all, and not video for all. It is a fairly black and white problem. If you secure enough bandwidth such that 25 to 30 percent of your users can simultaneously watch video feeds, and still have some head room on your circuit, congratulations  – you have reached bandwidth nirvana.

Why is video the lynchpin in this discussion?

Aside from the occasional iOS/Windows update, most consumers really don’t use that much bandwidth on a regular basis. Skype, chat, email, and gaming, all used together, do not consume as much bandwidth as video. Hence, the marker species for congestion is video.

Below, I present some of the metrics to see if you can mothball your bandwidth shaper.

1) How to determine the future bandwidth demand.
Believe it or not, you can outrun your bandwidth demand, if your latest bandwidth upgrade is large enough to handle the average video load per customer.  Then it is possible that no further upgrades will be needed, at least in the foreseeable future.

In the “Video for all” scenario the rule of thumb is you can assume 25 percent of your subscribers watching video at any one time.  If you still have 20 percent of your bandwidth left over, you have reached the video for all threshold.

To put some numbers to this
Assume 2000 subscribers, and a 1 gigabit link. The average video feed will require about 2 megabits. (note some HD video is higher than this )  This would mean, to support video 25 percent of your subscribers would use the entire 1 gigabit and there is nothing left over anybody else, hence you will run out of  bandwidth.

Now if you have 1.5 gigabits for 2000 subscribers you have likely reached the video for all threshold, and most likely you will be able to support them without any advanced intelligent bandwidth control . A simple 10 megabit rate cap per subscriber is likely all you would need.

2) Honeymoon periods are short-lived.
The reason why the reprieve in congestion after a bandwidth upgrade is so short-lived is usually because the operator either does not have a good intelligent bandwidth control solution, or they take their existing solution out thinking mistakenly they have reached the “video for all” level.  In reality, they are still under the auspices of the video not for all. They are lulled into a false sense of security for a brief honeymoon period.  After the upgrade things are okay. It takes a while for a user base to fill the void of a new bandwidth upgrade.

Bottom line: Unless you have the numbers to support 25 to 30 percent of your user base running video you will need some kind of bandwidth control.

Surviving iOS updates


The birds outside my office window are restless. I can see the strain in the Comcast cable wires as they droop, heavy with the burden of additional bits, weighting them down like a freak ice storm.   It is time, once again, for Apple to update every device in the Universe with their latest IOS update.

Assuming you are responsible for a Network with a limited Internet pipe, and you are staring down 100 or more users, about to hit the accept button for their update, what can you do to prevent your user network from being gridlocked?

The most obvious option to gravitate to is caching. I found this nice article (thanks Luke) on the Squid settings used for a previous iOS update in 2013.  Having worked with Squid quite a bit helping our customers, I was not surprised on the amount of tuning required to get this to work, and I suspect there will be additional changes to make it work in 2014.

If you have a Squid caching solution already up and running it is worth a try, but I am on the fence of recommending a Squid install from scratch.  Why? Because we are seeing diminishing returns from Squid caching each year due to the amount of dynamic content.  Translation: Very few things on the Internet come from the same place with the same filename anymore, and for many content providers they are marking much of their content as non-cacheable.

If you have a NetEqualizer in place you can easily blunt the effects of the data crunch with a standard default set-up. The NetEqualizer will automatically push the updates out further into time, especially during peak hours when there is contention. This will allow other applications on your network to function normally during the day. I doubt anybody doing the update will notice the difference.

Finally if you are desperate, you might be able to block access to anything IOS update on your firewall.  This might seem a bit harsh, but then again Apple did not consult with you, and besides isn’t that what the free Internet at Starbucks is for?

Here is a snippet pulled from a forum on how to block it.

iOS devices check for new versions by polling the server mesu.apple.com. This is done via HTTP, port 80. Specifically, the URL is:

http://mesu.apple.com/assets/com_apple_MobileAsset_SoftwareUpdate/com_apple_MobileAsset_SoftwareUpdate.xml

If you block or redirect mesu.apple.com, you will inhibit the check for software updates. If you are really ambitious, you could redirect the query to a cached copy of the XML, but I haven’t tried that. Please remove the block soon; you wouldn’t want to prevent those security updates, would you?

Stuck on Desert Island, Do You Take Your Caching Server or Your Netequalizer ?


Caching is a great idea and works well, but I’ll take my NetEqualizer with me if forced to choose between the two on my remote island with a satellite link.

Yes there are  a few circumstances where a caching server might have a nice impact. Our most successful deployments are in educational environments where the same video is watched repeatedly as an assignment;  but for most wide open installations  ,expectations of performance far outweigh reality.   Lets  have at look at what works and also drill down on expectations that are based on marginal assumptions.

From my personal archive of experience here are some of the expectations attributed to caching that perhaps are a bit too optimistic.

“Most of my users go to their Yahoo or Face Book home page every day when they log in and that is the bulk of all they do”

– I doubt this customer’s user base is that conformist :),   and they’ll find out once they install their caching solution.  But even if true, only some of the content on Face  Book and Yahoo is static.  A good portion of these pages are by default dynamic, and ever-changing with content.  They are marked as Dynamic in their URLs which means the bulk of the page must be reloaded each time.  For example,  in order for caching to have an impact , the users in this scenario would have to stick to their home pages , and not look at friend photo’s or other pages.

” We expect to see a 30 percent hit rate when we deploy our cache.”

You won’t see a 30 percent hit rate, unless somebody designs some specific robot army to test your cache, hitting the same pages over and over again. Perhaps, on IOS update day, you might see a bulk of your hits going to the same large file and have a significant performance boost for a day. But overall you will be  doing well if  you get a 3 or 4 percent hit rate.

” I expect the cache hits to take pressure off my Internet Link”

Assuming you want your average user to experience a fast loading Internet, this is where you really want your NetEqualizer ( or similar intelligent bandwidth controller) over your caching engine. The smart bandwidth controller can re-arrange traffic on the fly insuring Interactive hits get the best response. A caching engine does not have that intelligence.

Let’s suppose you have a 100 megabit link to the Internet ,and you install a cache engine that effectively gets a 6 percent hit rate. That would be exceptional  hit rate.

So what is the  end user experience with a 6 percent hit rate compared to pre-cache ?

-First off, it is not the hit rate that matters when looking at total bandwidth. Much of those hits will likely be smallish image  files from the Yahoo home page or common sites, that account for less than 1 percent of your actual traffic.  Most of your traffic is likely dominated by large file downloads and only a portion of those may be coming from cache.

– A 6 percent hit rate means that 94 percent miss rate , and if your Internet was slow from congestion before the caching server it will still be slow 94 percent of the time.

– Putting in a caching server  would be like upgrading your bandwidth from 100 megabits to 104 megabits to relieve congestion. That cache hits may add to the total throughput in your reports, but the 100 megabit bottleneck is still there, and to the end user, there is little or no difference in user perception at this point.  A  portion of your Internet access is still marginal or unusable during peak times, and other than the occasional web page or video loading nice and snappy , users are getting duds most of the time.

Even the largest caching server is insignificant in how much data it can store.

– The Internet is Vast and your Cache is not. Think of a tiny Ant standing on top of Mount Everest. YouTube puts up 100 hours of new content every minute of every day. A small commercial caching server can store about 1/1000 of what YouTube uploads in day, not to mention yesterday and the day before and last year. It’s just not going to be in your cache.

So why is a NetEqualizer bandwidth controller so much more superior than a caching server when changing user perception of speed?  Because the NetEqualizer is designed to keep Internet access from crashing , and this is accomplished by reducing the large file transfers and video download footprints during peak times. Yes these videos  and downloads may be slow or sporadic, but they weren’t going to work anyway, so why let them crush the interactive traffic ? In the end caching and equalizing are not perfect, but from real world trials the equalizer changes the user experience from slow to fast for all Interactive transactions, caching is hit or miss ( pun intended).

Squid Caching Can be Finicky


Editors Note: The past few weeks we have been working on tuning and testing our caching engine. We have been working  closely with  some of the developers who contribute to the Squid open source program.

Following are some of my  observations and discoveries regarding Squid Caching from our testing process.

Our primary mission was to make sure YouTube files cache correctly ( which we have done). One of the tricky aspects of caching a YouTube file, is that many of these files are considered dynamic content. Basically, this means their content contains a portion that may change with each access, sometimes the URL itself is just a pointer to a server where the content is generated fresh with each new access.

An extreme example of dynamic content would be your favorite stock quote site. During the business day much of the information on these pages is changing constantly, thus it  is obsolete within seconds. A poorly designed caching engine would do much more harm than good if it served up out of data stock quotes.

Caching engines by default try not cache dynamic content, and for good reason.    There are two different methods a caching server uses to decide whether or not to cache a page

1) The web designer can specifically set flags in the  format the actual URL  to tell caching engines whether a page is safe to cache or not.

In a recent test I set up a crawler to walk through the excite web site and all its urls. I use this crawler to create load in our test lab as well as to fill up our caching engine with repeatable content. I set my Squid Configuration file to cache all content less than 4k. Normally this would generate a great deal of Web hits , but for some reason none of the Excite content would cache. Upon further analysis our Squid consultant found the problem.

  I have completed the initial analysis. The problem is the excite.com
server(s). All of the “200 OK” excite.com responses that I have seen
among the first 100+ requests contain Cache-Control headers that
prohibit their caching by shared caches. There appears to be only two
kinds of Cache-Control values favored by excite:

Cache-Control: no-store, no-cache, must-revalidate, post-check=0,
               pre-check=0

and

Cache-Control: private,public

Both are deadly for a shared Squid cache like yours. Squid has options
to overwrite most of these restrictions, but you should not do that for
all traffic as it will likely break some sites.”

2) The second method is a bit more passive than deliberate directives.  Caching engines look at the actual URL of a page to gain clues about its permanence. A “?” used in the url implies dynamic content and is generally a  red flag to the caching server . And here-in lies the issue with caching Youtube files, almost all of them have  a “?” embedded within their URL.

Fortunately  Youtube Videos,  are normally permanent and unchanging once they are uploaded. I am still getting a handle these pages, but it seems the dynamic part is used for the insertion of different advertisements on the front end of the Video.  Our squid caching server uses a normalizing technique to keep the root of the URL consistent and thus serve up the correct base YouTube every time. Over the past two years we have had to replace our normalization technique twice in order to consistently cache YouTube files.

Caching in the Cloud is Here


By Art Reisman, CTO APconnections (www.netequalizer.com)

I just got a note from a customer, a University, that their ISP is offering them 200 megabit internet at fixed price. The kicker is, they can also have access to a 1 gigabit feed specifically for YouTube at no extra cost.  The only explanation for this is that their upstream ISP has an extensive in-network YouTube cache. I am just kicking myself for not seeing this coming!

I was well-aware that many of the larger ISPs cached NetFlix and YouTube on a large scale, but this is the first I have heard of a bandwidth provider offering a special reduced rate for YouTube to a customer downstream. I am just mad at myself for not predicting this type of offer and hearing about it from a third party.

As for the NetEqualizer, we have already made adjustments in our licensing for this differential traffic to come through at no extra charge beyond your regular license level, in this case 200 megabits. So if for example, you have a 350 megabit license, but have access to a 1Gbps YouTube feed, you will pay for a 350 megabit license, not 1Gbps.  We will not charge you for the overage while accessing YouTube.

%d bloggers like this: