Caching is a bit overrated as a means of eliminating congestion and speeding up Internet access. Yes there are some nice caching tricks that create fleeting illusions of speed, but in the end, caching alone will fail to mitigate problems due to congestion. The following article adapted from our previous November 2011 posting details why.
You might be surprised to learn that Internet link congestion cannot be mitigated with a caching server alone. Contention can only be eliminated by:
1) Increasing bandwidth
2) Some form of intelligent bandwidth control
3) Or a combination of 1) and 2)
A common assumption about caching is that somehow you will be able to cache a large portion of common web content – such that a significant amount of your user traffic will not traverse your backbone with a decent caching solution. Unfortunately, our real world experience has shown us that the after the implementation of a caching solution the overall congestion on your Internet link shows no improvement.
For example: Let’s take the case of an Internet trunk that delivers 100 megabits, and is heavily saturated prior to implementing a caching solution. What happens when you add a caching server to the mix?
From our experience, a good hit rate to cache will likely not exceed 5 percent. Yes, we have heard claims of 50 percent, but have not seen this in practice and suspect this is just best case vendor hype or a very specialized solution targeted at NetFLix (not general caching). We have been selling a caching solution and discussing other caching solutions with customers for almost 3 years, and like any urban myth, claims of high percentage caching hits are impossible to track down.
Why is the hit rate at best only 5 percent?
The Internet is huge relative to a cache, and you can only cache a tiny fraction of total Internet content. Even Google, with billions invested in data storage, does not come close. You can attempt to keep trending popular content in the cache, but the majority of access requests to the Internet will tend to be somewhat random and impossible to anticipate. Yes, a good number of hits locally resolve a Yahoo home page, but many more users are going to do unique things. For example, common hits like email and Facebook are all very different for each user, are not a shared resource maintained in the cache. User hobbies are also all different, and thus they traverse different web pages and watch different videos. The point is you can’t anticipate this data and keep it in a local cache any more reliably than guessing the weather long term. You can get a small statistical advantage, and that accounts for the 5 percent that you get right.
Even with caching at a 5 percent hit rate, your backbone link usage will not decline.
With caching in place, any gain in efficiency will be countered by a corresponding increase in total usage. Why is this?
If you assume an optimistic 10 percent hit rate to cache, you will end up getting a boost and obviously handle 10 percent more traffic than you did prior to caching , however your main pipe won’t.
This is worth repeating, if you cache 10 percent of your data, that does not mean your Internet pipe usage will go from 100 percent to 90 percent , it is not a zero sum game. The net effect will be your main pipe will remain at 100 percent full , and you will get 10 percent on top of that from your cache.Thus your net usage to the Internet appears to be 110 percent. The problem is you still have a congested pipe and the associated slow web pages and files that are not stored in cache will suffer , you have not solved your congestion problem!
Perhaps I am beating a dead horse with examples, but just one more.
Let’s start with a very congested 100 megabit Internet link. Web hits are slow, YouTube takes forever, email responses are slow, and Skype calls break up. To solve these issues, you put in a caching server.
Now 10 percent of your hits come from cache, but since you did nothing to mitigate overall bandwidth usage, your users will simply eat up the extra 10 percent from cache and then some. It is like giving a drug addict a free hit of their preferred drug. If you serve up a fast YouTube, it will just encourage more YouTube usage.
Even with a good caching solution in place, if somebody tries to access Grandma’s Facebook page, it will have to come over the congested link, and it may time out and not load right away. Or, if somebody makes a Skype call it will still be slow. In other words, the 90 percent of the hits not in cache are still slow even though some video and some pages play fast, so the question is:
If 10 percent of your traffic is really fast, and 90 percent is doggedly slow, did your caching solution help?
The answer is yes, of course it helped, 10 percent of users are getting nice, uninterrupted YouTube. It just may not seem that way when the complaints keep rolling in. :)