The pros and cons of Disk (Web) Caching


Eli Riles an independent consultant and former VP of sales for NetEqualizer has extensively investigated the subject of caching with many of  ISPs from around the globe. What follows are some useful observations on disk/web caching.

Effective use of Disk Caching

Suppose you are the administrator for a network, and you have a group of a 1000 users that wake up promptly at 7:00 am each morning and immediately go to MSNBC.com to retrieve the latest news from Wall Street. This synchronized behavior would create 1000 simultaneous requests for the same remote page on the Internet.

Or, in the corporate world, suppose the CEO of a multinational 10,000 employee business, right before the holidays put out an all points 20 page PDF file on the corporate site describing the new bonus plan? As you can imagine all the remote WAN links might get bogged down for hours while each and every employee tried to download this file.

Well it does not take a rocket scientist to figure out that if somehow the MSNBC home page could be stored locally on an internal server that would alleviate quite a bit of pressure on your WAN or Internet link.

And in the case of the CEO memo, if a single copy of the PDF file was placed locally at each remote office it would alleviate the rush of data.

Local Disk Caching does just that.

Offered by various vendors Caching can be very effective in many situations, and vendors can legitimately make claims of tremendous WAN speed improvement in some situations. Caching servers have built in intelligence to store the most recently and most frequently requested information, thus preventing future requests from traversing the WAN link unnecessarily .

You may know that most desktop browsers do their own form caching already. Many web servers keep a time stamp of their last update to data , and browsers such as the popular Internet Explorer will use a cached copy of a remote page after checking the time stamp.

So what is the downside of caching?

There are two main issues that can arise with caching:

1) Keeping the cache current. If you access a cache page that is not current then you are at risk of getting old and incorrect information. Some things you may never want to be cached, for example the results of a transactional database query. It’s not that these problems are insurmountable, but there is always the risk that the data in cache will not be synchronized with changes.

2) Volume. There are some 100 millions of web sites out on the Internet alone. Each site contains upwards of several megabytes of public information. The amount of data is staggering and even the smartest caching scheme cannot account for the variation in usage patterns among users and the likely hood they will hit an un-cached page. If you have a diverse set of users it is unlikely the Cache will have much effect on a given day

Formal definition of Caching

%d bloggers like this: