The pros and cons of Disk (Web) Caching


Eli Riles an independent consultant and former VP of sales for NetEqualizer has extensively investigated the subject of caching with many of  ISPs from around the globe. What follows are some useful observations on disk/web caching.

Effective use of Disk Caching

Suppose you are the administrator for a network, and you have a group of a 1000 users that wake up promptly at 7:00 am each morning and immediately go to MSNBC.com to retrieve the latest news from Wall Street. This synchronized behavior would create 1000 simultaneous requests for the same remote page on the Internet.

Or, in the corporate world, suppose the CEO of a multinational 10,000 employee business, right before the holidays put out an all points 20 page PDF file on the corporate site describing the new bonus plan? As you can imagine all the remote WAN links might get bogged down for hours while each and every employee tried to download this file.

Well it does not take a rocket scientist to figure out that if somehow the MSNBC home page could be stored locally on an internal server that would alleviate quite a bit of pressure on your WAN or Internet link.

And in the case of the CEO memo, if a single copy of the PDF file was placed locally at each remote office it would alleviate the rush of data.

Local Disk Caching does just that.

Offered by various vendors Caching can be very effective in many situations, and vendors can legitimately make claims of tremendous WAN speed improvement in some situations. Caching servers have built in intelligence to store the most recently and most frequently requested information, thus preventing future requests from traversing the WAN link unnecessarily .

You may know that most desktop browsers do their own form caching already. Many web servers keep a time stamp of their last update to data , and browsers such as the popular Internet Explorer will use a cached copy of a remote page after checking the time stamp.

So what is the downside of caching?

There are two main issues that can arise with caching:

1) Keeping the cache current. If you access a cache page that is not current then you are at risk of getting old and incorrect information. Some things you may never want to be cached, for example the results of a transactional database query. It’s not that these problems are insurmountable, but there is always the risk that the data in cache will not be synchronized with changes.

2) Volume. There are some 100 millions of web sites out on the Internet alone. Each site contains upwards of several megabytes of public information. The amount of data is staggering and even the smartest caching scheme cannot account for the variation in usage patterns among users and the likely hood they will hit an un-cached page. If you have a diverse set of users it is unlikely the Cache will have much effect on a given day

Formal definition of Caching

3 Responses to “The pros and cons of Disk (Web) Caching”

  1. Enhance your ISP offering with YouTube Caching « NetEqualizer News Blog Says:

    […] my provider, is keeping a local copy of the popular YouTube videos (caching) and then forcing my to the Youtube source for the less popular ones.  When my downloads are […]

  2. Ejaz Mehmood Says:

    Pros:
    1- Faster access of information: In memory operations are 15-20 times faster than Disk operations
    2- Less I/O calls, less network traffic (if resource on network)
    Cons:
    1- Resource locking(if synchronous Cache building)
    2- Higher access cost if desired information is not in Cache, instead I/O call for Disk or network resource
    3- Overhead of Cache refresh i.e. if a cached resource is changed, we need to continuously check it
    4- If cached resource is greater than available memory, than lot of swapping takes place
    5- High resource utilization during Cache building

    Best Practices
    1- Cache should be a separate service
    2- Need to allocate large number of resources(CPU, RAM, Virtual Memory)
    3- Gradual approach for Cache building be used
    4- If highly static content is Cached, it should also be dumped to local disk, on re-start same resource be loaded it will reduce network traffic
    5- Should be common Cache, shared among multiple applications

    Ejaz Mehmood
    Manager Performance Testing


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: