Caching Your iOS Updates Made Easy


If you have talked to us about caching in recent months, you probably know that we are now lukewarm on open ended third party caching servers . The simple un-encrypted content of the Internet circa 2010 has been replaced by dynamically generated pages along with increased content encryption.  It’s not that the caching servers don’t work, it’s just that if they follow rules of good practice, the amount of data that a caching server can cache has diminished greatly over the last few years.

The good news is that Apple has realized the strain they are putting on Business and ISP networks when their updates come out. They have recently released an easy to implement low-cost caching solution specifically for Apple content.  In fact, one of our customers noted in a recent discussion group that they are using an old MAC mini to cache iOS updates for an entire College Campus.

Other notes on Caching Options

Akamai offers a cloud solution. Usually hosted at larger providers, but if you are buying bandwidth in bulk sometimes you can often piggyback on their savings and get a discount on cached traffic.

There is also a service offered by Netflix for larger providers.  However, last I checked you must be using 10 gigabits sustained Netflix traffic to qualify.

So You Think you Have Enough Bandwidth?


There are actually only two tiers of bandwidth , video for all, and not video for all. It is a fairly black and white problem. If you secure enough bandwidth such that 25 to 30 percent of your users can simultaneously watch video feeds, and still have some head room on your circuit, congratulations  – you have reached bandwidth nirvana.

Why is video the lynchpin in this discussion?

Aside from the occasional iOS/Windows update, most consumers really don’t use that much bandwidth on a regular basis. Skype, chat, email, and gaming, all used together, do not consume as much bandwidth as video. Hence, the marker species for congestion is video.

Below, I present some of the metrics to see if you can mothball your bandwidth shaper.

1) How to determine the future bandwidth demand.
Believe it or not, you can outrun your bandwidth demand, if your latest bandwidth upgrade is large enough to handle the average video load per customer.  Then it is possible that no further upgrades will be needed, at least in the foreseeable future.

In the “Video for all” scenario the rule of thumb is you can assume 25 percent of your subscribers watching video at any one time.  If you still have 20 percent of your bandwidth left over, you have reached the video for all threshold.

To put some numbers to this
Assume 2000 subscribers, and a 1 gigabit link. The average video feed will require about 2 megabits. (note some HD video is higher than this )  This would mean, to support video 25 percent of your subscribers would use the entire 1 gigabit and there is nothing left over anybody else, hence you will run out of  bandwidth.

Now if you have 1.5 gigabits for 2000 subscribers you have likely reached the video for all threshold, and most likely you will be able to support them without any advanced intelligent bandwidth control . A simple 10 megabit rate cap per subscriber is likely all you would need.

2) Honeymoon periods are short-lived.
The reason why the reprieve in congestion after a bandwidth upgrade is so short-lived is usually because the operator either does not have a good intelligent bandwidth control solution, or they take their existing solution out thinking mistakenly they have reached the “video for all” level.  In reality, they are still under the auspices of the video not for all. They are lulled into a false sense of security for a brief honeymoon period.  After the upgrade things are okay. It takes a while for a user base to fill the void of a new bandwidth upgrade.

Bottom line: Unless you have the numbers to support 25 to 30 percent of your user base running video you will need some kind of bandwidth control.

Surviving iOS updates


The birds outside my office window are restless. I can see the strain in the Comcast cable wires as they droop, heavy with the burden of additional bits, weighting them down like a freak ice storm.   It is time, once again, for Apple to update every device in the Universe with their latest IOS update.

Assuming you are responsible for a Network with a limited Internet pipe, and you are staring down 100 or more users, about to hit the accept button for their update, what can you do to prevent your user network from being gridlocked?

The most obvious option to gravitate to is caching. I found this nice article (thanks Luke) on the Squid settings used for a previous iOS update in 2013.  Having worked with Squid quite a bit helping our customers, I was not surprised on the amount of tuning required to get this to work, and I suspect there will be additional changes to make it work in 2014.

If you have a Squid caching solution already up and running it is worth a try, but I am on the fence of recommending a Squid install from scratch.  Why? Because we are seeing diminishing returns from Squid caching each year due to the amount of dynamic content.  Translation: Very few things on the Internet come from the same place with the same filename anymore, and for many content providers they are marking much of their content as non-cacheable.

If you have a NetEqualizer in place you can easily blunt the effects of the data crunch with a standard default set-up. The NetEqualizer will automatically push the updates out further into time, especially during peak hours when there is contention. This will allow other applications on your network to function normally during the day. I doubt anybody doing the update will notice the difference.

Finally if you are desperate, you might be able to block access to anything IOS update on your firewall.  This might seem a bit harsh, but then again Apple did not consult with you, and besides isn’t that what the free Internet at Starbucks is for?

Here is a snippet pulled from a forum on how to block it.

iOS devices check for new versions by polling the server mesu.apple.com. This is done via HTTP, port 80. Specifically, the URL is:

http://mesu.apple.com/assets/com_apple_MobileAsset_SoftwareUpdate/com_apple_MobileAsset_SoftwareUpdate.xml

If you block or redirect mesu.apple.com, you will inhibit the check for software updates. If you are really ambitious, you could redirect the query to a cached copy of the XML, but I haven’t tried that. Please remove the block soon; you wouldn’t want to prevent those security updates, would you?

Stuck on Desert Island, Do You Take Your Caching Server or Your Netequalizer ?


Caching is a great idea and works well, but I’ll take my NetEqualizer with me if forced to choose between the two on my remote island with a satellite link.

Yes there are  a few circumstances where a caching server might have a nice impact. Our most successful deployments are in educational environments where the same video is watched repeatedly as an assignment;  but for most wide open installations  ,expectations of performance far outweigh reality.   Lets  have at look at what works and also drill down on expectations that are based on marginal assumptions.

From my personal archive of experience here are some of the expectations attributed to caching that perhaps are a bit too optimistic.

“Most of my users go to their Yahoo or Face Book home page every day when they log in and that is the bulk of all they do”

– I doubt this customer’s user base is that conformist :),   and they’ll find out once they install their caching solution.  But even if true, only some of the content on Face  Book and Yahoo is static.  A good portion of these pages are by default dynamic, and ever-changing with content.  They are marked as Dynamic in their URLs which means the bulk of the page must be reloaded each time.  For example,  in order for caching to have an impact , the users in this scenario would have to stick to their home pages , and not look at friend photo’s or other pages.

” We expect to see a 30 percent hit rate when we deploy our cache.”

You won’t see a 30 percent hit rate, unless somebody designs some specific robot army to test your cache, hitting the same pages over and over again. Perhaps, on IOS update day, you might see a bulk of your hits going to the same large file and have a significant performance boost for a day. But overall you will be  doing well if  you get a 3 or 4 percent hit rate.

” I expect the cache hits to take pressure off my Internet Link”

Assuming you want your average user to experience a fast loading Internet, this is where you really want your NetEqualizer ( or similar intelligent bandwidth controller) over your caching engine. The smart bandwidth controller can re-arrange traffic on the fly insuring Interactive hits get the best response. A caching engine does not have that intelligence.

Let’s suppose you have a 100 megabit link to the Internet ,and you install a cache engine that effectively gets a 6 percent hit rate. That would be exceptional  hit rate.

So what is the  end user experience with a 6 percent hit rate compared to pre-cache ?

-First off, it is not the hit rate that matters when looking at total bandwidth. Much of those hits will likely be smallish image  files from the Yahoo home page or common sites, that account for less than 1 percent of your actual traffic.  Most of your traffic is likely dominated by large file downloads and only a portion of those may be coming from cache.

– A 6 percent hit rate means that 94 percent miss rate , and if your Internet was slow from congestion before the caching server it will still be slow 94 percent of the time.

– Putting in a caching server  would be like upgrading your bandwidth from 100 megabits to 104 megabits to relieve congestion. That cache hits may add to the total throughput in your reports, but the 100 megabit bottleneck is still there, and to the end user, there is little or no difference in user perception at this point.  A  portion of your Internet access is still marginal or unusable during peak times, and other than the occasional web page or video loading nice and snappy , users are getting duds most of the time.

Even the largest caching server is insignificant in how much data it can store.

– The Internet is Vast and your Cache is not. Think of a tiny Ant standing on top of Mount Everest. YouTube puts up 100 hours of new content every minute of every day. A small commercial caching server can store about 1/1000 of what YouTube uploads in day, not to mention yesterday and the day before and last year. It’s just not going to be in your cache.

So why is a NetEqualizer bandwidth controller so much more superior than a caching server when changing user perception of speed?  Because the NetEqualizer is designed to keep Internet access from crashing , and this is accomplished by reducing the large file transfers and video download footprints during peak times. Yes these videos  and downloads may be slow or sporadic, but they weren’t going to work anyway, so why let them crush the interactive traffic ? In the end caching and equalizing are not perfect, but from real world trials the equalizer changes the user experience from slow to fast for all Interactive transactions, caching is hit or miss ( pun intended).

Squid Caching Can be Finicky


Editors Note: The past few weeks we have been working on tuning and testing our caching engine. We have been working  closely with  some of the developers who contribute to the Squid open source program.

Following are some of my  observations and discoveries regarding Squid Caching from our testing process.

Our primary mission was to make sure YouTube files cache correctly ( which we have done). One of the tricky aspects of caching a YouTube file, is that many of these files are considered dynamic content. Basically, this means their content contains a portion that may change with each access, sometimes the URL itself is just a pointer to a server where the content is generated fresh with each new access.

An extreme example of dynamic content would be your favorite stock quote site. During the business day much of the information on these pages is changing constantly, thus it  is obsolete within seconds. A poorly designed caching engine would do much more harm than good if it served up out of data stock quotes.

Caching engines by default try not cache dynamic content, and for good reason.    There are two different methods a caching server uses to decide whether or not to cache a page

1) The web designer can specifically set flags in the  format the actual URL  to tell caching engines whether a page is safe to cache or not.

In a recent test I set up a crawler to walk through the excite web site and all its urls. I use this crawler to create load in our test lab as well as to fill up our caching engine with repeatable content. I set my Squid Configuration file to cache all content less than 4k. Normally this would generate a great deal of Web hits , but for some reason none of the Excite content would cache. Upon further analysis our Squid consultant found the problem.

  I have completed the initial analysis. The problem is the excite.com
server(s). All of the “200 OK” excite.com responses that I have seen
among the first 100+ requests contain Cache-Control headers that
prohibit their caching by shared caches. There appears to be only two
kinds of Cache-Control values favored by excite:

Cache-Control: no-store, no-cache, must-revalidate, post-check=0,
               pre-check=0

and

Cache-Control: private,public

Both are deadly for a shared Squid cache like yours. Squid has options
to overwrite most of these restrictions, but you should not do that for
all traffic as it will likely break some sites.”

2) The second method is a bit more passive than deliberate directives.  Caching engines look at the actual URL of a page to gain clues about its permanence. A “?” used in the url implies dynamic content and is generally a  red flag to the caching server . And here-in lies the issue with caching Youtube files, almost all of them have  a “?” embedded within their URL.

Fortunately  Youtube Videos,  are normally permanent and unchanging once they are uploaded. I am still getting a handle these pages, but it seems the dynamic part is used for the insertion of different advertisements on the front end of the Video.  Our squid caching server uses a normalizing technique to keep the root of the URL consistent and thus serve up the correct base YouTube every time. Over the past two years we have had to replace our normalization technique twice in order to consistently cache YouTube files.

Caching in the Cloud is Here


By Art Reisman, CTO APconnections (www.netequalizer.com)

I just got a note from a customer, a University, that their ISP is offering them 200 megabit internet at fixed price. The kicker is, they can also have access to a 1 gigabit feed specifically for YouTube at no extra cost.  The only explanation for this is that their upstream ISP has an extensive in-network YouTube cache. I am just kicking myself for not seeing this coming!

I was well-aware that many of the larger ISPs cached NetFlix and YouTube on a large scale, but this is the first I have heard of a bandwidth provider offering a special reduced rate for YouTube to a customer downstream. I am just mad at myself for not predicting this type of offer and hearing about it from a third party.

As for the NetEqualizer, we have already made adjustments in our licensing for this differential traffic to come through at no extra charge beyond your regular license level, in this case 200 megabits. So if for example, you have a 350 megabit license, but have access to a 1Gbps YouTube feed, you will pay for a 350 megabit license, not 1Gbps.  We will not charge you for the overage while accessing YouTube.

A Novel Idea on How to Cache Data Completely Transparently


By Art Reisman

Recently I got a call from a customer claiming our Squid proxy was not retrieving videos from cache when expected.

This prompted me to set up a test in our lab where I watched  four videos over and over. With each iteration, I noticed that the proxy would  sometimes go out and fetch a new copy of a video, even though the video was already in the local cache, thus confirming the customer’s observation.

Why does this happen?

I have not delved down into the specific Squid code yet, but I think It has to do with the dynamic redirection performed by YouTube in the cloud, and the way the Squid proxy interprets the URL.  If you look closely at YouTube URLs, there is a CGI component in the name, the word “what” followed by a question mark “?”.  The URLs  are not static. Even though I may be watching the same YouTube on successive tries, the cloud is getting the actual video from a different place each time, and so the Squid proxy thinks it is new.

Since caching old copies of data is a big no-no, my Squid proxy, when in doubt, errors on the side of caution and fetches a new copy.

The other hassle with using a proxy caching server  is the complexity of  setting up port re-direction (special routing rules). By definition the Proxy must fake out the client making the request for the video. Getting this re-direction to work requires some intimate network knowledge and good troubleshooting techniques.

My solution for the above issues is to just toss the traditional Squid proxy altogether and invent something easier to use.

Note: I have run the following idea  by the naysayers  (all of my friends who think I am nuts), and yes, there are still  some holes in this idea. I’ll represent their points after I present my case.

My caching idea

To get my thought process started, I tossed all that traditional tomfoolery with re-direction and URL name caching out the window.

My caching idea is to cache streams of data without regard to URL or filename.  Basically, this would require a device to save off streams of characters as they happen.  I am already very familiar with implementing this technology; we do it with our CALEA probe.  We have already built technology that can capture raw streams of data, store, and then index them, so this does not need to be solved.

Figuring out if a subsequent stream matched a stored stream would be a bit more difficult but not impossible.

The benefits of this stream-based caching scheme as I see them:

1) No routing or redirection needed, the device could plugged into any network link by any weekend warrior.

2) No URL confusion.  Even if a stream (video) was kicked off from a different URL, the proxy device would recognize the character stream coming across the wire to be the same as a stored stream in the cache, and then switch over to the cached stream when appropriate, thus saving the time and energy of fetching the rest of the data from across the Internet.

The pure beauty of this solution is that just about any consumer could plug it in without any networking or routing knowledge.

How this could be built

Some rough details on how this would be implemented…

The proxy would cache the most recent 10,000 streams.

1) A stream would be defined as occurring when continuous data was transferred in one direction from an IP and port to another IP and port.

2) The stream would terminate and be stored when the port changed.

3) The server would compare the beginning parts of new streams to streams already in cache, perhaps the first several thousand characters.  If there was a match, it would fake out the sender and receiver and step in the middle and continue sending the data.

What could go wrong

Now for the major flaws in this technology that must be overcome.

1) Since there is no title on the stream from the sender, there would always be the chance that the match was a coincidence.  For example, an advertisement appended to multiple YouTube videos might fool the caching server. The initial sequence of bytes would match the advertisement and not the following video.

2) Since we would be interrupting a client-server transaction mid-stream, the server would have to be cut-off in the middle of the stream when the proxy took over.  That might get ugly as the server tries to keep sending. Faking an ACK back to the sending server would also not be viable, as the sending server would continue to send data, which is what we are trying to prevent with the cache.

Next step, (after I fix our traditional URL matching problem for the customer) is to build an experimental version of stream-based caching.

Stay tuned to see if I can get this idea to work!

%d bloggers like this: