Tracking Traffic by DNS

The video rental industry of the early 80’s was comprised of 1000’s of independent stores.  Corner video rental shops were as numerous as today’s Starbucks.  In the late 1990’s, consolidation took over.  Blockbuster with its bright blue canopy lighting up the night sky swallowed them up like doggy treats.   All the small retail outlets were gone. Blockbuster had changed everything, their economy of scale, and their chain store familiarity, had overrun the small operators.

In a similar fashion to the fledgling video rental industry, circa 1990’s Internet content was scattered across the spectrum of the web, ripe for consolidation.  I can still remember all of the geeks at my office creating and hosting their own personal websites. They used primitive tools and their own public IP’s to weave these sites together.  Movies  and music were bootlegged, and shared across a network of underground file-sharing sites.

Although we do not have one Internet “Blockbuster” today, there has been major consolidation.  Instead of all traffic coming from 100’s of thousands of personal or small niche content providers, most of it comes from the big content providers. Google, Amazon, Netflix, Facebook, Pinterest are all familiar names today.

So far I have reminisced about a nice bit of history, and I suspect you might be wondering how all of this prelude relates to tracking traffic by DNS?

Three years ago we added a DNS (domain name system) server lookup from our GUI interface, as more of a novelty than anything else. Tracking traffic by content was always a high priority for our customers, but most techniques had relied on a technology called “deep packet inspection” to identify traffic.  This technology was costly, and ineffective on its best day, but it was the only way to chase down nefarious content such as P2P.

Over the last couple of years I noticed again the world had changed. With the consolidation of content from a small number of large providers, you could now count on some consistency in the domain from which it originated.  I would often click on our DNS feature and notice a common name for my data.   For example, my YouTube videos resolved to one or two DNS names,  and I found the same to be true with my Facebook video.  We realized that this consolidation might make DNS tracking useful for our customers, and so we have now put DNS tracking into our current NetEqualizer 8.5 release.

Another benefit of tracking by domain is the fact that most encrypted data will report a valid domain.  This should help to identify traffic patterns on a network.

It will be interesting to get feedback on this feature as it hits the real world, stay tuned!

Latest Notes on the Peer to Peer Front and DMCA Notices

Just getting back from our tech talk seminar today at Western Michigan University. The topic of DMCA requests came up in our discussions, and here are some of my notes on the subject.

Background: The DMCA, which is the enforcement arm of the motion picture copyright conglomerate, tracks down users with illegal content.

They seem to sometimes shoot first and ask questions later when sending out their notices more specific detail to follow.

Unconfirmed Rumor has it, that one very large University in the State of Michigan just tosses the requests in the garbage and does nothing with them, I have heard of other organizations taking this tact. They basically claim  this problem for the DMCA is not the responsibility of the ISP.

I also am aware of a sovereign Caribbean country that also ignores them. I am not advocating this as a solution just an observation.

There was also a discussion on how the DMCA discovers copyright violators from the outside.

As standard practice,  most network administrators use their firewall to block UN-initiated requests  into the network from the outside. With this type of firewall setting, an outsider cannot just randomly probe a network  to find out what copyrighted material is being hosted. You must get invited in first by an outgoing request.

An analogy would be that if you show up at my door  uninvited, and knock, my doorman is not going to let you in, because there is no reason for you to be at my door. But if I order a pizza and you show up wearing  a pizza delivery shirt, my doorman is going to let you in.  In the world of p2p, the invite into the network is a bit more subtle, and most users are not aware they have sent out the invite, but it turns out any user with a p2p client is constantly sending out requests to p2p super nodes to attain information on what content is out there.  Doing so, opens the door on the firewall to let the P2p super node into the network.  The DMCA p2p super nodes just look like another web site to the firewall so it lets it in. Once in the DMCA reads directories of p2p clients.

In one instance, the DMCA is not really inspecting files for copyrighted material, but was only be checking for titles. A  music student who recorded their own original music, but named their files after original artists and songs based on the style of the song.  Was flagged erroneously with DMCA notifications based on his naming convention   The school security examined his computer and determined the content was not copyrighted at all.   What we can surmise from this account was that the DMCA was probing the network directories and not actually looking at the content of the files to see if they were truly in violation of copying original works.
Back to the how does the DMCA probe theory ? The consensus was that it is very likely that DMCA is actually running  super nodes, so they will get access to client directories.  The super  node is a server node that p2p clients contact to get advice on where to get music and movie content ( pirated most likely). The speculation among the user group , and these are very experienced front line IT administrators that have seen just about every kind  of p2p scheme.  They suspect that the since the DMCA super node is contacted by their student network first, it opens the door from the super node to come back and probe for content. In other words the super node looks like the Pizza delivery guy where you place your orders.
It was also further discussed and this theory is still quite open, that sophisticated p2p  networks try to cut out the DMCA  spy super nodes.  This gets more convoluted than peeling off character masks at a mission impossible movie. The p2p network operators need super nodes to distribute content, but these nodes cannot be permanently hosted, they must live in the shadows and are perhaps parasites themselves on client computers.

So questions that remain for future study on this subject are , how do the super nodes get picked , and how does the p2p network disable a spy DMCA super node ?

Layer 7 Application Shaping Dying with Increased SSL

By Art Reisman

When you put a quorum of front line IT administrators  in a room, and an impromptu discussion break out, I become all ears. For example, last Monday, the discussion at our technical seminar at Washington University turned to the age-old subject of controlling P2P.

I was surprised to hear from several of our customers about just how difficult it has become to implement Layer 7 shaping. The new challenge stems from fact that SSL traffic cannot be decrypted and identified from a central bandwidth controller. Although we have known about this limitation for a long time, my sources tell me there has been a pick up in SSL adoption rates over the last several years. I don’t have exact numbers, but suffice it to say that SSL usage is way up.

A traditional Layer 7 shaper will report SSL traffic as “unknown.” A small amount of unknown traffic has always been considered tolerable, but now, with the pick up SSL traffic, rumor has it that some vendors are requiring a module on each end node to decrypt SSL pages. No matter what side of the Layer 7 debate you are on, this provision can be a legitimate show stopper for anybody providing public or semi-open Internet access, and here is why:

Imagine your ISP is requiring you to load a special module on your laptop or iPad to decrypt all your SSL information and send them the results? Obviously, this will not go over very well on a public Internet. This relegates Layer 7 technologies to networks where administrators have absolute control over all the end points in their network. I suppose this will not be a problem for private businesses, where recreational traffic is not allowed, and also in countries with extreme controls such as China and Iran, but for a public Internet providers in the free world,  whether it be student housing, a Library, or a municipal ISP, I don’t see any future in Layer 7 shaping.

The Evolution of P2P and Behavior-Based Blocking

By Art Reisman

CTO – APconnections

I’ll get to behavior-based blocking soon, but before I do that, I encourage anybody dealing with P2P on their network to read about the evolution of P2P outlined below. Most of the methods historically used to thwart P2P, are short lived pesticides, and resistance is common. Behavior-based control is a natural wholesome predator of P2P which has proved to be cost effective over the past 10 years.

The evolution of P2P

P2P as it exists today is a classic example of Darwinian evolution.

In the beginning there was Napster. Napster was a centralized depository for files of all types. It also happened to be a convenient place to distribute unauthorized, copyrighted material. And so, the music industry, unable to work out a licensing distribution agreement with Napster basically closed it down. So now, you had all these consumers used to getting free music, and like a habituated wild animal, they were in no mood to pay 15.99 per CD from their local retailer.

P2P technology was already in existence when Napster was closed down; however until that time, it was intended to be a distribution system for legitimate content which came out of academia. By decentralizing the content to many multiple distribution points, the cost of distribution was much less than hosting content distribution on a private server. Decentralized content, good for legitimate distribution of academic content, quickly became a nightmare for the Music Industry.  Instead of having one cockroach of illegal content to deal with, they now had millions of little P2P cockroaches all over the world to contend with.

The Music industry had a multi-billion dollar leak in their revenue stream and went after enforcing copyright policy by harassing ISPs and threatening consumers with jail time. For the ISP, the legal liability of having copyrighted material on your network was a hassle, but the bigger problem was the congestion. When content was distributed by a single point supplier, there were natural cost barriers to prevent bandwidth utilization from rising unchecked. For example, when you buy a music file from Amazon or iTunes, both ends of the transaction require some form of payment. The supplier pays for a large bandwidth pipe, and the consumer pays money for the file. With P2P, the distributors and the clients are all consumers with essentially unlimited data usage on their home accounts, and the content is free. As P2P file sharing rose, ISPs had no easy way of changing their pricing model to deal with the orgy of file sharing. Although invisible to the public, it was a cyber party that rivaled 10 cent beer night fiasco of the 1970’s.

Resistant P2P pesticides

In order to thwart p2p usage, ISPs and businesses started spending hundreds of millions of dollars in technology that tracked specific P2P applications and blocked those streams. This technology is referred to as layer 7 blocking. Layer 7 blocking involves looking at the specific content traversing the Internet and identifying P2P applications by their specific footprint. Intuitively, this solution was a no-brainer* – spot P2P and block it. Most of these installations with layer 7 blocking showed some initial promise, however, as was the case with the previous cockroach infestation, P2P again evolved to meet the challenge and then some.

How does newer evolved P2P thwart layer 7 shaping?

1) There are now encrypted P2P clients where their footprint is hidden, and thus all the investment in the layer 7 shaper can go up in smoke once encrypted P2P infects your network. It can’t be spotted.

2) P2P clients open and close connections much faster than their first generation of the early 2000’s. To keep up with a the flurry of connections over a short time, the layer 7 engine must have many times the processing power of a traditional router, and must do the analysis quickly. The cost of layer 7 shaping is rising much faster than the cost of adding additional bandwidth to a circuit.

Also: Legally there also problems with eavesdropping on customer data without authorization.

How does behavior-based shaping P2P blocking keep up?

1) It uses a progressive rate limit on suspected P2P users.

P2P has the footprint of creating many simultaneous connections to move data across the internet. When behavior-based shaping is in effect, it detects these high connection count users, and slowly implements a progressive rate limit on all their data. This does not completely cut them off per se, but it punishes the speeds of the consumer using p2p and does so progressively as they use more p2p connections. This may seem a bit non specific in target, but when done correctly it rarely affects non P2P users, and even if it does, the behavior of using a large number of downloads is considered rude and abhorrent, and is most like a virus if not a P2P application.

2) It limits the user to a fixed number of simultaneous connections.

Also: It does not violate any privacy policies.

That covers the basics of P2P behavior-based shaping. In practice, we have developed our techniques with a bit of intelligence and do not wish to give away all of our fine tuning secrets, but suffice it to say, I have been implementing behavior-based shaping for 10 years and have empirically seen its effectiveness over time. The cost remains low with respect to licensing (very stable solution), and the results remain consistent.

* Although in some cases there was very little information about how effective the solution was working, companies and ISPs shelled out license fees year after year.

%d bloggers like this: