Why the reports about PRISM are wrong

Guardian

I feel a bit strange about writing this.

I distrust authority. I’m the sort of person who believes that powerful people have a tendency to be corrupt and that governments manipulate the truth for political purposes. I support the free press and believe journalists rather than government spokespeople and large companies.

We are being sold a lie about PRISM. Only it’s not governments selling us the lie, it’s the free press.

I strongly believe that the descriptions of PRISM in The Guardian, The Washington Post and on many websites are inaccurate and misleading. They are little more than conspiracy theories. The claims made are not technically possible or realistic.

Those of you who have a tendency to believe conspiracy theories may disregard my comments. You may believe that I am a mouthpiece of government spin. I am not. I work in the media and technology. I have no affiliation with any government or private company. I have no special knowledge of PRISM, but I know how technology and the public sector works. And the PRISM described in the press does not and cannot exist.

In the following article I will explain two things.

Firstly, why PRISM cannot exist in the form that the media is portraying, and these claims are nothing more than conspiracy theories and should be treated as such.

Secondly, that there are privacies being eroded that we should be worried about and that the spurious concerns about PRISM are distracting us from the real problems.

PRISM is not what we think it is

PRISM_logo

If you’ve been reading the press you will probably have an idea of what PRISM is. According to The Guardian:

The National Security Agency has obtained direct access to the systems of Google, Facebook, Apple and other US internet giants, according to a top secret document obtained by the Guardian.

The NSA access is part of a previously undisclosed program called Prism, which allows officials to collect material including search history, the content of emails, file transfers and live chats

Their evidence for this is a leaked PowerPoint document.

Slide5

The presentation is amateurish; the formatting and phasing is imprecise. The Guardian has focused on the phrase “collection directly from the servers”. They use this phrase to theorize about a whole range of activities. But these are groundless speculation with no basis. The phrase “directly from servers” means nothing. This is not a technical document, and these words are vague. There’s a mix of companies and products; YouTube belongs to Google, for example, and Skype to Microsoft, yet both are listed.

Former general counsel of the NSA Stewart Baker says:

The PowerPoint is suffused with a kind of hype that makes it sound more like a marketing pitch than a briefing — we don’t know what its provenance is and we don’t know the full context

Why would a top secret US Government programme even have a logo?

Do we seriously think that an organisation that can tap and read “the internet” in real time would produce a presentation as sloppy as this? As ZDNet says:

we strongly suspect that the leaked PowerPoint slides are probably not written by technical people. It’s likely that these slides were prepared as an internal marketing tool for new recruits. So, when the slides say: “direct access to servers,” that statement may well be an oversimplification of the facts.

The US Government clearly has data on individuals. We already know that. They can legally request it from companies by a subpoena, and we know they already do that a lot. Too much in fact.

But that is a legal process. The police go the courts, get a court order and request companies like Google or Facebook to export data from their servers and hand it over. At no point, do the security services have access to the servers, direct or otherwise.

Other than this vague and sloppy phrase, there is no evidence, either in these documents or in any other information released that PRISM does anything else other than store data collected through court orders. Really the burden of proof should be on the media to prove that something more is going on. However, the reports have now reached epidemic levels and one cannot satisfactorily disprove them by saying there is no evidence in the same way one cannot disprove the exist of God by saying that.

A huge number of sources have rejected these reports. Insiders have come forwards to multiple journalists:

Recent reports in The Washington Post and The Guardian […] are incorrect and appear to be based on a misreading of a leaked PowerPoint document, according to a former government official who is intimately familiar with this process of data acquisition and spoke today on condition of anonymity.

“It’s not as described in the histrionics in The Washington Post or The Guardian,” the person said. “None of it’s true. It’s a very formalized legal process that companies are obliged to do.”

That former official’s account — that the process was created by Congress six years ago and includes judicial oversight — was independently confirmed by another person with direct knowledge of how this data collection happens at multiple companies.

Larry Page and Mark Zuckerburg both stated that they’re not giving direct access to their servers. Google said:

The U.S. government does not have direct access or a ‘back door’ to the information stored in our data centers. We provide user data to governments only in accordance with the law.

But, hey, they run large companies so we can’t trust them.

The New York Times has cited anonymous sources that cast doubt on the initial reports, but maybe they’re lying too.

Maybe, one loosely phrased statement, in a non-technical, sloppy PowerPoint presentation is correct, and all of the industry experts, anonymous sources, government statements and publicly available legal records are incorrect, and the government are doing this.

So let’s dig a bit deeper.

slide4

The slides say that the budget for PRISM is $20m a year. In large scale IT projects, $20m is peanuts. The BBC’s DMI project recently failed, after spending $100m on trying to build a internal database. They even had all the content already and didn’t need to steal it from protected, encrypted sources.

In 2005, the FBI spent $170m trying to build a digital system for managing case work. It failed.

I’ve written before about how IT projects fail. Large scale IT projects are incredibly complex. They are too complex for humans to comprehend, and as more developers start working, communication becomes harder and harder to manage.

As ZDNet says:

One source speaking to ZDNet under the condition of anonymity said $20 million — the amount quoted by the NSA in the leaked document that covers the cost of the PRISM program — wouldn’t even cover the air conditioning costs and the electrical bill for the datacenter. Taking the datacenter out of the equation, $20 million would even not cover 3-6 months worth of data storage required to store keep copies of the wiretap data.

Even The Guardian struggle to make sense of this:

“The Prism budget – $20m – is too small for total surveillance,” one data industry source told the Guardian. Twitter, which is not mentioned in the Prism slides, generates 5 terabytes of data per day, and is far smaller than any of the other services except Apple. That would mean skyrocketing costs if all the data were stored. “Topsy, which indexes the whole of Twitter, has burned through about $20m in three years, or about $6m a year,” the source pointed out. “With Facebook much bigger than Twitter, and the need to run analysts etc, you probably couldn’t do the whole lot on $20m.”

It is unthinkable that such a project could be run for $20m a year. The press can’t find a single expert to support this. And you can be sure they’ve been desperately looking.

The budget given in the presentation is comparatively tiny – just $20m per year. That has puzzled experts because it’s so low.

But maybe, somehow, the NSA has found a way of cutting costs, way beyond anything anyone can understand. After all, the public sector is famous for being run efficiently and getting the best value for money for the taxpayer.

The Conversation.jpg

Let’s have a think about what PRISM is doing. The claims are that it is “tapping” the network. Images come to mind of Gene Hackman in The Conversation listening in with headphones. Unfortunately, that only works with analogue communications. The Internet isn’t analogue. You simply can’t “listen” in and see what websites someone is looking at. The internet does not work that way. Any claims like this show a shocking misunderstanding of the technology. As the PRISM slides say:

A target’s phone call, e-mail or chat will take the cheapest path, not the physical most direct path – you can’t always predict the path

If I send an email to my girlfriend sitting in the living room, that email will be sent in thousands of packets, some of which may go via Australia, or anywhere else in the world. The packets pick the best route. You can’t “listen in on them”.

Now, what you could do is connect into my wireless network illegally. To do that you’d need to crack the WEP or WPA key. There are tools available for that such as Aircrack-ng or Kismet. You could then use something like an ARP spoofing attack with Wireshark or Ettercap to view the data packets flowing into and out of my house.

But to do that you’d need to be physically close enough to my house to connect to my wireless network. And I’m just one person. To be able to “tap” the internet this way, you’d need a surveillance team outside every house in the country. You couldn’t do that to the whole work without 7 billion spies in vans. And although the traffic on my road is bad, it’s not that bad.

Undersea internet cables

The Guardian, however has described up “a couple of methods” that PRISM may be using. Before we start analyzing these, remember, these were thought up by people working in the data-processing business. They have no special knowledge of PRISM, they do not work for the government and although experts in their field, they have no information that we don’t have. These are theories that have been come up with because of that one phrase on a PowerPoint presentation.

First, lots of data bound for those companies passes over what are called “content delivery networks” (CDNs), which are in effect the backbone of the internet. Companies such as Cisco provide “routers” which direct that traffic. And those can be tapped directly.

I was dubious of this claim. The Guardian links to a Cisco technical document, about a specific Cisco router. So, I read it, (and boy was it boring). It says:

The Cisco Service Independent Intercept Architecture Version 3.0 document describes implementation of LI for VoIP networks using the Cisco BTS 10200 Softswitch call agent, version 5.0, in a non-PacketCable network.

In layman’s terms, this is a description of how you can connect directly into a specific router to access a VOIP phone call. VOIP, by the way, is things like Skype. It’s using the internet to have a phone call. This does not mean that the government can “tap” into CDNs. It means that someone could technically connect into one particular brand of router, if there was a court order to do so.

Another Guardian source, who said that $20m wasn’t enough to do anything useful, suggests:

“they might have search interfaces (at an administrator level) into things like Facebook, and then when they find something of interest can request a data dump. These localised data dumps are much smaller.”

The other day, I needed to find a receipt in my gmail inbox. Sadly, it turns out I’ve bought quite a few things. It was a really big job to find it. Imagine searching every gmail inbox in the world for something. You’d never be able to find anything in the noise.

And that’s even assuming it is technically possible. I have no insider knowledge into Google. But I can’t see why they’d build that. I do have knowledge of Exchange (the Microsoft email servers) and can categorically say you cannot do that on there (I’ve actually been asked a couple of times, and have been involved in email search operations. They are not easy or cheap). Even if Google had built this admin level functionality, it would be slow.

google-adsense

When you use Google search it is very quick. But that speed comes at a cost. Google spend a huge amount of money optimizing and caching searches to deliver the content to you quickly. Why would they spend a similar amount of time optimizing search across all gmail boxes. There’s a reason they optimize search. Because there’s money in it for them. Lots of money. The more you search, the more you see ads.

There’s no money in it for them to build a snoop search. It’s hardly as if Google are going to advertise to NSA officials alongside their search: “customers who searched for Jihad also bought the Koran”.

But let’s assume that the $20m figure is wrong. That the people that made this presentation were incredibly precise with their wording about the way data was collected, but then missed five zeroes off the end of the budget figure. Zdnet has produced a theory, which begins with the reassuring statement: “The following article should be treated as strictly hypothetical.”

Their suggestion is that PRISM taps into Tier 1 networks:

The Internet may be distributed and decentralized in nature, but there is a foundation web of connectivity that enables major sites and services to operate. These are referred to as “Tier 1” network providers. Think of these as pipes of the main arteries of the Internet, in simple terms.

There are 12 companies that provide Tier 1 networks. Zdnet’s theoretical paper suggests that the NSA could “tap” these networks.

these Tier 1 network providers have a far smaller employee base working in these divisions than the aforementioned companies. This allows the NSA to either send its own employees in as “virtual” employees — working under the guise of these companies — while the NSA gags those companies from disclosing this fact to other staff. They could look like special contractors that only work with the special wiretapping routers.

We’re heading into conspiracy theory territory again now. We’re suggesting that the NSA put undercover staff into twelve private companies, attached equipment to their computers and extracted all information that comes out of the servers. They then put gagging orders on all of the companies, and someone stopped all the individuals who knew about it from leaking it to the press.

Oh, yes and the they built a secret database that could contain the whole Internet.

But let’s ignore that. Let’s pretend they managed to build this magic technology that the rest of the world doesn’t have without anyone knowing.

They still wouldn’t have access to the Internet. They’d just see a load of data flowing through CDNs. And most of it would be iplayer, YouTube and pictures. They wouldn’t even have all of the Internet. Only some data flows through these.

And that’s ignoring the problem of encryption. Facebook, Google, Hotmail all the interesting stuff, is encrypted. What this means is, even if you got all of the packets of every request, you wouldn’t be able to read them. Even if I tweet, publicly, the tweet is encrypted when it’s sent to Twitter. Although it’s displayed publicly on the website, you wouldn’t be able to read it by “tapping” my internet, you’d just get a load of encrypted nonsense.

Maybe they have special servers optimized to crack encryption. And maybe they set them to work decrypted every single encrypted internet session in use. Seems even more unlikely, but lets imagine that this happened.

The Guardian reported that:

last year GCHQ was handling 600m “telephone events” each day, had tapped more than 200 fibre-optic cables and was able to process data from at least 46 of them at a time.

Each of the cables carries data at a rate of 10 gigabits per second, so the tapped cables had the capacity, in theory, to deliver more than 21 petabytes a day

xl_IBM_Datacentre_610

21 petabytes is big. Really big. And that’s just in one day. In 30 days, this would create 630 petabytes.

To put this to scale, IBM recently built the largest hard drive array in the world. It is highly experimental and no one else in the world has come close to something like this. It is 120 petabytes. The next biggest is little more than 40 petabytes.

The storage for processing this much data just does not exist. We’re suggesting that GCHQ and the NSA have secretly built databases that are a similar size to the internet and no one noticed. It is just not possible.

Even The Washington Post is starting to back down on its claims.

And then a funny thing happened the next morning. If you followed the link to that story, you found a completely different story, nearly twice as long, with a slightly different headline. The new story wasn’t  just expanded; it had been stripped of key details, with no acknowledgment of the changes. That updated version, time-stamped at 8:51 AM on June 7, backed off from key details in the original story.

Before naming their source as Edward Snowdon, The Washington Post and Guardian both referred to him as “a career intelligence officer [who exposed the materials] in order to expose what he believes to be a gross intrusion on privacy.”

Edward Snowdon is not an intelligence officer but an “infrastructure analyst”. He had been in his current position with an external contractor for three months. I don’t mean to discredit him, but if The Guardian managed to get his job title wrong, what else did they mistake?

All evidence from governments, from legal proceedings, from technology experts and from leaked documents that we’ve seen suggests that PRISM is simply gathering up information obtained legally through court orders. There is simply no evidence that anything else is happening. U.S. Director of National Intelligence James said:

“PRISM is not an undisclosed collection or data mining program […] it is an internal government computer system” designed to “facilitate […] authorized collection of foreign intelligence.” NSA Director Gen. Keith Alexander says of Snowdon’s claims “I know of no way to do that.”

It is absolutely unfeasible that the PRISM described by the Guardian exists.

But you should be worried

However, there is a problem that the hype around PRISM is overlooking. The US Government is requesting huge amounts of private data from companies. Governments are making hundreds of thousands of legal requests for information form Google, Microsoft, Facebook and many others.

These are perfectly legal, and all of the government officials questioned say their data is obtained In this way. Apple received 4000 requests, and Facebook 10,000 in the last quarter. Google were forced to respond to 8,000 by the US government alone.

Should governments legally be allowed to make all these requests? Shouldn’t we be more worried about what our current legal system is allowing to happen? Rather than becoming hysterical over conspiracy theories of illegal activities that clearly aren’t happening, maybe we should focus more on stopping what is actually going on.

Government rebuttals of PRISM are actually shocking. “No,” they’re saying, “we didn’t really tap your networks to get all this data illegally. We did it by the perfectly legal method and that’s fine.”

The biggest tragedy of PRISM is not the spurious and ignorant claims that are being made. But that it is distracted us from the real problem. The press claims of PRISM have made the governments standard activities look reasonable. But they are not. And we should stop being bamboozled by fantasy computer systems that seem like something out of a Hollywood film.

Operation BlackBriar

Operation BlackBriar

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s