Program X and Data Mining: How It Might Have Worked

Start your day with TPM.
Sign up for the Morning Memo newsletter

Very, very few people know the extent of the NSA surveillance program, part of which is called the Terrorist Surveillance Program and the rest of which we call Program X. We’d be dishonest if we suggested to you we knew how it works. But take a look at an interview Julian Sanchez did with former NSA analyst Russell Tice for Reason magazine in January 2006.

A brief recap: the NSA fired Tice in 2005 after he alleged that a colleague was a spy for the Chinese. (A DOD inspector general’s report found “no evidence” to support the charge.) He outed himself in December of that year as a source for James Risen, one of the New York Times reporters who broke the NSA surveillance scandal, and alleged that the program as acknowledged publicly by the president was, he told me last year, “just the tip of the iceberg.” That would certainly fit in with Alberto Gonzales and Mike McConnell’s recent revelations. While he won’t tell reporters what the iceberg exactly is — he’d risk jail time for that — Tice did tell me last year that NSA officials weren’t particularly concerned about the risk of abuse after the administration told it in 2001 not to bother with FISA warrants. “When I brought up problems, [NSA employees] said, ‘Who’s gonna stop us? Keep your mouth shut.'”

When Tice spoke to Sanchez, he spoke in hypothetical terms, sketching out how a surveillance program would work, rather than how what we’re calling Program X actually did. That’s a legal necessity, allowing Tice to provide his expertise without divulging classified information. As a result, whatever conclusions one can draw about Program X from Tice’s interview are purely inferential. But what he described to Sanchez is extremely broad.

…More than likely you’re talking about picking it up in a digital format and analyzing it depending on how the program is written depending on whether it’s audio or digital recognition you’re talking about, the computing power is phenomenal for that sort of thing. Especially if you’re talking about mass volumes, if you’re talking about hundreds of thousands of, say, telephone communications or something like that, calls of people just like you and me, like we’re talking now.

Then you have things like, and this is where language specialists come in, linguists who specialize in things like accents and inflections and speech patterns and all those things that come into play. Or looking for key phrases or combinations of key words within a block of speech. It becomes, when you add in all the variables, astronomical.

REASON: Do you have a sense of the scale that’s possible, how many phrases and conversations it might be possible to filter?

Tice: Technically it’s limitless. It’s like, you know what a Boolean logic line is? [Yes.] Think of a Boolean logic line with these sorts of parameters in your normal Boolean, built on these filtering parameters. As long as the software is designed to handle however long the Boolean string is in this case, then you have the computing power and the other equipment to crunch the information to put it through the filtering process. Technically you can do as much as you want. It’s going to cost you a lot of money and you’re going to have to buy some big computers and other equipment, bit synchronizers and that sort of thing, monitoring error rates.

You have to be careful to overdo it [sic], because if you overdo the situation, you’ll saturate your bit error rate. So in our hypothetical situation, you could write a program to do this, but you wouldn’t be able to filter enough, say. Ultimately you would have to tweak it over time; you would analyze what your output was and say “no, we’re getting too much garbage, so we need to focus on this particular filter or this particular item, to be able to winnow it down to where you want it to be.”

You run the risk the other way of omitting information you may have wanted, which is where you need specialists, who know exactly the information you want, to work with the software engineers and the language specialists to make sure that everyone’s working in sync so that you get the what you want. Normally a linguist or a software engineer isn’t the intelligence analyst or intelligence specialist who knows the nitty-gritty of the intelligence or the information you’re looking for.

Assuming that what Tice describes here applies to Program X, then the program didn’t start with phone numbers of “known” members of al-Qaeda, which is how President Bush said the TSP operated. Instead, the NSA was allowed to collect intelligence on a huge scale and mine the collected data for suspect words or turns of phrase believed to be connected to terrorism. Targets emerging from that data-mining would then become further targets, and so on. Over the weekend, the New York Times reported that the legal dispute over Program X between James Comey and the rest of the Bush administration centered on the program’s data-mining component.

Again, it’s far from clear that this is how Program X actually operated. But it’s still valuable to recall how such a surveillance program could have been structured — and, possibly, was — according to a prominent former NSA analyst.

Latest Muckraker
Comments
Masthead Masthead
Founder & Editor-in-Chief:
Executive Editor:
Managing Editor:
Associate Editor:
Editor at Large:
General Counsel:
Publisher:
Head of Product:
Director of Technology:
Associate Publisher:
Front End Developer:
Senior Designer: