NSA data mining and the false trade-off between privacy and security:

By Dale Carpenter on May 31, 2006 6:44 pm

Defenders of the NSA’s known domestic surveillance programs — listening in to some international calls and collecting records of Americans’ wholly domestic phone calls — sometimes claim that we must sacrifice a little privacy/liberty in order to gain security from future terrorist attacks. That sounds reasonable and pragmatic, as long as the magnitude of the loss of privacy/liberty is worth it in light of the magnitude of the gain in security. But in the case of the two programs revealed in the last 6 months, and especially the massive call-records programs revealed by USA Today earlier this month, the tradeoff may well be a false one. The program may well be all pain and no gain.
This column by Bruce Schneier, an expert on data systems and privacy, points to a big problem with data mining of the sort the NSA is doing with Americans’ telephone calls. It turns out to be a huge investment of wasted time and resources chasing rabbit trails. Writes Schneier:

Data mining works best when you’re searching for a well-defined profile, a reasonable number of attacks per year, and a low cost of false alarms. Credit-card fraud is one of data mining’s success stories: All credit-card companies mine their transaction databases for data for spending patterns that indicate a stolen card.

Many credit-card thieves share a pattern — purchase expensive luxury goods, purchase things that can be easily fenced, etc. — and data mining systems can minimize the losses in many cases by shutting down the card. In addition, the cost of false alarms is only a phone call to the cardholder asking him to verify a couple of purchases. The cardholders don’t even resent these phone calls — as long as they’re infrequent — so the cost is just a few minutes of operator time.

Terrorist plots are different; there is no well-defined profile and attacks are very rare. This means that data-mining systems won’t uncover any terrorist plots until they are very accurate, and that even very accurate systems will be so flooded with false alarms that they will be useless.

Just in the United States, there are trillions of connections between people and events — things that the data-mining system will have to “look at” — and very few plots. This rarity makes even accurate identification systems useless.

Let’s look at some numbers. We’ll be optimistic — we’ll assume the system has a one in 100 false-positive rate (99 percent accurate), and a one in 1,000 false-negative rate (99.9 percent accurate). Assume 1 trillion possible indicators to sift through: that’s about 10 events — e-mails, phone calls, purchases, Web destinations, whatever — per person in the United States per day. Also assume that 10 of them actually indicate terrorists plotting.

This unrealistically accurate system will generate 1 billion false alarms for every real terrorist plot it uncovers. Every day, the police will have to investigate 27 million potential plots in order to find the one real terrorist plot per month. Clearly ridiculous.

This isn’t anything new. In statistics, it’s called the “base rate fallacy,” and it applies in other domains as well. And this is exactly the sort of thing we saw with the National Security Agency (NSA) eavesdropping program: The New York Times reported that the computers spat out thousands of tips per month. Every one of them turned out to be a false alarm, at enormous cost in money and civil liberties.

Finding terrorism plots is not a problem that lends itself to data mining. It’s a needle-in-a-haystack problem, and throwing more hay on the pile doesn’t make that problem any easier. We’d be far better off putting people in charge of investigating potential plots and letting them direct the computers, instead of putting the computers in charge and letting them decide who should be investigated.

By allowing the NSA to eavesdrop on us all, we’re not trading privacy for security. We’re giving up privacy without getting any security in return.

With respect to the domestic call-records program, perhaps the NSA has developed a very precise formula for pinpointing patterns of terrorist-related calls that reduce the wasted time and resources that would otherwise be expended. Perhaps there are real and verifiable success stories — foiled plots, arrested would-be terrorists — that have come from the NSA’s activities. If so, we’ve seen little evidence of it, apart from the administration’s unsupported assertions that these NSA programs are needed for national security. Aside from the possible unconstitutionality of one or both of the NSA programs, there’s a deeper problem with the administration’s position. When it comes to the loss of personal privacy and liberty, the history of the abuse of executive power and the ever-present danger of the inadvertent disclosure of Americans’ personal data counsel that “Trust us” shouldn’t be good enough.

Navigation