The Volokh Conspiracy

District Court Holds that Running Hash Values on Computer Is A Search:
The case is United States v. Crist, 2008 WL 4682806 (M.D.Pa. October 22 2008) (Kane, C.J.). It's a child pornography case involving a warrantless search that raises a very interesting and important question of first impression: Is running a hash a Fourth Amendment search? (For background on what a "hash" is and why it matters, see here).

  First, the facts. Crist is behind on his rent payments, and his landlord starts to evict him by hiring Sell to remove Crist's belongings and throw them away. Sell comes a cross Crist's computer, and he hands over the computer to his friend Hipple who he knows is looking for a computer. Hipple starts to look through the files, and he comes across child pornography: Hipple freaks out and calls the police. The police then conduct a warrantless forensic examination of the computer:
In the forensic examination, Agent Buckwash used the following procedure. First, Agent Buckwash created an “MD5 hash value” of Crist's hard drive. An MD5 hash value is a unique alphanumeric representation of the data, a sort of “fingerprint” or “digital DNA.” When creating the hash value, Agent Buckwash used a “software write protect” in order to ensure that “nothing can be written to that hard drive.” Supp. Tr. 88. Next, he ran a virus scan, during which he identified three relatively innocuous viruses. After that, he created an “image,” or exact copy, of all the data on Crist's hard drive.

Agent Buckwash then opened up the image (not the actual hard drive) in a software program called EnCase, which is the principal tool in the analysis. He explained that EnCase does not access the hard drive in the traditional manner, i.e., through the computer's operating system. Rather, EnCase “reads the hard drive itself.” Supp. Tr. 102. In other words, it reads every file-bit by bit, cluster by cluster-and creates a index of the files contained on the hard drive. EnCase can, therefore, bypass user-defined passwords, “break[ ] down complex file structures for examination,” and recover “deleted” files as long as those files have not been written over. Supp. Tr. 102-03.

Once in EnCase, Agent Buckwash ran a “hash value and signature analysis on all of the files on the hard drive.” Supp. Tr. 89. In doing so, he was able to “fingerprint” each file in the computer. Once he generated hash values of the files, he compared those hash values to the hash values of files that are known or suspected to contain child pornography.Agent Buckwash discovered five videos containing known child pornography. Attachment 5. He discovered 171 videos containing suspected child pornography.
  One of the interesting questions here is whether the search that resulted was within the scope of Hipple's private search; different courts have approached this question differently. But for now the most interesting question is whether running the hash was a Fourth Amendment search. The Court concluded that it was, and that the evidence of child pornography discovered had to be suppressed:
The Government argues that no search occurred in running the EnCase program because the agents “didn't look at any files, they simply accessed the computer.” 2d Supp. Tr. 16. The Court rejects this view and finds that the “running of hash values” is a search protected by the Fourth Amendment.

Computers are composed of many compartments, among them a “hard drive,” which in turn is composed of many “platters,” or disks. To derive the hash values of Crist's computer, the Government physically removed the hard drive from the computer, created a duplicate image of the hard drive without physically invading it, and applied the EnCase program to each compartment, disk, file, folder, and bit.2d Supp. Tr. 18-19. By subjecting the entire computer to a hash value analysis-every file, internet history, picture, and “buddy list” became available for Government review. Such examination constitutes a search.
  I think this is generally a correct result: See my article Searches and Seizures in a Digital World, 119 Harv. L. Rev. 531 (2005), for the details. Still, given the lack of analysis here it's somewhat hard to know what to make of the decision. Which stage was the search — the creating the duplicate? The running of the hash? It's not really clear. I don't think it matters very much to this case, because the agent who got the positive hit on the hashes didn't then get a warrant. Instead, he immediately switched over to the EnCase "gallery view" function to see the images, which seems to be to be undoudtedly a search. Still, it's a really interesting question.

  Also, it seems that the Government failed to make the strongest argument that running the hash isn't a search: If the hash is for a known image of child pornography, then running a hash is a direct analog to a drug-sniffing dog in Illinois v. Caballes, 543 U.S. 405 (2005). Although Caballes is cited in the opinion for other reasons, it seems that the government didn't make the Caballes argument.

  It's possible that the argument wasn't raised because the agent made a hash of every file instead of running a search just for matches of known images. But I'm not sure that really makes a difference, and whether it does hinges on some interesting questions. Is the creation of the hash a search? Or is running a query that matches the hashes to known hashes and produces a positive hit a search? It might also break down based on how much the government saw of the machine while the hashes were being made: Perhaps the search occurred when the file structure was revealed to the officers (if it was in fact revealed). But if so, I'm not sure that the images themselves should be suppressed as compared to evidence more directly related to the revealing of the file structure.

  Either way, this is a fascinating computer crime law issue that gets debated from time to time without any case law; I believe this is the first case on the topic. Ah, more grist for the mill of the forthcoming second edition of my computer crime casebook. Thanks to FourthAmendment.com for the mention of the opinion, and Matt Caplan for the .pdf.

Related Posts (on one page):

  1. A verb usage you don't see every day:
  2. District Court Holds that Running Hash Values on Computer Is A Search:
PatHMV (mail) (www):
How did they get access to the computer itself? From the quoted sections, it sounds like they had physical custody of the computer, that this wasn't just remote snooping.

In the most basic analysis, you're physically reading the bits of information which are stored in the physical properties of particular molecules of the machine. It would seem obvious to me that reading that data, in any form or manner, is a "search" of the data contained on the disk.
10.27.2008 11:18pm
OrinKerr:
PatHMV,

I have interpreted your comment as a motion for a more detailed post, and I have granted your motion.
10.27.2008 11:28pm
Paul Allen:

The Government argues that no search occurred in running the EnCase program because the agents “didn't look at any files, they simply accessed the computer.”


This argument strains at the mechanics how computers work. Both to duplicate the files and generate the hash, the agent's computer, under his direction, accessed every byte very exactly.

Second a hash used this way is an acceleration of comparing files for exact match that allows the government to avoid archiving porn directly. With high probability (given a sufficiently limited reference pool) this is akin to directly comparing the files. I fail to see how this can be distinguished from looking at the files.

Third, the hash is a representation of what the file contains. Just as displaying the image is a representation of what the file contains. The hash happens to be a lossy representation. Say there is an audio recording on the computer. The government compresses (or encodes) the recording to a low-quality mp3 format, then listens. Is this is a search? Yes? Okay now turn down the quality. Repeat. Repeat.

Fourth, many internet search engines work in an identical way. They reduce your keywords and the page text to a set of hashes that they compare against. This is what allows google, for instance, to match your search terms to similar but not identical words.
10.27.2008 11:30pm
_quodlibet_:
So basically the agent makes a copy (an "image") of the entire disk, and then searches for child porn on the disk image, with files being compared by their hash-code. The process, as a whole, definitely seems like a "search" to me.
10.27.2008 11:33pm
Soronel Haetir (mail):
I have yet to read the opinion but it seems that they missed something even more basic here. If the landlord had legitimate possession of the computer how come this was not allowed under the same theory that allows for the collection and inspection of trash or other discarded materials?
10.27.2008 11:33pm
Anonperson (mail):
I also agree that reading the data is a search. It's not possible to compute the hash without reading the data. Here's a simple hash algorithm that gets the idea across: just treat each byte as a number from 0 to 255, and add them all up. Once the number is greater than 1000, just wrap around back to 0 (modulo arithmetic). This is a simple hash algorithm.

There is no way to compute this without reading the data. In theory, you could read only every other byte, but would that make a legal difference?

As an analogy to the physical world, imagine a very sensitive chemical sniffer that detects a THC "signature". Since it only detects a signature, it is possible that some non-marijuana plants also trigger the sniffer (with a very small probability). Now, if I walk around your house, with a blindfold on, applying this sniffer to various plants, does that count as a search?
10.27.2008 11:34pm
Bill Sommerfeld (www):
I write code using various cryptographic hash functions on an almost daily basis and have a good understanding of their uses and limitations.

First, anyone still using MD5 should have their head examined -- it's been pretty thoroughly broken. Use SHA256, please!

Note that OS and application software may compute hashes of files or parts of files as part of its normal operation.

I'd argue that the "search" occurred not when the hash values were computed but rather when a set of hashes of files on the computer was intersected with a set of hashes of known child porn.

I assume that there are cases that law enforcement can sieze a sealed container to protect it from being modified or destroyed before they can get a search warrant.

By analogy, computing a set of hashes at the time the digital evidence was seized would be a really good idea as it would increase confidence that the evidence had not been modified in the course of a later search.
10.27.2008 11:34pm
Soronel Haetir (mail):
(I have to agree with the conclusion that such an operation is a search, I would just have expected it to be allowed under the described circumstances).
10.27.2008 11:35pm
Melancton Smith:
Since they compared the hash values of each file to known hash values they performed a search. They searched the data for known hash signatures. This was a search.

I don't understand the chain of custody, however. Since I am not a lawyer, it is not surprising.

Isn't this analogous to a third party providing evidence? The landlord gave the computer away to another who gave the hard drive to the police. There certainly was no seizure. Isn't the report of illegal material by the third party sufficient cause for a search?
10.27.2008 11:37pm
OrinKerr:
For those of you who think this was a search: What do you make of the Caballes question?
10.27.2008 11:39pm
Melancton Smith:

First, anyone still using MD5 should have their head examined -- it's been pretty thoroughly broken. Use SHA256, please!


Why does that matter for their purposes? They aren't using the hash as a means to protect the data, but simply using the fact that the hash of a particular file is unique and repeatable.
10.27.2008 11:39pm
_quodlibet_:
>>470604

Fourth, many internet search engines work in an identical way. They reduce your keywords and the page text to a set of hashes that they compare against. This is what allows google, for instance, to match your search terms to similar but not identical words.

No! Cryptographic hash functions like MD5 produce entirely different output for inputs that differs in even a single bit. AFAIK, Google does not use hashes for searching text.
10.27.2008 11:40pm
mnarayan:
The only technical difference between this and doing a direct comparison bit by bit of all files against known child pornography is that this has a higher false positive rate. Not sure how the latter would be a search and the former not.
10.27.2008 11:45pm
Hoosier:
It sounds like a search to my untrained ears, in that I would consider it a search had it been my computer.

But can anyone come up with an illustrative analogy to hard-copy data searches? If I have a pile of magazines in the desk of my home office, and someone (Does what?) and this leads a police officer to find child porn, is it a search?

Is there anything comparable? Or is this Amend IV new ground?
10.27.2008 11:46pm
gasman (mail):
The government's case against the idea of a search seems to be also that they first duplicated the contents of the hard drive, then searched the duplicate.

This was an impossible concept not too long ago. In the future perhaps star trek like transporter technology will be available and by duplicating your home on a holodeck they could search the virtual home, find what they were after, then confirm the existence of the real item of interest in your real home with the holo equivalent of the hash value.
If it looks like a search and sounds like a search then it is probably a search.
10.27.2008 11:46pm
gasman (mail):
The government's case against the idea of a search seems to be also that they first duplicated the contents of the hard drive, then searched the duplicate.

This was an impossible concept not too long ago. In the future perhaps star trek like transporter technology will be available and by duplicating your home on a holodeck they could search the virtual home, find what they were after, then confirm the existence of the real item of interest in your real home with the holo equivalent of the hash value.
If it looks like a search and sounds like a search then it is probably a search.
10.27.2008 11:48pm
Anonperson (mail):
Interesting. I'm not a lawyer, but I read the Caballes decision. It seems that the government would have had a strong case. If the standard is that a search that does not “compromise any legitimate interest in privacy” is okay, then this is not a search. That's because the probability of any loss of privacy is negligible, especially if they are willing to use huge, good hashes. In those cases, the probability of a false positive are insignificant.

Note that using my lay judgement, I called it a search, but if we are to use the standard above, then I think it is not a search.
10.27.2008 11:51pm
_quodlibet_:
Orin Kerr asked:

For those of you who think this was a search: What do you make of the Caballes question?


I think it's inapplicable because the officer needed to actually peek inside the disk. In the Caballes case, the dog was able to sniff the marijuana from outside the trunk:

While Gillette was in the process of writing a warning ticket, Graham walked his dog around respondent’s car. The dog alerted at the trunk. Based on that alert, the officers searched the trunk, found marijuana, and arrested respondent. The entire incident lasted less than 10 minutes.


"Dog sniffs marijuana outside closed trunk" is analogous to "Officer detects child porn on disk by sensor readings of the electromagnetic field outside the computer"

"Officer detects child porn on disk by reading data from disk" is analogous to "Dog sniffs marijuana in trunk after officer opens trunk"
10.27.2008 11:52pm
OrinKerr:
_quodlibet_,

Perhaps, but then in United States v. Jacobsen, actually taking the substance and destroying it to test it for drugs was held not a search under the Caballes rationale.
10.27.2008 11:54pm
Anonperson (mail):
_quodlibet_, note that using a thermal imager is considered search.
10.27.2008 11:54pm
Tummler:
I though the fundamental premise of Caballes was that drug sniffing dogs alert only to the presence of illicit drugs. Thus, the sniff of a drug dog is not a "search" since it reveals information only about illegal activity.

With respect to hash files, the government can identify any file on someone's computer that it has already indexed, whether it be illegal child porn or a communist manifesto.

I suppose the government could argue that conducting an MD5 hash analysis of a hard drive is synonymous with scent detecting dogs because the dogs can be trained to alert to any scent, not just the scent emanating from illegal drugs. However, I don't think the government will be making that argument any time soon .
10.27.2008 11:55pm
Soronel Haetir (mail):
OK,

I actually wouldn't particularlly distinguish Caballes from this case unless there is something I am missing about the chain of custody. SCOTUS did not rule that the use of a drug sniffing dog isn't a search, they ruled that it is not an unreasonable/illegal search.

An analogy to the infrared search used as an example of an illegal search would be a police created virus that performed the same hash function done here and then transmits that data to HQ. While not a search in and of itself I would say that operation would perform a data seizure.
10.27.2008 11:55pm
DrObviousSo:
Going only from the wikipedia summary of Caballes, "Caballes argued that it was wrong to assume that the barks of drug-sniffing dogs reveal only information regarding the presence or absence of narcotics. But the Court rejected this argument because there was no information before the state courts to support it, and because he did not point to anything else in which a person has a reasonable expectation of privacy that a dog bark might reveal."

As was described above, a MD5 is like a dog bark that signals either contraband, or stuff that looks like contraband on a bit level. This can include, well, anything. With access to an 'incriminating' hash and a hash collision program (google it), you could attach the incriminating hash to any number of files you would have a legitimate interest in keeping private.

This has been true since at least 2005
10.27.2008 11:56pm
OrinKerr:
Soronel Haetir: Sorry, but that's just wrong as a matter of Fourth Amendment law.

Everyone else: I have posted a copy of the opinion, via rader Matt Caplan.
10.28.2008 12:02am
Anonperson (mail):
DrObviousSo, I'm not sure how this differs from the physical world. Are you saying that the police could plant evidence? That's also true in the physical world. Or are you saying that you could intentionally create false hits? That's also true in the physical world, it seems.
10.28.2008 12:02am
DrObviousSo:
I guess I should also specify, natural hash collisions are possible as well. I don't know the likelyhood of a random embarrassing pic colliding with an arbitrary hash, though.

Also, the clear message here is to make batch edits to your contraband files.
10.28.2008 12:03am
Soronel Haetir (mail):
I am also somewhat surprised that the police would perform this hashing operation on a file-by-file basis rather than sector by sector. There are, as have been mentioned many ways to hide files within other files that would escape file hashing. Of course, if these people were actually smart they'd be using crypto that wouldn't even indicate whether real data exists, witness the border crossing case that Prof Karr has brought up before.
10.28.2008 12:04am
Paul Allen:
_quodlibet_ writes (misleadingly):

No! Cryptographic hash functions like MD5 produce entirely different output for inputs that differs in even a single bit. AFAIK, Google does not use hashes for searching text.


I did not say that google used MD5 in searches. I said they used a hash as part of the process. Many hashes exist. What does google do exactly? I don't know. But in the meantime you should look into other famous hashes: soundex Metaphone
10.28.2008 12:04am
xyzzy:
Hypothetically, suppose Crist had hired "Sell Shipping", a private shipper, to ship the computer somewhere. Sell's employee Hipple breaks open the package, discovers contraband, then "Hipple freaks out and calls the police."

In that hypothetical case, I can see why the question would be whether the subsequent warrantless search by police exceeded the scope of the third-party discovery.

But, under the actual facts of United States v. Crist, as you've presented them, I'm having a hard time understanding the fourth amendment question.

Please tell us in more detail how we got here.
10.28.2008 12:05am
Greg Q (mail) (www):
I think what the officer did should qualify as a search.

They had a third-party who ahd looked at the computer, saw child porn, and called the cops. Why does this not qualify as "probable cause"? Why didn't they just get a search warrant?
10.28.2008 12:05am
wb (mail):
I would argue that if the search occurred by the act of comparing the hashes to hashes of known child pornography, then Caballes would apply. However, if the search is occurs by the very act of reading the bits and transferring them for later examination then Caballes does not apply as that process could reveal information for which a legitimate expectation of privacy does hold. Or in other words the procedures used have a broad competence for revealing information in contract with the singular competence of the dog.
10.28.2008 12:09am
Anonperson (mail):
I guess I should also specify, natural hash collisions are possible as well. I don't know the likelyhood of a random embarrassing pic colliding with an arbitrary hash, though.


With a good hash function, the probability of a collision is vanishingly small. You can simply treat the hash value as a random number. So, if you have a given 128-bit hash value, the probability that another given picture matches a specific hash is 1/2^128. (Note that this is not the same probability as that of a collision, but both are very small.)
10.28.2008 12:10am
jccamp (mail):
"For those of you who think this was a search: What do you make of the Caballes question?"

There's actually two questions in Caballes facts: did the police need a warrant to allow the dog to walk around the outside of the automobile, and once the dog alerted to drugs, did they then lawfully search &seize the auto?

Isn't there an exception when considering motor vehicles which are not (yet) subject to seizure, and thus may be driven away? The citation escapes me, but it was an old bootlegging case, as I recall. So, if officers have PC to believe an automobile, which can leave, contains contraband/evidence/fruits of the crime, they can search the auto without a warrant.

If you mean whether the dog walking around the car constitutes a search, I would say that if the dog can lawfully be where he was and still detect contraband, then a warrant is unnecessary for the dog's presence, just as if an officer, while lawfully present in a location, can then see or sense contraband on or within private property, that would constitute PC for a warrant (unless one of the other exceptions applied).

In the cited case, it seems to me that the police, when shown evidence of child porno on the computer, still needed to obtain a warrant to search the hard drive. Given the circumstances, establishing the probable cause for a warrant would have been simple and straightforward. Why not be safe and get the stupid warrant. it's not brain surgery...

I'd like to add one thing off-post: when constructing a search warrant and supporting affidavit, generally one wants the items to be searched and the items to be searched for to be as broad as possible while still maintaining good-faith compliance with the warrant requirements. In the present case, I would not want to use something like hash values to narrow the files which are examined. In this case, the investigators would then review only the photo files which somehow met the hash value of known pornography (if I understand this process). If the hash values were not considered, then the investigators could reasonably look not only at every single image on the drive, but also check email, IM's, and even word processing files, looking for embedded or attached images. I would want to do this because, say, the suspect may have home-made photos, which don't match a hash value of known porno, but which may contain images which either constitute porno themselves, or even worse, might show something like child abuse. The same for files which are documents: one might find credit card charges for the porn images, or reference to the porn or other crimes in text. There's a line between complying with specificity requirements &fishing expeditions, but intelligent wording of a warrant might allow more of a search, which is usually, from law enforcement's perspective, a good thing.
10.28.2008 12:11am
John Burgess (mail) (www):
First, the computer could have been considered abandoned property as it was in the process of being discarded via the landlord and his agent. Is there privacy retained in abandoned property?

Second, the agent's friend, yet another step removed from the former owner, find unlawful material and calls the cops. That is indeed probable cause and should have immunized the policy agency that did the analysis. That the discovery of unlawful material was made by a third party with no prompting from the police (i.e., the third party was not an agent of the state), there should have been further immunity.

Bad decision, IMO. I think it's going to be overturned on appeal.
10.28.2008 12:13am
Matt Caplan (mail):
Following up on what Greg Q said (10.27.2008, 11:05pm), why doesn't this fall under the "independent source" exception to the warrant requirement?

Wouldn't the testimony of the third party qualify as a source of probable cause independent of the actual search?
10.28.2008 12:16am
Paul Allen:
Caballes is the intellectual kin of 'search incident to arrest', the Leon good-faith exception, etc. If an officer happens to notice something while performing some other legal (and constitutional) activity. Everything is okay.

So is this an indirect consequence of walking around with a police dog? Or is it a deliberate action?

Drugs emit volatile compounds which float through the air, thru fabric, etc. These emissions become public. Dogs smell them and react. A computer hard drive emits nothing coherent beyond a few hair-thicknesses. Access requires proactive measures.

The difference between smelling drugs and reading computer disks is the difference between letters in your home and a conversation overheard.
10.28.2008 12:18am
_quodlibet_:

_quodlibet_ writes (misleadingly):


No! Cryptographic hash functions like MD5 produce entirely different output for inputs that differs in even a single bit. AFAIK, Google does not use hashes for searching text.


I did not say that google used MD5 in searches. I said they used a hash as part of the process. Many hashes exist. What does google do exactly? I don't know. But in the meantime you should look into other famous hashes: soundex Metaphone

Well, sure, if you're using the phrase "hash function" in the wider sense, then you're certainly correct. But the type of hash functions being discussed here, namely crypto hash functions such as MD5, don't map similar inputs to similar outputs.
10.28.2008 12:21am
OrinKerr:
Caballes is the intellectual kin of 'search incident to arrest', the Leon good-faith exception, etc. If an officer happens to notice something while performing some other legal (and constitutional) activity. Everything is okay.

I couldn't disagree more.
10.28.2008 12:22am
Anonperson (mail):
The difference between smelling drugs and reading computer disks is the difference between letters in your home and a conversation overheard.

Except that they read it in such a way that they could not inadvertently stumble on things that were not child porn (assuming that the police acted in good faith and followed procedure, etc.). How much of a legal difference does that make? I don't know.
10.28.2008 12:23am
DrObviousSo:
Anonperson - I guess that's a pretty good point. Do you know how many files are likely to be on a typical home computer? Or the size of a potential kiddie porn has dataset?
10.28.2008 12:25am
Soronel Haetir (mail):
OK,

I would also say that this operation is not particularly dissimilar from having an officer examine every file on the drive. There is roomusing this method to both pick up non-contraband (items that match hashes but are nonetheless legal, with a good hash this is extremely unlikely under these circumstances and doesn't even matter that much really) and miss actual contraband (contraband items that aren't already known, or contraband items that have been modified so that they have a new hash value).

I see this operation mostly as a quick filtering, a way to save having to do a bunch of tedious work that would be needed if each file needed to be examined one by one.

Would you count that eyeball examination of each file a search?
10.28.2008 12:25am
Bruce:
It looks to me like this hash was just to compare against later, to ensure that nothing changed on the drive in the meantime. I.e., it's like putting an "Evidence" sticker across the lid of a container and then initialing and dating it. Even if some hash operations could be searches, this does not appear to me to be a search. Sure, the program accessed every byte, but it didn't (and was set up to) return any useful information to the human operator about the contents, other than whether they changed. If this is a search, the evidence sticker is a search.
10.28.2008 12:27am
Avatar (mail):
Presumably the issue of random natural collisions is a non-issue. It's not like the government is prosecuting this guy on the hash value alone - once they're alerted to files which have a hash value matching known circulating images/videos of child porn, you still need to have a human actually view the file to make sure that, yup, it's actual child porn. And that's what happened - the officer saw flags and started looking around with the mark I eyeball, finding a lot more CP than was actually flagged.

The crux of the case is that the court didn't accept a lame "it's not a search, it's a hashing" argument when the main argument of the prosecution (that the computer had been "abandoned" and that the search was thus okay) was shot down. Had the circumstances displayed an intent to abandon, this issue would not have arisen.

There's also chain of custody worries - the computer was outside the defendant's control and thus the presence of child pornography on the computer cannot be presumed to be possessed by the defendant. It's equivalent of stealing someone's luggage and reporting to the police that you found a brick of marijuana packed inside - the police have no way of determining if that brick was in there when the luggage was stolen. Even if it was wrapped in one of the owner's t-shirts, that's still not evidence that the owner placed said brick inside the luggage. (At this point you'd want to go on the attack, pointing out that the person who discovered the child porn had obtained the computer in, ahem, adversarial fashion...)

I'm not sure why a search warrant was not obtained, however. The entire point of retrieving an image of the hard drive is to allow data operations on the image without harming the original (or vice versa). Once that image had been taken, the police were free to wait for the necessary paperwork to clear - that image wasn't going to deteriorate or leave police possession. So long as they don't go snooping through it before the warrant comes in, they haven't performed a search.
10.28.2008 12:29am
Soronel Haetir (mail):
OK,

Here is the sentence from Caballs that I use to justify my earlier statement. Perhaps it does not fairly reflect 4th amendment law.

    Official conduct that does not “compromise any legitimate interest in privacy” is not a search subject to the
Fourth Amendment.


I read this to mean exactly what it says, in that the use of the drug sniffing dog is not a search subject to the 4th amendment, not that it is not in fact a search.

I think the letter vs. overheard conversation is actually a pretty good comparison here.
10.28.2008 12:33am
jccamp (mail):
Bruce,

I would think that making the initial hash value (am I using the term correctly?) merely to establish and ensure the unchanged nature of the original data would not, in and of itself, be a search, as long as the hash values were not then cataloged and compared to known values of contraband. The search was actually the second step (or third), that is, comparing the mirror image of the original drive with known porn, with the intent to establish criminal possession. As long as the data contained within the hard drive was never indexed or looked at, then creating the hash value merely to verify integrity of the data does not seem like a search to me. Doing anything else with the resulting hash values, however, does seem like a search to me.
10.28.2008 12:35am
Hoosier:
Orin:

It looks like someone did in fact write such an article. And he even has your name. I'm really surprised that you didn't know this.

Thanks for the link: I(A) is just what I was looking for.
10.28.2008 12:35am
Michael Poole (mail):
DrObviousSo- My computer is not exactly typical -- it is running Linux -- but it current uses just over 700,000 inodes (files plus directories), comprising roughly 443 GB of data.

More to the Caballes question, creating a hash value (as Paul Allen pointed out) requires that a software agent of the law enforcement officer read every single byte of data from the hard drive. This seems like a clear search to me -- it is not detecting odorous emanations from a container, but investigating and operating on every part of the contents. In this way hash computations are analogous to X-ray or thermal images rather than a sniffer, and I am not surprised that the government passed on making the argument.
10.28.2008 12:53am
billb:
Orin: The dog in Caballes did not examine the contents of the car's trunk. It arguably examined the residues left on the outside of the car by those who transfered the drugs to the inside of the trunk. At the very least, it examined the particles in the air outside the trunk created by insufficiently wrapped up drugs. These odors might have been discernible by a well-trained human nose as well.

In the instant case, the police took perfect copies of the defendant's files, handed them to an oracle, and asked it "Are any of these files contraband?" The oracle answered in the affirmative. The oracle being in this case a computer with a list of hash values paired with a list of known file porn images.

The police could not have shown the computer to the oracle and said "is there contraband in there?" They must disassemble the computer, remove the drive, surreptitiously (in the sense of bypassing the existing OS) copy every bit of data on the drive, and then ask the oracle for a sophisticated examination of that copy in order to determine if there's a problem with said data. When I turn off my computer I expect that no one will take it apart both physically and digitally in order to determine if it contains contraband. Fido didn't take the car in Caballes apart. It simply looked (with its nose) at things already on the outside.

Now, one might argue that since the cops don't learn anything from the hash values themselves, then their oracle only tells them when bad things are present. Thus we're nearly back at Caballes. But, since the cops did have to disassemble the computer and bypass the usual roadblocks to getting at Crist's data, they clearly searched it. They searched it by taking it apart, removing the drive, copying the data, and analyzing it. All of these things required bypassing things put in place to keep that data away from prying eyes: the case, the drive itself, the OS, etc., and so they all constitute elements of a search. Even with a witness, to my knowledge, the police cannot disassemble (without a proper warrant) my car (not at or within 100 miles of the border) to pass every piece of it past a dog's nose, bolt-by-bolt, to see if any of it is contraband. Neither can they do so to my computer. If their oracle can determine the existence of child porn by looking at the outside of my computer, more power to them.

Now, as a layman, here's the thing I don't understand: The police have the computer. They have or could have gotten an affidavit from Hipple stating that he saw what he believed to be contraband on the computer. I don't see how they could not have passed this information in front of a judge to get a warrant. Everything they need appears to be there. The machine is secure, and there's no danger of Crist deleting the files or destroying the machine. Why not dot all the Is and cross all the Ts?
10.28.2008 1:02am
Chem_geek:
Hoosier:
It looks like someone did in fact write such an article. And he even has your name. I'm really surprised that you didn't know this.

Thanks for the link: I(A) is just what I was looking for.


Clearly, we owe Orin Kerr a beer.
10.28.2008 1:06am
whit:
It is most definitely a search. Comparing hashes is a less intrusive search in that, the agent was only actually looking at the contents of the files himself WHEN he got a match.

In an old skool conventional search you go thumbnail to thumbnail looking at images. this is much more invasive of privacy because the agent will see the contents of every file.

the hash search means he will only look at the contents of the file if and when he gets a match to a known illegal file.

but I cannot see how anybody... even the govt lol... could argue it's not a search.

and i cannot understand why a warrant wasn't applied for. this isn't a "street" thing. the frigging thing is sitting on the agent's desk.
10.28.2008 1:08am
roy:
The hash adds a level of complexity, which I don't think is relevant to the law. The program which computes the hashes must look at every bit of every file — or at least every interesting file — to get the hash. Comparing hashes is a lot more efficient than comparing entire files, but I don't see how it is any more or less invasive.

Various people are correct to point out that two different files can have identical hashes. The math makes this necessary; files can be arbitrary large and pictures are often several kilobytes, but the hash is usually only a few bytes and there's no point is using a hash as large as the files you're interested in. Fewer bytes means fewer possible values.

So your grocery list may have the same hash as an illegal picture, just because there aren't enough possible hashes for every file to get a unique one. BUT you can reduce the risk as much as you care to by increasing the size of the hash. A 32-bit hash is on the small side, but can give a less than 1-in-4 billion chance of accidentally mis-matching two files, depending on the algorithm used. I think that's better than fingerprints.

My math assumes random file contents, which is wrong, but good hashes have a way of acting randomish with non-random input, so the basic point stands anyway.

(IANAL — but I am a software engineer)
10.28.2008 1:08am
A. Zarkov (mail):
An MD5 hash value is a unique alphanumeric representation of the data, a sort of “fingerprint” or “digital DNA.”

Two files could map into the same hash, so hashes are not like fingerprints unless you believe two people can have the same set of fingerprints. In other words, except for special cases, hash functions are not injective. As a practical matter it's unlikely any two given files would have the same hash, but it's not impossible. You could have a legal to possess file with the same hash as one that's illegal, but the chances are small.
10.28.2008 1:10am
gattsuru (mail) (www):
From a physical viewpoint, the point where the police had physical control of the device would seem like a "search", especially since they either had to access the computer or the hard drive itself physically to do the search. Both the copy and search involve bit-by-bit access of each file, and it's hard to think of a more complete search than that. You can argue that the programs involved only flag illegal things, but cryptographic collisions *do* exist, and not in a merely theoretical viewpoint.

On the other side, I'm pretty sure that Hipple's statements would be covered by the silver platter doctrine, and the search of the computer's files would be inevitable given that evidence.
10.28.2008 1:11am
Anonperson (mail):
...creating a hash value (as Paul Allen pointed out) requires that a software agent of the law enforcement officer read every single byte of data from the hard drive.

It would certainly be possible, however, to create a hash that only looked at every other byte of a file. If we assume that a typical child porn image is 100K, that's still more than enough samples to produce a very low probability of a false hit.
10.28.2008 1:14am
whit:

Comparing hashes is a lot more efficient than comparing entire files, but I don't see how it is any more or less invasive.



I agree the hash is definitely a search. but it is less invasive. note: i am not saying this does not mean he shouldn't have gotten a warrant. it's still a search. get a warrant.

it is clearly less invasive because the agent isn't actually looking at what's IN files, certainly not in their screen rendered glory unless and until a hash match is found.

iow, assume you have 10,000 image/video files on your computer.


which is more invasive to your privacy
1) agent looks at each file (either looking at the jpg, etc. or the video file (mpg) etc. to determine if it's contraband

OR

2) agent applies a formula (which is what a hash is) to each byte in your files and then only if a match to a known illegal file is made does he actually VIEW the contents of the file in their rendered glory.

see the difference?

they are both searches. one is clearly more invasive.
10.28.2008 1:14am
Tummler:
I think a lot of the posters here have grossly misconstrued the fundamental premise of Caballes. I though the Court was rather clear that its decision turned wholly on a drug-sniffing dog alerting to the presence of illicit drugs. While the other facts were necessary for the court to be presented with the particular question at issue in Caballes--i.e., no physical intrusion into the car and "the residues left on the outside of the car"--the decision did not turn on those facts.

For example, if all drug dogs had the ability to somehow convey to the police the exact contents in an automobile trunk, both illicit and legal items, by sniffing the outside of the drunk, the Court would almost certainly hold that the sniff constituted a search. Of course, this is assuming that they public at large does not commonly use such dogs.

This case is different from Caballes in a number of ways. First, the government can determine what files it program alerts to on the fly. Second, once the government creates its index of MD5 sums from the suspects hard drive, it will probably retain this index indefinitely. With drug-dogs, the government has a limited amount of time to do what it wants to do.
10.28.2008 1:25am
Anonperson (mail):
You can argue that the programs involved only flag illegal things, but cryptographic collisions *do* exist, and not in a merely theoretical viewpoint.

True, but the probability of an unintentional collision can be made as small as desired. Note that DNA evidence is admissible in court, and that also has some chance of false positives. What is the legal standard that must be applied for searches?
10.28.2008 1:28am
Anonperson (mail):
This case is different from Caballes in a number of ways. First, the government can determine what files it program alerts to on the fly. Second, once the government creates its index of MD5 sums from the suspects hard drive, it will probably retain this index indefinitely.

True, but it cannot alert on all files that contain the word "bomb", for example. On the other hand, it could alert on any well-known files. For example, if a well-known PDF of the a leaked document was circulating, it could alert on that.
10.28.2008 1:36am
roy:
whit,

I wasn't discussing the difference between visually checking each file, and visually checking only those files that software matched to known illegal files. I was discussing the difference between matching with a hash matching without a hash.
10.28.2008 1:36am
xyzzy:
Thanks for the link to the opinion.

Personally, I think the key is on pp.3-4.

The court finds, in the main body of the opinion:
On September 30, 2005, the AG’s Office took custody of Crist’s computer in order to conduct a forensic examination. The AG’s Office was informed that the computer was “seized pursuant to consent from its owner”.

(Emphasis added.)

But in footnote 2, the court elucidates:

The record is not clear on when Detective Cotton became aware of Crist’s complaint regarding his computer. At various points in his testimony, it seemed that he was aware from the outset, see, e.g., Tr. 42, 47, but at other points, he claimed to have learned of the complaint only after the computer was brought in by Hipple, see Tr. 51. In any event, it is clear that Detective Cotton knew that Crist had reported his computer stolen before he contacted the AG’s Office.

(Emphasis added again.)

Putting those two highlighted facts together leads to an unsavory conclusion. Detective Cotton knew that the computer had been reported stolen, but nevertheless informed the AG's office that this was a search with consent of the owner. The clear inference is that Detective Cotton was less than honest.

When you add that little detail to the account, I agree that the evidence should be suppressed. There's a big chain of custody problem.
10.28.2008 1:53am
OrinKerr:
Soronel,

Read my article I link to: It answers your question. As for whether conduct amounts to a search in a sense not recognized by the Fourth Amendment, I don't really have any interest in that.
10.28.2008 2:03am
Graham Simms:
If the government rummaging through a hard drive to look at the hash values of the files on it is not a search because the government is only looking for "contraband," then what stops the government from searching everyone's computer as soon as it is hooked up to the internet?

I think we should take the Court's statement that drug dogs are sui generis at face value and let Caballes sit out there as an outlier of 4th Amendment jurisprudence because drug dogs are so unique. I'm still pretty sure that the police can not train a dog to smell what I have written on a notebook in my trunk, and I'm also sure that if police wanted to use hash values to search for legal content they could. I think that is what makes hash values different from drug dogs for purposes of searches under the 4th Amendment. The likelihood that they could be used to search for legal content.
10.28.2008 2:13am
devin chalmers (mail):
roy, I think you nailed it.

I have to admit, I very much enjoy reading judicial opinions that turn on intricacies of computer engineering. As a software person, it's a little like watching a nature documentary about oneself. Reading about familiar subjects spoken of in such an unfamiliar (almost, though I hesitate to use the word here, childlike) way is fascinating. It's admirable how often they seem to get it right, as in this instance.

A few side notes:

- The odds of hash collisions (though they can be architected in the case of MD5) are so astronomical that getting even one hit would seem to be pretty bulletproof probable cause. With 176 matches, they might as well lock you up without even bothering to check the original files.

Like this: if the odds of a collision were 50%, orders and orders of magnitude higher than is reasonable (though it's hard to put an exact number on it), then 1/2^176 gives approx. a 1 in 10^53 chance for all 176 to be false positives. Even if Agent Buckwash ran a check on a suspected pedophile's computer once every nanosecond, we wouldn't expect a false positive of that magnitude for 10^26 years.

To put it clumsily, given the current age of the universe to work in (approx. 10^26 nanoseconds), Agent Buckwash wouldn't encounter a single such fluke occurrence unless he could do a whole universe age's worth of one-pedophile-per-nanosecond checks, every single nanosecond. (The odds are roughly the same as quantum fluctuations spontaneously teleporting you, bodily and unharmed, to the surface of Mars.*) Even with a 90% chance of collision, the odds of winning the lottery are significantly better than a false positive. The search required to generate the hashes is the problem, not hash collisions.

* God, I wish there was a link for this. Trust me.

- The best real-world analogy for what the agent did in this instance would be something like the police sending a hyperintelligent robot into your house to rummage through your things, comparing what it sees to suspicious materials and reporting back. If that's not different from a drug sniffing dog circling a car I'll eat my hat.

If the use of technology or not in the course of an invasion of privacy is the criterion for a search, RoboCop never needs a warrant. (Actually, that sounds about right, come to remember.) See also looking through walls for drugs.
10.28.2008 2:51am
gattsuru (mail) (www):
True, but the probability of an unintentional collision can be made as small as desired. Note that DNA evidence is admissible in court, and that also has some chance of false positives. What is the legal standard that must be applied for searches?

Well, I'm not a lawyer, but I was under the understanding that forcing someone to provide DNA evidence requires a warrant. The issue is not whether this could be good proof, the issue is whether it could be invasive on matters that are not illegal.
Like this: if the odds of a collision were 50%, orders and orders of magnitude higher than is reasonable (though it's hard to put an exact number on it), then 1/2^176 gives approx. a 1 in 10^53 chance for all 176 to be false positives. Even if Agent Buckwash ran a check on a suspected pedophile's computer once every nanosecond, we wouldn't expect a false positive of that magnitude for 10^26 years.

Bad, bad software engineer. You've ignored both the birthday paradox, and assumed that purely statistical information is a good metric of applied data.

The number of matches in this case make it clear that the odds are stupidly prohibitive, but the court has to place precedent that would apply for even one match.
10.28.2008 3:04am
Soronel Haetir (mail):
Prof. Karr,

I would argue that even your exposure standard would actually call this action a search. The files may not be rendered as images, instead they are exposed and rendered as hashes. Perhaps not very titilating, but still exposed.
10.28.2008 3:11am
OrinKerr:
Soronel writes:
I would argue that even your exposure standard would actually call this action a search. The files may not be rendered as images, instead they are exposed and rendered as hashes. Perhaps not very titilating, but still exposed.
Yes, that's why the main post begins its analysis with the statement, "I think this is generally a correct result: See my article Searches and Seizures in a Digital World, 119 Harv. L. Rev. 531 (2005), for the details. "
10.28.2008 3:26am
einhverfr (mail) (www):

Why does [using MD5] matter for their purposes? They aren't using the hash as a means to protect the data, but simply using the fact that the hash of a particular file is unique and repeatable.


Actually it is repeatable, but not unique. All hashing functions will lack the uniqueness criteria you suggest.

What a hash search allows you to do is to generate a short list of files which are probable matches. Just because a particular file matches does not mean that it is a specific file.

Furthermore, there *is* a legitimate argument against using MD5 for this sort of activity. The issue is that one can essentially append data onto arbitrary files to create hashes of the values one wants. So if I had incriminating evidence on my computer, I could also create thousands of files with the same MD5 hash value and thus require a file-by-file check. "This, your honor, is a text file with the same md5 hash as a child pornography movie" isn't going to get very far in court.
10.28.2008 3:28am
Jay Ballou (mail):
Heck, the virus scan alone was an illegal search.

Third, the hash is a representation of what the file contains.

Wrong, unless you consider 0 to be a representation of half of all possible files and 1 to be a representation of the other half of all possible files.

Actually it is repeatable, but not unique. All hashing functions will lack the uniqueness criteria you suggest.

It's functionally unique; unless someone has intentionally cooked it, the odds of an incorrect match is much smaller than 1/number of atoms in the universe.

Furthermore, there *is* a legitimate argument against using MD5 for this sort of activity. The issue is that one can essentially append data onto arbitrary files to create hashes of the values one wants.

Um, at best you could create files on your disk that aren't pornography but match files that are -- why would anyone want to do that? You couldn't even frame anyone that way, since conviction would require examination of the actual files.
10.28.2008 3:48am
Avatar (mail):
While it's theoretically possible to "salt" one's system with lots of false hits in order to protect against police search, that's not necessarily effective. For one thing, if the police changed their hashing function even slightly, then your decoys are all worthless. Also, that kind of hash spoofing attack assumes that you've got access to the suspect hashes; I trust that the police don't publish the list of hashes of known child porn images online, huh?

And even if you did something like that, it's just about the same thing as dumping a few images of child porn into a big folder of ordinary porn; a determined and methodical search would turn up the incriminating material. In fact, it'd be rather more obvious that you were up to something screwy.

But in practice, it's not worth the effort. If you're really worried about your files being accessed by the cops, encrypt them. Hell, encrypt the filenames and dump them someplace boring. If you're not doing that, what's the point of going to the effort of hash spoofing? If you are doing that, then the police aren't finding your files anyway.

Here's an interesting question... is a hash value of a media file considered a "derivative work" under copyright law? ;p
10.28.2008 3:56am
Jay Ballou (mail):
You've ignored both the birthday paradox

The birthday "paradox" (it's not nearly a paradox, just a surprise to people with poor intuitions about probability) only applies to finding some pair in a DB that matches; it doesn't apply to matching a predetermined value. But even if the birthday problem did apply, the odds of an incorrect match would be astronomical ... less than the chance of an incorrect match due to a misread or other hardware malfunction.
10.28.2008 3:59am
Jay Ballou (mail):
Hell, encrypt the filenames and dump them someplace boring.



Better yet, use steganography.
10.28.2008 4:03am
Jay Ballou (mail):
what stops the government from searching everyone's computer as soon as it is hooked up to the internet?


A properly configured firewall.

I think that is what makes hash values different from drug dogs for purposes of searches under the 4th Amendment. The likelihood that they could be used to search for legal content.

Um, dogs are quite capable of searching for legal content.
10.28.2008 4:21am
Malvolio:
But even if the birthday problem did apply, the odds of an incorrect match would be astronomical
They are considerably higher than astronomical.

If the police had only one file to check and only one sample of child-pornography, the odds of an innocent file matching with a 128-bit MD5 hash would be 340, 282, 366, 920, 938, 463, 463, 374, 607, 431, 768, 211, 456 to 1 against. Of course, they probably have thousands of each, but still, you'd have about the same chance of getting a single monkey to type out Hamlet.

Caballes seems exactly on point: no-one has a privacy interest that would be harmed by the police comparing the hashes of his files against a list consisting entirely of the hashes of child-pornography.

On the other hand, all that duplication and hardware copying rigmarole doesn't seem to change the Constitutional situation a bit.
10.28.2008 4:41am
devin chalmers (mail):
Bad, bad software engineer. You've ignored both the birthday paradox, and assumed that purely statistical information is a good metric of applied data.


I did note that it's extremely difficult to put even arguable numbers on the odds of hash collisions, and I confess that I didn't have any envelope backs handy for more detailed analysis. :) You're right that it depends greatly on the vagaries of the algorithm and the data itself.

Consideration of the birthday problem was implicit in any assumption of a probability for a collision in this case. The birthday paradox on its own gives a vanishingly small probability for this situation.

The number of matches in this case make it clear that the odds are stupidly prohibitive, but the court has to place precedent that would apply for even one match.


If the government has a million hashes, and the defendant has a million files, let's ballpark the birthday effect as the odds of at least two of the two million total hashes being equal. In birthday terms, we've got a calendar 10^38 days long and only 10^6 people at our party... I'm sure your own envelope can put an upper bound of zero to twenty places on the odds. It's so far off the magnitude scale from the actual birthday problem to make 50% or 5% or 0.0005% for a collision laughably, impossibly high.
10.28.2008 4:45am
devin chalmers (mail):
I should note that I don't think my above argument supports the notion that taking a hash of a file is not a search or that we should actually lock people up based only on MD5 hashes. gattsuru was absolutely right to note that there are a lot of assumptions lurking behind such simplistic mathematical analyses. (For a variety of reasons, in general, even demonstrating the actual presence of objectionable data on a computer is not good proof of intent to possess said samizdata.)
10.28.2008 4:58am
Jay Ballou (mail):
Caballes seems exactly on point: no-one has a privacy interest that would be harmed by the police comparing the hashes of his files against a list consisting entirely of the hashes of child-pornography.

Yeah, and spy programs that promise to only send anonymous info to beneficial vendors don't violate your privacy, so why make a fuss about them?

The fact is scanning an image of your disk is a search, even if the police promise to do in a way that will only get bad guys.
10.28.2008 5:00am
Avatar (mail):
Nobody's proposing that anyone be locked up because of a hash.

The real trick is that the police department in this case jumped the gun, didn't do their paperwork to get a warrant before investigating the contents of the drive, and quite possibly misrepresented the chain of custody in ways that would have invalidated the evidence therein anyway.

They tried to get around this by characterizing their electronic search, which compares hashes because that's more time-effective than having a detective search through tens or hundreds of thousands of files, as "not a search". Thus, the "probable cause" generated by the hash hits could justify a warrant-less manual search.

But like one of the other posters said, that's like sending your spy drone in, seeing something, and then entering on the evidence that the spy drone picked up. If you didn't have the necessary authorization to send that drone in there, then it can't itself generate the evidence necessary to justify its own search. Nor did the judge buy that argument.

It's also another case that demonstrates that the dumbest thing you can do is talk to an investigator. If this guy had the sangfroid to shut up and get a lawyer, the compromised evidence chain and lack of warrant for the search would mean he'd now be a free man. But because he made self-incriminating statements, he's still got to worry about beating a rap that the cops should have blown through bad procedure. So if the detectives come to ask you about something, don't talk to them!
10.28.2008 5:16am
whit:

no-one has a privacy interest that would be harmed by the police comparing the hashes of his files against a list consisting entirely of the hashes of child-pornography.


you have a right not to have your hard drive searched by govt. agents, whether or not that search causes "harm" to use your term.

it's a search, and it didn't (as far as i can see) meet any warrant exceptions. therefore, it required a warrant.

whether or not your privacy interest would be "harmed" is a given. if it was an unlawful search, that's the proof right there.
10.28.2008 5:37am
Jay Ballou (mail):
Two files could map into the same hash, so hashes are not like fingerprints unless you believe two people can have the same set of fingerprints. In other words, except for special cases, hash functions are not injective. As a practical matter it's unlikely any two given files would have the same hash, but it's not impossible.

You're applying a double standard. Certainly it's possible for two people to have the same set of fingerprints -- and the question isn't even that existential one, but rather whether the fingerprints of two different people can match according to some algorithm. These cases are different only in that there are so many more possible files than there are people. But not so when you restrict it to "files that are known or suspected to contain child pornography" -- then, the odds that the MD5 of some file on your disk matches one of the child porn MD5s is likely to be less than the odds that your fingerprints will be flagged as matching those of some criminal on file (assuming you're not one).
10.28.2008 5:45am
Jay Ballou (mail):
you have a right not to have your hard drive searched by govt. agents, whether or not that search causes "harm" to use your term.

Yup. Just as the police cannot send a search robot into your home that only flags you if it detects a crime, they cannot search your disk with a program that only flags you if it detects a crime. Kind of obvious.
10.28.2008 5:56am
Jay Ballou (mail):
no-one has a privacy interest that would be harmed by the police comparing the hashes of his files against a list consisting entirely of the hashes of child-pornography.

Do you think that the 5th amendment cannot be violated because innocent people can't incriminate themselves and guilty people have no privacy interests?
10.28.2008 6:11am
Shane (mail):
There's a lot of interesting discussion here. I think it's pretty obviously a search - imaging a hard drive is about as intrusive of a search as possible. I don't care if nobody is watching the video; I don't want someone else's surveillance camera in my house.

I wonder how many child-porn owners are going to black out a single random pixel in each of their child porn photos now. Or just apply some imperceptible filter to their videos.

And seriously - salting text files to create MD5 collisions? That seems like entirely too much work to avoid conviction. You're better off stealing pubic hairs from your gym's shower drains and going on a child rape spree.
10.28.2008 8:43am
David Schwartz (mail):
What a hash search allows you to do is to generate a short list of files which are probable matches. Just because a particular file matches does not mean that it is a specific file.
With a properly designed hash (such as SHA-512) this can be made as likely as the possibility that hyper-intelligent aliens froze time and placed the contraband on your hard drive after the police seized it. If courts have to take this possibility into account, then they would have to take any number of equally improbable things into account and could never get anything done. This is so far beyond reasonable doubt -- it's beyond unreasonable doubt. This is beyond any possible legal standard.
10.28.2008 9:47am
cboldt (mail):
-- If the government rummaging through a hard drive to look at the hash values of the files on it is not a search because the government is only looking for "contraband," then what stops the government from searching everyone's computer as soon as it is hooked up to the internet? --
.
Just the mechanics prevent that. OTOH, whatever information transits the 'net can be copied. See mass collection of international calls, which is not a "search" until accessed in a way that determines content. I don't know if the Jabara case is still good law in that regard.
.
As a thought exercise, the same would be true if the government could somehow possess all the data on all our hard drives. Until it's looked at, there is no search. Cataloging it via MD5 numbers can't be termed a search, I don't think, because MD5 results aren't human-decipherable (it reads like gibberish).
.
There is a genuine issue here about how easily the government can move from "possession" or "observation" to "search," and it doesn't have a simple answer. It used to be that crossing that barrier required something called "suspicion," but when "suspicion" can be bootstrapped from compelled turning over (seizure) of duplicable material (IOW, you still haven't lost anything), the public is more exposed to government intrusion.
10.28.2008 9:51am
billb:
cboldt: Wha?!?

You'd have no problem then with a blind cop picking the lock on your house and then taking pictures of all its contents (assuming he puts everything back where he found it)? You're saying that's not a search until a sighted cop looks through those pictures? WTF? That can't be right.
10.28.2008 10:20am
David Schwartz (mail):
The key in Caballes was that the search couldn't have possibly compromised any privacy interest. If the dogs could have been trained to detect something other than contraband that a person did have a reasonable privacy interest in, the result of the case would likely have been different. In this case, taking a hash of every file on your hard drive does compromise many legitimate privacy interests. It's akin to taking an x-ray of the entire trunk. (You may not be able to identify things you've never seen before from an x-ray either.)
10.28.2008 10:26am
SeaDrive:
The difference between looking at the hard disk, and looking at a copy of the hard disk is like the difference between looking at a photo and looking at a scan of a photo.

If you want to be really technical, you can't actually see the picture on the hard drive at all. You have to read the data and display it. The reading process involves the unnecessary step of copying it to some other physical medium, it just means you are doing it the hard way.
10.28.2008 10:28am
whit:

It's also another case that demonstrates that the dumbest thing you can do is talk to an investigator. If this guy had the sangfroid to shut up and get a lawyer, the compromised evidence chain and lack of warrant for the search would mean he'd now be a free man. But because he made self-incriminating statements, he's still got to worry about beating a rap that the cops should have blown through bad procedure. So if the detectives come to ask you about something, don't talk to them!


i've debunked this rubbish before, and interestingly you fall into the same justification (selection bias) for this erroneous device that I mention as the most frequent reason for this error. congrats.
10.28.2008 10:31am
Bill Sommerfeld (www):

Why does that matter for their purposes? They aren't using the hash as a means to protect the data, but simply using the fact that the hash of a particular file is unique and repeatable.


If a hash is used for which it is feasible to find collisions, then there are opportunities for mischief.

From a cryptographic protocol design standpoint: an adversary who chooses what you feed to the hash function + non-collision-resistant hash = game over for the crypto protocol.

it's generally not necessary to analyze further, but here are two hypotheticals:

- the purveyors of child porn could create pairs of innocent files and child porn with the same hash, causing false positives for searches and undermining confidence in the technique.

- an corrupt official could create such a pair, arrange for the innocent half to find its way to his target, then use the hash collision as a pretext for causing the target further trouble.
10.28.2008 10:47am
David Schwartz (mail):
- the purveyors of child porn could create pairs of innocent files and child porn with the same hash, causing false positives for searches and undermining confidence in the technique.
Except that the whole point of a cryptographically-secure hash algorithm is that it's not possible to do that.

- an corrupt official could create such a pair, arrange for the innocent half to find its way to his target, then use the hash collision as a pretext for causing the target further trouble.
That's just nonsensical. It's trillions of times easier to obtain someone's DNA and leave it at a crime scene.

I have to think that someone who would make these kinds of arguments just doesn't understand modern hash protocols (such as SHA-512), what they do, and how they work.

How many times easier would it be for the corrupt official to just put the contraband on your hard drive?
10.28.2008 11:46am
einhverfr (mail) (www):
On the question as to whether this was an illegal search, the question becomes one that relates to the specifics of the case. If indeed the laptop was reported stolen, then it seems unlikely that this would be reasonable.
10.28.2008 11:49am
D.A.:
Orin,
Caballes is distinguishable because in that case the incriminating evidence of illegality was freely available outside the container in which it was traveling, and the dog could sniff it without invading the person's "legitimate expectation of privacy." It was key that the dog only alerted on contraband because the dog thus "[did] not expose noncontraband items that otherwise would remain hidden from public view." Here, the police had to physically invade the person's computer and "expose noncontraband items."

Jacobsen is likewise distinguishable because the person in Jacobsen had no "legitimate expectation of privacy" in contraband (i.e. the cocaine that was destroyed during the test and revealed to be cocaine). Here, the owner has a legitimate privacy interest in every single file that isn't contraband and which was hashed along with the contraband.
10.28.2008 12:09pm
Elliot123 (mail):
"The Government argues that no search occurred in running the EnCase program because the agents 'didn't look at any files, they simply accessed the computer.'"

Few of us ever look at a file. We look at a representation of some aspect of a file, and that representation is produced by another program. That program accesses the computer.

If I'm debugging a program, I might look at the bits in a file through a file dump program. Someone else may use the same file to look at a representation of video pictures. Both of us are using the same file, and using different programs to generate different representations.

Likewise, someone else may use the same file with a hash program to look at a hash ID. So, three of us are all looking at a representation of a file, and each is using a different program to generate that representation. But, we're all looking at the generated representation we choose for out purpose.
10.28.2008 12:11pm
random_computer_geek (mail):
whit,

I think that Prof. James Duane and his police officer guest made a very effective case for never talking with any officer.

http://www.regent.edu/admin/media/schlaw/LawPreview/

rcgeek
10.28.2008 12:28pm
Adam J:
billb- you presume that the method the government collected the data was as invasive as a blind police officer walking around your house... I doubt that's accurate. My main concern with privacy is having police officers probing through our (figuratively speaking) underwear drawers to find evidence of a crime. Almost every search entails this to some degree, which is why we want police to get warrants &have probable cause.

However, when police are able to find evidence of a crime without such probing, why is this a bad thing? If the police can find evidence of a crime without non-criminal private items being looked through, I say go for it. Crist only has a legitimate privacy interest in the non-criminal files on his computers, not the ones that relate to child pornography- if police can locate child-porn files without perusing legimitate files then what's the problem? If they did it to my or your computer the police would be viewing exactly 0 files- no problem there. Frankly, I have no problem with privacy being violated when its strictly limited to illegal acts, even without a warrant. My problem is the computer appears to have been illegally physically seized- there was no consent by the owner, nor was there abandonment.
10.28.2008 12:31pm
Nick (www):
"It's possible that the argument wasn't raised because the agent made a hash of every file instead of running a search just for matches of known images."

You can't run a search just for matches of known images using a hash. That's the point. Hashes are completely one way. In other words, I can create a hash using a known image, but if I have a hash, I can't know what the image looked like which created it.

Therefore, if you have a listing of hashes for known pornographic images, the only way to compare that hash with the contents of a hard drive is to create a hash of every file, and see which hashes match. Operating systems do not create hashes of all the files on the computer during their normal operation. So the only way to generate one is on purpose after the fact, usually through the forensic examination process (though there are other reasons for generating hashes as well).
10.28.2008 12:52pm
cboldt (mail):
-- You'd have no problem then with a blind cop picking the lock on your house and then taking pictures of all its contents (assuming he puts everything back where he found it)? You're saying that's not a search until a sighted cop looks through those pictures? --
.
ROTFL. I'd have a huge problem with it. I have a huge problem with the rationale of the Jabara decision, but AFIK, it's "the law" of the Circuit.
.
As a thought exercise, the instant case indeed aims to draw a line called "search" that is analogous to the government having the pictures, but not be deemed to have searched until they look - and disregarding the circumstances of how the pictures were obtained. By disregarding, I don't mean that the circumstance of obtaining the pictures is irrelevant to the complete decision, it just isn't part of deciding when "search" occurs, given that the government has the images in its possession.
10.28.2008 1:05pm
Joseph N. Wilson (mail):
Comparing running a hash to a drug sniffing dog is not valid.
The drug sniffing dog identifies chemicals released into the atmosphere and thus, no longer in control or possession of the owner of the goods being sniffed. These chemicals are left in the environment even after the object that generated them is long gone. The disk, however, is a sealed object and it must
be directly manipulated (electronically controlled) in order to extract data from it so that the hash can be performed. No passive sniffing occurs in such a scenario.
10.28.2008 1:30pm
bob (www):
In the article cited on hashing, the author compares hashing to the use of infrared imaging, and points out that imaging was ruled illegal not because it looked inside the house, but because it could detect things other than illegal activity. By the same argument, hash-based inspections of any computer (whether in physical custody or not) would not be unreasonable searches. There is no reason to copy the data or sequester the computer, of course. The inspection could be done by a program installed on the computer itself. There might be arguments against installing such a program (it might be illegal for other reasons), but the legal theory that would allow infrared searches if only they were more selective would seem to allow such searches.

"Contraband," of course, is a rather broad term these days. If I have copyrighted material on my system without permission, I have contraband.
10.28.2008 1:48pm
Soronel Haetir (mail):
Actually, I would use the example of copyrighted material to illustrate how a hash search would actually fail any sort of drug dog test. Unlike the dog which has a reasonably high threshold for alerting on substances that are actually illicit, a hash provides no such information.

A hash simply indicates presence of a particular file (ignoring hash salting because this does not seem to be an area where it would actually be useful) but provides no information on whether that file is actually a violation.

Unlike the child porn issue, the exact same file could be legitimate for one person to possess and not for another.
10.28.2008 2:01pm
Wurzel:
Clearly the obvious searches are:

1) Performing a Virus Scan - that's a search by definition
2) Matching MD5 hashes against a database

Other potential searches:

1) Opening up the computer case
2) Generating the MD5 hashes
3) Duplicating the hard drive

This is all besides the other issue of ownership of the PC, and especially regarding the fact that other people had access to the PC and could have tampered with it. I don't know whether or not the police could get a warrant for searching the PC when the PC was stolen and the porn was reported by someone handling stolen goods. However even with a valid warrant the chain of custody would have been enough to cast doubt as long as the owner had a consistent story disavowing any knowledge (and the police errors in this area just make this entire case a total cock up).

The best we can hope for here is that the guy stops looking at child porn because of the entire situation (and counts his lucky stars), and that the police sort out their act in the future, and don't let the emotion of child porn override due process.
10.28.2008 2:17pm
A. Zarkov (mail):
I am not comfortable with making the mere possession of images on your computer a crime. In the 1380s the mere possession of a bible in the vernacular was enough to have you burned alive as a heretic. I understand the motivation behind making possession of certain pornographic images illegal. The legislature believes this will curtail demand and thus cut down on abusive acts against children. Except it doesn't seem to work. It's like laws against drug possession-- they don't seem to work very well either.

Don't be surprise if one day your computer gets scanned for illegal "hate speech."
10.28.2008 2:18pm
zippypinhead:
I'm coming late to this party (a bad side effect of sleeping at night and working in the morning I guess). Professor Kerr is generally correct that the District Court reached the right result in suppressing the computer evidence.

However, but for the (exceedingly unusual) filing of the police report on the computer, this case could have come out the other way: the court would have had to directly address the "abandoned property" question it skirted in footnote 8. Without the known police report, an objective officer could have concluded that Hipple lawfully possessed the computer and consented to the search. Game over at that point.

But that's not this case. As noted in footnote 2, the detective who refered the computer to the AG's office for forensic analysis knew it had been reported stolen. Thus, the police were on notice that the computer clearly had not been abandoned by its prior owner and "found" by Hipple. The stolen property report converts a decent argument for the consent exception to the warrant requrement into the category of police conduct one judge I know calls "No! Bad cop!" (using his best naughty doggie corrective voice). At the point the officer became aware the computer might be stolen, he should have realized there was a problem with Hipple's ability to consent. He should have immediately put the computer in the evidence vault for safekeeping and visited a magistrate to obtain a search warrant that permitted forensic examination of the possibly-stolen computer (using Hipple's detailed statement about possible contraband he saw on the machine as the basis for probable cause). But he didn't. And there can be no good faith exception available where the specific officer was on notice of potentially invalid consent prior to the warrantless search.

But I also think the court didn't get the "private search" analysis quite right. If we assume, as the court did, that the private search doctrine applies to Hipple's actions as trespasser to the contents of the computer, then there's an argument the police were entitled to hash, image, and recover AT LEAST the specific videos Hipple had viewed and deleted (and it's trivial to determine which ones they were, based on the deletion records created by the OS combined with Hipple's own statements). Depending on the contents of those videos, there might be enough evidence to sustain a conviction in this case.

I generally think the court's attempt to distinguish cases applying the private search doctrine to floppy disks and other removeable media from this searh of a HDD -- based on things like the number of physical platters in the box -- just doesn't cut it. That part of the memorandum opinion reminded me of some discussions I had in the past with magistrates who insisted on analogizing computers to file cabinets, and couldn't understand why the agents' warrant application sought permission to take the whole "file cabinet" away rather than just look through it on-site to find selected "papers" relevant to the investigation...
10.28.2008 3:07pm
billb:
Adam J: Wha?!? (bold and italics to empahsize that this is a bigger reaction than I had to cboldt!)

So you'd have no problem with a government-mandated program running in the background on your computers which periodically sends the hash of every file therein to the police to be compared against their DB of known childporn hashes? I bet you're just itching to sign up!

And, btw, I think my analogy is pretty dead on. Beef it up by saying that the blind cop is also deaf and mute and has no sense of touch except for on his camera shutter release finger. Or, hell, say that it's a robot cop. If it only comes in when I'm not there, and I can't tell that it's been there, I'm probably not harmed in the usual sense of the word. But we still feel that our privacy has been invaded by the blind, senseless, deaf mute robot cop with the magic camera. Backing down a bit, we'd be similarly upset at said cop going through our home filing cabinets photocopying everything. How is my computer any different?

The thing I'd like explained to me by a lawyer in the know (hint. hint. Orin.) is given that we pretty much all seem to agree here that a warrant would have been issued on Hipple's testimony for a fully-authorized search of the computer which would have turned up undoubtedly admissible evidence, how does waiving the facts under a judge's nose make the search reasonable? Is this where our distrust of the system comes into play?
10.28.2008 3:22pm
Flyon (mail):
I think we're splitting hairs over whether a generating hash constitutes a search, or whether the search comes with the matching against known hashes. They are all part of the same process - as has been pointed out earlier, you glean no direct information from hashes, so there is no conceivable reason to generate hashes except to speed and simplify the search (match); or maybe store them for future such tests (searches).

IANAL, but the first thing that struck me was - the chain of possession. Why, in a crime with such a severe possible sentence, would the court rely soley on a computer with a compromised chain of custody?

yeah, likely the preponderance of evidence - dates of files, browser log, etc - is too complex to be faked convincingly (unless the entire chain were copied