District Court Holds that Running Hash Values on Computer Is A Search:
The case is United States v. Crist, 2008 WL 4682806 (M.D.Pa. October 22 2008) (Kane, C.J.). It's a child pornography case involving a warrantless search that raises a very interesting and important question of first impression: Is running a hash a Fourth Amendment search? (For background on what a "hash" is and why it matters, see here).
First, the facts. Crist is behind on his rent payments, and his landlord starts to evict him by hiring Sell to remove Crist's belongings and throw them away. Sell comes a cross Crist's computer, and he hands over the computer to his friend Hipple who he knows is looking for a computer. Hipple starts to look through the files, and he comes across child pornography: Hipple freaks out and calls the police. The police then conduct a warrantless forensic examination of the computer:
Also, it seems that the Government failed to make the strongest argument that running the hash isn't a search: If the hash is for a known image of child pornography, then running a hash is a direct analog to a drug-sniffing dog in Illinois v. Caballes, 543 U.S. 405 (2005). Although Caballes is cited in the opinion for other reasons, it seems that the government didn't make the Caballes argument.
It's possible that the argument wasn't raised because the agent made a hash of every file instead of running a search just for matches of known images. But I'm not sure that really makes a difference, and whether it does hinges on some interesting questions. Is the creation of the hash a search? Or is running a query that matches the hashes to known hashes and produces a positive hit a search? It might also break down based on how much the government saw of the machine while the hashes were being made: Perhaps the search occurred when the file structure was revealed to the officers (if it was in fact revealed). But if so, I'm not sure that the images themselves should be suppressed as compared to evidence more directly related to the revealing of the file structure.
Either way, this is a fascinating computer crime law issue that gets debated from time to time without any case law; I believe this is the first case on the topic. Ah, more grist for the mill of the forthcoming second edition of my computer crime casebook. Thanks to FourthAmendment.com for the mention of the opinion, and Matt Caplan for the .pdf.
First, the facts. Crist is behind on his rent payments, and his landlord starts to evict him by hiring Sell to remove Crist's belongings and throw them away. Sell comes a cross Crist's computer, and he hands over the computer to his friend Hipple who he knows is looking for a computer. Hipple starts to look through the files, and he comes across child pornography: Hipple freaks out and calls the police. The police then conduct a warrantless forensic examination of the computer:
In the forensic examination, Agent Buckwash used the following procedure. First, Agent Buckwash created an “MD5 hash value” of Crist's hard drive. An MD5 hash value is a unique alphanumeric representation of the data, a sort of “fingerprint” or “digital DNA.” When creating the hash value, Agent Buckwash used a “software write protect” in order to ensure that “nothing can be written to that hard drive.” Supp. Tr. 88. Next, he ran a virus scan, during which he identified three relatively innocuous viruses. After that, he created an “image,” or exact copy, of all the data on Crist's hard drive.One of the interesting questions here is whether the search that resulted was within the scope of Hipple's private search; different courts have approached this question differently. But for now the most interesting question is whether running the hash was a Fourth Amendment search. The Court concluded that it was, and that the evidence of child pornography discovered had to be suppressed:
Agent Buckwash then opened up the image (not the actual hard drive) in a software program called EnCase, which is the principal tool in the analysis. He explained that EnCase does not access the hard drive in the traditional manner, i.e., through the computer's operating system. Rather, EnCase “reads the hard drive itself.” Supp. Tr. 102. In other words, it reads every file-bit by bit, cluster by cluster-and creates a index of the files contained on the hard drive. EnCase can, therefore, bypass user-defined passwords, “break[ ] down complex file structures for examination,” and recover “deleted” files as long as those files have not been written over. Supp. Tr. 102-03.
Once in EnCase, Agent Buckwash ran a “hash value and signature analysis on all of the files on the hard drive.” Supp. Tr. 89. In doing so, he was able to “fingerprint” each file in the computer. Once he generated hash values of the files, he compared those hash values to the hash values of files that are known or suspected to contain child pornography.Agent Buckwash discovered five videos containing known child pornography. Attachment 5. He discovered 171 videos containing suspected child pornography.
The Government argues that no search occurred in running the EnCase program because the agents “didn't look at any files, they simply accessed the computer.” 2d Supp. Tr. 16. The Court rejects this view and finds that the “running of hash values” is a search protected by the Fourth Amendment.I think this is generally a correct result: See my article Searches and Seizures in a Digital World, 119 Harv. L. Rev. 531 (2005), for the details. Still, given the lack of analysis here it's somewhat hard to know what to make of the decision. Which stage was the search — the creating the duplicate? The running of the hash? It's not really clear. I don't think it matters very much to this case, because the agent who got the positive hit on the hashes didn't then get a warrant. Instead, he immediately switched over to the EnCase "gallery view" function to see the images, which seems to be to be undoudtedly a search. Still, it's a really interesting question.
Computers are composed of many compartments, among them a “hard drive,” which in turn is composed of many “platters,” or disks. To derive the hash values of Crist's computer, the Government physically removed the hard drive from the computer, created a duplicate image of the hard drive without physically invading it, and applied the EnCase program to each compartment, disk, file, folder, and bit.2d Supp. Tr. 18-19. By subjecting the entire computer to a hash value analysis-every file, internet history, picture, and “buddy list” became available for Government review. Such examination constitutes a search.
Also, it seems that the Government failed to make the strongest argument that running the hash isn't a search: If the hash is for a known image of child pornography, then running a hash is a direct analog to a drug-sniffing dog in Illinois v. Caballes, 543 U.S. 405 (2005). Although Caballes is cited in the opinion for other reasons, it seems that the government didn't make the Caballes argument.
It's possible that the argument wasn't raised because the agent made a hash of every file instead of running a search just for matches of known images. But I'm not sure that really makes a difference, and whether it does hinges on some interesting questions. Is the creation of the hash a search? Or is running a query that matches the hashes to known hashes and produces a positive hit a search? It might also break down based on how much the government saw of the machine while the hashes were being made: Perhaps the search occurred when the file structure was revealed to the officers (if it was in fact revealed). But if so, I'm not sure that the images themselves should be suppressed as compared to evidence more directly related to the revealing of the file structure.
Either way, this is a fascinating computer crime law issue that gets debated from time to time without any case law; I believe this is the first case on the topic. Ah, more grist for the mill of the forthcoming second edition of my computer crime casebook. Thanks to FourthAmendment.com for the mention of the opinion, and Matt Caplan for the .pdf.
Related Posts (on one page):
- A verb usage you don't see every day:
- District Court Holds that Running Hash Values on Computer Is A Search:
In the most basic analysis, you're physically reading the bits of information which are stored in the physical properties of particular molecules of the machine. It would seem obvious to me that reading that data, in any form or manner, is a "search" of the data contained on the disk.
I have interpreted your comment as a motion for a more detailed post, and I have granted your motion.
This argument strains at the mechanics how computers work. Both to duplicate the files and generate the hash, the agent's computer, under his direction, accessed every byte very exactly.
Second a hash used this way is an acceleration of comparing files for exact match that allows the government to avoid archiving porn directly. With high probability (given a sufficiently limited reference pool) this is akin to directly comparing the files. I fail to see how this can be distinguished from looking at the files.
Third, the hash is a representation of what the file contains. Just as displaying the image is a representation of what the file contains. The hash happens to be a lossy representation. Say there is an audio recording on the computer. The government compresses (or encodes) the recording to a low-quality mp3 format, then listens. Is this is a search? Yes? Okay now turn down the quality. Repeat. Repeat.
Fourth, many internet search engines work in an identical way. They reduce your keywords and the page text to a set of hashes that they compare against. This is what allows google, for instance, to match your search terms to similar but not identical words.
There is no way to compute this without reading the data. In theory, you could read only every other byte, but would that make a legal difference?
As an analogy to the physical world, imagine a very sensitive chemical sniffer that detects a THC "signature". Since it only detects a signature, it is possible that some non-marijuana plants also trigger the sniffer (with a very small probability). Now, if I walk around your house, with a blindfold on, applying this sniffer to various plants, does that count as a search?
First, anyone still using MD5 should have their head examined -- it's been pretty thoroughly broken. Use SHA256, please!
Note that OS and application software may compute hashes of files or parts of files as part of its normal operation.
I'd argue that the "search" occurred not when the hash values were computed but rather when a set of hashes of files on the computer was intersected with a set of hashes of known child porn.
I assume that there are cases that law enforcement can sieze a sealed container to protect it from being modified or destroyed before they can get a search warrant.
By analogy, computing a set of hashes at the time the digital evidence was seized would be a really good idea as it would increase confidence that the evidence had not been modified in the course of a later search.
I don't understand the chain of custody, however. Since I am not a lawyer, it is not surprising.
Isn't this analogous to a third party providing evidence? The landlord gave the computer away to another who gave the hard drive to the police. There certainly was no seizure. Isn't the report of illegal material by the third party sufficient cause for a search?
Why does that matter for their purposes? They aren't using the hash as a means to protect the data, but simply using the fact that the hash of a particular file is unique and repeatable.
No! Cryptographic hash functions like MD5 produce entirely different output for inputs that differs in even a single bit. AFAIK, Google does not use hashes for searching text.
But can anyone come up with an illustrative analogy to hard-copy data searches? If I have a pile of magazines in the desk of my home office, and someone (Does what?) and this leads a police officer to find child porn, is it a search?
Is there anything comparable? Or is this Amend IV new ground?
This was an impossible concept not too long ago. In the future perhaps star trek like transporter technology will be available and by duplicating your home on a holodeck they could search the virtual home, find what they were after, then confirm the existence of the real item of interest in your real home with the holo equivalent of the hash value.
If it looks like a search and sounds like a search then it is probably a search.
This was an impossible concept not too long ago. In the future perhaps star trek like transporter technology will be available and by duplicating your home on a holodeck they could search the virtual home, find what they were after, then confirm the existence of the real item of interest in your real home with the holo equivalent of the hash value.
If it looks like a search and sounds like a search then it is probably a search.
If only someone had written an article on these interesting questions!
Note that using my lay judgement, I called it a search, but if we are to use the standard above, then I think it is not a search.
I think it's inapplicable because the officer needed to actually peek inside the disk. In the Caballes case, the dog was able to sniff the marijuana from outside the trunk:
"Dog sniffs marijuana outside closed trunk" is analogous to "Officer detects child porn on disk by sensor readings of the electromagnetic field outside the computer"
"Officer detects child porn on disk by reading data from disk" is analogous to "Dog sniffs marijuana in trunk after officer opens trunk"
Perhaps, but then in United States v. Jacobsen, actually taking the substance and destroying it to test it for drugs was held not a search under the Caballes rationale.
With respect to hash files, the government can identify any file on someone's computer that it has already indexed, whether it be illegal child porn or a communist manifesto.
I suppose the government could argue that conducting an MD5 hash analysis of a hard drive is synonymous with scent detecting dogs because the dogs can be trained to alert to any scent, not just the scent emanating from illegal drugs. However, I don't think the government will be making that argument any time soon .
I actually wouldn't particularlly distinguish Caballes from this case unless there is something I am missing about the chain of custody. SCOTUS did not rule that the use of a drug sniffing dog isn't a search, they ruled that it is not an unreasonable/illegal search.
An analogy to the infrared search used as an example of an illegal search would be a police created virus that performed the same hash function done here and then transmits that data to HQ. While not a search in and of itself I would say that operation would perform a data seizure.
As was described above, a MD5 is like a dog bark that signals either contraband, or stuff that looks like contraband on a bit level. This can include, well, anything. With access to an 'incriminating' hash and a hash collision program (google it), you could attach the incriminating hash to any number of files you would have a legitimate interest in keeping private.
This has been true since at least 2005
Everyone else: I have posted a copy of the opinion, via rader Matt Caplan.
Also, the clear message here is to make batch edits to your contraband files.
I did not say that google used MD5 in searches. I said they used a hash as part of the process. Many hashes exist. What does google do exactly? I don't know. But in the meantime you should look into other famous hashes: soundex Metaphone
In that hypothetical case, I can see why the question would be whether the subsequent warrantless search by police exceeded the scope of the third-party discovery.
But, under the actual facts of United States v. Crist, as you've presented them, I'm having a hard time understanding the fourth amendment question.
Please tell us in more detail how we got here.
They had a third-party who ahd looked at the computer, saw child porn, and called the cops. Why does this not qualify as "probable cause"? Why didn't they just get a search warrant?
With a good hash function, the probability of a collision is vanishingly small. You can simply treat the hash value as a random number. So, if you have a given 128-bit hash value, the probability that another given picture matches a specific hash is 1/2^128. (Note that this is not the same probability as that of a collision, but both are very small.)
There's actually two questions in Caballes facts: did the police need a warrant to allow the dog to walk around the outside of the automobile, and once the dog alerted to drugs, did they then lawfully search &seize the auto?
Isn't there an exception when considering motor vehicles which are not (yet) subject to seizure, and thus may be driven away? The citation escapes me, but it was an old bootlegging case, as I recall. So, if officers have PC to believe an automobile, which can leave, contains contraband/evidence/fruits of the crime, they can search the auto without a warrant.
If you mean whether the dog walking around the car constitutes a search, I would say that if the dog can lawfully be where he was and still detect contraband, then a warrant is unnecessary for the dog's presence, just as if an officer, while lawfully present in a location, can then see or sense contraband on or within private property, that would constitute PC for a warrant (unless one of the other exceptions applied).
In the cited case, it seems to me that the police, when shown evidence of child porno on the computer, still needed to obtain a warrant to search the hard drive. Given the circumstances, establishing the probable cause for a warrant would have been simple and straightforward. Why not be safe and get the stupid warrant. it's not brain surgery...
I'd like to add one thing off-post: when constructing a search warrant and supporting affidavit, generally one wants the items to be searched and the items to be searched for to be as broad as possible while still maintaining good-faith compliance with the warrant requirements. In the present case, I would not want to use something like hash values to narrow the files which are examined. In this case, the investigators would then review only the photo files which somehow met the hash value of known pornography (if I understand this process). If the hash values were not considered, then the investigators could reasonably look not only at every single image on the drive, but also check email, IM's, and even word processing files, looking for embedded or attached images. I would want to do this because, say, the suspect may have home-made photos, which don't match a hash value of known porno, but which may contain images which either constitute porno themselves, or even worse, might show something like child abuse. The same for files which are documents: one might find credit card charges for the porn images, or reference to the porn or other crimes in text. There's a line between complying with specificity requirements &fishing expeditions, but intelligent wording of a warrant might allow more of a search, which is usually, from law enforcement's perspective, a good thing.
Second, the agent's friend, yet another step removed from the former owner, find unlawful material and calls the cops. That is indeed probable cause and should have immunized the policy agency that did the analysis. That the discovery of unlawful material was made by a third party with no prompting from the police (i.e., the third party was not an agent of the state), there should have been further immunity.
Bad decision, IMO. I think it's going to be overturned on appeal.
Wouldn't the testimony of the third party qualify as a source of probable cause independent of the actual search?
So is this an indirect consequence of walking around with a police dog? Or is it a deliberate action?
Drugs emit volatile compounds which float through the air, thru fabric, etc. These emissions become public. Dogs smell them and react. A computer hard drive emits nothing coherent beyond a few hair-thicknesses. Access requires proactive measures.
The difference between smelling drugs and reading computer disks is the difference between letters in your home and a conversation overheard.
Well, sure, if you're using the phrase "hash function" in the wider sense, then you're certainly correct. But the type of hash functions being discussed here, namely crypto hash functions such as MD5, don't map similar inputs to similar outputs.
I couldn't disagree more.
Except that they read it in such a way that they could not inadvertently stumble on things that were not child porn (assuming that the police acted in good faith and followed procedure, etc.). How much of a legal difference does that make? I don't know.
I would also say that this operation is not particularly dissimilar from having an officer examine every file on the drive. There is roomusing this method to both pick up non-contraband (items that match hashes but are nonetheless legal, with a good hash this is extremely unlikely under these circumstances and doesn't even matter that much really) and miss actual contraband (contraband items that aren't already known, or contraband items that have been modified so that they have a new hash value).
I see this operation mostly as a quick filtering, a way to save having to do a bunch of tedious work that would be needed if each file needed to be examined one by one.
Would you count that eyeball examination of each file a search?
The crux of the case is that the court didn't accept a lame "it's not a search, it's a hashing" argument when the main argument of the prosecution (that the computer had been "abandoned" and that the search was thus okay) was shot down. Had the circumstances displayed an intent to abandon, this issue would not have arisen.
There's also chain of custody worries - the computer was outside the defendant's control and thus the presence of child pornography on the computer cannot be presumed to be possessed by the defendant. It's equivalent of stealing someone's luggage and reporting to the police that you found a brick of marijuana packed inside - the police have no way of determining if that brick was in there when the luggage was stolen. Even if it was wrapped in one of the owner's t-shirts, that's still not evidence that the owner placed said brick inside the luggage. (At this point you'd want to go on the attack, pointing out that the person who discovered the child porn had obtained the computer in, ahem, adversarial fashion...)
I'm not sure why a search warrant was not obtained, however. The entire point of retrieving an image of the hard drive is to allow data operations on the image without harming the original (or vice versa). Once that image had been taken, the police were free to wait for the necessary paperwork to clear - that image wasn't going to deteriorate or leave police possession. So long as they don't go snooping through it before the warrant comes in, they haven't performed a search.
Here is the sentence from Caballs that I use to justify my earlier statement. Perhaps it does not fairly reflect 4th amendment law.
Official conduct that does not “compromise any legitimate interest in privacy” is not a search subject to the
Fourth Amendment.
I read this to mean exactly what it says, in that the use of the drug sniffing dog is not a search subject to the 4th amendment, not that it is not in fact a search.
I think the letter vs. overheard conversation is actually a pretty good comparison here.
I would think that making the initial hash value (am I using the term correctly?) merely to establish and ensure the unchanged nature of the original data would not, in and of itself, be a search, as long as the hash values were not then cataloged and compared to known values of contraband. The search was actually the second step (or third), that is, comparing the mirror image of the original drive with known porn, with the intent to establish criminal possession. As long as the data contained within the hard drive was never indexed or looked at, then creating the hash value merely to verify integrity of the data does not seem like a search to me. Doing anything else with the resulting hash values, however, does seem like a search to me.
It looks like someone did in fact write such an article. And he even has your name. I'm really surprised that you didn't know this.
Thanks for the link: I(A) is just what I was looking for.
More to the Caballes question, creating a hash value (as Paul Allen pointed out) requires that a software agent of the law enforcement officer read every single byte of data from the hard drive. This seems like a clear search to me -- it is not detecting odorous emanations from a container, but investigating and operating on every part of the contents. In this way hash computations are analogous to X-ray or thermal images rather than a sniffer, and I am not surprised that the government passed on making the argument.
In the instant case, the police took perfect copies of the defendant's files, handed them to an oracle, and asked it "Are any of these files contraband?" The oracle answered in the affirmative. The oracle being in this case a computer with a list of hash values paired with a list of known file porn images.
The police could not have shown the computer to the oracle and said "is there contraband in there?" They must disassemble the computer, remove the drive, surreptitiously (in the sense of bypassing the existing OS) copy every bit of data on the drive, and then ask the oracle for a sophisticated examination of that copy in order to determine if there's a problem with said data. When I turn off my computer I expect that no one will take it apart both physically and digitally in order to determine if it contains contraband. Fido didn't take the car in Caballes apart. It simply looked (with its nose) at things already on the outside.
Now, one might argue that since the cops don't learn anything from the hash values themselves, then their oracle only tells them when bad things are present. Thus we're nearly back at Caballes. But, since the cops did have to disassemble the computer and bypass the usual roadblocks to getting at Crist's data, they clearly searched it. They searched it by taking it apart, removing the drive, copying the data, and analyzing it. All of these things required bypassing things put in place to keep that data away from prying eyes: the case, the drive itself, the OS, etc., and so they all constitute elements of a search. Even with a witness, to my knowledge, the police cannot disassemble (without a proper warrant) my car (not at or within 100 miles of the border) to pass every piece of it past a dog's nose, bolt-by-bolt, to see if any of it is contraband. Neither can they do so to my computer. If their oracle can determine the existence of child porn by looking at the outside of my computer, more power to them.
Now, as a layman, here's the thing I don't understand: The police have the computer. They have or could have gotten an affidavit from Hipple stating that he saw what he believed to be contraband on the computer. I don't see how they could not have passed this information in front of a judge to get a warrant. Everything they need appears to be there. The machine is secure, and there's no danger of Crist deleting the files or destroying the machine. Why not dot all the Is and cross all the Ts?
Clearly, we owe Orin Kerr a beer.
In an old skool conventional search you go thumbnail to thumbnail looking at images. this is much more invasive of privacy because the agent will see the contents of every file.
the hash search means he will only look at the contents of the file if and when he gets a match to a known illegal file.
but I cannot see how anybody... even the govt lol... could argue it's not a search.
and i cannot understand why a warrant wasn't applied for. this isn't a "street" thing. the frigging thing is sitting on the agent's desk.
Various people are correct to point out that two different files can have identical hashes. The math makes this necessary; files can be arbitrary large and pictures are often several kilobytes, but the hash is usually only a few bytes and there's no point is using a hash as large as the files you're interested in. Fewer bytes means fewer possible values.
So your grocery list may have the same hash as an illegal picture, just because there aren't enough possible hashes for every file to get a unique one. BUT you can reduce the risk as much as you care to by increasing the size of the hash. A 32-bit hash is on the small side, but can give a less than 1-in-4 billion chance of accidentally mis-matching two files, depending on the algorithm used. I think that's better than fingerprints.
My math assumes random file contents, which is wrong, but good hashes have a way of acting randomish with non-random input, so the basic point stands anyway.
(IANAL — but I am a software engineer)
Two files could map into the same hash, so hashes are not like fingerprints unless you believe two people can have the same set of fingerprints. In other words, except for special cases, hash functions are not injective. As a practical matter it's unlikely any two given files would have the same hash, but it's not impossible. You could have a legal to possess file with the same hash as one that's illegal, but the chances are small.
On the other side, I'm pretty sure that Hipple's statements would be covered by the silver platter doctrine, and the search of the computer's files would be inevitable given that evidence.
It would certainly be possible, however, to create a hash that only looked at every other byte of a file. If we assume that a typical child porn image is 100K, that's still more than enough samples to produce a very low probability of a false hit.
I agree the hash is definitely a search. but it is less invasive. note: i am not saying this does not mean he shouldn't have gotten a warrant. it's still a search. get a warrant.
it is clearly less invasive because the agent isn't actually looking at what's IN files, certainly not in their screen rendered glory unless and until a hash match is found.
iow, assume you have 10,000 image/video files on your computer.
which is more invasive to your privacy
1) agent looks at each file (either looking at the jpg, etc. or the video file (mpg) etc. to determine if it's contraband
OR
2) agent applies a formula (which is what a hash is) to each byte in your files and then only if a match to a known illegal file is made does he actually VIEW the contents of the file in their rendered glory.
see the difference?
they are both searches. one is clearly more invasive.
For example, if all drug dogs had the ability to somehow convey to the police the exact contents in an automobile trunk, both illicit and legal items, by sniffing the outside of the drunk, the Court would almost certainly hold that the sniff constituted a search. Of course, this is assuming that they public at large does not commonly use such dogs.
This case is different from Caballes in a number of ways. First, the government can determine what files it program alerts to on the fly. Second, once the government creates its index of MD5 sums from the suspects hard drive, it will probably retain this index indefinitely. With drug-dogs, the government has a limited amount of time to do what it wants to do.
True, but the probability of an unintentional collision can be made as small as desired. Note that DNA evidence is admissible in court, and that also has some chance of false positives. What is the legal standard that must be applied for searches?
True, but it cannot alert on all files that contain the word "bomb", for example. On the other hand, it could alert on any well-known files. For example, if a well-known PDF of the a leaked document was circulating, it could alert on that.
I wasn't discussing the difference between visually checking each file, and visually checking only those files that software matched to known illegal files. I was discussing the difference between matching with a hash matching without a hash.
Personally, I think the key is on pp.3-4.
The court finds, in the main body of the opinion:
(Emphasis added.)
But in footnote 2, the court elucidates:
(Emphasis added again.)
Putting those two highlighted facts together leads to an unsavory conclusion. Detective Cotton knew that the computer had been reported stolen, but nevertheless informed the AG's office that this was a search with consent of the owner. The clear inference is that Detective Cotton was less than honest.
When you add that little detail to the account, I agree that the evidence should be suppressed. There's a big chain of custody problem.
Read my article I link to: It answers your question. As for whether conduct amounts to a search in a sense not recognized by the Fourth Amendment, I don't really have any interest in that.
I think we should take the Court's statement that drug dogs are sui generis at face value and let Caballes sit out there as an outlier of 4th Amendment jurisprudence because drug dogs are so unique. I'm still pretty sure that the police can not train a dog to smell what I have written on a notebook in my trunk, and I'm also sure that if police wanted to use hash values to search for legal content they could. I think that is what makes hash values different from drug dogs for purposes of searches under the 4th Amendment. The likelihood that they could be used to search for legal content.
I have to admit, I very much enjoy reading judicial opinions that turn on intricacies of computer engineering. As a software person, it's a little like watching a nature documentary about oneself. Reading about familiar subjects spoken of in such an unfamiliar (almost, though I hesitate to use the word here, childlike) way is fascinating. It's admirable how often they seem to get it right, as in this instance.
A few side notes:
- The odds of hash collisions (though they can be architected in the case of MD5) are so astronomical that getting even one hit would seem to be pretty bulletproof probable cause. With 176 matches, they might as well lock you up without even bothering to check the original files.
Like this: if the odds of a collision were 50%, orders and orders of magnitude higher than is reasonable (though it's hard to put an exact number on it), then 1/2^176 gives approx. a 1 in 10^53 chance for all 176 to be false positives. Even if Agent Buckwash ran a check on a suspected pedophile's computer once every nanosecond, we wouldn't expect a false positive of that magnitude for 10^26 years.
To put it clumsily, given the current age of the universe to work in (approx. 10^26 nanoseconds), Agent Buckwash wouldn't encounter a single such fluke occurrence unless he could do a whole universe age's worth of one-pedophile-per-nanosecond checks, every single nanosecond. (The odds are roughly the same as quantum fluctuations spontaneously teleporting you, bodily and unharmed, to the surface of Mars.*) Even with a 90% chance of collision, the odds of winning the lottery are significantly better than a false positive. The search required to generate the hashes is the problem, not hash collisions.
* God, I wish there was a link for this. Trust me.
- The best real-world analogy for what the agent did in this instance would be something like the police sending a hyperintelligent robot into your house to rummage through your things, comparing what it sees to suspicious materials and reporting back. If that's not different from a drug sniffing dog circling a car I'll eat my hat.
If the use of technology or not in the course of an invasion of privacy is the criterion for a search, RoboCop never needs a warrant. (Actually, that sounds about right, come to remember.) See also looking through walls for drugs.
Well, I'm not a lawyer, but I was under the understanding that forcing someone to provide DNA evidence requires a warrant. The issue is not whether this could be good proof, the issue is whether it could be invasive on matters that are not illegal.
Bad, bad software engineer. You've ignored both the birthday paradox, and assumed that purely statistical information is a good metric of applied data.
The number of matches in this case make it clear that the odds are stupidly prohibitive, but the court has to place precedent that would apply for even one match.
I would argue that even your exposure standard would actually call this action a search. The files may not be rendered as images, instead they are exposed and rendered as hashes. Perhaps not very titilating, but still exposed.
Actually it is repeatable, but not unique. All hashing functions will lack the uniqueness criteria you suggest.
What a hash search allows you to do is to generate a short list of files which are probable matches. Just because a particular file matches does not mean that it is a specific file.
Furthermore, there *is* a legitimate argument against using MD5 for this sort of activity. The issue is that one can essentially append data onto arbitrary files to create hashes of the values one wants. So if I had incriminating evidence on my computer, I could also create thousands of files with the same MD5 hash value and thus require a file-by-file check. "This, your honor, is a text file with the same md5 hash as a child pornography movie" isn't going to get very far in court.
Third, the hash is a representation of what the file contains.
Wrong, unless you consider 0 to be a representation of half of all possible files and 1 to be a representation of the other half of all possible files.
Actually it is repeatable, but not unique. All hashing functions will lack the uniqueness criteria you suggest.
It's functionally unique; unless someone has intentionally cooked it, the odds of an incorrect match is much smaller than 1/number of atoms in the universe.
Furthermore, there *is* a legitimate argument against using MD5 for this sort of activity. The issue is that one can essentially append data onto arbitrary files to create hashes of the values one wants.
Um, at best you could create files on your disk that aren't pornography but match files that are -- why would anyone want to do that? You couldn't even frame anyone that way, since conviction would require examination of the actual files.
And even if you did something like that, it's just about the same thing as dumping a few images of child porn into a big folder of ordinary porn; a determined and methodical search would turn up the incriminating material. In fact, it'd be rather more obvious that you were up to something screwy.
But in practice, it's not worth the effort. If you're really worried about your files being accessed by the cops, encrypt them. Hell, encrypt the filenames and dump them someplace boring. If you're not doing that, what's the point of going to the effort of hash spoofing? If you are doing that, then the police aren't finding your files anyway.
Here's an interesting question... is a hash value of a media file considered a "derivative work" under copyright law? ;p
The birthday "paradox" (it's not nearly a paradox, just a surprise to people with poor intuitions about probability) only applies to finding some pair in a DB that matches; it doesn't apply to matching a predetermined value. But even if the birthday problem did apply, the odds of an incorrect match would be astronomical ... less than the chance of an incorrect match due to a misread or other hardware malfunction.
Better yet, use steganography.
A properly configured firewall.
I think that is what makes hash values different from drug dogs for purposes of searches under the 4th Amendment. The likelihood that they could be used to search for legal content.
Um, dogs are quite capable of searching for legal content.
If the police had only one file to check and only one sample of child-pornography, the odds of an innocent file matching with a 128-bit MD5 hash would be 340, 282, 366, 920, 938, 463, 463, 374, 607, 431, 768, 211, 456 to 1 against. Of course, they probably have thousands of each, but still, you'd have about the same chance of getting a single monkey to type out Hamlet.
Caballes seems exactly on point: no-one has a privacy interest that would be harmed by the police comparing the hashes of his files against a list consisting entirely of the hashes of child-pornography.
On the other hand, all that duplication and hardware copying rigmarole doesn't seem to change the Constitutional situation a bit.
I did note that it's extremely difficult to put even arguable numbers on the odds of hash collisions, and I confess that I didn't have any envelope backs handy for more detailed analysis. :) You're right that it depends greatly on the vagaries of the algorithm and the data itself.
Consideration of the birthday problem was implicit in any assumption of a probability for a collision in this case. The birthday paradox on its own gives a vanishingly small probability for this situation.
If the government has a million hashes, and the defendant has a million files, let's ballpark the birthday effect as the odds of at least two of the two million total hashes being equal. In birthday terms, we've got a calendar 10^38 days long and only 10^6 people at our party... I'm sure your own envelope can put an upper bound of zero to twenty places on the odds. It's so far off the magnitude scale from the actual birthday problem to make 50% or 5% or 0.0005% for a collision laughably, impossibly high.
Yeah, and spy programs that promise to only send anonymous info to beneficial vendors don't violate your privacy, so why make a fuss about them?
The fact is scanning an image of your disk is a search, even if the police promise to do in a way that will only get bad guys.
The real trick is that the police department in this case jumped the gun, didn't do their paperwork to get a warrant before investigating the contents of the drive, and quite possibly misrepresented the chain of custody in ways that would have invalidated the evidence therein anyway.
They tried to get around this by characterizing their electronic search, which compares hashes because that's more time-effective than having a detective search through tens or hundreds of thousands of files, as "not a search". Thus, the "probable cause" generated by the hash hits could justify a warrant-less manual search.
But like one of the other posters said, that's like sending your spy drone in, seeing something, and then entering on the evidence that the spy drone picked up. If you didn't have the necessary authorization to send that drone in there, then it can't itself generate the evidence necessary to justify its own search. Nor did the judge buy that argument.
It's also another case that demonstrates that the dumbest thing you can do is talk to an investigator. If this guy had the sangfroid to shut up and get a lawyer, the compromised evidence chain and lack of warrant for the search would mean he'd now be a free man. But because he made self-incriminating statements, he's still got to worry about beating a rap that the cops should have blown through bad procedure. So if the detectives come to ask you about something, don't talk to them!
you have a right not to have your hard drive searched by govt. agents, whether or not that search causes "harm" to use your term.
it's a search, and it didn't (as far as i can see) meet any warrant exceptions. therefore, it required a warrant.
whether or not your privacy interest would be "harmed" is a given. if it was an unlawful search, that's the proof right there.
You're applying a double standard. Certainly it's possible for two people to have the same set of fingerprints -- and the question isn't even that existential one, but rather whether the fingerprints of two different people can match according to some algorithm. These cases are different only in that there are so many more possible files than there are people. But not so when you restrict it to "files that are known or suspected to contain child pornography" -- then, the odds that the MD5 of some file on your disk matches one of the child porn MD5s is likely to be less than the odds that your fingerprints will be flagged as matching those of some criminal on file (assuming you're not one).
Yup. Just as the police cannot send a search robot into your home that only flags you if it detects a crime, they cannot search your disk with a program that only flags you if it detects a crime. Kind of obvious.
Do you think that the 5th amendment cannot be violated because innocent people can't incriminate themselves and guilty people have no privacy interests?
I wonder how many child-porn owners are going to black out a single random pixel in each of their child porn photos now. Or just apply some imperceptible filter to their videos.
And seriously - salting text files to create MD5 collisions? That seems like entirely too much work to avoid conviction. You're better off stealing pubic hairs from your gym's shower drains and going on a child rape spree.
.
Just the mechanics prevent that. OTOH, whatever information transits the 'net can be copied. See mass collection of international calls, which is not a "search" until accessed in a way that determines content. I don't know if the Jabara case is still good law in that regard.
.
As a thought exercise, the same would be true if the government could somehow possess all the data on all our hard drives. Until it's looked at, there is no search. Cataloging it via MD5 numbers can't be termed a search, I don't think, because MD5 results aren't human-decipherable (it reads like gibberish).
.
There is a genuine issue here about how easily the government can move from "possession" or "observation" to "search," and it doesn't have a simple answer. It used to be that crossing that barrier required something called "suspicion," but when "suspicion" can be bootstrapped from compelled turning over (seizure) of duplicable material (IOW, you still haven't lost anything), the public is more exposed to government intrusion.
You'd have no problem then with a blind cop picking the lock on your house and then taking pictures of all its contents (assuming he puts everything back where he found it)? You're saying that's not a search until a sighted cop looks through those pictures? WTF? That can't be right.
If you want to be really technical, you can't actually see the picture on the hard drive at all. You have to read the data and display it. The reading process involves the unnecessary step of copying it to some other physical medium, it just means you are doing it the hard way.
i've debunked this rubbish before, and interestingly you fall into the same justification (selection bias) for this erroneous device that I mention as the most frequent reason for this error. congrats.
If a hash is used for which it is feasible to find collisions, then there are opportunities for mischief.
From a cryptographic protocol design standpoint: an adversary who chooses what you feed to the hash function + non-collision-resistant hash = game over for the crypto protocol.
it's generally not necessary to analyze further, but here are two hypotheticals:
- the purveyors of child porn could create pairs of innocent files and child porn with the same hash, causing false positives for searches and undermining confidence in the technique.
- an corrupt official could create such a pair, arrange for the innocent half to find its way to his target, then use the hash collision as a pretext for causing the target further trouble.
That's just nonsensical. It's trillions of times easier to obtain someone's DNA and leave it at a crime scene.
I have to think that someone who would make these kinds of arguments just doesn't understand modern hash protocols (such as SHA-512), what they do, and how they work.
How many times easier would it be for the corrupt official to just put the contraband on your hard drive?
Caballes is distinguishable because in that case the incriminating evidence of illegality was freely available outside the container in which it was traveling, and the dog could sniff it without invading the person's "legitimate expectation of privacy." It was key that the dog only alerted on contraband because the dog thus "[did] not expose noncontraband items that otherwise would remain hidden from public view." Here, the police had to physically invade the person's computer and "expose noncontraband items."
Jacobsen is likewise distinguishable because the person in Jacobsen had no "legitimate expectation of privacy" in contraband (i.e. the cocaine that was destroyed during the test and revealed to be cocaine). Here, the owner has a legitimate privacy interest in every single file that isn't contraband and which was hashed along with the contraband.
Few of us ever look at a file. We look at a representation of some aspect of a file, and that representation is produced by another program. That program accesses the computer.
If I'm debugging a program, I might look at the bits in a file through a file dump program. Someone else may use the same file to look at a representation of video pictures. Both of us are using the same file, and using different programs to generate different representations.
Likewise, someone else may use the same file with a hash program to look at a hash ID. So, three of us are all looking at a representation of a file, and each is using a different program to generate that representation. But, we're all looking at the generated representation we choose for out purpose.
I think that Prof. James Duane and his police officer guest made a very effective case for never talking with any officer.
http://www.regent.edu/admin/media/schlaw/LawPreview/
rcgeek
However, when police are able to find evidence of a crime without such probing, why is this a bad thing? If the police can find evidence of a crime without non-criminal private items being looked through, I say go for it. Crist only has a legitimate privacy interest in the non-criminal files on his computers, not the ones that relate to child pornography- if police can locate child-porn files without perusing legimitate files then what's the problem? If they did it to my or your computer the police would be viewing exactly 0 files- no problem there. Frankly, I have no problem with privacy being violated when its strictly limited to illegal acts, even without a warrant. My problem is the computer appears to have been illegally physically seized- there was no consent by the owner, nor was there abandonment.
You can't run a search just for matches of known images using a hash. That's the point. Hashes are completely one way. In other words, I can create a hash using a known image, but if I have a hash, I can't know what the image looked like which created it.
Therefore, if you have a listing of hashes for known pornographic images, the only way to compare that hash with the contents of a hard drive is to create a hash of every file, and see which hashes match. Operating systems do not create hashes of all the files on the computer during their normal operation. So the only way to generate one is on purpose after the fact, usually through the forensic examination process (though there are other reasons for generating hashes as well).
.
ROTFL. I'd have a huge problem with it. I have a huge problem with the rationale of the Jabara decision, but AFIK, it's "the law" of the Circuit.
.
As a thought exercise, the instant case indeed aims to draw a line called "search" that is analogous to the government having the pictures, but not be deemed to have searched until they look - and disregarding the circumstances of how the pictures were obtained. By disregarding, I don't mean that the circumstance of obtaining the pictures is irrelevant to the complete decision, it just isn't part of deciding when "search" occurs, given that the government has the images in its possession.
The drug sniffing dog identifies chemicals released into the atmosphere and thus, no longer in control or possession of the owner of the goods being sniffed. These chemicals are left in the environment even after the object that generated them is long gone. The disk, however, is a sealed object and it must
be directly manipulated (electronically controlled) in order to extract data from it so that the hash can be performed. No passive sniffing occurs in such a scenario.
"Contraband," of course, is a rather broad term these days. If I have copyrighted material on my system without permission, I have contraband.
A hash simply indicates presence of a particular file (ignoring hash salting because this does not seem to be an area where it would actually be useful) but provides no information on whether that file is actually a violation.
Unlike the child porn issue, the exact same file could be legitimate for one person to possess and not for another.
1) Performing a Virus Scan - that's a search by definition
2) Matching MD5 hashes against a database
Other potential searches:
1) Opening up the computer case
2) Generating the MD5 hashes
3) Duplicating the hard drive
This is all besides the other issue of ownership of the PC, and especially regarding the fact that other people had access to the PC and could have tampered with it. I don't know whether or not the police could get a warrant for searching the PC when the PC was stolen and the porn was reported by someone handling stolen goods. However even with a valid warrant the chain of custody would have been enough to cast doubt as long as the owner had a consistent story disavowing any knowledge (and the police errors in this area just make this entire case a total cock up).
The best we can hope for here is that the guy stops looking at child porn because of the entire situation (and counts his lucky stars), and that the police sort out their act in the future, and don't let the emotion of child porn override due process.
Don't be surprise if one day your computer gets scanned for illegal "hate speech."
However, but for the (exceedingly unusual) filing of the police report on the computer, this case could have come out the other way: the court would have had to directly address the "abandoned property" question it skirted in footnote 8. Without the known police report, an objective officer could have concluded that Hipple lawfully possessed the computer and consented to the search. Game over at that point.
But that's not this case. As noted in footnote 2, the detective who refered the computer to the AG's office for forensic analysis knew it had been reported stolen. Thus, the police were on notice that the computer clearly had not been abandoned by its prior owner and "found" by Hipple. The stolen property report converts a decent argument for the consent exception to the warrant requrement into the category of police conduct one judge I know calls "No! Bad cop!" (using his best naughty doggie corrective voice). At the point the officer became aware the computer might be stolen, he should have realized there was a problem with Hipple's ability to consent. He should have immediately put the computer in the evidence vault for safekeeping and visited a magistrate to obtain a search warrant that permitted forensic examination of the possibly-stolen computer (using Hipple's detailed statement about possible contraband he saw on the machine as the basis for probable cause). But he didn't. And there can be no good faith exception available where the specific officer was on notice of potentially invalid consent prior to the warrantless search.
But I also think the court didn't get the "private search" analysis quite right. If we assume, as the court did, that the private search doctrine applies to Hipple's actions as trespasser to the contents of the computer, then there's an argument the police were entitled to hash, image, and recover AT LEAST the specific videos Hipple had viewed and deleted (and it's trivial to determine which ones they were, based on the deletion records created by the OS combined with Hipple's own statements). Depending on the contents of those videos, there might be enough evidence to sustain a conviction in this case.
I generally think the court's attempt to distinguish cases applying the private search doctrine to floppy disks and other removeable media from this searh of a HDD -- based on things like the number of physical platters in the box -- just doesn't cut it. That part of the memorandum opinion reminded me of some discussions I had in the past with magistrates who insisted on analogizing computers to file cabinets, and couldn't understand why the agents' warrant application sought permission to take the whole "file cabinet" away rather than just look through it on-site to find selected "papers" relevant to the investigation...
So you'd have no problem with a government-mandated program running in the background on your computers which periodically sends the hash of every file therein to the police to be compared against their DB of known childporn hashes? I bet you're just itching to sign up!
And, btw, I think my analogy is pretty dead on. Beef it up by saying that the blind cop is also deaf and mute and has no sense of touch except for on his camera shutter release finger. Or, hell, say that it's a robot cop. If it only comes in when I'm not there, and I can't tell that it's been there, I'm probably not harmed in the usual sense of the word. But we still feel that our privacy has been invaded by the blind, senseless, deaf mute robot cop with the magic camera. Backing down a bit, we'd be similarly upset at said cop going through our home filing cabinets photocopying everything. How is my computer any different?
The thing I'd like explained to me by a lawyer in the know (hint. hint. Orin.) is given that we pretty much all seem to agree here that a warrant would have been issued on Hipple's testimony for a fully-authorized search of the computer which would have turned up undoubtedly admissible evidence, how does waiving the facts under a judge's nose make the search reasonable? Is this where our distrust of the system comes into play?
IANAL, but the first thing that struck me was - the chain of possession. Why, in a crime with such a severe possible sentence, would the court rely soley on a computer with a compromised chain of custody?
yeah, likely the preponderance of evidence - dates of files, browser log, etc - is too complex to be faked convincingly (unless the entire chain were copied directly from another perp's computer) but when the consequences of conviction are a life sentence of at least harrassment and humiliation, at worst torture and death from a prison system of "justice" that is a laughed-at perversion of the constitution, I thnk I'd want an air-tight case.
Secondly, the "don't talk to police". In the vindictive, publicity-over-justice attitude that prosecutors seem to have taken in the last decade or two, anyone who "should be guilty" and manages to avoid having enough evidence against them - Bill Clinton, Martha Stewart, Skipper(?)Whasisname, heck even Whittaker Chambers - is instead prosecuted for lying to investigators. I recall reading one case where a man was charged with obstruction of justice for lying to his own lawyers. The prosecution's argument went like this - "you told them you didn't do X, knowing they would tell us, so therefore you were lying to us indirectly and so obstructing justice".
Obstruction used to be a charge against accessories aiding a perp, not a blanket lever to make the main perp plead out. It simply adds to the reasons to "never talk to the police", to plead the fifth because anything you misspeak may be twisted into an obstruction charge if the prosecutor is so inclined.
If the only files a cop is able to see is child porn, then privacy is only being violated is ones privacy to illegal conduct. We don't protect privacy so that it can be abused &to aid in the commission of illegal acts, we protect privacy for the legitimate things we'd like to keep private- it's an unfortunate consequence that this can assist illegal behavior. This is of course reaching hypothetical grounds, I doubt the hash comparison has zero error rating where cops will never accidently view legal files as well as child porn files, and that it can be done in a completely unintrusive manner (a blind, dumb, deaf officer would be a little intrusive).
-----
Completely incorrect. I can make any file's hash match their file's hash if they are using md5. That's the whole point in not using MD5 anymore. You don't think that the people profiting from child pornography aren't smart enough to package viruses of matching file hashes in viruses to "pollute" the internet? Just check out eastern europe and the number of botnets syndicates have there.
Secondly, having a list of hashes does leak information (even if they are using a faulty hash). It's one way virus scanners can quickly compare with known bad viruses. The second you calculate hashes of files is the second you violate privacy (and the 4th amendment).
Really? I consider my laptop to be the single most sensitive item I own, with regards to my privacy. It has my medical records, personal photos containing legal but embarrassing subject matter, documents that show my religious and political convictions, and my banking statements.
I'm confident I've taken the steps to prevent network intrusions to my computer just like I'm confident I've taken steps to prevent intrusion into my home. More, actually. My home just has a locking front door and glass windows. My laptop has firewalls, encryption on sensitive files, etc.
As for if hashing is a search, I'd think (as an EnCase user) that it is, when hashing on the file level. One can also hash the clone or image of the drive as a whole, to help determine that the contents are unaltered since the image was taken. This is more akin to the evidence tape example. I'd think that form of hashing would only be a search if the preceding necessary steps (imaging the computer and/or removing the hard disk drive) constituted a search.
I too would agree that the correct decision (in regards to the hashing) was made - but I'm not a lawyer. It does also seem to me that the other factors (such as if the computer is "trash", 3rd party access) are more relevant to this case.
My question is - what if I toss my computer in the trash, and the government retrieves it from the dumpster and does a hash search. Would this still require a warrant (IANAL)
I'm asking because the details of the case seem to suggest that the computer was "discarded" by the owner.
I would say that it's more like the computer was stolen from the owner. (And the owner did report it stolen to the police.)
If you voluntarily discard your computer or a hard disk, then I believe (but IANAL) that the gov't can search it w/o a warrant. And as a practical matter, it would easy for anybody to go dumpster diving and read your private information off the disk. So always erase (i.e., overwrite with zeros or random bits) your data before discarding it! The best way to do this is by a sector-by-sector wipe of your disk. DBAN and CMRR Secure Erase can do this for you.
From the court's opinion, on p.9:
All the familiarity I have with Pennsylvania residential landlord-tenant law is that which I picked up just a few minuates ago. Notwithstanding that, it appears undisputed that that there was no written notice, no court filing, no hearing, no judgment for possession, no order for possession, and no constable.
I believe Professor Kerr mischaracterized this state of affairs when he wrote, “Crist's landlord starts to evict him by hiring Sell to remove Crist's belongings and throw them away.” That does appear to me to be a permissable eviction procedure under Pennsylvania law.
Instead, this looks to me like burglary.
On your general question about the Fourth Amendment status of stuff you throw away: The law is well-settled that you lose any expectation of privacy in anything you abandon to the trashman. Anybody -- ranging from garbage men, to nosy neighbors, to dumpster-diving homeless people, to the annoying local wildlife who rip apart garbage bags looking for food and cause all your non-food trash to go blowing down the street -- can look through your trash once it's out for pick-up.
In fact, a standard covert investigatory tactic used by law enforcement is to institute a "trash cover" that basically involves coming by the subject's house and taking his garbage away after it's been put out by the curb. Sometimes the agents are driving a garbage truck and dress the part; other times they just swing by very early in the morning in their government Crown Victoria and load/stink up the trunk while the neighborhood is still asleep. Either method is Constitutional. This is an especially favorite trick in white collar or fraud investigations right after the agents have served a document subpoena on a subject are are curious to see what the subpoena recipient might decide to get rid of on the next trash pickup day...
Moral of the story: Before putting out the trash, shred your sensitive papers. Wipe your hard drive or other electronic media. And don't put any other contraband out front in your garbage can, expecting it to just forever disappear into some big, anonymous landfill in the sky.
If you think I'm incorrect, the MD5 checksum of a 19,638 file I randomly selected is ab0a6520870fa261f678b1678f0c0a5a. Produce a file with the same checksum. You cannot do it.
Huh? What portion of section II.B in the opinion, on pp.8-10, is specifically declining to address the abandoment theory?
The court very specifically addressed the government's argument “that Crist retained no reasonable expectation of privacy in his computer because he abandoned it.” And, simply, “The Court disagrees.”
Not trying to be rude, but perhaps you had a brain-fart.
two points
1) the 4th amendment limits govt. actors, not "neighbors, homeless people" etc. Generally speaking, the 4th amendment is not an issue when a private actor invades (or doesn't invade) privacy. In these cases (private actors), it's not a "search" as referenced by the 4th. It may result in criminal or civil liability for the private actor, but it's still not a 4th amendment issue. While a private actor may commit a crime (or civilly actionable ) while invading your privacy, generally speaking, whatever they find is admissable against you criminally, unlike a 4th amendment violation.
2) your analysis of the principle regarding garbage is correct under the 4th. as an aside, note that in some states (WA, for example), under independent grounds readings of the state const. state (county, city, etc.) actors still cannot search trash on the curb. in my state, we have to wait UNTIL the garbage man dumps it into the garbage truck, THEN the person has no expectation of privacy. but that's an aside. it's not a 4th amendment thang, it's a state const. thang.
Second, I think it is a question of what is a search exactly. When CSI collects stuff, is that a search? Or does it only become a search when they get back to the lab and examine the stuff? I say taking the copy of the drive was the search, and computing the hash was just a convenient way to plow though a huge amount of information. But if CSI is only searching when they examine the stuff, not when they collect it, the hash was the search.
As for the dog. As far as I am concerned it is certainly a search, but it is a search of the air around a car. I don't see how anyone can claim a right to the air.
Finally, I think it is pernicious to speak of a difference in privacy rights depending on criminality. I don't mean to say evidence of criminal acts should be private, but that one cannot know ahead of time what will turn up in a search.
Adam J. says he wouldn't mind a search if none but illegal images got through to the police, but what is legal depends to at least some degree on the prosecutor who reviews the evidence. I believe I follow the law, but I don't want to discuss the matter in detail in front of a judge.
As you can clearly see, I am certainly not a lawyer.
compare to:
EFF brief opposing FISA Amendments Act motion to dismiss
Soronel Haetir:
Says who?
I have personal experience with a "drug dog" apparently (imperceptively) alerting to "contraband" on my person - while in our nation's Constitution-unencumbered airport customs zone - which was absolutely a false positive, unless some chemical fragrance from another wafted on to me in the process of traveling, thereby removing my purported Constitutional expectation of privacy against invasive search, thanks to an unimpeachable, "trained" Golden Retriever.
Imagine if you will, the police being able to send an "agent" into your home to examine your belongings. The agent "reports" to the police what he/she/it finds, but the police themselves have not "seen" anything inside your home.
Is that a "search"? OF COURSE IT IS.
Conceptually, there is no difference at all from doing MD5 hashes on your individual files. That produces a signature that is extremely likely (though not guaranteed) to be unique, thus identifying the "property" on that hard drive. Does it matter that the police have not actually "seen" the files? No. They have thoroughly "examined" them using other means.
If you do not find that persuasive, then think about this: What they are doing with the MD5 hashes is "pattern matching". What is vision and recognition of objects but "pattern matching"? The fact that they used a computer to match the pattern is, again, conceptually no different from using their own eyes to do the same. The specific mechanism is different, but the result is exactly the same. I don't think the Constitution was very much concerned with what specific mechanism is used.
I understand the need to dumb things down, but there needs to be a honest assessment before doing so. It is true that water and air are both elements and swimming and flying are both motions through there respective elements, but no sane person would therefore conclude that flying and swimming are the same activity. Abstractions can be taken so far they no longer correspond to the factual uniqueness of the situations. Caballes has nothing at all do do with computer searches, period.
I am not a lawyer, so I feel I should first provide my definition of search and seizure: Fundamental to the act of searching is the interpretation of what is seen, and the subsequent comparison of the interpretation to 'things of interest'. Compare this to seizure, which is the simple collection and securing of potential evidence without further inspection.
So treating the blogpost as a brief, my expert opinion would be along the following lines[1].
The automated inspection of the harddrive required to create the duplicate is surely incidental as there is no interpretative act. This would be analogous to the incidental copying of a web-proxy in the transmission of a copyrighted work. The creation of the duplicate should be seen as a seizure, but not as a search.
The action that is interpretive, whether it be of data or (as in this case) metadata either contained or implicit in the harddrive constitutes the search. In the case of running the hash, we are discussing the derivation of implicit metadata, which in itself is not interpretive and therefore should not be considered a search but rather securing the data prior to performing a search.
The act of comparing the results of the hash with the list of known hashes is an interpretive act on implicit metadata extracted from the harddrive. So it is here that the search occurred.[2]
So given it is a search, it would remain to determine if the search is protected by the 4th amendment.
I read in your post a reference to Illinois v. Caballes, 543 U.S. 405 (2005); so I followed the link and read the Stevens opinion. According to the opinion:
Official conduct that does not “compromise any legitimate interest in
privacy” is not a search subject to the Fourth Amendment. Jacobsen,
466 U.S., at 123. We have held that any interest in possessing
contraband cannot be deemed “legitimate,” and thus, governmental
conduct that only reveals the possession of contraband “compromises no
legitimate privacy interest.” Ibid.
Not being a lawyer I cannot say if there are other standards that should be applied in this case, however as an expert I can attest to the fact that a search that consisted only of hash comparisons meets the requirement that it be "conduct that only reveals the possession of contraband".[3]
On the other hand, given that the police had already been provided with the physical harddrive by the legitimate owner, along with testimony that it contained child pornography. The burden of obtaining a warrant prior to further examination of the harddrive cannot be considered onerous, and the risk of harm to the investigation from the delay minimal. If I was a US citizen I would be concerned to hear of the cavalier approach to the 4th amendment evidenced by this case.
[1] Be aware that in Australia, an expert witness' sole duty is to the court, not to the prosecutor/defendant, and is required to swear that they have not withheld relevant opinion that might be damaging to the side that retained them.
[2] I also note that the virus scan, being interpretive, would also qualify as a search using this standard.
[3] Note that unless viruses are considered contraband, the virus scan is going to fail this standard and would have to be defended some other way.
Your conclusion that the police took possession of the drive with the consent of the "legitimate owner" does not square with the facts of this case.
However, the fourth amendment protects people, not property qua property. So the question of the legitimate ownership of the harddrive should have been somewhat beside the point here. It isn't though, thanks to Detective Cotton.
Glittering generalities, like Jacobsen's generality about expectation of privacy in contraband, must be read carefully within the context of the fourth amendment. There are two good generalities that have been repeated time and time again, in opinion after opinion: First, warrantless searches are presumptively unreasonable unless they fall within one of the narrow and well-defined classes of exceptions to the warrrant requirement. Second, fourth amendment inquiries are especially fact-intensive.
Crist had a reasonable expectation of privacy in the movable goods and chattels kept within his home. He had a reasonable expectation of privacy in the electronic analogue of "papers" stored on the computer hardware kept within his home.
Society, and the law, respects that expection of privacy. However, Crist's landlord did not. Crist's landlord might have had a remedy in Pennsylvania's legal process for eviction. Crist's landlord chose instead to hire someone to unlawfully enter the residence for the express purpose of carrying Crist's movable goods and chattels across the threshold. Carrying goods and chattels across the threshold is generally considered an exertion of dominion over those goods or chattels. Further, during the course of the job, one of persons involved made an unlawful agreement —subsequently executed— for someone else to take possession of one of the items: Crist's computer and the papers stored in it.
Crist reported the crime to police.
At this point there are all sorts of events which might have ensued. Most of those hypothetical situations would not have created any serious fourth amendment questions. After all, neither Crist's landlord, his hirelings, nor the person in receipt of the stolen computer were state actors.
But as it turned out: enter Detective Cotton.
With actual knowledge of facts surrounding title to the computer, Detective Cotton chose to ignore and omit material facts in his application to the AG's Office for a forensic examination. Detective Cotton concealed Hipple's receipt of stolen property in order to claim a benefit: the consent to the warrantless search.
That was unreasonable.
The third party opens the outer one and sees child pornography. He calls the police. The police come and take the Russian dolls and put them in a special machine that takes them *all* apart, looking for child pornography on each.
It finds some, I get charged.
Clearly a 4th Amendment violation.
What spooks me about this case is the portion of the holding, which I'd rather see in dicta:
This court had clearly taken the time to educate itself on the technical side - perhaps why it mentioned the writeblocker was "software", yet another little slap at the quality of the forensics? - but this kind of distinction is not necessary and could easily lead to needlessly tortuous jurisprudence.
festivus
At that point, you've got probable cause, the evidence isn't going anywhere, so why not just get the warrant? Any warrantless search is going to be held up to immense scrutiny, and rightly so; so if you've got probable cause (they did), and they've got the time (they did), then get the stinking warrant!
First, I do agree that at least some of Hipple's acts may be attributed at law to Sell, and vice versa, based on their agreement to perform those acts. Nonetheless, let's do try to keep the actors and their actual actions straight.
From p.3 of the opinion:
(Cites to the record omitted.)
Because Hipple reported that he deleted all the allegedly contraband files that he found, the police had no reason to believe that they could replicate Hipple's search.
Thus, this set of facts doesn't permit any search under the Jacobsen exception to the warrant requirement.
I do agree, though, that the police should have applied for a warrant at this point. Given all that the police knew then, the determination of whether or not probable cause existed was reserved for a neutral and detached magistrate.
One can only speculate why Detective Cotton chose an unreasonable course of action. The record provided yields few clues into Detective Cotton's underlying motivation.
Perhaps Detective Cotton thought that submitting false facts in an application to the AG's office would have reduced consequences compared to submitting false facts in a sworn affidavit attached to an application for a warrant. But that speculation leaves muddy Detective Cotton's motivation for submitting false facts to anyone at all.
Detective Cotton obtained the forensic examination by the AG's office by pretending that “the computer was ‘seized pursuant to consent from its owner’”. He got something out of the falsified application. But it seems rather likely that he could have obtained the same forensic examination by swearing to the true facts as he knew them, and applying for a warrant.
Why did Detective Cotton do what he did? What was his reason?
Admittedly I've been reading Orin's article in the Harvard Review and haven't gotten to the actual case yet, so I missed the part about the files being deleted..
Here's a question for you though: If Hipple tells the police that he deleted the files (presumably by moving them to the Recycle Bin and then emptying it), would it not still be within the scope of the original search for the investigators to recover the deleted file? The file is still exactly the same after being deleted, just lacking a complete entry in the FAT (depending on what else he may have done on the computer after deleting the files)
I don't know of anything in the record that indicates whether the disk was formatted VFAT or NTFS (or perhaps some other filesystem, although that's unlikely.) But in general, I think a search for data contained in the free space on a disk amounts to a search of the disk. From a technical pov, it's hard to see that as not exceeding the scope of the original third-party search.
Beyond that, I think the Jacobsen exception should be looked at from less technical and more user-centric viewpoint. If Hipple could tell the police what steps he took to make the discovery, and if the police could follow roughly those same steps to make the same discovery, then it's within scope. Otherwise, not.
IPOF, it's pretty clear on the facts that the good faith exception shouldn't apply here.
It's one a' them tricky law thingies... probably one of the reasons they pay public defenders and assistant D.A.'s the big bucks...
OK: You generally think this court reached the right result in suppressing this evidence (presumably because a search occurred when the hash comparison software reported a result to the officer), but then you reference Caballes and ask whether it applies? Are you really asking this question or do you already known an answer and are wondering if anyone else is going to figure it out?
Simply put, I don't like Caballes. I think it leads to a dark future. Stevens wrote: “Critical to that [Kyllo] decision was the fact that the device was capable of detecting lawful activity."
So, based on your analysis that the search only occurs when it provides a result to a human, and the holding in Caballes that any result provided to a human that could only pertain to unlawful activity cannot be called a search, it seems this is not a search (for 4th amendment purposes).
Edge cases about hash collisions would likely be dismissed as easily as edge cases about the police misinterpreting the dog or the dog barking at the wrong time.
So, the bottom line becomes, any technology that we can develop to collect information about crimes is A-OK so long as it never provides any information to a human being unless an actual crime has been committed.
Let your fantasies about orwellian high-tech distopias fly! Hash checks of internet communications at ISP's? Check! Compulsary installation of face recognition cameras in all private buildings? No problem!
Artificial intelligences that read email correspondence or analyze search engine queries for patterns indicating criminal behavior? Well, they would have to be highly accurate, which is a bit far fetched by present technological standards, but if they were, then that might be alright as well...
Eventually in the distant future, you reach a point that has been mentioned by previous posters, where you've replaced your human police officers with robots... These robots are artificially intelligent and never report the results of their investigations to humans unless a crime has been committed.
Under this analysis I cannot see how the Constitution would prohibit these robots from doing all of the tyrannical things that the 4th amendment was intended to prevent the police from doing, and I don't see how this state of affairs would be materially different from not having any 4th amendment at all.
Therefore, if the 4th amendment is to have any meaning at all, there must be some reason that this kind of automated search is not reasonable.
Scalia offered the following in reference to Caballes: "This is not a new technology. This is a dog." I find that explanation extremely unsatisfying.
I also happen to be a computer expert with 27 years of experience. The mere reading of the drive itself requires interpretation from one form to another.
Likewise, photography is interpretive. It takes it's input in the form of light from a particular perspective passing it through filters and optics to render yet another form of "copy".
I don't think any lawyer would find it permissible for the police to rummage through an persons entire house taking photographs of everything without actually looking at the pictures. I can certainly think of ways of accomplishing this without a human having a look around. The police agents could, for instance, enter the house with blindfolds on and feel around using only the camera to record the information obtained.
One might then object that the police could "feel" the objects. Well likewise the police were able to feel the hard drive. The police might for instance search for child porn in someones premises by feeling around for magazines, photographs, books, etc. and copying them using the camera.
The very act of copying requires a search. The police must send commands to the read head of the disk to seek to particular locations on the disk to find the magnetic stripes. The read head must interpret that data in order to find the beginning of the track. So on and so forth.
Copying is not some magical act. It occurs via a means and that means must always be interpretive. If it weren't then it couldn't make an accurate copy. Even the interpretation "1 to 1" is a kind of interpretation.
Thanks for the explanation.
I concur. It is not possible to determine the contents of a hard drive via entirely passive means with the current technology. In order to find out what is on a hard drive, it is necessary to access the data inside the drive. That is a search for 'data on the drive', and it occurred when the technician made the image of the hard drive. That this can be done without physically opening the casing of the drive and touching the platters is not relevant, hard drives are specifically designed to be used without being physically opened.
With that said, the Judge may have made an error regarding the technology, specifically-
By subjecting the entire computer to a hash value analysis-every file, internet history, picture, and “buddy list” became available for Government review.
I think this is only true if it is possible to reverse the hash to reconstruct the original data, which is not a given. There is a significant difference between having a 'fingerprint' derived from a file, and having the original file.
I think it's important to keep in mind that the fourth amendment isn't supposed to protect guilty people from being found out but it isn't just supposed to protect innocent people from the inconvenience of a search or seizure. It's also a substantive limitation to the government's ability to enforce bad laws.
And the test is objective reasonableness. One can at least argue that walking a drug dog around a car one has already stopped is objectively reasonable. It is much harder to make a similar argument about what happened in this case.
In this case, the police seized the drive and read every byte off of it. That they then threw away some of that information doesn't change anything. Every analysis process (including human vision) throws away a large quantity of the information gathered. What matters is what we could focus on, now what we do focus on.
Again, rational people would object to a camera in their bedroom that records everything that happens 24x7, even if the police promise to only review the tapes if there's a crime committed and it's needed for evidence. We would object because this makes everything that happens in our bedroom subject to police review, despite their promise not to do so and claim that they have not and will not do so.
What matters is that they could.
The NSA made this same argument in the interception cases. They argued that there was only a search when they alerted and the things they didn't alert on were not searched. If you accept this, then the NSA has the right to intercept every communication and it's still reasonable.
- The suspect was not given notice by the landlord, his possessions were removed illegally, the computer was reported stolen, therefore there was no basis for the argument that this was abandoned property. Hipple had no right to give consent, and the officer could reasonably be expected to have known this.
- Hashes of known contraband files were run against contents the computer. The intent was clearly to identify the presence of contraband material on the computer. Whether there are technical issues that affect the overall effectiveness of this approach (using MD5 hashes) or not, the intent was clearly to see if there were contraband files on the drive.
(Sidenote: while I am firmly in the "MD5 collisions are a non-issue for this application" party, a very effective and increasingly common practice is to run BOTH a SHA-1 and MD5 has to eliminate known files, find matches, and to validate that an image is both a true copy of the original and the original was not altered. Nobody has made a case that any two logical objects, even with padding, can produce both the same MD5 and SHA-1 hash. But as I said that's really neither here nor there).
One might walk run a drug sniffing dog around the perimeter of a car or house, and argue they were't actually searching the car or the house. To physically open the door to a home or a car and allow the dog to enter the inside (without a warrant or consent)is an entirely different matter.
The examiner did not run a file-sniffing dog around the perimeter of the computer. The examiner looked at the contents of the media specifically to determine whether or not contraband files were present on that media.
The investigators in this case had a solid basis, based on the information provided by Hipple, to have gotten a warrant. They chose not to do it, when there was no rational reason NOT to get a warrant.
(Please note that any opinions expressed are my own, and not those of any organization I belong to or am associated with)
Very, very true, as EFF has made clear in its recent briefs, where EFF vividly exposed the secretly-redefined meaning of words the administration has used to parse their public statements on the warrantless spying, and as evidenced by the NSA's infatuation with a "reasonableness" premise for the computerized, yet-to-be-read-by-a-human acquisition of our private communications.
Which is why, among other reasons, the NSA, in particular, must love the potential contained in this phrase about the pending case Pearson v. Callahan:
From an article about the case by Scott Street of Akin Gump posted at Scotusblog.com:
The Fourth Amendment Question in Pearson v. Callahan