pageok
pageok
pageok

The theory strikes me as quite similar to that explained by Mark Seecof in his comments.

There's also some more material pointed to on SearchEngineWatch.

JLR (mail) (www):
Let me repost my comment made in Orin Kerr's thread about Ferber and effectiveness of Google searches (with the three corrections already fixed so I don't have to post 4 times), since I have a feeling I came a day too late to the thread.
-----
Thank you Mr. Seecof, Professor Felten, and Professor Hailperin for your excellent comments [in Professor Kerr's thread].

I agree with Professor Hailperin on the importance of correlating queries with the age of the searcher, given the state of obscenity law.

Cf. New York v. Ferber, 458 US 747 (1982) (link to my comment with Findlaw link ) -- Justice White's opinion for the Court states that governments are allowed "greater leeway" in the regulation of pornographic depictions of children, and cites the Broadrick substantial overbreadth rule as applying: " '[Whatever] overbreadth may exist should be cured through case-by-case analysis of the fact situations to which [the statute's] sanctions, assertedly, may not be applied.' Broadrick v. Oklahoma, 413 U.S. 601 (1973)." Obviously, Miller v. California and Paris Adult are controlling in the realm of state's police power reaching pornography generally. Also, "child pornography" is obviously different from children's ability to access legal pornography.

I agree with Professor Felten that, from a statistical standpoint, it is unclear why the DOJ is requesting all search terms in a given week. But from a legal standpoint it is possible/probable that, as Professor Felten surmises, the DOJ wants to ask for the maximum and then work with Google to negotiate an agreement.

I agree with Mr. Seecof that Google should be commended for often giving searchers what they desire despite ineffective search queries. Every Googler should always know to put phrases in quotation marks; it is unfortunate that that isn't widely known. For example, " 'New York Times' " (i.e., with quotation marks in the query) gets 197,000,000 hits, while "New York Times" (i.e., without quotation marks in the query) gets 348,000,000 hits. On the first page of results on the Google query without quotation marks is a web cam of Times Square. That Google at the top of their results page offers a Google page "News results for New York Times" resulting from the query without quotation marks (as well as with quotation marks) is a great feat of technology (one that Yahoo has as well; it's possible Yahoo copied that from Google -- I don't know). Of course, "news results FOR New York Times" is different than "news results FROM New York Times" -- but nytimes.com is first on the results page for both the with-quotations and without-quotations results pages. Also, this is leaving aside the "sponsored links," which I believe is the main revenue stream for Google.

Here are the links to the three salient comments. Thanks again.

Mr. Seecof's comment

Professor Felten's comment

Professor Hailperin's comment
1.21.2006 2:43pm
Mahan Atma (mail):
Hmmm... I took a class from Prof. Stark in nonparametric methods when I was at Berkeley. I'm suprised to see him involved with this -- hope he's getting paid well...
1.21.2006 2:45pm
Dave:
It seems to me that the most relevant issue from Google's perspective isn't mentioned--Google doesn't want to publish its trade secrets and doesn't want to set a precedent that the government can obtain such information.

Personally, from the little bit I've read, I'm feeling pretty sympathetic towards Google. I don't think the government should be able to obtain that kind of information coercively, especially when the only potential benefit is a CYA operation for the government in terms of the porn law's legality.

Dave
1.21.2006 2:46pm
Eugene Volokh (www):
The government has offered to agree to a protective order barring the redisclosure of Google trade secrets; naturally, that's just a second-best solution from any trade secret owner's perspective, but as I understand it such revelation of trade secrets under a protective order is pretty routine in discovery.
1.21.2006 3:48pm
Max Hailperin (mail) (www):
I agree that Prof. Stark's declaration is consistent with the analyses by Seecof et al. One consequence of that consistency is that the declaration does not clear up the mystery about why search queries are relevant without knowing the age of the searcher. I think I may need to clarify that point a bit; in my prior comment, I merely wrote that "for the constitutional analysis, the distinction between [minors and adults] is crucial." JLR's reference to Ferber makes me think I need to be more explicit, as Ferber is not the relevant case: Ginsberg is. (Ginsberg v. New York, 390 U.S. 629 (1968)) In particular, to understand the two Reno v. ACLU cases (regarding CDA and COPA) you need to understand how Ginsberg and Miller work together. (Miller is the key obscenity case, Miller v. California, 413 U.S. 15 (1973)).

Essentially, there are three levels of material: (1) stuff so tame and/or valuable that it is OK even for kids, (2) stuff that is tame enough and/or valuable enough for adults, but which for kids would be too raunchy and not valuable enough, and (3) stuff so raunchy and valueless that even adults can have their access to it impeded.

COPA has no constitutional flaw with regard to category (3) material. By Miller, obscenity has no First Amendment protection. (In fact, even before COPA, CDA was OK with regard to obscenity.)

COPA has typically been assumed to also be in the clear with regard to category (1), because it does not purport to restrict the distribution of category (1) material. Now, that might be somewhat questionable, because of lack of clarity, chilling effects, and whatnot. Even a regulation that does not purport to impede category (1) might. But that hasn't been the focus of the Court's analysis, because there is a more glaring problem.

That more glaring problem is category (2) material. By Ginsberg, distribution to minors can be obstructed. By Miller, however, distribution to adults is activity protected by the First Amendment. So, if COPA achieves the permissible objective of blocking distribution of category (2) material to minors, but along the way causes adults to also suffer difficulties accessing the same material, how should this be sorted out? The Court's answer has been "strict scrutiny," in which the government needs to show that no other means that would do an equally good job of shielding the minors would do less harm to the adults' access. In principle, the test could also take into account harm to minors' access to category (1), but again, that hasn't been the focus.

As such, there is no relevant question of "the search behavior of current web users" in general (Prof. Stark's phrase). What is relevant is the behavior of minors (so we know how often a technique would succeed in shielding them) and the behavior of adults (so we know how often a technique would hinder their access).
1.21.2006 4:36pm
Fishbane (mail):
Does anyone have an opinion on the second issue raised by Dave? Instinctively, I too am uncomfortable with this sort of data being routinely (that is, in the course of normal legal proceedings) being disclosed. For somewhat obvious reasons, I'm going to avoid commenting on slippery slopes, but it does strike me that there's a rachet here: accepting the government's agrument means that at least "this much" somewhat sensitive data is OK for the state to request *from third parties otherwise not involved*. I find that troubling.

Others have raised the issue, but I still haven't seen a firm reason why such third party compulsions might be acceptable for the state and not, say, for corporation. (I'm sure there's something in civproc barring it, but what if there were a constitutional question raised in, say, a criminal SOX case?)
1.21.2006 4:57pm
Max Hailperin (mail) (www):
I hate to correct myself, but then again, I'd even less like to remain in error. In my earlier comment, I wrote "In particular, to understand the two Reno v. ACLU cases (regarding CDA and COPA) you need to understand how Ginsberg and Miller work together."

There's only one problem here: by the time COPA came up, Reno wasn't the Attorney General any more, Ashcroft was. So rather than refering to two Reno v. ACLU cases, I should have refered to one Reno and two Ashcrofts. (Two Ashcrofts because COPA has made two trips to the Supreme Court.)
1.21.2006 5:27pm
Cal Lanier (mail) (www):
"I don't think the government should be able to obtain that kind of information coercively, especially when the only potential benefit is a CYA operation for the government in terms of the porn law's legality. "

This bothers me, too.

Back in the area where I have some knowledge, as opposed to an opinion:

From Stark's declaration:

"Reviewing URLS available through search engines will help us understand what sites users can find using search engines, to estimate the prevalence of [HTM] materials among such sites, to characterize such sites, and to measure the effectiveness of content filters in screening HTM materials from those sites."

Max Hailperin has already reiterated the big "so what" about the first three objectives, but the one that really bothers ms is the last objective--measure effectiveness of content filters.

My understanding of content filters and blockers is fairly decent, and they have nothing to do with search engines. Unless these filters actually eliminate results from a results list, the government does not need search engine results to measure effectiveness of content filters.

So if new filters are indeed eliminating results as opposed to blocking access or blocking queries, then I grovel for my ignorance. Otherwise, this is a troubling inaccuracy.
1.21.2006 5:35pm
Justin (mail):
Everyone here whose worked for or against the government knows just how useful it is to trust the government when it is forced to explain its motivation for something, right?
1.21.2006 6:12pm
Pete Guither (mail) (www):
It's been so long since I was following COPA. There are two points I don't remember.

1. Has the government's authority to restrict a category of speech called "Material Harmful to Minors" been constitutionally approved by the courts and has that term been clearly defined?

2. Have the courts recognized a difference between older and younger minors in this regard?
1.21.2006 6:29pm
Max Hailperin (mail) (www):
Cak Lanier quotes from Stark's declaration:

"Reviewing URLS available through search engines will help us ... to measure the effectiveness of content filters in screening HTM materials from those sites."

Lanier then goes on to say:

"Unless these filters actually eliminate results from a results list, the government does not need search engine results to measure effectiveness of content filters."

But Stark's declaration is not talking about search engine results. In fact, search engine results do not figure anywhere in the subpoena. (Search engine queries figure in elsewhere, which was what I was commenting on. But the quoted text isn't about them either.) The quoted text is with regard to a demand essentially of "show us what URLs you have indexed." This is totally independent of any particular searches Google's users might do -- it reflects the first part of Google's process, where they use "spiders" or "crawlers" to traverse the web, looking to see what is out there.
1.21.2006 7:02pm
Cal Lanier (mail) (www):
Max--Ultimately, the objective appears to be discovery of the sites available to search engine users. So the difference between "URLs available through search engines" and "search engine results" doesn't seem significant. However, I'll restate:

"Unless these filters actually eliminate results from a results list, the government does not need URLS available to search engine users to measure effectiveness of content filters."

Filters and blockers, to my knowledge, have nothing to do with search engine URLs or results. So why is the DoJ using search engine data to test filter effectiveness?
1.21.2006 7:19pm
Max Hailperin (mail) (www):
I'll try to answer Pete Guither's three questions:

Q 1a: Has the government's authority to restrict a category of speech called "Material Harmful to Minors" been constitutionally approved by the courts?

A 1a: Yes, see my earlier citation of Ginsberg.

Q 1b: Has that term been clearly defined?

A 1b: Yes, or at least, a quite good effort was made. One of the big differences between COPA and CDA is that COPA contains a detailed definition, largely cribbed from earlier Supreme Court opinions (especially Miller, which I cited earlier). The actual definition is mildly interesting; for example, one can debate whether the "post-pubescent female breast" really belongs. However, most of what's in there, whether it belongs or not, is at least clear. See 47 USC 231(e)(6) if you want to read the whole definition.

Q 2: Have the courts recognized a difference between older and younger minors in this regard?

A 2: The closest I am aware of -- and I am not a lawyer and not expert on the subject -- is a comment suggesting that the difference between a 16 year old and a 17 year old may matter. In the Supreme Court's opinion in Reno v. ACLU, CDA is distinguished from the law at issue in Ginsberg: "Fourth, the New York statute defined a minor as a person under the age of 17, whereas the CDA, in applying to all those under 18 years, includes an additional year of those nearest majority." Taking the hint, Congress defined "minor" in COPA as "any person under 17 years of age."
1.21.2006 7:35pm
TruthInAdvertising:
"Filters and blockers, to my knowledge, have nothing to do with search engine URLs or results. So why is the DoJ using search engine data to test filter effectiveness?"

Most filters primarily rely on blacklists of URLs and domains to block access to material that meet the various criteria of the filter companies. Some filters also employ keyword filtering to catch content that resides on an unknown or unreviewed site or which resides on a site that is largely acceptable but might include blockable content. But keyword filtering tends to overblock so it's considered less reliable than blocking by URL. Presumably, the DOJ will feed through a list of a million searches into the various filtering products to see how effectively they screen out the sites that lead to content the DOJ considers HTM.
1.23.2006 2:17pm