A terafic milestone, according to Google's report,
[O]ur systems that process links on the web to find new content hit a milestone: 1 trillion (as in 1,000,000,000,000) unique URLs on the web at once!
How do we find all those pages? We start at a set of well-connected initial pages and follow each of their links to new pages. Then we follow the links on those new pages to even more pages and so on, until we have a huge list of links. In fact, we found even more than 1 trillion individual links, but not all of them lead to unique web pages. Many pages have multiple URLs with exactly the same content or URLs that are auto-generated copies of each other. Even after removing those exact duplicates, we saw a trillion unique URLs, and the number of individual web pages out there is growing by several billion pages per day.
So how many unique pages does the web really contain? We don't know; we don't have time to look at them all! :-) Strictly speaking, the number of pages out there is infinite — for example, web calendars may have a "next day" link, and we could follow that link forever, each time finding a "new" page. We're not doing that, obviously, since there would be little benefit to you. But this example shows that the size of the web really depends on your definition of what's a useful page, and there is no exact answer.
We don't index every one of those trillion pages — many of them are similar to each other, or represent auto-generated content similar to the calendar example that isn't very useful to searchers....
Thanks to Dan Friedman for the pointer.
Well played.
[Important Note to Helpful Readers: If we have confusing typos and especially ugly formatting errors, such as an unclosed underline or bold tag, we'd love to hear from you about them -- but please e-mail the author about this, rather than leaving a comment. We often won't read the comments for a while after the post, and if there's a glaring formatting error, we'd see it quickly when we revisit the post, even without the comment; and in any event the comment likely isn't going to be that helpful to your fellow comment readers. So please e-mail us directly about glitches like this. Thanks!]
Comment Policy: We'd like the posts to be civil, of course (no profanity, personal insults, and the like), but we're also hoping that people try to be as calm, reasoned, and substantive as possible. So please, also avoid rants, invective, substantial and repeated exaggeration, and radical departures from the topic of the thread. Sticking with substance -- and staying on-topic -- will make the comments more helpful to other readers, and more pleasant.
As editors, we reserve the right to delete posts, and even to kick out posters, though we hope that both of these will be exceptional events. (We also reserve the right to be busy with other things, and therefore (1) not remove all the posts that might merit removal, and (2) ignore demands such as "You should remove A's posts, because they're just as bad as B's!")
Here's a tip: Reread your post, and think of what people would think if you said this over dinner. If you think people would view you as a crank, a blowhard, or as someone who vastly overdoes it on the hyperbole, rewrite your post before hitting enter.
And if you think this is the other people's fault -- you're one of the few who sees the world clearly, but fools wrongly view you as a crank, a blowhard, or as someone who overdoes it on the hyperbole -- then you should still rewrite your post before hitting enter. After all, if you're one of the few who sees the world clearly, then surely it's especially important that you frame your arguments in a way that is persuasive and as unalienating as possible, even to fools.
Our goal is to provide an interesting and pleasant environment that can help inform readers. To do that, we'll occasionally have to exercise our editorial discretion. Think of this as an in-person discussion group, where having different voices is critical to a great conversation -- but where sometimes the leader has to deal with cranks who sour the conversation more than they enliven it.
Naturally, there's always a risk that this discretion will be used erroneously, no matter how well-intentioned the editor. But discussion groups (especially on the Internet, but also off it) generally need an editor who'll occasionally make such judgments.
And, remember, it's a big Internet. If you think we were mistaken in removing your post (or, in extreme cases, in removing you) -- or if you prefer a more free-for-all approach -- there are surely plenty of ways you can still get your views out.