The linguists at Language Log have been poking fun at a BBC story suggesting that British teens have poor vocabularies and that Britain is becoming a nation of "Vicky Pollards." The main posts on the subject are here and here; an extra post is here, and for a (very partial) retraction of their original mockery (which was substantially fair, but here they go into greater theoretical detail) here.
By the way, who is Vicky Pollard? The Language Loggers suggest looking here, here, here, and here. I've only looked at the fourth of those links, but it's pretty funny.
In any event, the basic moral is that the BBC doesn't know what it's talking about. For one thing:
The Vicky character — a broad satire of the accent, dress and manners of British lumpen-teen females — is portrayed as hyper-verbal. One of the basic Vicky bits is her jabbering rapidly on automatic pilot, saying far more than she should. Yet the BBC sees her as someone who is unable to communicate due to an inadequate word stock, not someone who over-communicates with socially inappropriate content, accent, word choice and sentence structure. This is another piece of evidence that journalists these days are incapable of elementary observation and common-sense description, at least when it comes to speech and language.
For another thing, the story generated the assertion that "the top 20 words used [by British teens] . . . account for around a third of all words." Now, you're supposed to read that and imagine "um," "like," "y'know" . . . but it turns out that everyone does the same thing. Having the top 20 words account for a third of all your words is a normal distribution. (That's "normal" in the "ordinary" sense, not the "Gaussian" sense.) Take a look at Zipf's Law, and then read this lovely article about the Oxford English Corpus, where you can find the 100 commonest English "words" (where "words" basically means "lemmas," if you find that helpful).
Especially funnily, the Language Log folks analyzed a text by the professor responsible for the statistic, and found that he, too, followed the same 20/one-third law! Not that the professor is really to blame; of course, his research was badly mangled by the media.
UPDATE: A commenter quibbles with my use of the word "commonest." In the comments, I quote the Oxford English Corpus guys using the word, and also uses of the word by Byron and Jonathan Swift.
Related Posts (on one page):
- More about language:
- Zipf's law:
Sasha, is "commonest" a valid superlative? I think you would need to say "most common".
Something to do with Fibonacci Numbers, the Golden Mean, etc., lurking here...
"What is the commonest word? Based on the evidence of the billion-word Oxford English Corpus, the 100 commonest English words found in writing around the world are as follows . . ."
This is the OED guys talking.
Jonathan Swift says, in Tale of a Tub (1704): "It was necessary that corruption should have some allegory as well as the rest; and the author invented the properest he could, without inquiring what other people had writ; and the commonest reader will find, there is not the least resemblance between the two stories."
Byron wrote, in The Irish Avatar (1821):
Is it madness or meanness which clings to thee now?
Were he God, as he is but the commonest Clay
With scarce fewer wrinkles than sins on his brow
Such servile devotion might shame him away.
"Commonest" is listed in The Free Dictionary. And a Google search yields 2,790,000 hits for the word "commonest."
And even without any of the above, I maintain that "commonest" would still be O.K., as it's unambiguous and not difficult to say. Awkwardness or clunkiness is in the eye of the beholder, but I say it's neither awkward nor clunky.
Now I realize there are "rules" floating around to the effect that you don't add "-est" after long adjectives, which includes 2-syllable adjectives not ending in "-y." Even on that web site I just linked to, they recognize exceptions like "quiet," "clever," "narrow," and "simple." ("Simple" might be a special case because an "-est" superlative still only takes two syllables.) So that rule arguably doesn't exclude "commonest." To the extent it does, I reject the rule.
given your response above, the "quibble" note is... superfluous.
It ranks more than 80,000 English words by frequency of use.
As far as word frequencies are concerned... I love them to pieces. I'm getting a new Sunday School class (turning 8 years old this year) and we're going to learn to read half the KJV in a month (the hardest words out of the 47 which are necessary are "thou," "Israel," and "against.") I also use the Russian frequency lists for vocabulary study, though I have to say that a lot of this stuff is less useful than it appears to be -- if I tell you that "к" or "в" are two of the most commonly used words in Russian, you still have a long way to go to reach actual understanding (since these are "function" words used in many different contexts, sometimes with very different meanings from an English-speaking point of view.) The top 20 words used in any given text probably get used in 30 or 40 different senses within that text: that's a big part of why they're used so much.
Aren't you referring to syllables with a schwa?
This rule allows "biggerer" or similar constructions comparing comparisons. ("In gorillas the male is biggerer than the female than in humans." Isn't that more clear [why don't I like clearer here?] than "The male gorilla is bigger than the female gorilla to a greater degree than the male human is bigger than the female human"?)
?
Shouldn't, like, "like", be number one, you know?
I don't see how "the rule" (which one?) allows double forms like "biggerer". Those are impossible as far as I know, and they would only be generated if the morphology provided the opportunity to add the suffix twice, which it doesn't.
It's a great site, but it has a severe flaw: it lists plurals of words separately and in addition to singulars. This is simply not right for straight frequency counting.
Like, that sounds, like, you know, "bush schwa", you know?