Friday, April 27, 2012

Sweepstakes Stats

(This is the nineteenth in what will eventually be series of 26 posts, should I live so long: each post centers on a topic suggested by the next letter of the alphabet from the previous post. The posts all have to do, directly or indirectly, (in this case, very indirectly indeed) with teaching and learning.)

Doing this alphabetical challenge has led me into some mental landscapes that I would probably not have thought to explore otherwise. For example, back at the letter K, I was struck by the quirkiness of K words generally and the smaller available stock of even those. Which led me to wonder which letters of the alphabet had the largest pool of words. The small ponds were clear from the start. Not too many X and Z words out there, you know that before the start, as anyone who has ever done an elementary school abecedary can attest. But who's the boss daddy of letters? I took it upon myself to conduct a little bit of informal investigative research (counting the number of pages for each letter in the Official Scrabble Player's Dictionary). The results are in. Let's hear it for our winner, today's honoree: the letter S. 

The race wasn't even close. S came in with 78 pages, C came in second with 54, followed closely by P with 50. The next group arrived in a tightly-knit pack some distance behind: A and B at 40, T at 38, D and M at 36. X, no surprise, was dead last, with one page. (Z and Y tied at 4).

Summary of the top seven: S, C, P, A, B, T, D

I just took a break from writing this post to see what stats I could track down on the internet and I found my way to The Phrontistery, which is precisely the kind of site that you would presume would have to exist somewhere simply on the grounds that if you can think of something, someone else has probably already thought of it and done it. The letters listed there for the first seven are the same, but order is slightly different: P, S, C, A, T, M, D

I don't know how to account for the discrepancy between S, which finished miles ahead in the Scrabble dictionary, and P, the winner by a nose on the web site, other than to speculate that the Scrabble dictionary does not include words of more than ten letters, and I can think of a ton of pseudo- and psycho- and philo- and pneumo-based words that would not appear there. 

A check of the AHED seems to confirm my original estimate and raise further questions about the Phrontistery dictionary: in the AHED, P comes in at 168 pages, where as S comes in at 229, which gives S more letters by a factor of 1.36, matching pretty closely the Scrabble dictionary ratio of 1.44.

So, back to the Phrontistery site, which has an explanation of sorts: the dictionary that Stephen Chrisomalis, the creator of the site,  is using is one he compiled himself, following various interesting but somewhat idiosyncratic rules:

Welcome to the International House of Logorrhea, a free online dictionary of weird and unusual words to help enhance your vocabulary. The IHL is a component of The Phrontistery, which has many other free word lists and unusual word related resources.

Did you ever have an English teacher who told you 'Whenever you read something, and find a word you don't know, look it up in the dictionary and write it down'? Well, I took that advice to heart. Of course, once you have a few hundred words down on your list, you think to yourself (if you are as obsessive as I am), 'Wouldn't it be a lot easier if I just read the whole dictionary, so that I could just do this word writing thing once and be done with it?' The result, after nearly a decade of conscientious word-collecting, is the International House of Logorrhea.

I have compiled a list of 15,500 English words, ranging from the merely uncommon to the extremely rare, nearly obsolete and just plain nutty! Each word is listed along with a brief, one-line definition. You should be able to get the general sense of most words, without having to read through pages of dictionary definitions. Having said that, don't go out and discard your dictionary. 
I have omitted the following word categories from the IHL:
  • extremely obsolete words (with some latitude, particularly for very interesting or useful terms)
  • words which are of strictly dialectal usage today
  • jargon, including medical, legal, biological, and other terms rarely found in non-specialist writing
  • foreign terms which, in writing, always require italicization
  • inflected forms of words (a single form is included for each word)

There is a ton of other interesting language-related stuff on the site, which I recommend to your attention. 

Process Reflection:

I had not really intended to get into all of this, but I started over there and wound up over here and in so doing managed NOT to address any of the other perhaps more substantive subjects that I had considered, including but not limited to standards, surprise, Scrabble, systems thinking,  serpentine, succotash, stress, satisfaction, and symbolism. And now the evening is slipping away and if I'm going to get any sleep I'm going to have to stop. Sigh. So many subjects, so little time.

No comments: