The Great Apostrophe Survey

We are all too used to seeing missing apostrophes, as in sentences like "Dont press that button, you moron, its linked to the detonator!", as well as extra apostrophes in phrases like "Who'se pantie's are those in your pocket?!". The question I've long been pondering upon is this: might there be a fixed number of apostrophes in the universe? Could there be some law of nature such that whenever somebody misses an apostrophe in a sentence like "I dont like figs", somebody else is obliged to use the missing apostrophe in a phrase like "Fig's - £2 per kilo"? In fact, might apostrophes operate something like entangled quantum particles, such that adding an extra one in your writing immediately causes one to disappear somewhere else in the universe, or vice-versa?

Thanks to the power of the Interweb I have at last been able to tackle this question. Using the MRC Psycholinguistic database I was able to pull out a list of the 200 or so most common words in everyday English according to Gordon Brown's figures (who, incidentally, was an examiner for my PhD). I just used the most common words in our language as (a) it seemed sensible, and (b) I was buggered if I was going to waste more time doing this silly test than necessary. I was pleased to note that many of the most common words involved apostrophes, such as "we're" (116 appearances per million words) and "you're" (271 per million), which would make the whole business easier. Then, buoyed by the intrepid work of Blair, Urland & Ma (2002), who demonstrated that the number of hits produced by a search engine is a good approximation of how common a word is in the language, I turned to that source of all knowledge, Google.

For each word, if it could possibly be written with an apostrophe, I noted the number of hits produced when punctuated correctly and the number when incorrect. It was necessary to enclose the search words in double-quotes to make sure only the punctuation forms I wanted were returned, as Google tries to be clever and correct bad punctuators' grammar (which frankly seems like a level of support they don't deserve). Many of the top words couldn't be used in the test, as the "incorrect" punctuation was also a legitimate use. For example, although many instances of "back's" and "cant" will be incorrect, as in sentences like "I cant sell my house because it back's onto a minefield", it was impossible to estimate how many such uses there were thanks to "Adam Back's home page" and the fact "cant" is a word.

I measured how often words had an apostrophe added and how often a word had an apostrophe removed. If the "steady-state" theory of apostrophization is correct, these two values should be exactly the same; if, on the other hand, the universe is a wild and lawless place in which there is no governance on how apostrophes are used, the numbers will be different from each other.

Apostrophes incorrectly added Apostrophes incorrectly omitted
0.25% of words 7.05% of words

Shit! There are vastly more apostrophes missed out of words than there are apostrophes inserted into them. The universe is indeed a terrible, lonely place devoid of order and frankly I wouldn't blame you for having a little existential crisis round about now.

You may be interested to know the worst culprits were "didn't" (spelt "didnt" on 12.1% of occasions), "don't" (spelt "dont" on a disgusting 22.06% of occasions) and "haven't" ("havent" on 10.74% of occasions). Extra apostrophes were most often seen in "put's", "think's" and "feel's", which frankly make's me sick. A special award must go to Lancaster University who came out as the top hit for the word "universitys" (see it here), which makes me chuckle as they are the arch-rival to my alma mater.

The big outstanding question is where do all the missing apostrophes go? Many millions must be missed out of words every day, and we know they aren't going into non-apostrophed words in any large numbers, so somewhere they're piling up. Physicists thought they had their hands full explaining missing matter, but the problem of having mislaid 90% of the universe is clearly trivial in comparison to the one now faced by grammaticians.

Reference
Blair, I.V., Urland, G.R. & Ma, J.E. (2002). Using internet search engines to estimate word frequency. Behavior Research Methods, Instruments and Computers, 34, 286-290.