Friday, January 8, 2016

Data Ain't What They Used to Be. Or Is They?

"Hmm... that makes sense," said Ben.

"Uh, you sound surprised," was my response.

"Well, usually when people start out by saying 'Here's my take on that,' it tends to not be that helpful." Well Ben's a smart guy, and if he hints that it might be helpful, that was enough to get me to write this post, which has been rattling around in my head probably about since blogs were invented.

My "take" had been prompted by another iteration of a typically tedious discussion that you may also have been involved in periodically: whether "data" is singular or plural. For those of you too young to know, or too smart to care, "data" is the Latin plural of "datum," meaning "a piece of information." So when a hapless person would say "There isn't enough data," grammar snoots would correct them and try to get them to say "There aren't enough data."  It's been a losing battle.  (David Foster Wallace fans will know that he actually called these people SNOOTS, all caps; an executive summary is here.)

So why do all of us troglodytes (we troglodytes? no, I guess it is "us troglodytes") insist on treating "data" as singular?  But wait, we don't!  "Car" is singular, but we don't say "There isn't enough car," at least not routinely.  The answer is that we are treating the word "data" as a mass noun, like "rice," "water," or "sand," as opposed to a "count noun" like "car."  We say "there isn't enough rice" without any problem. (Incidentally, I had originally thought that the word for this is "collective noun," but that refers words like "baggage" [a number of bags], "library" [a number of books], or phrases like "a pride of lions" and other terms of venery.)

The conclusion I had come to is that before the "information age," there just weren't "that many data," and you could count them one by one.  Since the arrival of the computer, we are, to paraphrase Torricelli, "swimming at the bottom of a sea of data," and it is no more practical to enumerate data one-by-one than it is to count grains of sand on a beach. So it seemed to me that this was a shift in usage prompted by a technological change, not simply by ignorance.

I was very satisfied with this explanation until, in the process of writing this post, I went to check out what I thought might be an early use of the word, recalling Sherlock Holmes saying "Not enough data" as he mulled over a difficult case.  So I fired up the OED on line (so awesome not need the magnifying glass!), and eventually found (after clicking on "full entry"), the following usage note under "data":
The use of data as a mass noun became increasingly common from the middle of the 20th cent., probably partly popularized by its use in computing contexts, in which it is now generally considered standard (compare sense 2b and the recent uses cited at datum n. 1b, some of which are ambiguous as to grammatical number). However, in general and scientific contexts it is still sometimes regarded as objectionable. Compare the plural uses cited at datum n. and the following:
1949   Nature 19 Nov. 890/1   ‘Data’ was a plural noun; for literate English writers it still is, and I contend that it always should be.
1978   P. Howard Weasel Words xiii. 63   Data stubbornly persists in trying to become an English singular.
1990   Psychologist 13 31/1   A staggeringly large number of psychologists fail to appreciate that ‘data’ should be followed by the plural form of the verb.

Snoots on parade!  And pretty much what I expected.  But when you read the entry on data, you find some shockers:
1645   T. Urquhart Trissotetras 53   The verticall Angles, according to the diversity of the three Cases being by the foresaid Datas thus obtained.
1764   Gentleman's Mag. Nov. 509,   I collected the datas chiefly from those excellent coin notes.
1807   Salmagundi 24 Nov. 366   My grandfather..took a data from his own excellent heart.
1910   Oologist 27 20/1   To make the markings on the eggs gibe with the datas is something of a chore.
2006   Cancer Causes & Control 17 1055/2   These datas were likely not missing at random.
This is "data," used as a singular count noun, with "datas" as plural.  WTF!?

Perhaps more interesting, the use of "data" as a mass noun has a long history:
1702   R. Morden Introd. Astron. i. 103   And by this Data there are twelve Problems resolved.
1826   Edinb. New Philos. Jrnl. 1 340   Inconsistent data sometimes produces a correct result. This, however, only happens..when part of the data is allowed to lie dormant.
1888   Pump Court 5 May 56/2   In the Northampton [table] the data is taken from the actual deaths of a floating population.
1902   A. S. Tompkins Hist. Rec. Rock Co., N.Y. 46   There is but little data to estimate Indian populations.

In fact, to find unambiguously plural usages of the word "data" you need to look at the OED entry on "datum." The earliest one is:
1691   Philos. Trans. (Royal Soc.) 16 498   From these data..the time of this Invasion will be determined to a day.

Merriam-Webster online has what seems to me a very sane usage note:
Data leads a life of its own quite independent of datum, of which it was originally the plural. It occurs in two constructions: as a plural noun (like earnings), taking a plural verb and plural modifiers (as these, many, a few) but not cardinal numbers, and serving as a referent for plural pronouns (as they, them); and as an abstract mass noun (like information), taking a singular verb and singular modifiers (as this, much, little), and being referred to by a singular pronoun (it). Both constructions are standard. The plural construction is more common in print, evidently because the house style of several publishers mandates it.

Of course this sent me scurrying to the Chicago Manual of Style; in 5.220 of the 16th edition ("Good usage versus common usage"), we see:
data. Though originally this word was a plural of datum, it is now commonly treated as a mass noun and coupled with a singular verb. In formal writing (and always in the sciences), use data as a plural.
Uh, so I guess "computing contexts" are not considered "science." Oh well.

The usage examples from the OED suggest to me that if you are pining for the days when an educated person would know beyond question that "data" is the plural of "datum," your nostalgia is for first-century Latin, not eighteenth-century English.  Condemning "data is" has a bit more historical basis than some other snoot shibboleths, such as the proscription on ending a sentence with a preposition, but that historical basis is not from English. In fact, many of these "rules" may arise from the same group of Latin-obsessed 17-century introverts.  There seems to be a rabbit-hole the size of the internet to slide down here, in which even cherished stories about Winston Churchill are demolished.

But back up and out the rabbit-hole: "data is"-ers, take heart! Not only does it make more sense, for the last half-century or so, to use data as a mass noun—and hence say "data is"—but you have over three centuries of English usage to back you up!

PS With apologies to Duke Ellington for the title, and to you for having to put up with an ad on that youtube video.

PPS Of course, my own inner snoot was interested to find this in the OED under datum:
The plural form data reflects the Latin plural; within English, this has given rise to a new singular and collective noun: see data n. and discussion at that entry.
Wait, don't you mean "mass noun"? I felt better about not have previously understood the difference.

PPPS I was delighted to find our old friend Cal Mooers in the OED with the first "mass noun" reference under "Computing" (2b).
1946   C. N. Mooers in Moore School Lect. (1985) 524   The data is stored in the memory in a systematic fashion with the points numbered in sequence.

PPPPS The dialog with Ben at the start was reconstructed from (my poor) memory, and might get edited if I can manage to communicate with him.


  1. Am I a snoot if I point that that the first letter of the word "is" in your title ought to be capitalized? (It's a verb, not an article, preposition, or conjunction. - Margy

    1. I can't tell you whether you are a snoot or not, but you are right (16th ed, CMOS 8.157) and I am changing it Right Now!

  2. Is you is or is you ain't my brother? The data sez yes...geek out, bro! Love, Sis