A Play on Words
Erez Aiden and Jean-Baptiste Michel
Big Data as a Lens on Human Culture
Riverhead Books, New York, 2013, 288 pp., $27.95 (cloth).
There are many things that are uncharted in this book. But Erez Aiden and Jean-Baptiste Michel don’t mean by “uncharted” that things are left out—in fact, a more appropriate title might be “Charted.”
The book tells the story of collecting the billions of words in all the world’s books, words that were previously lost in the meaning of the text—“uncharted” as it were—but can now be charted to our heart’s content. The authors hope that the charting process will lead us to discover interesting aspects of our culture, which they refer to as “culturomics.” Aiden and Michel collaborated with Google to make a powerful Web tool, but their claims about its usefulness are perhaps extravagant.
This is not to say that the book isn’t fun. That’s what you’d expect from the acknowledgments, in which Aiden thanks his three children and includes the middle name of a daughter: Banana. (At least he’s quirkily consistent; his son is Galileo.)
Now I’m all for fun. But Aiden and Michel are doing important scientific work, and they don’t do themselves any favors by giving “fun” examples. It doesn’t take big data to convince us that the word chupacabra (a blood-drinking creature reportedly sighted in Puerto Rico in 1995) is much rarer than Sasquatch or the Loch Ness Monster. It also seems silly to chart the changing usage of “argh” and “aargh” in books published sometime between the 1940s (it’s hard to tell the starting date from the chart reproduced in the book) and 2000. There’s a quote on the book jacket from Mother Jones that calls the Ngram Viewer “the greatest timewaster in the history of the Internet.” It was bold of the publisher to include that.
To document a cultural history by getting robots to read every word of every book ever published is ambitious. So what do I mean by saying there’s a lot left out of this effort? Aiden and Michel acknowledge that they are searching through a tiny sample of words, and although they say that Google has so far scanned some 30 million books (probably more by now), there are still some 100 million to go.
Further, if a word’s usage is a clue to our cultural history, many sources are ignored in this book: newspaper and magazine articles, letters, movies, TV and radio interviews, transcripts, lectures—in fact, everything written or spoken, but not published in a book. Besides, after books are written, they are often edited and revised for grammar and spelling, not to mention translated into other languages. Every author knows that editors change the text according to their publisher’s house style. I wonder if the language in books, even 30 million of them, is a reliable source of changing language usage.
I expect Aiden and Michel would argue that the books Google has scanned are all they have to work with, but given what’s missing, their “lens on human culture” theory may be too bold a claim.
About the charts: there are many, and they are stripped down; printed in black, white, and gray; and generated directly from the data. There is nothing wrong with simple charts, but the relative lack of labels and grid lines, and the chart lines themselves (sometimes as many as six) in minimally differentiated shades of gray, make for difficult reading. The authors point us to the Web, where all these problems are taken care of: colors differentiate the lines, and clicking on them at any point reveals a label and date. It’s an example of the distance between print and Web-based graphics.
The book tells the story of collecting the billions of words in all the world’s books.
But let’s be positive. The authors bravely do not dodge copyright questions raised when books are scanned or seemingly unethical “shadow” ways to get around those questions. There is lovely detail about a 2002 experiment by Larry Page and Marissa Mayer, who worked out how long it might take to scan all the world’s books. Apparently it would take “millennia, even eons.” So how did the authors get around that problem? Read the book; you’ll have fun!
Principal, Explanation Graphics, author, most recently of Wordless Diagrams and The Book of Everything
The Price of Everything and the Value of Nothing
A Brief but Affectionate History
Princeton University Press, Princeton, New Jersey, 2014, 168 pp, $19.95 (cloth).
Why didn’t a smart guy like Aristotle come up with the concept of gross domestic product (GDP) 2,000 years ago, since the word “economy” derives from the Greek word for household, oikos? It was because Aristotle focused on things that moved, like moons and planets.
For about 50,000 years human GDP did not budge. In Aristotle’s time there was no CNBC to shake the markets by announcing GDP gyrations. And nobody expected to live better than his parents. Of course, people cared about money and debt. In the 1500s, Henry VIII asked his treasurers to keep an eye on the tab he ran up at the pub and during the wars with France, but it would not have occurred to Henry to ask whether per capita GDP had climbed over the past year.
Diane Coyle’s smart and lucid new book, GDP: A Brief but Affectionate History, tells the story of this twentieth century numerical creation, which every three months threatens to topple prime ministers. Coyle begins by reminding us of the stakes, not in ancient Greece but modern Athens, where the head of that country’s statistical agency calls his job “a combat sport.” She shares the story of her economist friend Paola Subacchi of Chatham House, who visited the Greek agency expecting to see supercomputers, or at least an abacus. Instead, she walked up the stairs of a 1950s residential building to find “a dusty room with a handful of people” and no computers.
But national statistical agencies must come up with something, and often finagle data in their quest to sell bonds and wheedle others to provide aid. Coyle suggests that Chinese officials sometimes boast of their powerful GDP, while other times diminish GDP to qualify for handouts. After the Soviet Union collapsed, I visited St. Petersburg, Russia. My old economics textbooks suggested that the USSR had enjoyed strong growth under communism. Even a Nobel laureate like Paul Samuelson published such dubious numbers. Yet all I had to do was sniff the acrid air inside the decaying Hermitage museum to realize: the problem with communism was not that it couldn’t keep up with the West; the problem was that it couldn’t keep up with the standards of 1917!
Coyle performs an important task by reminding us that the very calculation of GDP (C + I + G +[X – M] or consumption plus investment plus government spending plus net exports) gives government leaders an incentive to spend more money. Why? Because stronger government spending tautologically increases that sum. All a leader must do is turn on the spending spigot, and he can count on his bean-counters to add more to GDP. Moreover, the value of government spending is calculated based on the salaries of government workers, not the value of their output. One of my Harvard students once suggested that, given this tautology, leaders who are willing to consider occasional budgetary austerity deserve special ribbons.
Coyle also does a fine job picking apart other problems with GDP, including such paradoxes as the widower who marries his housekeeper and thereby lowers GDP because he doesn’t pay her wages anymore. It’s especially hard to properly value services in the information economy. In a current, real-life example, I’ve devised a new matrix of numbers to help kids learn arithmetic. When children learn addition through this matrix called the Math Arrow, they increase their earning potential by, let’s say, a hundred thousand dollars. Yet the app costs just $4.99. Is each download of this matrix creating a hundred thousand dollars of value, or just a few?
After dissecting the problems with GDP, Coyle asks whether we can do better and runs through the list of competitors, including the Human Development Index (HDI), Measure of Economic Welfare (MEW), and assorted happiness indices. She is right to be skeptical, especially of those dispensed by happiness gurus and demagogues. Hugo Chavez called GDP a “capitalist conspiracy.” But alternatives are even more easily twisted like taffy. In 2009, the Happy Planet Index, for example, ranked Costa Rica highest among nations, with Cuba not far behind at number 7. It also found that people living under the Palestinian Authority are happier and healthier than Israelis. If a pro-Zionist spokesman argued that Palestinians were better off, he’d be laughed at or stoned. Oh, the United States showed up 114th on the list. Funny, I’ve never seen a raft leaving Miami for Cuba. So Coyle is correct both to dissect GDP’s flaws and to raise warning flags over its threatened demise.
I found only one omission in her otherwise short but masterful tract. When I wonder about a country’s standard of living, I often ask this simple question: How many hours does a typical worker have to work in order to buy a chicken? In the 1920s, President Herbert Hoover’s campaign promised “a chicken in every pot.” In those days, it took about two and one-half hours to earn a chicken. Today, it takes less than 15 minutes. Sounds like progress to me. Unless, of course, you’re poultry.
Todd G. Buchholz
former White House director of economic policy, author of New Ideas from Dead Economists, and CEO of Sproglit, LLC, an educational software firm