This article into a new area of research, dubbed culturomics, is fascinating. Long before massive digital infrastructure and the concept of ‘big data’, humans have been creating a record. What we thought culturally important to write down tells our story, not just our tales. And more important than anything is to get the measure of both big and small, so history can’t be “(re)written by the victors”.
If alien anthropologists wanted to learn about human behaviour, they would likely examine our literary works. Among the embarrassing flotsam, they would also discover Thomas Paine’s The Age of Reason, Toni Morrison’s Song of Solomon, or the lyrics to Bob Dylan’s ‘Blowin’ in the Wind’. The aliens might conclude that we are a troubled species, plagued by a mix of arrogance and ignorance, but with an overall trajectory that is progressive and promising.
The point is that there’s a treasure trove of information about our nature that can be extracted from the collective works of humanity. According to Google, there are approximately 130 million books in the world, and the company intends to scan all of them. So far, they’ve scanned about 30 million, and this impressive database is already being mined by scientists who are seeking answers to questions about our historical behavior.
Thirty million books contain an extraordinary amount of information, thus it qualifies as ‘big data’. As our computational ability to manage and probe large datasets increases, researchers are poised to answer queries that we couldn’t dream of addressing just a few years ago. Today, big data is routinely used in science laboratories – for example, when geneticists compare DNA sequences between tens of thousands of individuals to find correlations between gene variants and behaviour (these are referred to as GWAS, or genome-wide association studies).
In much the same way geneticists can analyze millions of genes to learn about human physiology, scientists can scan millions of books to learn about human culture. And like “genomics,” this new science has been dubbed ‘culturomics’ by its pioneers.