The start of a new year seems to prompt an awful lot of writing about how the data revolution will change everything—especially in the developing world. It will be bigger than the industrial revolution. It is already disruptive. And the applications and devices that humans can design to use this data are projected to reduce poverty, liberate people, halt the spread of disease, and alter the state-centric nature of the international system. The more disruptive the better! Vive la Révolution!
It’s easy to get caught up in this, as (full disclosure) I am. The availability of machine-readable, comparable information is already changing people’s lives in very practical ways. Data has even become less nerdy and more exciting to talk about: We can refer to “a disruptive future,” and plenty of people think that future kind of looks like an iPhone. Using technical terms in everyday professional conversations is becoming the norm. But underneath the comfortable arm waving about this bright new future, there are some quiet places that have not seen this change.
At a time where people are waxing eloquent about the power of big data to make consumer goods and services ever more tailored and ever more rapid, the world still lacks reliable, comparable country statistics on basic economic, governance and human development outcomes across much of the developing world. UNICEF estimates that one in three children have not been registered and therefore simply do not exist in statistical terms. Education outcomes are often estimated by models based on five-to-10-year-old data. As a proxy for accountable governance, budget transparency data covers only about half of the more than 190 countries in the world.
And the closer you look, the more you find that even the data we have considered reliable has internal flaws that can make it hard to trust (see Mortan Jensen's controversial book Poor Numbers). Unlike “big data”—where the law of large numbers more or less evens out the errors of any individual data point—cross-country data comparisons are typically small enough that even a handful of inaccurate data points can alter the outcome.
The first challenge here is obvious. If we want to realize the potential of the data revolution in the world’s poorest countries, we need more and better data. Period. And people are already both demanding it and trying to create it.
But there is a second, less-visible challenge: ensuring that data is used responsibly. Foreign aid and foreign assistance are fields where much of the data we want to use is just beginning to be collected or fraught with challenges. But while development professionals grapple with how to work appropriately with some serious data gaps, we are surrounded by popular examples from other fields of how reliable big data can be: Nate Silver's 2012 election predictions, Target's marketing algorithms that can tell you are pregnant before you tell your friends and even a Brad Pitt movie about data—seriously! It can be tempting to think our world is the same—but it isn’t yet.
So if we are using development data, how do we know we are using it responsibly for policy making and aid allocation? That's not an often-asked question, but I think it should be. Are there cross checking metrics? What would that even look like?! Is transparency the answer? When someone corrects a data error, how should decision makers react (à la the Reinhart and Rogoff data controversy)?
Over this year, focusing on the responsible use of data is a theme I'll come back to again and again: things worth watching and learning from, characteristics of the responsible (and irresponsible!) use of development data and efforts to fill data gaps to enhance aid effectiveness. I hope others will too.