The availability of big datasets offers great potential to shape and influence policy outcomes, as well as the means by which policy-making is undertaken. But it remains unclear how government might make best use of this rich source of information, or with what practical and ethical implications. Victoria Nash (OII) discusses a recent OII workshop that explored how policy-makers, analysts and researchers should respond to the threats and promises offered by big data to public policy making and government services.
It it is unclear exactly how government might make best use of ‘big data’ about the world. Image: the 2013 Kumbh Mela (the largest gathering of humans in a single place: est. 100 million) by Daniele Colombo.
Last week the OII went to Harvard. Against the backdrop of a gathering storm of interest around the potential of computational social science to contribute to the public good, we sought to bring together leading social science academics with senior government agency staff to discuss its public policy potential. Supported by the OII-edited journal Policy and Internet and its owners, the Washington-based Policy Studies Organization (PSO), this one-day workshop facilitated a thought-provoking conversation between leading big data researchers such as David Lazer, Brooke Foucault-Welles and Sandra Gonzalez-Bailon, e-government experts such as Cary Coglianese, Helen Margetts and Jane Fountain, and senior agency staff from US federal bureaus including Labor Statistics, Census, and the Office for the Management of the Budget.
It’s often difficult to appreciate the impact of research beyond the ivory tower, but what this productive workshop demonstrated is that policy-makers and academics share many similar hopes and challenges in relation to the exploitation of ‘big data’. Our motivations and approaches may differ, but insofar as the youth of the ‘big data’ concept explains the lack of common language and understanding, there is value in mutual exploration of the issues. Although it’s impossible to do justice to the richness of the day’s interactions, some of the most pertinent and interesting conversations arose around the following four issues.
Managing a diversity of data sources. In a world where our capacity to ask important questions often exceeds the availability of data to answer them, many participants spoke of the difficulties of managing a diversity of data sources. For agency staff this issue comes into sharp focus when available administrative data that is supposed to inform policy formulation is either incomplete or inadequate. Consider, for example, the challenge of regulating an economy in a situation of fundamental data asymmetry, where private sector institutions track, record and analyse every transaction, whilst the state only has access to far more basic performance metrics and accounts. Such asymmetric data practices also affect academic research, where once again private sector tech companies such as Google, Facebook and Twitter often offer access only to portions of their data. In both cases participants gave examples of creative solutions using merged or blended data sources, which raise significant methodological and also ethical difficulties which merit further attention. The Berkman Center’s Rob Faris also noted the challenges of combining ‘intentional’ and ‘found’ data, where the former allow far greater certainty about the circumstances of their collection.
Data dictating the questions. If participants expressed the need to expend more effort on getting the most out of available but diverse data sources, several also canvassed against the dangers of letting data availability dictate the questions that could be asked. As we’ve experienced at the OII, for example, the availability of Wikipedia or Twitter data means that questions of unequal digital access (to political resources, knowledge production etc.) can often be addressed through the lens of these applications or platforms. But these data can provide only a snapshot, and large questions of great social or political importance may not easily be answered through such proxy measurements. Similarly, big data may be very helpful in providing insights into policy-relevant patterns or correlations, such as identifying early indicators of seasonal diseases or neighbourhood decline, but seem ill-suited to answer difficult questions regarding say, the efficacy of small-scale family interventions. Just because the latter are harder to answer using currently vogue-ish tools doesn’t mean we should cease to ask these questions.
Ethics. Concerns about privacy are frequently raised as a significant limitation of the usefulness of big data. Given that with two or more data sets even supposedly anonymous data subjects may be identified, the general consensus seems to be that ‘privacy is dead’. Whilst all participants recognised the importance of public debate around this issue, several academics and policy-makers expressed a desire to get beyond this discussion to a more nuanced consideration of appropriate ethical standards. Accountability and transparency are often held up as more realistic means of protecting citizens’ interests, but one workshop participant also suggested it would be helpful to encourage more public debate about acceptable and unacceptable uses of our data, to determine whether some uses might simply be deemed ‘off-limits’, whilst other uses could be accepted as offering few risks.
Accountability. Following on from this debate about the ethical limits of our uses of big data, discussion exposed the starkly differing standards to which government and academics (to say nothing of industry) are held accountable. As agency officials noted on several occasions it matters less what they actually do with citizens’ data, than what they are perceived to do with it, or even what it’s feared they might do. One of the greatest hurdles to be overcome here concerns the fundamental complexity of big data research, and the sheer difficulty of communicating to the public how it informs policy decisions. Quite apart from the opacity of the algorithms underlying big data analysis, the explicit focus on correlation rather than causation or explanation presents a new challenge for the justification of policy decisions, and consequently, for public acceptance of their legitimacy. As Greg Elin of Gitmachines emphasised, policy decisions are still the result of explicitly normative political discussion, but the justifiability of such decisions may be rendered more difficult given the nature of the evidence employed.
We could not resolve all these issues over the course of the day, but they served as pivot points for honest and productive discussion amongst the group. If nothing else, they demonstrate the value of interaction between academics and policy-makers in a research field where the stakes are set very high. We plan to reconvene in Washington in the spring.
*We are very grateful to the Policy Studies Organization (PSO) and the American Public University for their generous support of this workshop. The workshop “Responsible Research Agendas for Public Policy in the Era of Big Data” was held at the Harvard Faculty Club on 13 September 2013.
Also read: Big Data and Public Policy Workshop by Eric Meyer, workshop attendee and PI of the OII project Accessing and Using Big Data to Advance Social Science Knowledge.