What frameworks are available to permit data reuse? How can legal and technical systems be structured to allow people to donate their data to science? What are appropriate methods to repurpose traditional consent forms so that user-donated data can be gathered, deidentified and syndicated for use in computational research environments?
This webinar will examine how traditional frameworks to permit data reuse have been left behind by the mix of advanced techniques for re-identification and cheap technologies for the creation of data about individuals. Existing systems typically depend on the idea that de-identification is robust and stable, despite a multitude of studies demonstrating that re-identification is nearly always possible on at least some portion of a de-identified cohort.
At issue here is a real risk to scientific progress. If privacy concerns block the redistribution of data on which scientific and policy conclusions are made, reproducing the conditions that led to those will be impossible.
Approaches and frameworks that are emerging to deal with this reality tend to fall along two contours. One uses technological and organizational systems to “create” privacy where it has been eroded, while allowing data reuse. This approach draws on encryption and boundary organizations to manage privacy on behalf of individuals. The second applies an approach of “radical honesty” towards data contribution by acknowledging up front the tension between anonymization and utility, and the difficulty of true de-identification. It draws on the traditions of beneficence and utility and as well as autonomy in informed consent to create reusable and redistributable open data, and leverages cloud-based systems to facilitate storage, collaborative reuse, and analysis of data.
About the Seminar Series
The rapidly-declining cost of genomic sequencing promises many breakthroughs in our understanding of genetic predisposition to disease and for the development of medical treatments more precisely tailored to the individual patient. Much of this genomic data will end up in databases maintained by research and healthcare organisations (and increasingly by commercial “personal genomics” companies) which will have the ethical and legal responsibilities for preserving the privacy of such sensitive information. Unfortunately, recent research suggests that it is much more difficult than was first imagined to preserve the privacy of such information. Many existing methods for “de-identifying” or “anonymising” such data have been shown to be fragile: correlation of information from genomic databases, electronic health records and public sources such as genealogy and residence databases can often lead to surprisingly accurate inferences about the identities of individuals. If such information were to becomes widely available, it might compromise the ability of individuals to obtain health and life insurance, and might influence employment and even personal relationship decisions. Such information leakage might also well have a significant chilling effect on the public’s willingness to participate in research and clinical studies.
We are organising a series of seminars, funded by the Balliol Interdisciplinary Institute, to examine the current state of information privacy in this domain, and to look in particular at several questions:
To what extent can technology keep up with the arms race between “hackers” and data curators? Will recent advances in cryptography, database security architectures and “privacy preserving” data mining methods mitigate the risks, now and in the future?
What is the current state of legislation and regulation in this domain, and how is it likely to evolve in the face of developing attacks on privacy? Who actually owns and has control over genomic (and related health) data and its uses? Are there significant national and cultural differences which need to be taken into account (especially when data storage may transcend jurisdictional boundaries e.g. when data are stored in commercial “clouds”)?
To what extent does the appearance of patient-centric disease management portals such as PatientsLikeMe mitigate the concerns about privacy? Will patients’ altruistic urge to share information about themselves, their disease and their interactions with the healthcare system outweigh their concerns about their personal privacy? What is the appropriate balance between the public good which results from data sharing and the potential private loss?
What changes need be made to informed consent protocols to ensure that both researchers and donors fully understand and accept the risks associated with data collection and use?
If, as Scott McNealy (former CEO of Sun Microsystems) once said “Privacy is dead ñ get used to it,” and privacy is doomed to lose the arms race, what is the impact likely to be on public attitudes towards, and expectations of, personal genomic privacy? In a world where people are willing to commit intimate personal information to Facebook, should we even worry about the consequences of loss of genomic privacy? Or should we rather be addressing the issues inherent in completely open sharing of such information?
Answers to some or all of the above questions would have a profound impact on the practice of scientific research and medicine. A clear analysis of the risks, methods for mitigating those risks, and, alternatively, of the consequences of a deliberate policy of transparency, will help policy makers to develop realistic approaches to public education about, and the setting of guidelines for future research on, and exploitation of, personal genomic information.
Data Dump to delete
- Name: John Wilbanks
- Affiliation: SAGE Bionetworks and Consent to Research
- Bio: John Wilbanks is the Chief Commons Officer at Sage Bionetworks and a Senior Fellow in Entrepreneurship at the Ewing Marion Kauffman Foundation. He has worked at Harvard’s Berkman Center for Internet & Society, the World Wide Web Consortium, the US House of Representatives, and Creative Commons. John is a past affiliate of MIT’s Project on Mathematics and Computation and also started a bioinformatics company called Incellico, which is now part of Selventa. He sits on the Advisory Boards for Boundless Learning, Genomic Arts, Curious, GenoSpace, Patients Like Me, and Genomera, and is Special Advisor on the Research Commons to the University of California San Francisco’s Clinical Translational Science Institute. John holds a degree in Philosophy from Tulane and studied modern letters at the Sorbonne.