The Human Genome Project was one of the first large scale data projects in biology to show the benefits to research of early and open data sharing. This model of open data release has been applied to other human genome data including the 1,000 genomes project. However as sequencing has begun to be be applied to cohorts with substantial phenotype data more restrictive data sharing mechanisms have been adopted which have in practice greatly limited access. As the Caldicott2 report proposes wide use of Safe Havens for health data, I will discuss possible technical mechanisms to facilitate research without dataset distribution.
About the Seminar Series
The rapidly-declining cost of genomic sequencing promises many breakthroughs in our understanding of genetic predisposition to disease and for the development of medical treatments more precisely tailored to the individual patient. Much of this genomic data will end up in databases maintained by research and healthcare organisations (and increasingly by commercial “personal genomics” companies) which will have the ethical and legal responsibilities for preserving the privacy of such sensitive information. Unfortunately, recent research suggests that it is much more difficult than was first imagined to preserve the privacy of such information. Many existing methods for “de-identifying” or “anonymising” such data have been shown to be fragile: correlation of information from genomic databases, electronic health records and public sources such as genealogy and residence databases can often lead to surprisingly accurate inferences about the identities of individuals. If such information were to becomes widely available, it might compromise the ability of individuals to obtain health and life insurance, and might influence employment and even personal relationship decisions. Such information leakage might also well have a significant chilling effect on the public’s willingness to participate in research and clinical studies.
We are organising a series of seminars, funded by the Balliol Interdisciplinary Institute, to examine the current state of information privacy in this domain, and to look in particular at several questions:
To what extent can technology keep up with the arms race between “hackers” and data curators? Will recent advances in cryptography, database security architectures and “privacy preserving” data mining methods mitigate the risks, now and in the future?
What is the current state of legislation and regulation in this domain, and how is it likely to evolve in the face of developing attacks on privacy? Who actually owns and has control over genomic (and related health) data and its uses? Are there significant national and cultural differences which need to be taken into account (especially when data storage may transcend jurisdictional boundaries e.g. when data are stored in commercial “clouds”)?
To what extent does the appearance of patient-centric disease management portals such as PatientsLikeMe mitigate the concerns about privacy? Will patients’ altruistic urge to share information about themselves, their disease and their interactions with the healthcare system outweigh their concerns about their personal privacy? What is the appropriate balance between the public good which results from data sharing and the potential private loss?
What changes need be made to informed consent protocols to ensure that both researchers and donors fully understand and accept the risks associated with data collection and use?
If, as Scott McNealy (former CEO of Sun Microsystems) once said “Privacy is dead ñ get used to it,” and privacy is doomed to lose the arms race, what is the impact likely to be on public attitudes towards, and expectations of, personal genomic privacy? In a world where people are willing to commit intimate personal information to Facebook, should we even worry about the consequences of loss of genomic privacy? Or should we rather be addressing the issues inherent in completely open sharing of such information?
Answers to some or all of the above questions would have a profound impact on the practice of scientific research and medicine. A clear analysis of the risks, methods for mitigating those risks, and, alternatively, of the consequences of a deliberate policy of transparency, will help policy makers to develop realistic approaches to public education about, and the setting of guidelines for future research on, and exploitation of, personal genomic information.
About the speakers
Professor Tim Hubbard is Director of Bioinformatics for King’s Health Partners and Head of the Department of Medical & Molecular Genetics at King’s College London. He is also Head of Bioinformatics at Genomics England, the company set up by the Department of Health to execute the 100,000 genomes project. Previously he was Head of Informatics at the Wellcome Trust Sanger Institute where he remains Honorary Faculty. While at Sanger he was one of the organisers of the sequencing of the human genome. In 1999 he co-founded the Ensembl project to analysis, organise and provide access to the human genome and currently leads the ENCODE project that creates the GENCODE human and mouse gene sets. He is actively involved in efforts to improve data sharing in science, develop open access publishing resources and plan for the adoption of genomic medicine. He is a member of the cross funding agency Expert Advisory Board on Data Access (EAGDA) and is chair of the advisory board of Europe PubMedCentral. He was a member of the OSCHR e-health board (2007-9), which advised on the use of patient record data for research, supporting the work to create the Clinical Practice Research Datalink (CPRD) and the creation of the UK Farr Institute, and working groups of the Human Genomic Strategy Group (HGSG) (2010-11).