OII | Data anonymity methods and privacy safeguards unfit for modern data, says Oxford data scientist

Published on
17 Jul 2024

Written by
Andrea Gadotti, Luc Rocher, Florimond Houssiau, Ana-Maria Cretu and Yves-Alexandre de Montjoye

Leading experts in privacy at the OII and Imperial College believe traditional approaches to anonymizing data for scientific and societal research are outdated.

Leading experts in privacy at Oxford Internet Institute, University of Oxford, and Imperial College, London believe traditional approaches to anonymizing data for scientific and societal research are outdated and lack sufficient privacy safeguards in the age of big data.

In a new article published in the leading US journal Science Advances, ‘Anonymization: The imperfect science of using data while preserving privacy’, the authors argue that traditional de-identification techniques used by researchers and scientists to retain the confidentiality of individual records have significant limitations when applied to modern data. They argue that an alternative approach is needed.

Dr Luc Rocher, Departmental Research Lecturer, Oxford Internet Institute, part of the University of Oxford and one of the co-authors of the new article explains, “The ability to safely share and analyse data is key for scientific and societal progress. Anonymization is one of the main ways scientists share data while protecting individuals’ privacy. At the heart of anonymization is the privacy-utility trade-off, in other words, achieving a high degree of anonymity whilst ensuring the data remains suitable and accurate, whilst protecting against the risk of malicious attacks. In our new article, we look at this issue and consider whether traditional record-level de-identification techniques provide a good privacy-utility trade-off to anonymize modern data”.

To address this issue, the authors have carried out a comprehensive review of current approaches to data anonymization and offer a modern perspective on this evolving field. They explore the strengths and weaknesses of different anonymization techniques, such as pseudonymization and de-identification, and focus on the challenges that arise when implementing them in practice.

The researchers also focus on the challenges that high-dimensional modern data poses for traditional anonymization approaches and reflect on the opportunities that state-of-the-art paradigms and techniques can offer to tackle such challenges.

Commenting on the findings, Dr Andrea Gadotti, Imperial College London, said, “Record-level deidentification presents inherent vulnerabilities and there is no reason to believe that it will ever provide an acceptable privacy-utility trade-off for modern, high-dimensional data. The weaknesses of pseudonymization and record-level de-identification techniques have been established beyond reasonable doubt thanks to the wide range of attacks that can be carried out by a realistic adversary. The picture for aggregate data is more nuanced.“

The researchers also consider the literature on ‘differential privacy’, a formal definition of privacy and framework for anonymization mechanisms with mathematical privacy guarantees—an approach that has many points in common with cryptography. The team finds that whilst differential privacy is an important framework to address privacy risks, it is not a magic bullet that can fix every anonymization problem.

Dr Rocher adds, “Differential privacy has attracted most efforts from privacy researchers since it was proposed in 2006, and it is seen by many as the most promising solution for robust anonymization. The pledge of differential privacy—i.e. context-independent provable guarantees of anonymity that hold against present and future attacks—is an attractive one for researchers, policymakers, and practitioners alike. However, its adoption in practice has been more challenging than expected”.

The researchers also find that aggregate data—in its various forms, including machine learning models, data query systems and synthetic data—can offer a better trade-off between privacy and utility, but doesn’t inherently protect against potential privacy attacks and privacy breaches. Instead, they argue that formal methods combined with assessing privacy safeguards and consideration of the context of how the data is being used is the best approach to balancing data risks.

Dr de Montjoye, Associate Professor, Imperial College London, explains, “In our analysis of current data anonymization techniques, we show that the best approach to successfully balance privacy and utility in practice is to carefully combine formal methods and an empirical evaluation of robustness against malicious attacks, taking into account the context where data is used or shared.”

Looking ahead, the team is confident that their latest review of anonymization practices will help shape and influence future thinking in this evolving field.

Dr Gadotti concludes, “We hope that the review will provide a useful reference for policy-makers, regulators, and practitioners who work in the field of privacy and data protection.

Notes for editors:

Media information

Media information: For more information or to request an interview with the authors call +44 (0)1865 287 210 or contact press@oii.ox.ac.uk.

About the article

The full article, ‘Anonymization: The imperfect science of using data while preserving privacy’ by Andrea Gadotti, Luc Rocher, Florimond Houssiau, Ana-Maria Creţu, Yves-Alexandre de Montjoye is published in the journal Science Advances.

About the OII

The Oxford Internet Institute (OII) is a multidisciplinary research and teaching department of the University of Oxford, dedicated to the social science of the Internet. Drawing from many different disciplines, the OII works to understand how individual and collective behaviour online shapes our social, economic and political world. Since its founding in 2001, research from the OII has had a significant impact on policy debate, formulation and implementation around the globe, as well as a secondary impact on people’s wellbeing, safety and understanding. Drawing on many different disciplines, the OII takes a combined approach to tackling society’s big questions, with the aim of positively shaping the development of the digital world for the public good. https://www.oii.ox.ac.uk/

Authors

Dr Andrea Gadotti

Postdoctoral Research Associate, University of Oxford

Andrea is a Technologist focusing on digital platforms, working at the European Commission. He is also a Postdoctoral Researcher at the Mathematical Institute, University of Oxford.

View profile

Dr Luc Rocher

Senior Research Fellow

Luc conducts human-centred computing research to understand how data and algorithms impact society. They work to make digital power visible to the public and guide the development of accountable, sustainable, and safe algorithms for all.

View profile

Dr Florimond Houssiau

Postdoctoral Research Associate, The Alan Turing Institute

Florimond started as a postdoctoral research associate in 2021. He is interested in all things synthetic data, with a particular focus on evaluating the privacy protection of synthetic data.

View profile

Dr Ana-Maria Cretu

Postdoctoral Researcher, SPRING Lab at EPFL

Ana-Maria is a Postdoctoral Reasearcher in the SPRING Lab at EPFL. She completed her PhD in the Computational Privacy Group at Imperial College London, advised by Dr. Yves-Alexandre de Montjoye.

View profile

Associate Professor of Applied Mathematics and Computer Science Yves-Alexandre de Montjoye

http://www.demontjoye.com/index.html, Imperial College, London

He is Associate Professor of Applied Mathematics and Computer Science at Imperial College, London. Yves is also a Special Adviser on AI and Data Protection to EC Justice Commissioner Reynders.

View profile

Authors

Dr Andrea Gadotti

Dr Luc Rocher

Dr Florimond Houssiau

Dr Ana-Maria Cretu

Associate Professor of Applied Mathematics and Computer Science Yves-Alexandre de Montjoye

Related Topics: