Numerous applications of ‘Big Data analytics’ drawing potentially troubling inferences about individuals and groups have emerged in recent years.  Major internet platforms are behind many of the highest profile examples: Facebook may be able to infer protected attributes such as sexual orientation, race, as well as political opinions and imminent suicide attempts, while third parties have used Facebook data to decide on the eligibility for loans and infer political stances on abortion. Susceptibility to depression can similarly be inferred via usage data from Facebook and Twitter. Google has attempted to predict flu outbreaks as well as other diseases and their outcomes. Microsoft can likewise predict Parkinson’s disease and Alzheimer’s disease from search engine interactions. Other recent invasive applications include prediction of pregnancy by Target, assessment of users’ satisfaction based on mouse tracking, and China’s far reaching Social Credit Scoring system.

Inferences in the form of assumptions or predictions about future behaviour are often privacy-invasive, sometimes counterintuitive and, in any case, cannot be verified at the time of decision-making. While we are often unable to predict, understand or refute these inferences, they nonetheless impact on our private lives, identity, reputation, and self-determination.

These facts suggest that the greatest risks of Big Data analytics do not stem solely from how input data (name, age, email address) is used. Rather, it is the inferences that are drawn about us from the collected data, which determine how we, as data subjects, are being viewed and evaluated by third parties, that pose the greatest risk. It follows that protections designed to provide oversight and control over how data is collected and processed are not enough; rather, individuals require meaningful protection against not only the inputs, but the outputs of data processing.

Unfortunately, European data protection law and jurisprudence currently fails in this regard.

In May 2018 the General Data Protection Regulation (GDPR) came into force, intended to update data protection standards across the EU. While laudable on many fronts, the new framework and the caselaw of the European Court of Justice nonetheless appear to provide little protection against the novel risks of inferential analytics. Compared to other types of personal data, inferences are effectively ‘economy class’ personal data. Ironically, inferences receive the least protection of all the types of data addressed in data protection law and relevant jurisprudence, and yet now pose perhaps the greatest risks in terms of privacy and discrimination.

In a recent paper, we assessed whether inferences or derived data constitute personal data according to the Article 29 Working Party’s three-step model and jurisprudence of the European Court of Justice. If inferences are classified as personal data within the scope of the GDPR, individual data protection rights should apply. The Article 29 Working Party views verifiable and unverifiable inferences as personal data (e.g. results of a medical analysis), but leaves open whether the reasoning and processes that led to the inference are similarly classified. The European Court of Justice, meanwhile, is still finding its voice on this topic, as recent cases have proved inconsistent.

Unfortunately, as we show in our paper, even if inferences are considered personal data, data subjects’ rights to know about (Art 13-15), rectify (Art 16), delete (Art 17), object to (Art 21), or port (Art 20) them are significantly curtailed, often requiring a greater balance with controller’s interests (e.g. trade secrets, intellectual property) than would otherwise be the case. Similarly, the GDPR provides insufficient protection against sensitive inferences (Art 9) or remedies to challenge inferences or important decisions based on them (Art 22(3)).

In standing jurisprudence the European Court of Justice (Case C-28/08 P Commission v Bavarian Lager, C-141/12 YS and Others, C-403/16 Nowak) and the Advocate General (YS. and Others, Nowak) have consistently restricted the remit of data protection law to assessing the legitimacy of input personal data undergoing processing, and to rectify, block, or erase it. Critically, the ECJ has likewise made clear that data protection law is not intended to ensure the accuracy of decisions and decision-making processes involving personal data, or to make these processes fully transparent. Rather, individuals need to consult sectoral laws and governing bodies applicable to their specific case to seek possible recourse.

This could potentially be problematic. It might be the case that generally applicable decision-making standards exist in the public sector based on democratic legitimacy, but comparable broadly applicable standards are less likely to govern the private sector. Even though the decision-making autonomy of private entities is bound by certain laws (e.g. anti-discrimination law), companies are less likely than the public sector to have legally binding procedures or rules they need to follow when making decisions. Therefore, meaningful access to recourse against problematic inferences or decisions based upon them will be limited if a party feels unfairly treated.

More generally, the ECJ views data protection law as a tool for data subjects to assess whether the (input) data undergoing processing was legally obtained, and whether the purpose for processing is lawful. To ensure this, data protection law grants various rights to individuals, for example the right of access, rectification and deletion.

The GDPR, the draft e-Privacy regulation, the Digital Content Directive, and legal scholars attribute only limited rights over inferences to data subjects. At the same time, new frameworks such as the EU Copyright Directive, as well as provisions in the GDPR, push to facilitate data mining, knowledge discovery and Big Data analytics by limiting data subjects’ rights over their data. The new Trade Secrets Directive also poses a barrier to accountability as models, algorithms and inferences may very well fall within its remit, allowing companies to limit access and rights over them on the basis that they are commercially sensitive.

Going forward, data protection law needs to focus more on how, why and for what purpose data is processed. Broadly applicable standards for socially acceptable inferential analytics similarly require further development. Such standards need to address, for example, when inferring political opinions, sexual orientation or (mental) health from unintuitive sources, such as clicking behaviour, is socially acceptable or overly privacy invasive, and under which circumstances. Similarly, standards need to be set to establish the reliability of methods used to draw high-risk inferences, including minimum thresholds for testing both pre- and post-deployment.

To ensure individuals can meaningfully exercise their rights to privacy, identity, reputation, self-presentation, and autonomy in the age of Big Data, we argue that a new data protection right, the ‘right to reasonable inferences’, is now required. Such a right can help close the accountability gap currently posed by ‘high risk inferences’, meaning inferences that are privacy invasive or damaging to reputation, and have low verifiability in the sense of being predictive or opinion-based.

In cases where algorithms draw ‘high risk inferences’ about individuals, this right would require ex-ante justification to be provided by the data controller to establish whether an inference is reasonable. This disclosure would address (1) why certain data is a relevant basis to draw inferences; (2) why these inferences are relevant for the chosen processing purpose or type of automated decision; and (3) whether the data and methods used to draw the inferences are accurate and statistically reliable. The ex-ante justification is bolstered by an additional ex-post mechanism enabling unreasonable inferences to be challenged. Such a right must, however, be reconciled with EU jurisprudence and counterbalanced with IP and trade secrets law as well as freedom of expression and Article 16 of the EU Charter of Fundamental Rights: the freedom to conduct a business.

As it was necessary to create a ‘right to be forgotten’ in a big data world, we believe it is now necessary to create a ‘right of how to be seen’ in the age of Big Data and AI. This will help us seize the full potential of these technologies, while providing sufficient legal protection for the fundamental rights and interests of individuals.

Dr. Sandra Wachter is a lawyer and Research Fellow in Data Ethics, AI, robotics and Internet Regulation/cyber-security at the University of Oxford, a Fellow at the Alan Turing Institute and a Fellow of the World Economic Forum. Twitter: @SandraWachter5

Dr.Brent Mittelstadt is a Research Fellow and British Academy Postdoctoral Fellow in data ethics at the University of Oxford , a Fellow at the Alan Turing Institute, and a member of the UK National Statistician’s Data Ethics Advisory Committee. Twitter: @b_mittelstadt  

This post was originally published on the Oxford Business Law Blog.