OII | OII researchers propose recommendations for effective data governance in light of the EU’s Digital Service Act

Published on
17 Dec 2024

Written by
Diyi Liu, Manuel Tonneau and Juliette Zaccour

OII researchers propose a series of recommendations for effective data access and data governance in light of the EU’s Digital Service Act.

The need for robust regulatory frameworks on data access

Recent developments such as widespread shutdown of platform APIs and research tools create significant obstacles to independent investigation of platform governance and societal impact. These actions severely limit researchers’ ability to study systemic risks arising from digital platforms underscoring the urgent need for robust regulatory frameworks ensuring protected platform data access for research.

The European Union’s Digital Services Act (DSA) addresses these challenges by establishing comprehensive rules to create a safer digital space. Its primary goals are to tackle the dissemination of illegal content, prevent the spread of disinformation, and protect users’ fundamental rights online. Within this framework, Article 40 specifically facilitates data access for vetted researchers, enabling them to study systemic risks and assess the effectiveness of risk mitigation measures implemented by very large online platforms (VLOPs) and very large online search engines (VLOSEs). This provision not only enhances platform transparency and accountability within the European Union but also sets a potential global standard for researcher access to platform data.

To implement Article 40, the EU has proposed the Draft Delegated Regulation on Data Access under the Digital Services Act. In response, doctoral researchers from the Oxford Internet Institute specialising in platform and data governance studies have identified several aspects of the Draft Act, which they believe requires clarification before finalisation.

Their analysis focuses on three key areas: the appropriateness of data access modalities, the scope and context of accessible data and the underlying enforcement and coordination. Drawing on their empirical research experience, the Oxford researchers offer recommendations for strengthening these provisions to ensure meaningful transparency and effective implementation.

Data Access and Research Independence

The current framework that allows how researchers access data has significant implications for researcher autonomy. The researchers highlight how the methods and conditions under which data is made accessible as outlined in Article 9 of the Draft Act requires substantial clarification. Furthermore, the current framework may inadvertently encourage researchers to accept whatever default access arrangements are in place in the inventory, which could include Terms of Service from data providers or third-party providers that impose additional restrictions for researchers.

Recommendations: The researchers suggest that the Draft Act should set clear basic standards for how data access should work. These standards should be developed through expert consultation as the Draft Act is finalised, to help Digital Service Coordinators (DSCs) assess proposed access arrangements. The Act should also make sure that any agreements that impose restrictions on researchers after accessing the data – especially limits on publishing their findings – are addressed. Finally, data protection concerns should be handled through the procedures already outlined in Article 13, rather than allowing data providers to add extra rules.

Context of Data Access and Meaningful Transparency

The Draft Act rightly focuses on making sure data access requests are necessary and reasonable, but it doesn’t include key measures to ensure meaningful transparency. Based on their review of transparency reports from Very Large Online Platforms (VLOPs), OII researchers have found major gaps in data about the size of moderator teams, and the effectiveness of moderation enforcement. One big issue is the lack of proper metrics to compare data across countries and languages, which makes it hard to analyse whether platforms have enough moderators to handle the amount of content they need to review. Another problem is that platforms don’t explain why there are differences in the number of moderators working in different languages. There’s also little clarity on moderation enforcement: platforms report the number of harmful content they have moderated, such as hate speech or child sexual abuse material (CSAM), but they don’t provide a baseline on total volume of harmful content that would help understand the share of all harmful content that is effectively moderated.

Recommendations: To ensure meaningful transparency while maintaining proportionality, the researchers set out how the Draft Act should consider: (a) clarify the burden of proof for data access requests by establishing clear criteria for what constitutes adequate alternative data sources, (b) require DSCs of establishment to maintain an updated registry of available data sources, and (c) explicitly address the scope of accessible data to include necessary contextual information for raw numbers. For instance, necessary contextual data could include moderator count per language, both the total volume of harmful content (prevalence) and the volume that is actually detected and moderated (enforcement), which could strengthen our assessment of moderation effectiveness.

Balancing Data Protection with Research Integrity

Public interest

The Draft Act’s current emphasis on data protection, while important, lacks explicit consideration of research needs and public interest. If this balance is not made explicit, there is a risk of undermining the social value of the proposed research.

Recommendations: The Oxford researchers suggest strengthening The Draft Act by amending Article 9(2) to clarify that Digital Service Coordinators should balance the risks and benefits of data access in deciding on appropriate access modalities. This should include not only consideration of data sensitivity and the interests of data providers but also the value of the proposed research and how it serves the public interest.

Transparency theatre

Previous research has documented instances where platform-provided data or research tools proved inconsistent or misleading. This raises concerns about the risk of “transparency theatre” – where platforms provide data that satisfy technical requirements without enabling meaningful oversight – remains significant. While the granularity and quality of the data itself are determinant in establishing meaningful access, of particular concern are unilateral decisions by the data providers related to data anonymisation, as privacy-enhancing technologies such as differential privacy and synthetic data generation could significantly altering the data and undermining the reliability and validity of research findings.

Recommendations: To address these concerns, the researchers suggest that the Draft Act should:

Clarify responsibilities: Update Article 9 to clearly define who is responsible for deciding the appropriate anonymization methods, when these are needed.
Improve documentation: Update Article 15 to require data providers to properly document the data processing and anonymization methods they use and give researchers guarantees about the accuracy and reliability of these methods.
Prevent unnecessary restrictions: Amend Article 15(3) to make it clear that data providers cannot apply data processing or aggregation methods not specified in the original request if these actions interfere with the research.

Coordination Mechanisms and Dispute Resolution

The Draft Act’s provisions for harmonisation across jurisdictions require strengthening say the Oxford researchers. Its emphasis on consistency across DSCs of establishment raises concerns regarding implementation and oversight. Moreover, the current framework leaves critical aspects of the mediation process undefined, which raises questions about its effectiveness and fairness.

Recommendations: They suggest that the Draft Act should include detailed criteria and guidelines for applying exemptions to ensure consistent implementation across Digital Service Coordinators. It should also clarify the process for appointing mediators, the scope of mediators’ authority, explain whether mediator decisions are binding, and specify what happens when mediation closes without agreement. The framework must also explicitly define circumstances requiring researcher participation and establish clear protocols for keeping researchers informed throughout the mediation process.

Vetting Process and Potential Impacts on Fair Representation

Lastly, the Draft Act, while advancing crucial data access provisions, requires careful consideration of its potential downstream impacts on academic research and knowledge production. The vetting process and institutional requirements could disproportionately advantage well-resourced research institutions. The issue is particularly concerning given that platform impacts and risks often transcend geographical boundaries, requiring diverse global perspectives for comprehensive understanding of systemic risks.

Recommendations: To promote more equitable access across institutions, the researchers propose the Draft Act should consider incorporating provisions to:

Ensure technical and infrastructural requirements do not create unnecessary barriers to entry for smaller institutions.
Create clear guidelines for international research collaboration or consider the establishment of shared research infrastructure for researchers from non-EU institutions.

The Delegated Regulation is planned to be adopted by the European Commission in the first quarter of 2025. Download the full feedback form on the draft delegated regulation on data access provided for in the Digital Services Act, submitted December 2024 by clicking here: Feedback-Draft-Act-DSA40-OII

Find out more about the authors:

Diyi Liu is a DPhil candidate and Clarendon scholar at the OII. Her doctoral research examines algorithmic content moderation and the legitimacy of platformised speech governance in Asian contexts. Manuel Tonneau is a DPhil candidate at the OII. He studies the extent to which harmful online content moderation may treat users unequally across geographies. Juliette Zaccour is a DPhil candidate and a Clarendon Scholar at the OII. She is conducting research on data access, privacy-enhancing technologies, and algorithm auditing.

Authors

Diyi Liu

DPhil Student

Diyi is a DPhil candidate and Clarendon scholar at the OII. Her doctoral research examines algorithmic content moderation and the legitimacy of platformised speech governance in Asian contexts.

View profile

Manuel Tonneau

DPhil Student

Manuel is a second-year DPhil student in Social Data Science and a Shirley Scholar at the OII. His research is focused on the cross-geographic study of harmful online content moderation.

View profile

Juliette Zaccour

DPhil Student

Juliette is a Clarendon Scholar at the OII, conducting research on data access, privacy-enhancing technologies, and algorithm auditing.

View profile

Authors

Diyi Liu

Manuel Tonneau

Juliette Zaccour

Related Topics: