Diyi Liu
DPhil Student
Diyi is a DPhil candidate and Clarendon scholar at the OII. Her doctoral research examines algorithmic content moderation and the legitimacy of platformised speech governance in Asian contexts.
The need for robust regulatory frameworks on data access
Recent developments such as widespread shutdown of platform APIs and research tools create significant obstacles to independent investigation of platform governance and societal impact. These actions severely limit researchers’ ability to study systemic risks arising from digital platforms underscoring the urgent need for robust regulatory frameworks ensuring protected platform data access for research.
The European Union’s Digital Services Act (DSA) addresses these challenges by establishing comprehensive rules to create a safer digital space. Its primary goals are to tackle the dissemination of illegal content, prevent the spread of disinformation, and protect users’ fundamental rights online. Within this framework, Article 40 specifically facilitates data access for vetted researchers, enabling them to study systemic risks and assess the effectiveness of risk mitigation measures implemented by very large online platforms (VLOPs) and very large online search engines (VLOSEs). This provision not only enhances platform transparency and accountability within the European Union but also sets a potential global standard for researcher access to platform data.
To implement Article 40, the EU has proposed the Draft Delegated Regulation on Data Access under the Digital Services Act. In response, doctoral researchers from the Oxford Internet Institute specialising in platform and data governance studies have identified several aspects of the Draft Act, which they believe requires clarification before finalisation.
Their analysis focuses on three key areas: the appropriateness of data access modalities, the scope and context of accessible data and the underlying enforcement and coordination. Drawing on their empirical research experience, the Oxford researchers offer recommendations for strengthening these provisions to ensure meaningful transparency and effective implementation.
Data Access and Research Independence
The current framework that allows how researchers access data has significant implications for researcher autonomy. The researchers highlight how the methods and conditions under which data is made accessible as outlined in Article 9 of the Draft Act requires substantial clarification. Furthermore, the current framework may inadvertently encourage researchers to accept whatever default access arrangements are in place in the inventory, which could include Terms of Service from data providers or third-party providers that impose additional restrictions for researchers.
Recommendations: The researchers suggest that the Draft Act should set clear basic standards for how data access should work. These standards should be developed through expert consultation as the Draft Act is finalised, to help Digital Service Coordinators (DSCs) assess proposed access arrangements. The Act should also make sure that any agreements that impose restrictions on researchers after accessing the data – especially limits on publishing their findings – are addressed. Finally, data protection concerns should be handled through the procedures already outlined in Article 13, rather than allowing data providers to add extra rules.
Context of Data Access and Meaningful Transparency
The Draft Act rightly focuses on making sure data access requests are necessary and reasonable, but it doesn’t include key measures to ensure meaningful transparency. Based on their review of transparency reports from Very Large Online Platforms (VLOPs), OII researchers have found major gaps in data about the size of moderator teams, and the effectiveness of moderation enforcement. One big issue is the lack of proper metrics to compare data across countries and languages, which makes it hard to analyse whether platforms have enough moderators to handle the amount of content they need to review. Another problem is that platforms don’t explain why there are differences in the number of moderators working in different languages. There’s also little clarity on moderation enforcement: platforms report the number of harmful content they have moderated, such as hate speech or child sexual abuse material (CSAM), but they don’t provide a baseline on total volume of harmful content that would help understand the share of all harmful content that is effectively moderated.
Recommendations: To ensure meaningful transparency while maintaining proportionality, the researchers set out how the Draft Act should consider: (a) clarify the burden of proof for data access requests by establishing clear criteria for what constitutes adequate alternative data sources, (b) require DSCs of establishment to maintain an updated registry of available data sources, and (c) explicitly address the scope of accessible data to include necessary contextual information for raw numbers. For instance, necessary contextual data could include moderator count per language, both the total volume of harmful content (prevalence) and the volume that is actually detected and moderated (enforcement), which could strengthen our assessment of moderation effectiveness.
Balancing Data Protection with Research Integrity
Public interest
The Draft Act’s current emphasis on data protection, while important, lacks explicit consideration of research needs and public interest. If this balance is not made explicit, there is a risk of undermining the social value of the proposed research.
Recommendations: The Oxford researchers suggest strengthening The Draft Act by amending Article 9(2) to clarify that Digital Service Coordinators should balance the risks and benefits of data access in deciding on appropriate access modalities. This should include not only consideration of data sensitivity and the interests of data providers but also the value of the proposed research and how it serves the public interest.
Transparency theatre
Previous research has documented instances where platform-provided data or research tools proved inconsistent or misleading. This raises concerns about the risk of “transparency theatre” – where platforms provide data that satisfy technical requirements without enabling meaningful oversight – remains significant. While the granularity and quality of the data itself are determinant in establishing meaningful access, of particular concern are unilateral decisions by the data providers related to data anonymisation, as privacy-enhancing technologies such as differential privacy and synthetic data generation could significantly altering the data and undermining the reliability and validity of research findings.
Recommendations: To address these concerns, the researchers suggest that the Draft Act should:
Coordination Mechanisms and Dispute Resolution
The Draft Act’s provisions for harmonisation across jurisdictions require strengthening say the Oxford researchers. Its emphasis on consistency across DSCs of establishment raises concerns regarding implementation and oversight. Moreover, the current framework leaves critical aspects of the mediation process undefined, which raises questions about its effectiveness and fairness.
Recommendations: They suggest that the Draft Act should include detailed criteria and guidelines for applying exemptions to ensure consistent implementation across Digital Service Coordinators. It should also clarify the process for appointing mediators, the scope of mediators’ authority, explain whether mediator decisions are binding, and specify what happens when mediation closes without agreement. The framework must also explicitly define circumstances requiring researcher participation and establish clear protocols for keeping researchers informed throughout the mediation process.
Vetting Process and Potential Impacts on Fair Representation
Lastly, the Draft Act, while advancing crucial data access provisions, requires careful consideration of its potential downstream impacts on academic research and knowledge production. The vetting process and institutional requirements could disproportionately advantage well-resourced research institutions. The issue is particularly concerning given that platform impacts and risks often transcend geographical boundaries, requiring diverse global perspectives for comprehensive understanding of systemic risks.
Recommendations: To promote more equitable access across institutions, the researchers propose the Draft Act should consider incorporating provisions to:
The Delegated Regulation is planned to be adopted by the European Commission in the first quarter of 2025. Download the full feedback form on the draft delegated regulation on data access provided for in the Digital Services Act, submitted December 2024 by clicking here: Feedback-Draft-Act-DSA40-OII
Find out more about the authors:
Diyi Liu is a DPhil candidate and Clarendon scholar at the OII. Her doctoral research examines algorithmic content moderation and the legitimacy of platformised speech governance in Asian contexts. Manuel Tonneau is a DPhil candidate at the OII. He studies the extent to which harmful online content moderation may treat users unequally across geographies. Juliette Zaccour is a DPhil candidate and a Clarendon Scholar at the OII. She is conducting research on data access, privacy-enhancing technologies, and algorithm auditing.