AI code of practice: first draft and first copyright meeting
On 1 August 2024, the EU’s Artificial Intelligence Act entered into force. This regulation carries significant implications for the cultural and creative sectors as AI increasingly transforms artistic processes and leverages cultural data. Among other provisions, the AI Act introduces obligations for providers of general-purpose AI (GPAI) models, including transparency and copyright.
To define the technical measures and policies GPAI providers must implement to meet these obligations, the European Commission’s AI Office is facilitating the drafting of a Code of Practice. By May 2025, this document will outline best practices and measures to support providers in complying with their legal requirements. The Code is being developed through a multistakeholder process involving nearly 1000 participants from industry, academia, civil society, and rightsholder organisations. Culture Action Europe also participates in this working group. The process is led by Chairs—renowned experts—who consolidate stakeholder input to draft successive iterations of the document.
A first draft
Last week, the AI Office published the first draft of the Code. Below is an overview of the main measures related to transparency and copyright.
Measure 3: Internal Copyright Policy
- Providers of GPAI models must implement an internal policy ensuring compliance with EU copyright laws across the entire lifecycle of their models. They should also assign clear responsibilities within their organisations to oversee this policy.
- Providers of GPAI models must perform copyright due diligence on upstream parties before contracting them and ensure that these entities have respected rights reservations. In the context of AI model development, ‘upstream’ refers to the process of collecting and preparing the datasets used to train the model.
- Providers of GPAI models should take steps to mitigate the risk that downstream systems produce copyright-infringing outputs. ‘Downstream’ refers to later stages where the AI model, being essentially a statistical model, is integrated into tools or applications for real-world use. Providers are urged to avoid overfitting their models (when the model learns the training data too closely, including its noise or specific details) and should require downstream entities to prevent repeated generation of outputs identical or recognisably similar to protected works. This measure does not apply to SMEs.
Measure 4: Providers should identify and comply with rights reservations
- Providers should only use crawlers that respect the robots.txt protocol.
- Providers should ensure that rights reservation expressed through robots.txt does not negatively affect the findability of the content in their search engine.
- Providers should respect other appropriate machine-readable means to express a rights reservation at the source and/or work level according to widely used industry standards.
- Providers, excluding SMEs, should collaborate to develop and adopt interoperable machine-readable standards for expressing rights reservations.
- Crawling activities must exclude pirated sources, such as those listed on the European Commission’s Counterfeit and Piracy Watch List or national equivalents.
Measure 5: Transparency
- Providers will publish information on their websites about the measures they adopt to identify and comply with rights reservations, written in clear and understandable language.
- This information should include the names of all crawlers used for GPAI model training and their relevant robots.txt features.
- Providers are encouraged to designate a single point of contact to allow rightsholders to communicate directly and promptly lodge complaints regarding the use of protected works in GPAI model development.
- Providers will draw up, keep up-to-date and provide the AI Office upon its request with information about data sources used for training, testing and validation and about authorisations to access and use protected content for the development of a GPAI model.
Transparency and Copyright
On 21 November, the first meeting of the Working Group on Transparency and Copyright, co-chaired by Nuria Oliver and Alexander Peukert, took place. Pre-selected participants, representing both rightsholders and tech companies, briefly presented their positions on the first draft of the Code of Practice. Culture Action Europe provides generalised feedback from the meeting (in line with the Chatham House Rule, names of organisations are not disclosed).
- Providers’ copyright policies should go beyond merely respecting opt-outs, even though this is a crucial aspect. They should also incorporate measures to establish robust licensing frameworks and encourage collaboration with Collective Management Organisations and key rightsholders.
- Many rightsholders argued that relying solely on the robots.txt protocol for opting out is insufficient and risks misapplication for AI training permissions. Rightsholders should be able to use other machine-readable mechanisms, such as opting out via terms and conditions on a website, public repositories of rights reservations, public declarations, or using Automated Content Recognition (ACR) technology to remove protected content from datasets.
- Some participants suggested establishing an official public registry to explicitly record rights reservations. This registry would provide legal certainty for all stakeholders and enable tracking the dates of rights reservations, facilitating the removal of protected data from datasets as needed. However, one participant opposed the proposal, arguing that it could place an undue burden on rightsholders.
- Regarding upstream copyright compliance, rightsholders argue that it should not be limited to a simple pre-check of datasets—GPAI model providers should require third parties to provide full traceability of the data they supply and details about their collection methods. The concept of ‘reasonable due diligence’ needs further elaboration.
- Ensuring downstream copyright compliance requires GPAI model providers to share detailed information about the data used for training with the AI Office and downstream entities. This is the only way to ensure that AI outputs are not generated using illegal or infringing content.
However, others noted that downstream providers are often the only entities capable of properly assessing and managing copyright compliance within their specific operational context. They may manipulate their own protected content or hold licences which fall outside the control of GPAI providers. - Authors and rightsholders must be compensated for the prior unauthorised and illegal use of copyrighted works by GPAI providers. The Code of Practice should include a provision requiring AI providers to commit, through their copyright policies, to compensating for such unauthorised use. The Code should also establish a framework for sanctions and measures to address non-compliance.
At the same time, tech company representatives stressed the need to stay within the scope of the AI Act, avoiding additional obligations: ‘We’re here to finish the rules under the AI Act, nothing more, nothing less.’ They questioned the AI Office’s role, arguing it is ‘not a copyright enforcer’ and that its responsibilities in verifying copyright compliance are unclear.
They also pointed to technical challenges, including the unfeasibility of work-level rights reservations and the difficulty of downstream compliance. Predicting infringing outputs, they argued, is nearly impossible with current technology, and imposing copyright compliance on downstream providers lies outside the AI Act’s scope.
Both the next meeting and the publication of the second iteration of the Code of Practice are expected to take place in January 2025.
Culture Action Europe, together with the Michael Culture Association, has prepared considerations regarding the implementation of the AI Act, developed through our Action Group on AI & Digital. This paper forms the basis of the feedback we are providing in the Code of Practice drafting process.