Skip to main content

AI companies would be required to disclose copyrighted training data under new bill

AI companies would be required to disclose copyrighted training data under new bill

/

The AI Foundation Model Transparency Act aims to make it clear if AI models used copyright data for training.

Share this story

An image showing a graphic of a brain on a black background
Illustration by Alex Castro / The Verge

Two lawmakers filed a bill requiring creators of foundation models to disclose sources of training data so copyright holders know their information was taken. The AI Foundation Model Transparency Act — filed by Reps. Anna Eshoo (D-CA) and Don Beyer (D-VA) — would direct the Federal Trade Commission (FTC) to work with the National Institute of Standards and Technology (NIST) to establish rules for reporting training data transparency. 

Companies that make foundation models will be required to report sources of training data and how the data is retained during the inference process, describe the limitations or risks of the model, how the model aligns with NIST’s planned AI Risk Management Framework and any other federal standards might be established, and provide information on the computational power used to train and run the model. The bill also says AI developers must report efforts to “red team” the model to prevent it from providing “inaccurate or harmful information” around medical or health-related questions, biological synthesis, cybersecurity, elections, policing, financial loan decisions, education, employment decisions, public services, and vulnerable populations such as children. 

The bill calls out the importance of training data transparency around copyright as several lawsuits have come out against AI companies alleging copyright infringement. It specifically mentions the case of artists against Stability AI, Midjourney, and Deviant Art, (which was largely dismissed in October, according to VentureBeat), and Getty Images’ complaint against Stability AI

“With the increase in public access to artificial intelligence, there has been an increase in lawsuits and public concerns about copyright infringement,” the bill states. “Public use of foundation models has led to countless instances of the public being presented with inaccurate, imprecise, or biased information.”

The bill still needs to be assigned to a committee and discussed, and it’s unclear if that will happen before the busy election campaign season starts.

Eshoo and Beyer’s bill complements the Biden administration’s AI executive order, which helps establish reporting standards for AI models. The executive order, however, is not law, so if the AI Foundation Model Transparency Act passes, it will make transparency requirements for training data a federal rule.