FSF on Anthropic Copyright: What AI Builders Need to Know
The intersection of AI and copyright law just got more interesting. The Free Software Foundation (FSF) issued a statement responding to the Bartz v. Anthropic settlement—a class action lawsuit claiming Anthropic infringed copyright by downloading works from Library Genesis and Pirate Library Mirror datasets to train their large language models.
This case matters for anyone building with AI. The implications touch on training data, fair use, and the future of open AI development.
What Happened
The district court ruled that using books to train LLMs was fair use—a significant win for AI companies. But the court left an open question: was downloading those books for this purpose legal? Rather than wait for trial on that specific issue, the parties settled.
The settlement offers money to copyright holders whose works were used. The FSF received such a notice because one of their books—Free as in Freedom: Richard Stallman's Crusade for Free Software by Sam Williams and Richard Stallman—was found in the datasets.
The twist: The FSF published this book under the GNU Free Documentation License (GNU FDL), a free license allowing use of the work for any purpose without payment. So technically, Anthropic could use it legally anyway.
The FSF's Position
Instead of taking the settlement money, the FSF used this moment to advocate for something bigger: AI transparency and user freedom.
Obviously, the right thing to do is protect computing freedom: share complete training inputs with every user of the LLM, together with the complete model, training configuration settings, and the accompanying software source code.
Their request? If you train AI on freely-licensed content, users should get:
- Complete training inputs — know what data trained your model
- The complete model — full access, not just an API
- Training configuration — how it was built
- Source code — the software running it
It's a call for true AI openness—models that respect user freedom the way free software does.
Why This Matters for Indie Builders
If you're building AI products, this case sets important precedents:
- Fair use for training is real. The court affirmed that training on copyrighted works can be fair use. This gives AI developers breathing room.
- Downloading is still murky. How you acquire training data matters. Scraping vs. using datasets others scraped vs. licensed access—the legal landscape varies.
- Transparency is gaining traction. The FSF isn't alone in calling for training data disclosure. Regulations like the EU AI Act are moving this direction too.
- Free licenses offer protection. If your content is under free licenses, you sidestep some of these battles entirely.
The Bigger Picture
The FSF is a small organization with limited resources, but they're signaling a philosophy: if your AI training touches copyrighted works, the ethical response isn't just paying off rights holders—it's giving users freedom.
For indie hackers building AI tools, this is a reminder that:
- Training data choices have legal and ethical implications
- Open models and datasets reduce your legal exposure
- The community is watching how companies handle these issues
The Bartz v. Anthropic settlement won't be the last word on AI copyright. More cases are coming. More precedents will be set. But the FSF's stance adds a principled voice to the conversation—one that prioritizes user freedom over settlement checks.