By Sean Punska, author of “Trust Is a Scarred Commodity” and creator of the Seanistotlean project.
A poet in 1985 prints a small collection of verse. He assumes his readers will be human — maybe a teacher, a friend, a stranger on the subway. Forty years later, that same book is scanned, scraped, and silently consumed by a large language model. The poet never consented. He never imagined this future. And he certainly never gave anyone permission to synthesize his voice into a chatbot.
But the AI companies call it fair use — as though this is just another form of reading.
Fair use — a legal principle allowing limited use of copyrighted material for commentary, scholarship, and education — was never meant to underwrite the industrial extraction of culture. It was designed to protect quotation and critique, not to license the wholesale ingestion of expressive work into commercial AI systems.
The Real Problem
In early cases like Authors Guild v. Google, courts ruled that scanning books to build a searchable index was fair use — because the system didn’t reproduce the works, only helped users find them. But today’s AI systems do something entirely different.
They don’t just index. They extract. Modern AI models generate outputs that reflect the styles and patterns derived from authors’ works — often without consent, and often trained on material scraped from behind paywalls, subscriptions, or private servers.
In 2023, an indie novelist discovered her e-book — a labor of years, sold only through a subscription platform — had been scraped to train a chatbot, her paywall ignored. Her voice now speaks through a machine she never authorized.
This isn’t passive reading. It’s automated appropriation at scale.
And it's not just drag-and-drop — it's bot-driven, intentional crawling that often circumvents clear opt-out signals like robots.txt
. The process is opaque, the permissions are absent, and the benefits flow almost entirely one way.
AI companies argue that their outputs are “transformative” — and that they don’t directly reproduce copyrighted material. But transformation doesn’t erase the ethical debt owed to the authors whose creative essence is repurposed without permission.
A Reasonable Expectation
When authors publish a book, they enter a social contract. That contract assumes their work will be read by people — maybe slowly, maybe even poorly — but always directly. They imagine libraries, classrooms, readers with pens in hand.
What they don’t imagine is that their work will be silently absorbed by a statistical model, filed into a latent space, and later reproduced in uncanny variations, stripped of context and connection.
This is the crux of the issue:
If an author could not have reasonably anticipated that their work would be used to train a machine, then that use should fall outside the bounds of fair.
Expectation-Based Fair Use
We need a new framework — one that ties fair use to effort, scale, and authorial expectation. I call this:
Expectation-Based Fair Use
It works like this:
If the work was published before the internet era, and the author had no reason to foresee machine-readable repurposing, then use in AI training should require consent or license.
If the work was born digital but protected behind paywalls or opt-out protocols, those boundaries must be respected.
If the content was created for humans, not machines, its use by machines should not be presumed fair.
To enforce this, AI companies should be required to disclose their training data sources and obtain licenses for post-2000 digital works — a threshold when web scraping and ML ingestion became foreseeable. Independent audits could verify compliance, ensuring transparency without exposing proprietary details.
This is not a radical shift. It’s a return to fairness.
A Threshold of Effort
There is a difference between:
A researcher scanning a 1970s manual to train a small, nonprofit model, and
A corporation ingesting tens of thousands of born-digital novels without permission
Fair use should favor effortful, human-scale reading over frictionless machine-scale extraction.
If access required scanning, effort, or labor, then the use is bounded and arguably fair. But if content was harvested en masse, without dialogue, compensation, or context — then that’s not reading. It’s stripmining.
AI’s Benefits Don’t Justify Exploitation
None of this is an attack on AI itself.
AI can democratize access, accelerate discovery, and serve people with disabilities. These are real and worthy goals. But they don’t justify a system where authors are excluded from the use and value of their own work.
The same technology that enables synthesis can also track attribution. The same systems that generate new text could also honor the sources that shaped it.
The question is not can we use authors’ works — it’s how, with what terms, and with what respect.
Closing
Fair use was meant to share knowledge — not seize it.
If we don’t draw a new line, machines will redraw it for us — one scrape at a time. Authors, creators, and readers deserve a future where creativity is respected, not stripmined.
And the poet? He’ll never know his verses trained a machine to mimic his soul — without ever asking his name.
This essay is part of the Seanistotlean project — a platform advocating for ethical AI, author rights, and political literacy in the algorithmic age.
Subscribe to follow our upcoming white paper and model policy brief on Expectation-Based Fair Use.
“The Author Avatar” and “The Training Artifact”
© 2025 Sean Punska. All rights reserved.
These original artworks were created in collaboration with generative AI and are protected under U.S. copyright law. Permission is granted for non-commercial sharing only when accompanied by full credit and a link to the original publication. For licensing, adaptation, or reprint inquiries, contact the author directly.