policy

AI Companies Want It Both Ways on Copyright — And African Content Creators Are Caught in the Middle

AI Companies Want It Both Ways on Copyright — And African Content Creators Are Caught in the Middle

Yesterday, Anthropic published a detailed post exposing what it called "industrial-scale" theft by three Chinese AI laboratories that is DeepSeek, Moonshot, and MiniMax. The companies, Anthropic alleged, had generated over 16 million exchanges with Claude through approximately 24,000 fraudulent accounts, systematically extracting Claude's capabilities to train their own competing models.

The post was well-written, the evidence was specific, and the concern was legitimate. But for anyone who has been following the AI industry's relationship with copyright over the past two years, there was an uncomfortable question sitting just beneath the surface of every paragraph.

Who exactly gets to decide what counts as stealing someone else's work to train an AI?

First, What Is Distillation?

Before we get into the contradiction, it helps to understand what distillation actually is, because the term gets used in AI circles without much explanation for everyone else.

Think of it this way. Imagine you have a brilliant, experienced doctor who has spent 30 years studying medicine and can diagnose almost any condition accurately. Now imagine a medical student who, instead of spending 30 years in training, simply follows that doctor around for a year, records every answer they give to every patient, and uses those recordings to train themselves to give the same answers. The student did not acquire the underlying knowledge the hard way. They extracted the output of someone else's expertise and used it as a shortcut.

That is essentially what model distillation does in AI. You take a powerful, expensive model in this case Claude and you feed it carefully crafted questions at enormous scale. The answers it gives become training data for a smaller, cheaper model you are building. The smaller model learns to behave like the larger one without the years of research, the billions of dollars in compute, or the original training work that made the larger model capable in the first place.

Distillation itself is not inherently wrong. AI companies do it legitimately all the time . Anthropic uses it to create smaller, faster versions of Claude for different use cases. The problem Anthropic identified is doing it covertly, at industrial scale, through fraudulent accounts, in violation of terms of service, specifically to replicate a competitor's capabilities without doing the underlying work.

That is a reasonable complaint. The question is whether the industry making that complaint has standing to make it.

The Complaint and the Contradiction

Anthropic's core argument is that DeepSeek, Moonshot, and MiniMax took something that did not belong to them, Claude's outputs, which represent enormous investment in research and training, and used it to build competing products without permission or payment.

Now consider what has been happening in American courts for the past two years.

Over 70 lawsuits have been filed against AI companies — OpenAI, Anthropic, Meta, Google, Microsoft, Stability AI, Perplexity, and others — by authors, journalists, publishers, musicians, and visual artists. The core allegation in virtually every case is the same: these companies took something that did not belong to them — copyrighted books, articles, songs, and images — and used it to build products without permission or payment.

Anthropic itself settled one of those cases, Bartz v. Anthropic, for $1.5 billion, after facing allegations that it had downloaded millions of pirated copies of books — sourced from shadow libraries like LibGen and PiLiMi — to train Claude. In January 2026, just weeks ago, Universal Music Publishing Group filed a $3.1 billion lawsuit against Anthropic alleging that Claude was built on a foundation of torrented piracy.

And yet Anthropic's defence in these cases, like every other AI company's defence, is essentially that training on copyrighted works is transformative, that it constitutes fair use, that you cannot build competitive AI if you have to license everything you learn from.

President Trump made the same argument at an AI event in July 2025, framing it in terms that cut right to the contradiction: "You can't be expected to have a successful AI program when every single article, book, or anything else that you've read or studied, you're supposed to pay for. When a person reads a book or an article, you've gained great knowledge. That does not mean that you're violating copyright laws or have to make deals with every content provider." He added, for good measure: "China's not doing it."

So the position the industry has collectively staked out is this: our outputs are protected intellectual property that competitors cannot extract without permission. Your inputs — the books, journalism, and creative work we trained on — are fair game under fair use doctrine.

That is not a legal argument. It is a convenience argument dressed in legal language.

The Courts Are Not Settled on This

To be fair, the legal picture is genuinely complicated and the courts have not reached a consensus. Three judges have now ruled on fair use in AI training cases, and they have not agreed with each other.

One judge sided with AI companies, comparing training on books to "training schoolchildren to write well." Another cautioned that generative AI could "flood the market" with content and erode incentives for human creators, a concern he described as central to the purpose of copyright law. A third took a nuanced middle position. No definitive ruling is expected until summer 2026 at the earliest, and the appeals process will likely push final clarity further still.

What is clear is that the industry is not operating from a position of settled legal legitimacy. It is operating from a position of legal uncertainty while simultaneously claiming clear moral authority when competitors do to their outputs what they have been doing to everyone else's inputs.

Where African and Kenyan Content Creators Fit In

Here is the part of this conversation that never gets told loudly enough.

The training datasets used by every major AI company — Claude, ChatGPT, Gemini, Llama — contain content from across the internet. That includes journalism from African publications. Academic work from African researchers. Creative writing from African authors. Commentary, analysis, and reporting produced by people across the continent, including Kenya, who have no idea their work was used, received no compensation, and have no seat at any of the tables where these legal arguments are being made.

The $1.5 billion Anthropic settlement and the $3.1 billion music lawsuit are American cases fought by American plaintiffs with American lawyers in American courts. The New York Times has the resources to sue OpenAI. The Authors Guild has the organisational capacity to file class actions. Universal Music has leverage.

A Kenyan journalist whose investigative reporting ended up in a training dataset has none of those things. There is no Kenyan equivalent of the Authors Guild negotiating licensing terms with AI companies. There is no East African Publishers Alliance filing class action suits in San Francisco. The legal frameworks being developed right now — what counts as fair use, what compensation looks like, what rights creators retain — are being decided entirely without African voices in the room.

This matters in a specific, practical way. The AI industry's fair use argument rests partly on the claim that training is transformative, that the model does not reproduce your work, it learns from it in the same way a human reader learns from a book. If that argument succeeds in court, it sets a global precedent that any content accessible on the internet is available for AI training without compensation, regardless of where the creator is located or whether they could ever realistically participate in legal proceedings to dispute it.

If it fails — if courts ultimately decide that licensing is required — the licensing frameworks that emerge will almost certainly be negotiated between large institutions in wealthy countries. The question of whether African creators receive any share of that value will depend entirely on whether anyone with power in those negotiations chooses to include them.

The Irony Worth Sitting With

None of this means Anthropic's distillation complaint is wrong. DeepSeek and others running coordinated extraction campaigns through tens of thousands of fraudulent accounts is a different category of behaviour from training on publicly accessible internet content. The national security concerns Anthropic raises about safety guardrails being stripped out of distilled models are a legitimate separate issue.

But the moral authority with which the complaint is made deserves scrutiny. An industry that has spent two years arguing in court that using other people's creative work without permission is acceptable learning, and that settled a $1.5 billion case over pirated books just last year, is on uncomfortable ground when it declares that using its outputs without permission is theft.

The underlying principle being contested is the same in both cases: when does learning from someone else's work cross the line into taking something that is not yours?

The AI industry's answer, it turns out, depends entirely on which side of that line the AI industry is on.

What Needs to Happen

The copyright battles playing out in American courts will eventually produce a legal framework, probably in 2027 or 2028 once appeals work through the system. That framework will shape how AI companies operate globally, including in Kenya.

African governments, publishers, and creative industry organisations need to be paying attention to this now, not after the framework is set. Kenya's Copyright Board, the Kenya Publishers Association, and bodies representing Kenyan journalists and authors should be engaging with international copyright discussions and making clear that African creators have interests that deserve representation in whatever licensing regime ultimately emerges.

The conversation about who owns the inputs to AI, and what fair compensation looks like, is one of the most consequential intellectual property debates of the decade. Right now, it is being held in San Francisco courtrooms and Washington policy offices, by and for a very specific set of stakeholders.

African creators helped build the internet that trained these models. The question is whether they will have any say in what happens next.

Comments

to join the discussion.