(cache)From DeepSeek to Stargate: How dramatic AI developments will re-shape the copyright landscape

Walled Culture started covering generative AI relatively early, back in October 2022. That’s just two years ago, and yet the technological progress since then has been extraordinary, whatever you think about the other issues gen AI raises. The latest breakthrough is particularly interesting because it has come from China. The site Live Science explains why the new reasoning model DeepSeek-R1 is potentially so important:

R1 has also surpassed ChatGPT’s latest o1 model in many of the same tests. This impressive performance at a fraction of the cost of other models, its semi-open-source nature, and its training on significantly less graphics processing units (GPUs) has wowed AI experts and raised the specter of China’s AI models surpassing their U.S. counterparts.

DeepSeek claims that the model was trained for the very low figure of $5.6 million, and without using state-of-the-art hardware. Those (unconfirmed) numbers are particularly striking against the background of the announcement by OpenAI of the Stargate Project in the US:

The Stargate Project is a new company which intends to invest $500 billion over the next four years building new AI infrastructure for OpenAI in the United States. We will begin deploying $100 billion immediately. This infrastructure will secure American leadership in AI, create hundreds of thousands of American jobs, and generate massive economic benefit for the entire world. This project will not only support the re-industrialization of the United States but also provide a strategic capability to protect the national security of America and its allies.

Barely a week earlier, OpenAI released its “Economic Blueprint”, which “lays out our policy proposals for extending America’s global leadership in AI innovation, ensuring equitable access to AI, and driving economic growth across communities nationwide.” That was clearly in the knowledge that the Stargate announcement was coming, and provides some context for it. The following section is key:

There’s an estimated $175 billion sitting in global funds awaiting investment in AI projects, and if the US doesn’t attract those funds, they will flow to China-backed projects—strengthening the Chinese Communist Party’s global influence.

There’s also an interesting call not to allow restrictive regulations – such as copyright laws – to get in the way:

Freedom for developers and users to work with and direct our tools as they see fit, in exchange for following clear, common-sense standards that help keep AI safe for everyone, and being accountable when they don’t

As Politico rightly puts it, this $500 billion AI project – assuming it is realised – is something of a “slap in the face” for Europe, with its own rather paltry plans. It also puts the UK’s aspirations to become an “AI superpower” in a humbling context. Moreover, both the EU and the UK face a particular problem that the US doesn’t: a strong copyright industry that wants to restrict how AI companies can use copyright material to train their models. OpenAI’s comment about “Freedom for developers and users to work with and direct our tools as they see fit” is partly about that pushback from the copyright world. With Donald Trump back in the White House and keen to gain US technological supremacy, especially with regard to China, it seems unlikely that he will stop AI companies from using copyright material to train their systems, or force them to enter into onerous licensing agreements first.

Enshrining the future primacy of copyright over AI technology may be a lost cause in the US. But historically the EU has always been more in thrall to publishers and the music industry, as evidenced by the EU Copyright Directive, whose history and flaws are discussed at length in Walled Culture the book (free digital versions available). This fact is probably behind a last-ditch attempt of European publishers to stop the EU taking the same route as the US when it comes to the use of copyright material by AI companies. The European Publishers Council has just published what it calls a “vision paper”. It’s mostly about AI, with the key recommendation as follows:

We emphasise the enormous opportunities AI brings to publishers but also highlights the risks, particularly the ingestion of publishers’ copyrighted content by AI models without permission or compensation. We call for rigorous enforcement of copyright laws and a balanced relationship with large online platforms and AI companies to incentivise licensing.

Once again, the publishing industry wants to impose a licensing framework on a new use of copyright material. As Walled Culture the book explains, this has been an obsession of the copyright industry for decades. Licensing may have worked before the Internet, when copyright material was analogue and hard to reproduce, but is inappropriate for a digital world whose power derives from the new possibilities opened up by moving data across the Internet without friction.

The old, outdated licensing model is particularly inappropriate for gen AI. During training, material is analysed to extract the knowledge it contains about the world, not simply copied; licensing is therefore a tax on a new kind of knowledge. That’s something the copyright industry has always been in favour of, but it’s time that the benefits of freely sharing knowledge are recognised, and not dismissed as some absurd form of “infringement”. The new report even goes so far as to speak of “the way in which Large Language Models have been trained using the largest, unauthorised and unremunerated theft of copyright protected material and personal data in the history of the internet”.

In contrast with this hyperbolic claim of wrongdoing, the European Publishers Council’s “vision paper” presents the news media and publishing sector as “diverse, transparent, accountable and highly ethical”. The argument seems to be that copyright must be given precedence over AI technology because publishers are the saints, while AI companies are the sinners. That neat contrast is rather undermined by the other main concern of the vision paper, which concerns online advertising. Here’s what the new paper says:

Publishers operate in an advertising market which is dominated and distorted by Google and Meta. Through their ability to collect and combine data across their many user-facing platforms and apps as well as across most internet domains, the Very large search engines (VLSEs), very large online platforms (VLOPs) and social networks have built large advertising ecosystems and data lakes which attract most of the advertisers’ campaign spend in order to benefit from their capacity to personalise and target very precisely their users, with detailed campaign performance data points.

The publishers are absolutely right that Google and Meta dominate the adtech sector, and abuse their power there. They do this through the use of surveillance advertising and real-time bidding (RTB). The former means that everything people do online is tracked and recorded. RTB refers to the fact that the ads that people see online are sold in an automated auction that takes place in the few hundredths of a second after they click on a link. The ads are personalised on the basis of the huge stores of tracking information that the surveillance advertising model has harvested – the “data lakes”. In other words, the whole basis of today’s’ targeted online advertising is the abuse of highly personal data for commercial purposes. However, the publishers are not against this abuse; on the contrary, they want it strengthened, but only in their favour:

It is important that the European legal framework enables publishers to develop business models which sustain their economic viability, including through personalised advertising. This balance cannot be reached if an emphasis is placed only on privacy, without consideration of economic realities and societal needs as this will hamper the economic model of free or low-cost access to news and information thereby affecting democratic processes. We can see this clash in the ongoing discussions about the so-called ‘consent, or pay’ business model deployed by publishers, as well as political calls for a general ban on tracking and/or targeting consumers for advertising purposes, both of which illustrate this privacy-dominant approach, while ignoring the need to balance interests of different types of businesses and their consumers.

What this criticism of the “privacy-dominant approach” ignores is that there is an alternative form of advertising that does not track everything people do online. Contextual advertising simply uses information about what people are seeing on a page to display targeted advertising. Research shows that it is actually better than the current model based on surveillance. And yet the publishers want to retain the latter, because it gives them useful marketing information for free, although at the cost of the public’s privacy.

The fact that the publishers have the temerity to call for privacy to be weakened even further, purely to support their creaking business models, undercuts their argument that they are the heroes here, and should be privileged over the abusive AI companies. Against a background where governments are desperate to boost national AI industries, the publishers’ attempt to cling on to a broken copyright system looks doomed. Even if the AI bubble loses much of its current dimensions, it seems unlikely that the US, the EU or the UK will risk deflating it further by acceding to the unreasonable and hypocritical requests of the copyright world to rein in AI companies because of the harm they cause to the public. The copyright landscape will never be the same again.

Follow me @glynmoody on Mastodon and on Bluesky.

From DeepSeek to Stargate: How dramatic AI developments will re-shape the copyright landscape

How modern Mountweasels could block generative AI and undermine access to knowledge

How copyright chaos reigns among the UK’s top cultural institutions