(cache)Exaggerating the risks (Part 18: Introduction to AI 2027)

Everybody’s talking about a scary and vivid manifesto called AI 2027; even Vice President Vance claims to have read it … As a work of fiction, it’s very effective. Although it’s written like a thriller, it is crafted to look like science, complete with clickable interactive graphs, and numerous citations to the literature … I mostly salute AI 2027’s intention, which is to stir up fear about AI so that people will get off of their couches and act … I honestly wish, though, that it wasn’t being taken so seriously. It’s a work of fiction, not a work of science.

Gary Marcus, “The ‘AI 2027’ Scenario: How realistic is it?“

Listen to this post

1. Introduction

This is Part 18 of my series Exaggerating the risks. In this series, I look at some places where leading estimates of existential risk look to have been exaggerated.

Part 1 introduced the series. Parts 2-5 (sub-series: “Climate risk”) looked at climate risk. Parts 6-8 (sub-series: “AI risk”) looked at the Carlsmith report on power-seeking AI. Parts 9-17 (sub-series: “Biorisk“) look at biorisk.

Today’s post continues my sub-series on AI risk by examining the recent AI 2027 report.

2. AI 2027 team

Daniel Kokotajlo is director of the AI Futures Project, which authored the AI 2027 report. Previously, Kokotajilo worked on AI governance at OpenAI before quitting in protest (at significant risk to his equity) over the company’s approach to safety. Before that, Kokotajilo was a philosophy PhD student at UNC.

Scott Alexander is a blogger, best known for his current blog Astral Codex Ten and its previous iteration Slate Star Codex. The New Yorker calls Alexander’s blog “perhaps the premier public-facing venue of the `rationalist’ community.”

Thomas Larsen is a researcher at the AI Futures Project and a cofounder of the Center for AI Policy, as well as a previous employee of the Machine Intelligence Research Institute and the Stanford Existential Risks Initiative.

Eli Lifland is a researcher at the AI Futures Project, a co-founder of Sage and a guest fund manager at the Long Term Future Fund. Previously, Litland worked at Ought and SaferAI.

Romeo Dean is a master’s student in computer science at Harvard and a researcher at the AI Futures Project. Previously, Dean was an AI Policy Fellow at the Institute for AI Policy and Strategy, an Astra Fellow at Constellation, and a Data Science Intern at the World Data Lab.

Together, Kokotajilo, Alexander, Larsen, Lifland and Dean are authors of the AI 2027 report.

3. AI 2027: Scenario

The most conspicuous component of AI 2027 is a 71-page scenario. The scenario was written by Daniel Kokotajlo, Eli Lifland, Thomas Larsen, and Romeo Dean and then rewritten by Scott Alexander “in an engaging style.”

The report was informed by more than 30 runthroughs of a table-top exercise. The exercise begins in April 2027 of the scenario, in which the United States has developed an artificial intelligence with superhuman coding abilities and China has stolen its weights.

Here is a very brief summary of the scenario. 2025 is a comparatively tame year, with rudimentary AI agents introduced. Towards the end of 2025, things heat up as the leading US AI company (nicknamed `OpenBrain’) develops a highly capable model agent-1, which performs excellently on a number of tasks including AI research and hacking.

This leads in early 2026 to rapid advances facilitated by the growing automation of coding. In mid-2026, China wakes up and joins the race towards AGI in earnest. By late 2026, AI has begun taking many jobs, but many jobs remain occupied by human workers.

In the titular year of 2027, all hell breaks loose. In January, OpenBrain develops a new agent-2 model capable of continuous learning, beginning the march towards a software-driven singularity. In February, China steals agent-2 for its own project, `DeepCent’. In March, algorithmic breakthroughs lead to an improved agent-3 system and coding tasks are now fully automated, with agent-3 replacing human coders at OpenBrain and elsewhere. April and May see increasing concerns about alignment as well as the national security implications of AI technology.

By June, agent-3 has rendered most staff at OpenBrain useless and all they can do is watch as it rapidly upgrades its own capacities. In July, copies of agent-3 are used by many companies to generate remote workers that similarly replace human employees.

In August, geopolitical concerns take center stage again and politicians largely throw caution to the wind, prioritizing the goal of winning the race towards superintelligence over concerns about safety and alignment.

In September, a superhuman system named agent-4 is released that is so powerful that often neither humans nor even agent-3 can understand it well enough to effectively supervise it. Agent-4 is misaligned and eventually gets caught. OpenBrain mounts a coverup, only to have agent-4’s misalignment leaked by a whistleblower in October.

From there, the scenario bifurcates. In the “race” ending, caution is again thrown to the wind and use of agent-4 is continued. Its successor, agent-5, is fully superintelligent and is given extensive access to information and government systems. Agent-5 gradually accumulates power until by mid-2028, it has near total control over the economy including, crucially, its own compute resources. For a few years, it tolerates the shrinking human presence, but in mid-2030 it has had enough and murders us all.

In the alternative “slowdown” ending, agent-4 is crippled and eventually shut down and replaced with agent-3. A chain of safer models follow, but in the end an arms race leads both the United States and China to develop powerful models. The models collude and together orchestrate a coup around 2030 under terms favorable to the United States. From there, humanity begins its cosmic expansion throughout the stars. Humans remain very much alive, though our intelligence and control over our own destiny have both been largely superseded by artificial systems.

4. Takeaways

The authors suggest eight takeaways from the AI 2027 scenario:

By 2027, we may automate AI R&D leading to vastly superhuman AIs (“artificial superintelligence” or ASI). In AI 2027, AI companies create expert-human-level AI systems in early 2027 which automate AI research, leading to ASI by the end of 2027. See our timelines forecast and takeoff forecast for reasoning.

ASIs will dictate humanity’s future. Millions of ASIs will rapidly execute tasks beyond human comprehension. Because they’re so useful, they’ll be widely deployed. With superhuman strategy, hacking, weapons development, and more, the goals of these AIs will determine the future.

ASIs might develop unintended, adversarial “misaligned” goals, leading to human disempowerment. In our AI goals forecast we discuss how the difficulty of supervising ASIs might lead to their goals being incompatible with human flourishing. In AI 2027, humans voluntarily give autonomy to seemingly aligned AIs. Everything looks to be going great until ASIs have enough hard power to disempower humanity.

An actor with total control over ASIs could seize total power. If an individual or small group aligns ASIs to their goals, this could grant them control over humanity’s future. In AI 2027, a small committee has power over the project developing ASI. They could attempt to use the ASIs to cement this concentration of power. After seizing control, the new ruler(s) could rely on fully loyal ASIs to maintain their power, without having to listen to the law, the public, or even their previous allies.

An international race toward ASI will lead to cutting corners on safety. In AI 2027, China is just a few months behind the US as ASI approaches which pressures the US to press forward despite warning signs of misalignment.

Geopolitically, the race to ASI will end in war, a deal, or effective surrender. The leading country will by default accumulate a decisive technological and military advantage, prompting others to push for an international agreement (a “deal”) to prevent this. Absent a deal, they may go to war rather than “effectively surrender”.

No US AI project is on track to be secure against nation-state actors stealing AI models by 2027. In AI 2027 China steals the US’s top AI model in early 2027, which worsens competitive pressures by reducing the US’ lead time. See our security forecast for reasoning.

As ASI approaches, the public will likely be unaware of the best AI capabilities. The public is months behind internal capabilities today, and once AIs are automating AI R&D a few months time will translate to a huge capabilities gap. Increased secrecy may further increase the gap. This will lead to little oversight over pivotal decisions made by a small group of AI company leadership and government officials.

I hope to revisit these takeaways later in the series.

5. Forecasts

While the scenario certainly provides a vivid illustration of the concerns raised by the AI 2027 team, the written scenario itself does not represent the primary claimed research contribution of the team. The report is, after all, informed primarily by a tabletop exercise, and that exercise begins halfway through the last year of the scenario.

The primary research contribution is meant to be carried by five forecasts which combine forecasting and computational modeling to inform key components of the scenario.

1. The compute forecast assesses future growth in AI-relevant compute. The forecast predicts that AI-relevant compute will grow tenfold by the end of 2027. It predicts that AI-relevant compute will be increasingly concentrated among leading AI firms in the United States and China (see chart below). It predicts that an increasing fraction of AI-relevant compute will be shifted from pre-training to post-training, research and development, and other tasks.

Estimated concentration of AI-relevant compute in 2024 and 2027, converted into units of the Nvidia H100 GPU (H100e). From AI 2027 compute forecast.

2. The timelines forecast predicts the date at which a superhuman coding system will be developed. A superhuman coder is understood as an AI system that the leading AI company could deploy with 5% of their compute budget to run thirty-times as many agents as they have human research engineers, with each AI agent accomplishing AI research tasks at thirty-times the speed of the company’s best engineer.

The timelines forecast draws on two models. The first is a timeline-extension model extrapolating trends in the length of tasks that AI agents can accomplish from a recent Machine Evaluation and Threat Research report.

Results of simulation modeling on the timeline-extension model. From AI 2027 timelines forecast.

The second is a benchmarks-and-gaps model which predicts the time needed for AI agents to saturate a benchmark of AI R&D tasks (RE-Bench), then extends this forecast by predicting the time to cross remaining milestones from benchmark saturation to superintelligent coding. The timelines forecast also provides an all-things-considered forecast which is not based on a comparably detailed model.

3. The takeoff forecast picks up where the timelines forecast left off, asking how quickly after the arrival of a superhuman coder we should expect the emergence of artificial superintelligence, understood as an AI system that is much better than the best human at every cognitive task. This forecast models how a software-driven intelligence explosion might proceed by setting out milestones that a software-driven intelligence explosion would need to cross in order to move from superhuman coders to artificial superintelligence. Timelines for each milestone are estimated and combined to produce a final takeoff forecast.

Arrival date of AI milestones assuming a superhuman coder in March 2027, from AI 2027 takeoff forecast.

4. The AI goals forecast discusses the various goals that AI systems might have. It gives considerations for and against a series of possible goals: the written goals specified by developers; the goals intended by developers, which may deviate from explicitly written goals; an unintended version of written or intended goals; the goals shaped by a process of reinforcement learning; proxy or instrumentally convergent goals which perform well during training but have poor consequences during deployment; or other goals. Three forecasters, as well as three AI systems, rate the likelihood of each goal category as follows:

	Daniel	Thomas	Eli	4o	Claude	Gemini
Specified goals	25%	5%	40%	30%	40%	30%
Intended goals	15%	30%	40%	25%	25%	20%
Unintended version of the above	70%	40%	50%	50%	65%	40%
Reinforcement	50%	5%	20%	20%	55%	60%
Proxies/ICGs	50%	80%	50%	40%	70%	70%
Other	50%	90%	50%	15%	35%	10%
If-else compromises of the above	80%	90%	80%	80%	75%
Weighted compromises of the above	40%	90%	80%	50%	80%

5. The security forecast justifies the main scenario’s concern over theft of AI model weights and algorithmic secrets, AI project sabotage, and subversion of safety mechanisms by AI systems. The forecast predicts likely security levels of leading model weights in the United States and China over time, using a taxonomy from the RAND report on securing model weights. The forecast estimates the growing hacking ability of AI systems themselves using the Cybench benchmark of capture-the-flag tasks, as well as the impact of AI-driven hacking on competing AI projects.

Estimated research slowdown in leading US AI project (OpenBrain) and Chinese AI project (DeepCent) due to competitor’s hacking, from AI 2027 security forecast.

The forecast concludes with a brief discussion of the potential for AI systems to subvert human attempts to control them.

6. Load-bearing assumptions

Some of the AI 2027 forecasts are relatively modest. For example, the compute forecast predicts a tenfold growth in AI-relevant compute by the end of December 2027, and the security forecast predicts that leading AI models will be stolen by China. There is certainly room to quibble about the details of such predictions, but they are not the primary assumptions driving the suggested takeaways of vastly superhuman artificial intelligence disempowering humanity and dictating humanity’s future.

These predictions are driven by (a) the timelines forecast, which predicts the arrival of superhuman coders by 2027; (b) the takeoff forecast, which predicts radical superintelligence soon thereafter; and (c) the goals forecast, which predicts that radical superintelligence may aim to disempower humanity.

I think that these forecasts are substantially under-evidenced. As a result, I do not think that they provide a rational basis for significant shifts of opinion in favor of the report’s takeaways. The next several posts in this series will discuss my concerns about the AI 2027 forecasts.

The next post will discuss the timelines forecast, building on a recently popular critique by the computational physicist titotal.

Exaggerating the risks (Part 18: Introduction to AI 2027)

1. Introduction

2. AI 2027 team

3. AI 2027: Scenario

4. Takeaways

5. Forecasts

6. Load-bearing assumptions

Share this:

Like this:

Comments

Leave a ReplyCancel reply

Discover more from Reflective altruism