A survey of 2,778 AI researchers conducted in October 2023, seven months after the release of GPT-4 in March 2023, used two differently worded definitions of artificial general intelligence (AGI). The first definition, High-Level Machine Intelligence, said the AI system would be able to do any “task” a human can do. The second definition, Full Automation of Labour, said it could do any “occupation”. (Logically, the former definition would seem to imply the latter, but this is apparently not how the survey respondents interpreted it.)

The forecast for High-Level Machine Intelligence was as follows:

  • 10% probability by 2027
  • 50% probability by 2047

For Full Automation of Labour, the forecast was:

  • 10% probability by 2037
  • 50% probability by 2116

Another survey result of interest: 76% of AI experts surveyed in 2025 thought it was unlikely or very unlikely that current AI methods, such as large language models (LLMs), could be scaled up to achieve AGI.


Also of interest: there have been at least two surveys of superforecasters about AGI, but, unfortunately, both were conducted in 2022 before the launch of ChatGPT in November 2022. These surveys used different definitions of AGI. Check the sources for details. 

The Good Judgment superforecasters gave the following forecast: 

  • 12% probability of AGI by 2043
  • 40% probability of AGI by 2070
  • 60% probability of AGI by 2100

The XPT superforecasters forecast:

  • 1% probability of AGI by 2030
  • 21% probability of AGI by 2050
  • 75% probability of AGI by 2100

My opinion: forecasts of AGI are not rigorous and can’t actually tell us much about when AGI will be invented. We should be extremely skeptical of all these numbers because they’re all just guesses.

That said, if you are basing your view of when AGI will be invented largely on the guesses of other people, you should get a clear picture of what those guesses are. If you just rely on the guesses you happen to hear, what you hear will be biased. For example, people are more likely to repeat guesses that AGI will be invented shockingly soon because that’s much more interesting than a guess it will be invented sometime in the 2100s. You might also have a filter bubble or echo chamber bias where you hear a biased sample of guessed based on your social networks rather than a representative sample of AI experts or expert forecasters. 

I have never heard a good, principled argument for why people in effective altruism should believe guesses that put AGI much sooner than the guesses above from the AI researchers and the superforecasters. You should worry about selectively accepting some evidence and rejecting other evidence due to confirmation bias.

My personal guess — which is as unrigorous as anyone else’s — is that the probability of AGI before January 1, 2033 is significantly less than 0.1%. One reason I have for thinking this is that the development of AGI will require progress in fundamental science that currently isn’t getting much funding or attention, and that science usually takes a long time to move from a pre-paradigmatic stage to a stage where engineers have mastered building technology using the new scientific paradigm. As far as I know, it’s never in history been only 7 years.

The overall purpose of this post is to expose people in effective altruism to differing viewpoints from what you might have heard so far, and to encourage you to worry about confirmation bias and about filter bubble/echo chamber bias. I strongly believe that, in time, many people in EA will come to regret how the movement’s focus has shifted toward AGI and will come to see it as a mistake. People will wonder why they ever trusted a minority of experts over the majority or why they trusted non-expert bloggers, tweeters, and forum posters over people with expertise in AI or forecasting. I’m not saying that the expert majority forecasts are right, I’m saying that all AGI forecasts are completely unrigorous and worthy of extreme skepticism, but, if you already put your trust in forecasting, at least exposing yourself to the actual diversity of opinion, you might begin to question things you accepted too readily.

A few other relevant ideas to consider:

  • LLM scaling may be running out of steam, both in terms of compute and data
  • A recent study found that AI coding assistants make human coders 19% less productive
  • There is growing concern about an AI financial bubble because LLMs are not turning out to practically useful for as many things as was hoped, e.g., the vast majority of businesses report no financial benefit from using LLMs and many companies have abandoned their efforts to use them
  • Despite many years of effort from top AI talent and many billions of dollars of investment, self-driving cars remain at roughly the same level of deployment as 5 years ago or even 10 years ago, nowhere near substituting for human drivers on a large scale[1]
  1. ^

    Andrej Karpathy, an AI researcher formerly at OpenAI who led Tesla’s autonomous driving AI from 2017 to 2022, recently made the following remarks on a podcast:

    …self-driving cars are nowhere near done still. The deployments are pretty minimal. Even Waymo and so on has very few cars. … Also, when you look at these cars and there’s no one driving, I actually think it’s a little bit deceiving because there are very elaborate teleoperation centers of people kind of in a loop with these cars. I don’t have the full extent of it, but there’s more human-in-the-loop than you might expect. There are people somewhere out there beaming in from the sky. I don’t know if they’re fully in the loop with the driving. Some of the time they are, but they’re certainly involved and there are people. In some sense, we haven’t actually removed the person, we’ve moved them to somewhere where you can’t see them.

Comments


No comments on this post yet.
Be the first to respond.
Curated and popular this week
 ·  · 6m read
 · 
In Ugandan villages where non-governmental organisations (NGOs) hired away the existing government health worker, infant mortality went up. This happened in 39%[1] of villages that already had a government worker. The NGO arrived with funding and good intentions, but the likelihood that villagers received care from any health worker declined by ~23%. Brain Misallocation “Brain drain”, - the movement of people from poorer countries to wealthier ones, has been extensively discussed for decades[2]. But there’s a different dynamic that gets far less attention: “brain misallocation”. In many low- and middle-income countries (LMICs), the brightest talents are being incentivised towards organisations that don’t utilise their potential for national development. They’re learning how to get grants from multilateral alphabet organisations rather than build businesses or make good policy. This isn’t about talent leaving the country. It’s about talent being misdirected and mistrained within it. Examples Nick Laing in northern Uganda describes the talent drain: > Last year, one of our best nurses left one of our rural health centres. With no warning and without telling anyone. It was the 3rd nurse that year who left for an NGO job. We rushed to replace him, but it put the only remaining nurse there under a lot of stress, and I’m sure patients weren’t cared for as well in the meantime. Our replacement wasn’t as good. I didn’t hear the nurse who left again until 6 months later, last week. He came to apologise for leaving abruptly. He said he felt really bad about it, that he had let his fellow staff and the patients down. He’s a great guy and it was good to catch up and reconcile everything. When I asked him why he left for the NGO job, he looked at me as if it was a stupid question. > > “The money was too much, of course” This same dynamic affects would-be entrepreneurs and private sector workers, not just government employees and local health centres. The wage different
 ·  · 1m read
 · 
All quotes are from their blog post "Why we chose to invest another $100 million in cash transfers", highlights are my own:  > Today, we’re announcing a new $100 million USD commitment over the next four years to expand our partnership with GiveDirectly and help empower an additional 185,000 people living in extreme poverty. We’re also funding new research, and pilot variants, to further understand how we can maximize the impact of each dollar.  This is on top of another $50 million USD they gave to GiveDirectly before:  > We started partnering with GiveDirectly in 2021. Since then, we’ve donated $50 million USD to support their work across Malawi, through direct cash transfers to those living in extreme poverty. We’ve already reached more than 85,000 people, helping to provide life changing resources and the dignity of choice. For context * the Cash for Poverty Relief program by Give Directly has paid out USD 220+ million since 2009 in total. * Giving What We Can estimates their community giving since 2009 to about USD 400+ million, all cause areas combined About their "founding-to-give" model > A little bit of context: since starting Canva, we’ve been guided by a ‘simple’ yet ambitious Two-Step Plan. Step One is to build one of the world’s most valuable companies. Step Two is to do the most good we can. With more than 30% of Canva’s value (shares) committed to doing good in the world, this vision has shaped everything we do, from the products we create to the kind of company we’ve strived to be. At a current valuation of about USD 26 billion, this would be around USD 8 billion in shares committed to doing good.  Other Engagement From what I can see, their other engagement so far is less likely to be counted towards common EA cause areas: 
 ·  · 9m read
 · 
This is the latest in a series of essays on AI Scaling.  You can find the others on my site. Summary: RL-training for LLMs scales surprisingly poorly. Most of its gains are from allowing LLMs to productively use longer chains of thought, allowing them to think longer about a problem. There is some improvement for a fixed length of answer, but not enough to drive AI progress. Given the scaling up of pre-training compute also stalled, we'll see less AI progress via compute scaling than you might have thought, and more of it will come from inference scaling (which has different effects on the world). That lengthens timelines and affects strategies for AI governance and safety.   The current era of improving AI capabilities using reinforcement learning (from verifiable rewards) involves two key types of scaling: 1. Scaling the amount of compute used for RL during training 2. Scaling the amount of compute used for inference during deployment We can see (1) as training the AI in more effective reasoning techniques and (2) as allowing the model to think for longer. I’ll call the first RL-scaling, and the second inference-scaling. Both new kinds of scaling were present all the way back in OpenAI’s announcement of their first reasoning model, o1, when they showed this famous chart: I’ve previously shown that in the initial move from a base-model to a reasoning model, most of the performance gain came from unlocking the inference-scaling. The RL training did provide a notable boost to performance, even holding the number of tokens in the chain of thought fixed. You can see this RL boost in the chart below as the small blue arrow on the left that takes the base model up to the trend-line for the reasoning model. But this RL also unlocked the ability to productively use much longer chains of thought (~30x longer in this example). And these longer chains of thought contributed a much larger boost. The question of where these capability gains come from is important becau