The Forecasting Research Institute conducted a survey asking different kinds of experts (including technical and non-technical) many questions about AI progress. The report, which was just published, is here. I've only looked at the report briefly and there is a lot that could be examined and discussed. 

The major flaw I want to point out is in the framing of a question where survey respondents are presented with three different scenarios for AI progress: 1) the slow progress scenario, 2) the moderate progress scenario, and 3) the rapid progress scenario. 

All three scenarios describe what will happen by the end of 2030. Respondents have to choose between the three scenarios. There are only three options; there is no option to choose none. 

First, two important qualifications. Here’s qualification #1, outlined on page 104:

In the following scenarios, we consider the development of AI capabilities, not adoption. Regulation, social norms, or extended integration processes could all prevent the application of AI to all tasks of which it is capable.

Qualification #2, also on page 104:

We consider a capability to have been achieved if there exists an AI system that can do it:

  • Inexpensively: with a computational cost not exceeding the salary of an appropriate
    2025 human professional using the same amount of time to attempt the task.
  • Reliably: what this means is context-dependent, but typically we mean as reliably as, or more reliably than, a human or humans who do the same tasks professionally in 2025.

With that said, here is the scenario that stipulates the least amount of AI progress, the slow progress scenario (on page 105):

Slow Progress

By the end of 2030 in this slower-progress future, AI is a capable assisting technology for humans; it can automate basic research tasks, generate mediocre creative content, assist in vacation planning, and conduct relatively standard tasks that are currently (2025) performed by humans in homes and factories.

Researchers can benefit from literature reviews on almost any topic, written at the level of a capable PhD student, yet AI systems rarely produce novel and feasible solutions to difficult problems. As a result, genuine scientific breakthroughs remain almost entirely the result of human-run labs and grant cycles. Nevertheless, AI tools can support other research tasks (e.g., copy editing and data cleaning and analysis), freeing up time for researchers to focus on higher-impact tasks. AI can handle roughly half of all freelance software-engineering jobs that would take an experienced human approximately 8 hours to complete in 2025, and if a company augments its customer service team with AI, it can expect the model to be able to resolve most complaints.

Writers enjoy a small productivity boost; models can turn out respectable short stories, but full-length novels still need heavy human rewriting to avoid plot holes or stylistic drift. AI can make a 3-minute song that humans would blindly judge to be of equal quality to a song released by a current (2025) major record label. At home, an AI system can draft emails, top up your online grocery cart, or collate news articles, and—so long as the task would take a human an hour or less and is well-scoped—it performs on par with a competent human assistant. With a few prompts, AI can create an itinerary and make bookings for a weeklong family vacation that feels curated by a discerning travel agent.

Self-driving car capabilities have advanced, but none have achieved true level-5 autonomy. Meanwhile, household robots can make a cup of coffee and unload and load a dishwasher in some modern homes—but they can’t do it as fast as most humans and they require a consistent environment and occasional human guidance. In advanced factories, autonomous systems can perform specific, repetitive tasks that require precision but little adaptability (e.g., wafer handling in semiconductor fabrication facilities).

So, in the slowest progress scenario, by the end of 2030, respondents are to imagine what is either nearly AGI or AGI outright. 

This is a really strange way to frame the question. The slowest scenario is extremely aggressive and the moderate and rapid scenarios are even more aggressive. What was the Forecasting Research Institute hoping to learn here?


Edited on Friday, November 14, 2025 at 9:30 AM Eastern to add the following. 

Here’s the question respondents are asked with regard to these scenarios (on page 104):

At the end of 2030, what percent of LEAP panelists will choose “slow progress,” “moderate progress,” or “rapid progress” as best matching the general level of AI progress?

The percent of panelists forecasted by the respondents is then stated in the report as the probability the respondents assign to each scenario (e.g. on page 38).
 


Edit #2 on Friday, November 14, 2025 at 6:25 PM Eastern. 

I just noticed that the Forecasting Research Institute made a post on the EA Forum a few days ago that presents the question results as probabilities:

By 2030, the average expert thinks there is a 23% chance of a “rapid” AI progress scenario, where AI writes Pulitzer Prize-worthy novels, collapses years-long research into days and weeks, outcompetes any human software engineer, and independently develops new cures for cancer. Conversely, they give a 28% chance of a slow-progress scenario, in which AI is a useful assisting technology but falls short of transformative impact.

If the results are going to be presented this way, it seems particularly important to consider the wording and framing of the question.

Comments10


Sorted by Click to highlight new comments since:

This post is getting some significant downvotes. I would be interested if someone who has downvoted could explain the reason for that.

There's plenty of room for disagreement on how serious a mistake this is, whether it has introduced a 'framing' bias into other results or not, and what it means for the report as a whole. But it just seems straightforwardly true that this particular question is phrased extremely poorly (it seems disingenuous to suggest that the question using the phrasing "best matching" covers you for not even attempting to include the full range of possibilities in your list).

I assume that people downvoting are objecting to the way that this post is using this mistake to call the entire report into question, with language like "major flaw". They may have a point there. But I think you should have a very high bar for downvoting someone who is politely highlighting a legitimate mistake in a piece of research.

'Disagree' react to the 'major flaw' language if you like, and certainly comment your disagreements, but silently downvoting someone for finding a legitimate methodological problem in some EA research seems like bad EA forum behaviour to me!

Thank you very much. I really appreciate your helpful and cooperative approach. 

The "best matching" wording of the question doesn’t, in my view, change the underlying problem of presenting these as the only three options. 

It’s also a problem, in my view, that the "best matching" wording is dropped on page 38 and the report simply talks about the probability respondents assign to the scenario. I looked at the report in the first place because a Forecasting Research Institute employee just said (on the EA Forum) what the probability assigned to a scenario was, and didn’t mention the "best matching" wording (or the three-scenario framing). If you include "best matching" in the question and then drop it when you present the results, what was the point of saying "best matching" in the first place?

I didn’t intend for the post to come across as more than a criticism of this specific question in the survey — I said that the report contains many questions and said "I've only looked at the report briefly and there is a lot that could be examined and discussed". I meant the title literally and factually. This is a major flaw that I came across in the report. 

I would be happy to change the title of the post or change the wording of the post if someone can suggest a better alternative. 

If people have qualms with either the tone or the substance of the post, I’d certainly like to hear them. So, I encourage people to comment.

I’m confused by this, for a few reasons:

  1. The question asks what scenario a future panel would believe is "best matching" the general level of AI progress in 2030, so if things fell short of the "slow" scenario, it would still be the best matching. This point is also reinforced by the instructions: "Reasonable people may disagree with our characterization of what constitutes slow, moderate, or rapid AI progress. Or they may expect to see slow progress observed with some AI capabilities and moderate or fast progress in others. Nevertheless, we ask you to select which scenario, in sum, you feel best represents your views" (p. 104).
  2. There are several other questions in the survey that allow responses indicating very low capability levels or low societal impact. If there is a huge framing effect in the scenario question, it would have to strongly affect answers to these other questions, too (which I think is implausible), or else you should be able to show a mismatch between these questions and the scenario question (which I don't think there is).
  3. The actual answers don’t seem to reflect the view that most respondents believe that the slow scenario represents a very high bar (unless, again, you believe the framing effect is extremely strong): "By 2030, the average expert thinks there is a 23% chance that the state of AI most closely mirrors a “rapid” progress scenario [...]. Conversely, they give a 28% chance of a slow-progress scenario, in which AI is a widely useful assisting technology but falls short of transformative impact".

Aside from these methodological points, I’m also surprised that you believe that the slow scenario constitutes AI that is "either nearly AGI or AGI outright". Out of curiousity, what capability mentioned in the "slow" scenario do you think is the most implausible by 2030? To me, most of these seem pretty close to what we already have in 2025. 

[disclaimer: I recommended a major grant to FRI this year, and I’ve discussed LEAP with them several times]

Thanks for your comment. My response:

  1. Thank you for pointing this out. I forgot to include the text of the question in my post. I just added it now. The report states (e.g. on page 38) that the respondents’ forecast of the percentage of panelists who will say which scenario most closely matches reality is the respondents’ forecasted probability of each scenario. This seems incorrect to me. Do you disagree? In any case, I don’t know why the options for the scenarios are so limited. Would you have designed the question the same way? I would have either included more options showing a wider spectrum or, if I wanted to stick to just three, I would have made the slow progress scenario much slower.
  2. I have no comment on the other questions for now since it’s a long report and I haven’t had the time to examine the other questions carefully. I tried briefly to look for how responses to the other questions might match or mismatch the scenarios, but they seem hard to compare on a one-to-one basis. Do you have any ideas about how to compare them?
  3. I don’t know what the respondents would have said if the question were framed differently, but it seems like an odd decision to frame it like this. In a 2023 survey, AI researchers assigned a 50% probability to AGI (or at least a very powerful transformative AI) either by 2047 or 2116, a difference of 69 years, depending on whether the question was framed as AI automating ”tasks” or ”occupations”, which seems subtle. This makes me think how exactly questions are framed in surveys is really important.

There are several capabilities mentioned in the slow progress scenario that seem indicative of AGI or something close, such as the ability of AI systems to largely automate various kinds of labour (e.g. research assistant, software engineer, customer service, novelist, musician, personal assistant, travel agent) and ”produce novel and feasible solutions to difficult problems”, albeit "rarely”. The wording is ambiguous as to whether ”genuine scientific breakthroughs" will sometimes, if only rarely, be made by AI, in the same manner they would be made by a human scientist leading a research project, or if AI will only assist in such breakthroughs by automating ”basic research tasks”. 

The more I discuss these kinds of questions or scenarios, the more I realize how differently people interpret them. It’s a difficulty both for forecasting questions and for discussions of AI progress more broadly, since people tend to imagine quite different things based on the same description of a hypothetical future AI system, and it’s not always immediately clear when people are talking past each other.

Thanks for the replies! 

  1. I agree that the scenario question could have been phrased better in hindsight, or maybe could have included an option like "progress falling behind all of the three scenarios". I also agree that given the way the question was asked, the summary on p. 38 is slightly inaccurate. (it doesn't seem like a big issue to me, but that's probably downstream of me disagreeing that the "slow" scenario describes AGI-like capabilities).
  2. Fair.
  3. I'm not saying that the responses show that there's no framing effect. I'm saying that they seem to indicate that at least for most respondents, the description of the slow scenario didn't seem wildly off as a floor of what could be expected.  

I forgot to include the text of the question in my post. I just added it now.

I think it would also be fair to include the disclaimer in the question I quoted above.

There are several capabilities mentioned in the slow progress scenario that seem indicative of AGI or something close, such as the ability of AI systems to largely automate various kinds of labour (e.g. research assistant, software engineer, customer service, novelist, musician, personal assistant, travel agent)

I would read the scenario as AI being able to do some of the tasks required by these jobs, but not to fully replace humans doing them, which I would think is the defining characteristic of slow AI progress scenarios.  

Thank you for your follow-up.

1. How much it matters depends how people interpret the report and how they use it as evidence or in argumentation. I wrote this post because a Forecasting Research Institute employee told me the percentage is the probability assigned to each scenario and that I, personally, should adjust my probability of AGI by 2030 as a result. People should not do this.

3. You can guess that it didn’t, I can guess that it did, but the point is these survey questions should be well-framed in the first place. We shouldn’t have to guess how much a methodological problem impacted the results.

1. Are you referring to your exchange with David Mathers here?

3. I'm not sure what you're saying here. Just to clarify what my point is: you're arguing in the post that the slow scenario actually describes big improvements in AI capabilities. My counterpoint is that this scenario is not given a lot of weight by the respondents, suggesting that they mostly don't agree with you on this.

You are guessing you know how the framing affected the results, which is your right, but it is my right to guess something different, and the whole point we do surveys is not to guess but to know. If we wanted to rely on guesses, we could have saved the Forecasting Research Institute the trouble of running the whole survey in the first place!

I don't think this is an accurate summary of the disagreement, but I've tried to clarify my point twice already, so I'm going to leave it at that.

I don't mind if you don't respond — it's fair to leave a discussion whenever you like — but I want to try to make my point clear for you and for anyone else who might read this post. 

How do you reckon the responses to the survey question would be different if there were a significant question wording effect biasing the results? My position is: I simply don't know and can't say how the results would be different if the question were phrased in such a way as to better avoid a question wording effect. The reason to run surveys is to learn that and be surprised. If the question were worded and framed differently, maybe the results would be very different, maybe they would be a little different, maybe they would be exactly the same. I don't know. Do you know? Do you actually know for sure? Or are you just guessing?

What if we consider the alternative? Let's say the response was something like, I don't know, 95% in favour of the slow progress scenario, 4% for the moderate scenario, and 1% for the rapid. Just to imagine something for the sake of illustration. Then you could also argue against a potential question wording effect biasing the results by appealing to the response data. You could say: well, clearly the respondents saw the past the framing of the question and managed to accurately report their views anyway.

This should be troubling. If a high percentage can be used to argue against a question wording effect and a low percentage can be used to argue against a question wording effect, then no matter what the results are, you can argue that you don't need to worry about a potential methodological problem because the results show they're not a big deal. If any results can be used to argue against a methodological problem, then surely no results should be used to argue against a methodological problem. Does that make sense?

I don't feel like I'm inventing the wheel here, but just talking about common concerns with how surveys are designed and worded. In general, you can't know whether a response was biased or not just by looking at the data and not the methodology.

For reference, here are the results on page 141 of the report:

Curated and popular this week
 ·  · 1m read
 · 
The cause prioritization landscape in EA is changing. * Focus has shifted away from evaluation of general cause areas or cross-cause comparisons, with the vast majority of research now comparing interventions within particular cause areas. * Artificial Intelligence does not comfortably fit into any of the traditional cause buckets of Global Health, Animal Welfare, and Existential Risk. As EA becomes increasingly focused on AI, traditional cause comparisons may ignore important considerations. * While some traditional cause prioritization cruxes remain central (e.g. animal vs. human moral weights, cluelessness about the longterm future), we expect new cruxes have emerged that are important for people’s giving decisions today but have received much less attention. We want to get a better picture of what the most pressing cause prioritization questions are right now. This will help us, as a community, decide what research is most needed and open up new lines of inquiry. Some of these questions may be well known in EA but still unanswered. Some may be known elsewhere but neglected in EA. Some may be brand new. To elicit these cruxes, consider the following question: > Imagine that you are to receive $20 million at the beginning of 2026. You are committed to giving all of it away, but you don’t have to donate on any particular timeline. What are the most important questions that you would want answers to before deciding how, where, and when to give?
 ·  · 8m read
 · 
Cross-posted from Good Structures. For impact-minded donors, it’s natural to focus on doing the most cost-effective thing. Suppose you’re genuinely neutral on what you do, as long as it maximizes the good. If you’re donating money, you want to look for the most cost-effective opportunity (on the margin) and donate to it. But many organizations and individuals who care about cost-effectiveness try to influence the giving of others. This includes: * Research organizations that try to influence the allocation or use of charitable funds. * Donor advisors who work with donors to find promising opportunities. * People arguing to community members on venues like the EA Forum. * Charity recommenders like GiveWell and Animal Charity Evaluators. These are endeavors where you’re specifically trying to influence the giving of others. And when you influence the giving of others, you don’t get full credit for their decisions! You should only get credit for how much better the thing you convinced them to do is compared to what they would otherwise do. This is something that many people in EA and related communities take for granted and find obvious in the abstract. But I think the implications of this aren’t always fully digested by the community. In particular, often, when looking at an intervention, it being highly cost-effective is less important than who paid for it — if the donor would have otherwise funded something similarly cost-effective, the intervention isn’t actually making that much difference on the margin. As a quick demonstration, say as a donor advisor you have two options: * Option 1: You can influence Big EA Donor to move $1,000,000 from something generating 95 units of value per dollar to something generating 105 units per dollar. * You’ve created ($1,000,000 * 105 - $1,000,000 * 95) = 10,000,000 units of value. * This is often what I expect a typical EA Forum post arguing that X animal welfare campaign is better than Y one, or similar, would
Every month, the Forecasting Research Institute asks top computer scientists, economists, industry leaders, policy experts and superforecasters for their AI predictions. Here’s what we learned from the first three months of forecasts: AI is already reshaping labor markets, culture, science, and the economy—yet experts debate its value, risks, and how fast it will integrate into everyday life. Leaders of AI companies forecast near futures in which AI cures all diseases, replaces whole classes of jobs, and supercharges GDP growth. Skeptics see small gains at best, with AI’s impact amounting to little more than a modest boost in productivity—if anything at all. Despite these clashing narratives, there is little work systematically mapping the full spectrum of views among computer scientists, economists, technologists in the private sector, and the public. We fill this gap with LEAP, a monthly survey tracking the probabilistic forecasts of experts, superforecasters, and the public. Expert participants include top-cited AI and ML scientists, prominent economists, key technical staff at frontier AI companies, and influential policy experts from a broad range of NGOs. LEAP operates on three key principles: * Accountability: LEAP forecasts are detailed and verifiable, encouraging disciplined thinking and allowing us to track whose predictions prove most accurate. * The wisdom of well-chosen crowds: LEAP emphasizes a diversity of perspectives from people at the top of their fields. * Decision relevance: LEAP’s policy-relevant forecasts help decision-makers plan for likely futures. As well as collecting thousands of forecasts, LEAP captures rationales—1.7 million words of detailed explanations across the first three survey waves. We use this data to identify key sources of disagreement and to analyze why participants express significant uncertainty about the future effects of AI. Since we launched LEAP in June 2025, we have completed three survey waves focused on: hi