The Forecasting Research Institute conducted a survey asking different kinds of experts (including technical and non-technical) many questions about AI progress. The report, which was just published, is here. I've only looked at the report briefly and there is a lot that could be examined and discussed.
The major flaw I want to point out is in the framing of a question where survey respondents are presented with three different scenarios for AI progress: 1) the slow progress scenario, 2) the moderate progress scenario, and 3) the rapid progress scenario.
All three scenarios describe what will happen by the end of 2030. Respondents have to choose between the three scenarios. There are only three options; there is no option to choose none.
First, two important qualifications. Here’s qualification #1, outlined on page 104:
In the following scenarios, we consider the development of AI capabilities, not adoption. Regulation, social norms, or extended integration processes could all prevent the application of AI to all tasks of which it is capable.
Qualification #2, also on page 104:
We consider a capability to have been achieved if there exists an AI system that can do it:
- Inexpensively: with a computational cost not exceeding the salary of an appropriate
2025 human professional using the same amount of time to attempt the task.- Reliably: what this means is context-dependent, but typically we mean as reliably as, or more reliably than, a human or humans who do the same tasks professionally in 2025.
With that said, here is the scenario that stipulates the least amount of AI progress, the slow progress scenario (on page 105):
Slow Progress
By the end of 2030 in this slower-progress future, AI is a capable assisting technology for humans; it can automate basic research tasks, generate mediocre creative content, assist in vacation planning, and conduct relatively standard tasks that are currently (2025) performed by humans in homes and factories.
Researchers can benefit from literature reviews on almost any topic, written at the level of a capable PhD student, yet AI systems rarely produce novel and feasible solutions to difficult problems. As a result, genuine scientific breakthroughs remain almost entirely the result of human-run labs and grant cycles. Nevertheless, AI tools can support other research tasks (e.g., copy editing and data cleaning and analysis), freeing up time for researchers to focus on higher-impact tasks. AI can handle roughly half of all freelance software-engineering jobs that would take an experienced human approximately 8 hours to complete in 2025, and if a company augments its customer service team with AI, it can expect the model to be able to resolve most complaints.
Writers enjoy a small productivity boost; models can turn out respectable short stories, but full-length novels still need heavy human rewriting to avoid plot holes or stylistic drift. AI can make a 3-minute song that humans would blindly judge to be of equal quality to a song released by a current (2025) major record label. At home, an AI system can draft emails, top up your online grocery cart, or collate news articles, and—so long as the task would take a human an hour or less and is well-scoped—it performs on par with a competent human assistant. With a few prompts, AI can create an itinerary and make bookings for a weeklong family vacation that feels curated by a discerning travel agent.
Self-driving car capabilities have advanced, but none have achieved true level-5 autonomy. Meanwhile, household robots can make a cup of coffee and unload and load a dishwasher in some modern homes—but they can’t do it as fast as most humans and they require a consistent environment and occasional human guidance. In advanced factories, autonomous systems can perform specific, repetitive tasks that require precision but little adaptability (e.g., wafer handling in semiconductor fabrication facilities).
So, in the slowest progress scenario, by the end of 2030, respondents are to imagine what is either nearly AGI or AGI outright.
This is a really strange way to frame the question. The slowest scenario is extremely aggressive and the moderate and rapid scenarios are even more aggressive. What was the Forecasting Research Institute hoping to learn here?
Edited on Friday, November 14, 2025 at 9:30 AM Eastern to add the following.
Here’s the question respondents are asked with regard to these scenarios (on page 104):
At the end of 2030, what percent of LEAP panelists will choose “slow progress,” “moderate progress,” or “rapid progress” as best matching the general level of AI progress?
The percent of panelists forecasted by the respondents is then stated in the report as the probability the respondents assign to each scenario (e.g. on page 38).
Edit #2 on Friday, November 14, 2025 at 6:25 PM Eastern.
I just noticed that the Forecasting Research Institute made a post on the EA Forum a few days ago that presents the question results as probabilities:
By 2030, the average expert thinks there is a 23% chance of a “rapid” AI progress scenario, where AI writes Pulitzer Prize-worthy novels, collapses years-long research into days and weeks, outcompetes any human software engineer, and independently develops new cures for cancer. Conversely, they give a 28% chance of a slow-progress scenario, in which AI is a useful assisting technology but falls short of transformative impact.
If the results are going to be presented this way, it seems particularly important to consider the wording and framing of the question.
This post is getting some significant downvotes. I would be interested if someone who has downvoted could explain the reason for that.
There's plenty of room for disagreement on how serious a mistake this is, whether it has introduced a 'framing' bias into other results or not, and what it means for the report as a whole. But it just seems straightforwardly true that this particular question is phrased extremely poorly (it seems disingenuous to suggest that the question using the phrasing "best matching" covers you for not even attempting to include the full range of possibilities in your list).
I assume that people downvoting are objecting to the way that this post is using this mistake to call the entire report into question, with language like "major flaw". They may have a point there. But I think you should have a very high bar for downvoting someone who is politely highlighting a legitimate mistake in a piece of research.
'Disagree' react to the 'major flaw' language if you like, and certainly comment your disagreements, but silently downvoting someone for finding a legitimate methodological problem in some EA research seems like bad EA forum behaviour to me!
Thank you very much. I really appreciate your helpful and cooperative approach.
The "best matching" wording of the question doesn’t, in my view, change the underlying problem of presenting these as the only three options.
It’s also a problem, in my view, that the "best matching" wording is dropped on page 38 and the report simply talks about the probability respondents assign to the scenario. I looked at the report in the first place because a Forecasting Research Institute employee just said (on the EA Forum) what the probability assigned to a scenario was, and didn’t mention the "best matching" wording (or the three-scenario framing). If you include "best matching" in the question and then drop it when you present the results, what was the point of saying "best matching" in the first place?
I didn’t intend for the post to come across as more than a criticism of this specific question in the survey — I said that the report contains many questions and said "I've only looked at the report briefly and there is a lot that could be examined and discussed". I meant the title literally and factually. This is a major flaw that I came across in the report.
I would be happy to change the title of the post or change the wording of the post if someone can suggest a better alternative.
If people have qualms with either the tone or the substance of the post, I’d certainly like to hear them. So, I encourage people to comment.
I’m confused by this, for a few reasons:
Aside from these methodological points, I’m also surprised that you believe that the slow scenario constitutes AI that is "either nearly AGI or AGI outright". Out of curiousity, what capability mentioned in the "slow" scenario do you think is the most implausible by 2030? To me, most of these seem pretty close to what we already have in 2025.
[disclaimer: I recommended a major grant to FRI this year, and I’ve discussed LEAP with them several times]
Thanks for your comment. My response:
There are several capabilities mentioned in the slow progress scenario that seem indicative of AGI or something close, such as the ability of AI systems to largely automate various kinds of labour (e.g. research assistant, software engineer, customer service, novelist, musician, personal assistant, travel agent) and ”produce novel and feasible solutions to difficult problems”, albeit "rarely”. The wording is ambiguous as to whether ”genuine scientific breakthroughs" will sometimes, if only rarely, be made by AI, in the same manner they would be made by a human scientist leading a research project, or if AI will only assist in such breakthroughs by automating ”basic research tasks”.
The more I discuss these kinds of questions or scenarios, the more I realize how differently people interpret them. It’s a difficulty both for forecasting questions and for discussions of AI progress more broadly, since people tend to imagine quite different things based on the same description of a hypothetical future AI system, and it’s not always immediately clear when people are talking past each other.
Thanks for the replies!
I think it would also be fair to include the disclaimer in the question I quoted above.
I would read the scenario as AI being able to do some of the tasks required by these jobs, but not to fully replace humans doing them, which I would think is the defining characteristic of slow AI progress scenarios.
Thank you for your follow-up.
1. How much it matters depends how people interpret the report and how they use it as evidence or in argumentation. I wrote this post because a Forecasting Research Institute employee told me the percentage is the probability assigned to each scenario and that I, personally, should adjust my probability of AGI by 2030 as a result. People should not do this.
3. You can guess that it didn’t, I can guess that it did, but the point is these survey questions should be well-framed in the first place. We shouldn’t have to guess how much a methodological problem impacted the results.
1. Are you referring to your exchange with David Mathers here?
3. I'm not sure what you're saying here. Just to clarify what my point is: you're arguing in the post that the slow scenario actually describes big improvements in AI capabilities. My counterpoint is that this scenario is not given a lot of weight by the respondents, suggesting that they mostly don't agree with you on this.
You are guessing you know how the framing affected the results, which is your right, but it is my right to guess something different, and the whole point we do surveys is not to guess but to know. If we wanted to rely on guesses, we could have saved the Forecasting Research Institute the trouble of running the whole survey in the first place!
I don't think this is an accurate summary of the disagreement, but I've tried to clarify my point twice already, so I'm going to leave it at that.
I don't mind if you don't respond — it's fair to leave a discussion whenever you like — but I want to try to make my point clear for you and for anyone else who might read this post.
How do you reckon the responses to the survey question would be different if there were a significant question wording effect biasing the results? My position is: I simply don't know and can't say how the results would be different if the question were phrased in such a way as to better avoid a question wording effect. The reason to run surveys is to learn that and be surprised. If the question were worded and framed differently, maybe the results would be very different, maybe they would be a little different, maybe they would be exactly the same. I don't know. Do you know? Do you actually know for sure? Or are you just guessing?
What if we consider the alternative? Let's say the response was something like, I don't know, 95% in favour of the slow progress scenario, 4% for the moderate scenario, and 1% for the rapid. Just to imagine something for the sake of illustration. Then you could also argue against a potential question wording effect biasing the results by appealing to the response data. You could say: well, clearly the respondents saw the past the framing of the question and managed to accurately report their views anyway.
This should be troubling. If a high percentage can be used to argue against a question wording effect and a low percentage can be used to argue against a question wording effect, then no matter what the results are, you can argue that you don't need to worry about a potential methodological problem because the results show they're not a big deal. If any results can be used to argue against a methodological problem, then surely no results should be used to argue against a methodological problem. Does that make sense?
I don't feel like I'm inventing the wheel here, but just talking about common concerns with how surveys are designed and worded. In general, you can't know whether a response was biased or not just by looking at the data and not the methodology.
For reference, here are the results on page 141 of the report: