Pronouns: she/her or they/them.
I got interested in effective altruism back before it was called effective altruism, back before Giving What We Can had a website. Later on, I got involved in my university EA group and helped run it for a few years. Now I’m trying to figure out where effective altruism can fit into my life these days and what it means to me.
Sure, François Chollet recently changed his prediction from AGI maybe in about 10 years to AGI maybe in about 5 years, but his very incisive arguments about the shortcomings of LLMs are logically and intellectually independent from his, in my opinion, extremely dubious prediction of AGI in about 5 years. I would like people who believe LLMs will scale to AGI to seriously engage with his arguments about why it won't. His prediction about when AGI will happen is kind of beside the point.
That's basically all that needs to be said about this, but I'll elaborate anyway because I think the details are interesting and intellectually helpful.
If I understand François Chollet's point of view correctly, his prediction of AGI in about 5 years depends on a) program synthesis being the key to solving AGI and b) everything required for program synthesis to take deep learning the rest of the way to AGI being solved within about 5 years. I have extreme doubts about both (a) and (b) and, for that matter, I would guess most people who think AGI will come within 7 years have strong doubts about at least (a). Thinking that LLMs will scale to AGI and believing that solving program synthesis is required to achieve AGI are incompatible views.
François Chollet, Yann LeCun of FAIR, Jeff Hawkins of Numenta, and Richard Sutton of Amii and Keen Technologies — LeCun and Sutton won Turing Awards for their pioneering work in deep learning and reinforcement learning, respectively — are four AI researchers who have strongly criticized LLMs and have proposed four different directions for AI research to get to AGI:
All four have also at times made statements indicating they think their preferred research roadmap to AGI can be executed on a relatively short timescale, although with some important caveats from the three other than Chollet. (Chollet might have given a caveat somewhere that I missed. I think he's said the least on this topic overall or I've missed what he's said.) I already described Chollet's thoughts on this. As for the others:
So, Chollet, LeCun, Hawkins, and Sutton all think LLMs are insufficient to get to AGI. They all argue that AGI requires fundamentally different ideas than what the mainstream of AI research is focusing on — and they advocate for four completely different ideas.[1] And all four of them are optimistic that their preferred research roadmap has a realistic chance of getting to AGI within a relatively short time horizon. Isn't that interesting?
Either three out of four of them have to be wrong about the key insights needed to unlock AGI or all four them have to be wrong. (I guess the solution to AGI could also turn out to be some combination of their ideas, in which case they would be partially right and partially wrong in some combination.) It's interesting that all four independently, simultaneously think their respective roadmaps are viable on a roughly similar timescale when a) it would be hard to imagine a strong theoretical or intellectual reason to support the idea that we will figure out the solution to AGI soon, regardless of what the solution actually is (e.g. whether it requires simulating cortical columns or using non-deep learning AI methods or using novel deep learning methods) and b) there are psychological reasons to believe such things, like the appeal of having an ambitious but achievable goal on a timescale that's motivating.
I find what Yann LeCun has said about this to be the wisest and most self-aware. I can't find the interview where he said this, but he said something along the lines of (paraphrasing based on memory): over the last several decades, many people have come along and said they have the solution to AGI and they've all been wrong. So, if someone comes along and tells you they have the solution to AGI, you should not believe them. I'm another person who's coming along and telling you I have the solution to AGI, and you should not believe me. But I still think I'm right.
I think (or at least hope), once described like this, in the context I just gave above, that kind of skepticism will strike most people as logical and prudent.
Chollet believes that the major AI labs are in fact working on program synthesis, but as far as I know, this hasn't been confirmed by any lab and, if it is happening, it hasn't made its way into published research yet.
I very, very strongly believe there’s essentially no chance of AGI being developed within the next 7 years. I wrote a succinct list of reasons to doubt near-term AGI here. (For example, it might come as a surprise that around three quarters of AI experts don’t think scaling up LLMs will lead to AGI.) I also highly recommend this video by an AI researcher that makes a strong case for skepticism of near-term AGI and of LLMs as a path to AGI.
By essentially no chance, I mean less than the chance of Jill Stein running as the Green Party candidate in 2028 and winning the U.S. Presidency. Or, if you like, less than the chance Jill Stein had of winning the presidency in 2024. I mean it’s an incredible long shot, significantly less than a 0.1% chance. (And if I had to give a number on my confidence of this, it would be upwards of 95%.)
By AGI, I mean a system that can think, plan, learn, and solve problems just like humans do, with at least an equal level of data efficiency (e.g. if a human can learn from one example, AGI must also be able to learn from one example and, not, say, one million), with at least an equal level of reliability (e.g. if humans do a task correctly 99.999% of the time, AGI must match or exceed that), with an equal level of fluidity or adaptability to novel problems and situations (e.g. if a human can solve a problem with zero training examples, AGI must be able to as well), and with an equal ability to generate novel and creative ideas. This is the only kind of AI system that could plausibly automate all human labour or cunningly take over the world. No existing AI system is anything like this.
As we go further out into the future, it becomes less and less clear what will happen. I think it could be significantly more than 100 years before AGI is developed. It could also be significantly less. Who knows? We don’t have a good understanding of fundamental ideas like what intelligence is, how it works, how to measure it, or how to measure progress toward AGI. It would be surprising if we could predict the invention of a technology — something that is generally not possible anyway — whose fundamental scientific nature we understand so poorly.
In March 2025, Dario Amodei, the CEO of Anthropic, predicted that 90% of code would be written by AI as early as June 2025 and no later than September 2025. This turned out to be dead wrong. In 2016, the renowned AI researcher Geoffrey Hinton predicted that by 2021, AI would automate away all radiology jobs and that turned out to be dead wrong. Even 9 years later, the trend has moved in the opposite direction and there is no indication radiology jobs will be automated away anytime soon. Many executives and engineers working in autonomous driving predicted we’d have widespread fully autonomous vehicles long ago; some of them have thrown in the towel. Cruise Automation, for example, is no more.
I expect that in 7 years from now (2032), we will have many more examples of this kind of false prediction, except on a somewhat grander scale, since people were predicting a high chance of either utopia or doom and trying to marshall a commensurate amount of resources and attention. I somewhat dread people simply doubling down and saying they were right all along regardless of what happens. Possible responses I dread:
-But ChatGPT is AGI! (A few people are in fact already saying this or something close to it. For example, Tyler Cowen said that o3 is AGI. I don’t know what "very weak AGI" is supposed to mean, but I most likely disagree that GPT-4 is that.)
-We were never sure anything would happen by 2032! What we really said was…
-We successfully averted the crisis! Mission accomplished! Thanks to us, AI progress was slowed and the world is saved!
I honestly don’t know who is interested in having this debate. I don’t get the sense that many people are eager to have it. In theory, I’d like to see people engage more deeply with the substantive, informed criticism (e.g. anti-LLM arguments from people like Yann LeCun and François Chollet; data showing little economic or practical benefit to AI, or even negative benefit). But since not many people seem interested in this, I guess I’m fine with that, since if we just wait 7 years the debate will settle itself.
[Personal blog] I’m taking a long-term, indefinite hiatus from the EA Forum.
I’ve written enough in posts, quick takes, and comments over the last two months to explain the deep frustrations I have with the effective altruist movement/community as it exists today. (For one, I think the AGI discourse is completely broken and far off-base. For another, I think people fail to be kind to others in ordinary, important ways.)
But the strongest reason for me to step away is that participating in the EA Forum is just too unpleasant. I’ve had fun writing stuff on the EA Forum. I thank the people who have been warm to me, who have had good humour, and who have said interesting, constructive things.
But negativity bias being what it is (and maybe “bias” is too biased a word for it; maybe we should call it “negativity preference”), the few people who have been really nasty to me have ruined the whole experience. I find myself trying to remember names, to remember who’s who, so I can avoid clicking on reply notifications from the people who have been nasty. And this is a sign it’s time to stop.
Psychological safety is such a vital part of online discussion, or any discussion. Open, public forums can be a wonderful thing, but psychological safety is hard to provide on an open, public forum. I still have some faith in open, public forums, but I tend to think the best safety tool is giving authors the ability to determine who is and isn’t allowed to interact with their posts. There is some risk of people censoring disagreement, sure. But nastiness online is a major threat to everything good. It causes people to self-censor (e.g. by quitting the discussion platform or by withholding opinions) and it has terrible effects on discourse and on people’s minds.
And private discussions are important too. One of the most precious things you can find in this life is someone you can have good conversations with who will maintain psychological safety, keep your confidences, “yes, and” you, and be constructive. Those are the kind of conversations that loving relationships are built on. If you end up cooking something that the world needs to know about, you can turn it into a blog post or a paper or a podcast or a forum post. (I’ve done it before!) But you don’t have to do the whole process leading up to that end product in public.
The EA Forum is unusually good in some important respects, which is kind of sad, because it shows us a glimpse of what maybe could exist on the Internet, without itself realizing that promise.
If anyone wants to contact me for some reason, you can send me a message via the forum and I should get it as an email. Please put your email address in the message so I can respond to you by email without logging back into the forum.
Take care, everyone.
The economic data seems to depend on one's point of view. I'm no economist and I certainly can't prove to you that AI is having an economic impact. Its use grows quickly though: Statistics on AI market size
This is confusing two different concepts. Revenue generated by AI companies or by AI products and services is a different concept than AI’s ability to automate human labour or augment the productivity of human workers. By analogy, video games (another category of software) generate a lot of revenue, but automate no human labour and don’t augment the productivity of human workers.
LLMs haven’t automated any human jobs and the only scientific study I’ve seen on the topic found that LLMs slightly reduced worker productivity. (Mentioned in a footnote to the post I linked above.)
If AI is having an economic impact by automating software engineers' labour or augmenting their productivity, I'd like to see some economic data or firm-level financial data or a scientific study that shows this.
Your anecdotal experience is interesting, for sure, but the other people who write code for a living who I've heard from have said, more or less, AI tools save them the time it would take to copy and paste code from Stack Exchange, and that's about it.
I think AI's achievements on narrow tests are amazing. I think AlphaStar's success on competitive StarCraft II was amazing. But six years after AlphaStar and ten years after AlphaGo, have we seen any big real-world applications of deep reinforcement learning or imitation learning that produce economic value? Or do something else practically useful in a way we can measure? Not that I'm aware of.
Instead, we've had companies working on real-world applications of AI, such as Cruise, shutting down. The current hype about AGI reminds me a lot of the hype about self-driving cars that I heard over the last ten years, from around 2015 to 2025. In the five-year period from 2017 to 2022, the rhetoric on solving Level 4/5 autonomy was extremely aggressive and optimistic. In the last few years, there have been some signs that some people in the industry are giving up, such as Cruise closing up shop.
Similarly, some companies, including Tesla, Vicarious, Rethink Robotics, and several others have tried to automate factory work and failed.
Other companies, like Covariant, have had modest success on relatively narrow robotics problems, like sorting objects into boxes in a warehouse, but nothing revolutionary.
The situation is complicated and the truth is not obvious, but it's too simple to say that predictions about AI progress have overall been too pessimistic or too conservative. (I'm only thinking about recent predictions, but one of the first predictions about AI progress, made in 1956, was wildly overoptimistic.[1])
I wrote a post here and a quick take here where I give my other reasons for skepticism about near-term AGI. That might help fill in more information about where I'm coming from, if you're curious.
Quote:
An attempt will be made to find how to make machines use language, form abstractions and concepts, solve kinds of problems now reserved for humans, and improve themselves. We think that a significant advance can be made in one or more of these problems if a carefully selected group of scientists work on it together for a summer.
I completely feel the same way that racism and sympathy toward far-right and authoritarian views in effective altruism is a reason for me to want to distance myself from the movement. As well as people maybe not agreeing with these views but basically shrugging and acting like it's fine.
Here's a point I haven't seen many people discuss:
...many people could have felt betrayed by the fact that EA leadership was well aware of FTX sketchiness and didn't say anything (or weren't aware, but then maybe you'd be betrayed by their incompetence).
What did the EA leadership know and when did they know it? About a year ago, I asked in a comment here about a Time article that claims Will MacAskill, Holden Karnofsky, Nick Beckstead, and maybe some others were warned about FTX and/or Sam Bankman-Fried. I might have missed some responses to this, but I don't remember ever getting a clear answer on this.
If EA leaders heard credible warnings and ignored them, then maybe that shows poor judgment. Hard to say without knowing more information.
Forgive me for the very long reply. I’m sure that you and others on the EA Forum have heard the case for near-term AGI countless times, often at great depth, but the opposing case is rarely articulated in EA circles, so I wanted to do it justice that a tweet-length reply could not do.
Why does the information we have now indicate AGI within 7 years and not, say, 17 years or 70 years or 170 years? If progress in science and technology continues indefinitely, then eventually we will gain the knowledge required to build AGI. But when is eventually? And why would be so incredibly soon? To say that some form of progress is being made is not the same as making an argument for AGI by 2032, as opposed to 2052 or 2132.
I wouldn’t say that LLM benchmarks accurately represent what real intellectual tasks are actually like. First, the benchmarks are designed to be solvable by LLMs because they are primarily intended to measure LLMs against each other and to measure improvements in subsequent versions of the same LLM model line (e.g. GPT-5 vs GPT-4o). There isn’t much incentive to create LLM benchmarks where LLMs stagnate around 0%.[1]
Even ARC-AGI 1, 2, and 3, which are an exception in terms of their purpose and design are still intended to be in the sweet spot between too easy to be a real challenge and too hard to see progress on. If a benchmark is easy to solve or impossible to solve, it won’t encourage AI researchers and engineers to try hard to solve it and make improvements to their models in the process. The intention of the ARC-AGI is to give people working on AI a shared point of focus and a target to aim for. The purpose is not to make a philosophical or scientific point about what current AI systems can’t do. The benchmarks are designed to be solved by current AI systems.
It always bears repeating, since confusion is possible, that the ARC-AGI benchmarks are not intended to test whether a system is AGI or not, but are rather intended to test whether AI systems are making progress toward AGI. So, getting 95%+ on ARC-AGI-2 would not mean AGI is solved, but it would be a sign of progress — or at least that is the intention.
Second, for virtually all LLMs tests or benchmarks, the definition of success or failure on the tasks has to be reduced to something simple enough that software can grade the task automatically. This is a big limitation.
When I think about the sort of intellectual tasks that humans do, not a lot of them can be graded automatically. Of course, there are written exams and tests with multiple choice answers, but these are primarily tests of memorization. Don’t get me wrong, it is impressive that LLMs can memorize essentially all text ever written, but memorization is only one aspect of intelligence. We want AI systems that go beyond just memorizing things from huge numbers of examples and can also solve completely problems that aren’t a close match for anything in their training dataset. That’s where LLMs are incredibly brittle and start just generating nonsense, saying plainly false (and often ridiculous) things, contradicting themselves, hallucinating, etc. Some great examples are here, and there's also an important discussion of how these holes in LLM reasoning get manually patched by paying large workforces to write new training examples specifically to fix them. This creates an impression of increased intelligence, but the improvement isn't from scaling in these cases, it's from large-scale special casing.
I think the most robust tests of AI capabilities are tasks that have real world value. If AI systems are actually doing the same intellectual tasks as human beings, then we should see AI systems either automating labour or increasing worker productivity. We don’t see that. In fact, I’m aware of two studies that looked at the impact of AI assistance on human productivity. One study on customer support workers found mixed results, including a negative impact on productivity for the most experienced employees. Another study, by METR, found a 19% reduction in productivity when coders used an AI coding assistant.
In industry, non-AI companies that have invested in applying AI to the work they do are not seeing that much payoff. There might be some modest benefits in some niches. I’m sure there are at least a few. But LLMs are not going to be transformational to the economy. Let alone automate all office work.
Personally, I find ChatGPT to extremely useful as an enhanced search engine. I call it SuperGoogle. If I want to find a news article or an academic paper or whatever about a certain very specific topic, I can ask GPT-5 Thinking or o3 to go look to see if anything like that exists. For example, I can say, “Find me any studies that have been published comparing the energy usage of biological systems to technological systems, excluding brains and computers.” And then it often will give me some stuff that isn’t helpful, but it is good enough at digging up a really helpful link often enough that it’s still a really useful tool overall. I don’t know how much time this saves me Googling, but it feels useful. (It’s possible like the AI coders in the METR study, I’m falling prey to the illusion that it’s saving me time when actually, on net, it wastes time, but who knows.)
This is a genuine innovation. Search engines are an important tool and such a helpful innovation on the search engine is a meaningful accomplishment. But this is an innovation on the scale of Spotify allowing us to stream music, rather than something on the scale of electricity or the steam engine or the personal computer. Let alone something as revolutionary as the evolution of the human prefrontal cortex.
If LLMs were genuinely replicating human intelligence, we would expect to see an economic impact, excluding the impact of investment. Investment is certainly having an impact, but, as John Maynard Keynes said, if you pay enough people to dig holes and then fill the same holes up with dirt again, that stimulus will impact the economy (and may even get you out of a recession). What economic impact is AI having over and above the impact that would have been had by using the same capital on to pay people to dig holes and fill them back up? From the data I’ve seen, the impact is quite modest and a huge amount of capital has been wasted. I think within a few years many people will probably see the recent AI investment boom as just a stunningly bad misallocation of capital.[2]
People draw analogies to the transcontinental railway boom and the dot com bubble, but they also point out that railways and fibre-optic cable depreciate at a much slower rate than GPUs. Different companies calculate the depreciation of GPUs at different rates, typically ranging from 1 year to 6 years. Data centres have non-GPU aspects like the buildings and the power connects that are more durable, but the GPUs account for more than half of the costs. So, overbuilding capacity for demand that doesn’t ultimately materialize would be extremely wasteful. Watch this space.[3]
If you think an LLM scoring more than 100 on an IQ test means it's AGI, then we've had AGI for several years. But clearly there's a problem with that inference, right? Memorizing the answers to IQ tests, or memorizing similar answers to similar questions that you can interpolate, doesn't mean a system actually has the kind of intelligence to solve completely novel problems that have never appeared on any test, or in any text. The same general critique applies to the inference that LLMs are intelligent from their results on virtually any LLM benchmark. Memorization is not intelligence.
If we instead look at performance on practical, economically valuable tasks as the test for AI's competence at intellectual tasks, then its competence appears quite poor. People who make the flawed inference from benchmarks just described say that LLMs can do basically anything. If they instead derived their assessment from LLMs' economic usefulness, it would be closer to the truth to say LLMs can do almost nothing.
There is also some research on non-real world tasks that supports the idea that LLMs are mass-scale memorizers with a modicum of interpolation or generalization to examples similar to what they've been trained on, rather than genuinely intelligent (in the sense that humans are intelligent). The Apple paper on "reasoning" models found surprisingly mixed results on common puzzles. The finding that sticks out most in my mind is that the LLM's performance on the Tower of Hanoi puzzle did not improve after being told the algorithm for solving the puzzle. Is that real intelligence?
It's possible, at least in principle (not sure it often happens in practice), to acknowledge these flaws in LLMs and still believe in near-term AGI. If there's enough progress in AI fast enough, then we could have AGI within 7 years. This is true, but it was also true ten years ago. When AlphaGo beat Lee Sedol in 2016, you could have said we'll have AGI within 7 years — because, sure, being superhuman at go isn't that close to AGI, but look at how fast the progress has been, and imagine how fast the progress will be![4] If you think it's just a matter of scaling, then I could understand how you would see the improvement as predictable. But I think the flaws in LLMs are inherent to LLMs and can't be solved through scaling. The video from AI researcher Edan Meyer that I linked to in my original comment makes an eloquent case for this. As does the video with François Chollet.
There are other problems with the scaling story:
The benefits to scaling are diminishing and, at the same time, data scaling and compute scaling may have to slow down sometime soon (if this is not already happening).
If you expand the scope of LLM performance between written prompts and responses to "agentic" applications, I think LLMs' failures are more stark and the models do not seem to be gaining mastery of these tasks particularly quickly. Journalists generally say that companies' demos of agentic AI don't work.
I don't expect that performance on agentic tasks will rapidly improve. To train on text-based tasks, AI labs can get data from millions of books and large-scale scrapes of the Internet. There aren't similarly sized datasets for agentic tasks. In principle, you can use pure reinforcement learning without bootstrapping from imitation learning, but while this approach has succeeded in domains with smaller spaces of possible actions like go, it has failed in domains with larger spaces of possible actions like StarCraft. I don't think agentic AI will get particularly better over the next few years. Also, the current discrepancy between LLM performance on text-based tasks and agentic tasks tells us something about whether LLMs are genuinely intelligent. What kind of PhD student can't use a computer?
So, to briefly summarize the core points of this very long comment:
Maybe it's worth mentioning the very confusing AI Impacts survey conducted in 2022 where the surveyors gave 2,778 AI researchers essentially two different descriptions of an AI system that could be construed as an AGI and also could be construed as equivalent to each other (I don't know why they designed the survey like this) and, aggregating the AI researchers' replies, found they assign a 50% chance of AGI by 2047 (or a 10% chance by 2027) on one definition and a 50% chance of AGI by 2116 (or a 10% chance by 2037) on another definition.
In 2022, there was also a survey of superforecasters with a cleaner definition of AGI. They, in aggregate, assigned a 1% chance of AGI by 2030, a 21% chance by 2050, a 50% chance by 2081, and a 75% chance by 2100.
Both the AI Impact survey and the superforecaster survey were conducted before the launch of ChatGPT. I would guess ChatGPT would probably have led them to shorten their timelines, but if LLMs are more or less a dead end, as people like the Turing Award winners Yann LeCun and Richard Sutton have argued,[5] then this would be a mistake. (In a few years, if things go the way I expect, meaning that generative AI turns out to be completely disappointing and this is reflected in finance and the economy, then I would guess the same people would then lengthen their timelines.) In any case, it would be interesting to run the surveys again now.
I think it might be useful to bring up just to disrupt the impression some people in EA might have that there is an expert consensus that near-term AGI is likely. I imagine even if these surveys were re-run now, we would still see a small chance of AGI by 2032. Strongly held belief in near-term AGI exists in a bit of a bubble or echo chamber and if you're in the minority on an issue among well-informed people, that can stimulate some curiosity about why so many people disagree with you.
In truth, I don't think we can predict when a technology will be invented, particularly when we don't understand the science behind it. I am highly skeptical that we can gain meaningful knowledge by just asking people to guess a year. So, it really is just to stimulate curiosity.
There are a lot of strong, substantive, well-informed arguments against near-term AGI and against the idea that LLMs will scale to AGI. I find it strange how little I see people in EA engage with these arguments or even are aware of them. It's weird to me that a lot of people are willing to, in some sense, stake the reputation of EA and, to some degree, divert money away from GiveWell-recommended charities without, as far as I've seen, much in the way of considering opposing viewpoints. It seems like a lack of due diligence.
However, it would be easy to do so, especially if you're willing to do manual grading. Task an LLM with making stock picks that achieve alpha — you could grade that automatically. Try to coax LLMs into coming up with a novel scientific discovery or theoretical insight. Despite trillions of tokens generated, it hasn't happened yet. Tasks related to computer use and "agentic" use cases are also sure to lead to failures. For example, make it play a video game it's never seen before (e.g. because the game just came out). Or, if the game is slow-paced enough, simply give you instructions on how to play. You can abstract out the computer vision aspect of these tests if you want, although it's worth asking how we're going to have AGI if it can't see.
From a Reuters article published today:
However, you'd think if this accurately reflected the opinions of people in finance, the bubble would have popped already.
The FTX collapse caused a lot of reputational damage for EA. Depending on how you look at it, AI investments collapsing could cause an even greater amount of reputational damage for EA. So much of EA has gone all-in on near-term AGI and the popping of an AI financial bubble would be hard to square with that. Maybe this is melodramatic because the FTX situation was about concerns of immoral conduct on the part of people in EA and the AI financial bubble would just be about people in EA being epistemically misguided. I don't know anything and I can't predict the future.
Some people, like Elon Musk, have indeed said things similar to this in response to DeepMind's impressive results.
Sutton's reinforcement learning-oriented perspective, or something close to Sutton's perspective, anyway, is eloquently argued for in the video by the AI researcher Edan Meyer.