Thesis: Artificial general intelligence (AGI) is far away because general intelligence requires the ability to learn quickly and efficiently. General intelligence is not just a large set of skills learned inefficiently. Current AI systems learn incredibly slowly and inefficiently and scaling them up won’t fix that.
Preamble: AGI is less than 0.1% likely by 2032
My current view is that there is significantly less than a 1 in 1,000 chance of artificial general intelligence (AGI) being developed before the end of 2032, with upwards of 95% confidence. Part of the reason I think AGI within 7 years[1] is so unlikely and why my confidence is so high is that in the sort of accounts that attempt to show how we go from current AI systems to AGI in such a short time, people stipulate things that are impossible based on current knowledge of how AI works or their accounts contradict themselves (which also makes it impossible for them to actually happen). In response to objections to these accounts that attempt to point out these difficulties, there have been no good answers. This leads me to conclude that my first impression that these accounts are impossible is indeed incorrect.
If I stopped to really think about it, my best guess of the probability of AGI by the end of 2032 might be less than even 1 in 10,000 and my confidence might be 99% or more. My impulse to make these numbers more cautious and conservative (higher probability, lower confidence) comes only from a desire to herd toward the predictions of other people, but a) this is a bad practice in the first place and b) I find the epistemic practices of people who believe very near-term AGI is very likely with high confidence tend to have alarming problems (e.g. being blithely unaware of opposing viewpoints, even those held by a large majority of experts — I’m not talking about disagreeing with experts, but not even knowing that experts disagree, let alone why), which in other contexts most reasonable people would find disqualifying. That makes me think I should disregard those predictions and think about the prediction I would make if those predictions didn’t exist.
Moreover, if I change the reference class from, say, people in the Effective Altruism Forum filter bubble to, say, AI experts or superforecasters, the median year for AGI gets pushed out past 2045, so my prediction starts to look like a lot less of an outlier. But I don’t want to herd toward those forecasts, either.
Humans learn much faster than AI
DeepMind’s AI agent AlphaStar was able to attain competence at StarCraft II competitive with the game’s top-tier players. This is an impressive achievement, but it required a huge amount of training relative to what a human requires to attain the same level of skill or higher. AlphaStar was unable to reinforcement learn StarCraft II from scratch — the game was too complex — so it first required a large dataset of human play (supplied to DeepMind by the game’s developer, Blizzard) to imitation learn from. After bootstrapping with imitation, AlphaStar did 60,000 years of reinforcement learning via self-play to reach Grandmaster level. How does this compare to how fast humans learn?
Most professional StarCraft II players are in their 20s or 30s. The age at which they first achieved professional status will also be less, on average, than their current ages. But to make my point very clear, I’ll just overestimate by a lot and assume that, on average, StarCraft II players reach professional status at age 35. I’ll also dramatically overestimate and say that, from birth until age 35, professional players have spent two-thirds of their time (16 hours a day, on average) playing StarCraft II. This comes out to 23 years of StarCraft II practice to reach professional status. Excluding the imitation learning and just accounting for the self-play, this means humans learn StarCraft II more than 2,500x faster than AlphaStar did.
The domain of StarCraft II also helps show why the speed of learning is also relevant in terms of what it means to have a skill. The strategy and tactics of StarCraft II are continually evolving. Unlike testing a skill against a frozen benchmark that never changes, opponents in StarCraft II respond to what you do and adapt.
Anecdotally, some top-tier StarCraft II players have conjectured that the reason AlphaStar II’s win rate eventually plateaued at a certain point within the Grandmaster League is that there are few enough Grandmasters (only 200 per geographical region or 1,000 worldwide) that these players were able to face AlphaStar again and again and learn how to beat it.
It’s one thing for the AI to beat a professional player in a best of 5 matchup, as happened on two or three occasions. (Although the details are a bit complicated and only one of these represented fair, typical competitive play conditions.) A best of 5 matchup in one sitting favours the AI. A best of 100 matchup over a month would favour the human. AlphaStar is not continually learning and, even if it were, it learns far too slowly — more than 2,500x more slowly than a human — to keep up. Humans can learn how AlphaStar plays, exploit its weaknesses, and turn their win rate against AlphaStar around. This is a microcosm of how general intelligence works. General intelligence means learning fast.
Scaling can’t compensate for AI’s inefficiency
It is generally not disputed that AI learns far more slowly and more inefficiently than humans. No one seems to try to claim that the speed or efficiency at which AI learns is improving fast enough to make up the gap anytime soon, either. Rather, the whole argument for the high likelihood of near-term AGI relies on that efficiency disadvantage being overcome by exponentially growing amounts of training data and training compute. So what if it takes AI more than 2,500x more data or experience than humans to learn the same skills? We’ll just give the AI that 2,500x more (or whatever it is) and then we’ll be even! But that is not how it works.
First, this is physically impossible. According to calculations by the philosopher (and effective altruism co-founder) Toby Ord, just scaling the reinforcement learning of large language models (LLMs) by as much again as it has already been scaled would require five times more electricity than the Earth currently generates in a year. It would also require the construction of 1 million data centres. What would you estimate the probability of that happening before the end of 2032 is? More than 1 in 1,000?
But keep in mind this would only achieve a modest performance gain. You might not even notice it as a user of LLMs. This is not about what it takes to get to AGI. This is about what it would take just to continue the scaling trend of reinforcement learning for LLMs. That’s a very low bar.[2]
Second, it’s technologically impossible given the current unsolved problems in fundamental AI research. For instance, AI models can’t learn from video data, at least not in anything like the way LLMs learn from text. This is an open research problem that has received considerable attention for many years. Does AGI need to be able to see? My answer is yes. Well, we currently don’t have AI models that can learn how to see to even the level of competence LLMs have with text (which is far below human-level) and it’s not for a lack of compute or data, so scaling isn’t a solution.
Sure, eventually this will be solved, but everything will be solved eventually, barring catastrophe. If you’re willing to hand-wave away sub-problems required to build AGI, you might as well hand-wave all the way, and just assume the overall problem of how to build AGI will be solved whenever you like. What year sounds interesting? Say, 2029? It has the weight of tradition behind it, so that’s a plus.[3]
Third, it’s practically impossible given the datasets we currently lack. Humans have a large number of heterogeneous skills. The amount of text available on the Internet is an unusual exception when it comes to the availability of data that can be imitation learned from, not the norm.
For instance, there are almost no recordings or transcripts (which, it should be noted, lose important information) of psychotherapy sessions, primarily due to privacy concerns. Clinical psychology professors face a difficulty in teaching their students how to practice psychotherapy because of the ethics concerns of showing a recording of some real person’s real therapy session to a classroom of students. If this scarcity of data poses challenges even for humans, how could AI systems that require three or more orders of magnitude more data to learn the same thing (or less) ever learn enough to become competent therapists?
That’s just one example. What about high-stakes negotiations or dealmaking that happens behind closed doors, in the context of business or government? What about a factory working using some obscure tool or piece of equipment about which the number of YouTube videos is either zero or very few? (Not that AI models can currently learn from video, anyway.) If we’re talking about AI models learning how to do everything… everything in the world… that’s a lot of data we don’t have.
Fourth, many skills require adapting to changing situations in real time, with very little data. If AI systems continue to require more than 2,500x as much data than humans to learn the same thing (or less), there will never be enough data for AI systems to attain human-level general intelligence. If the strategy or tactics of StarCraft II changes, AI systems will be left flatfooted. If the strategy or tactics of anything changes, AI systems will be left flatfooted. If anything changes significantly enough that it no longer matches what was in the training data, AI systems that generalize as poorly as current AI systems will not succeed in that domain. Arguably, nearly all human occupations — and nearly all realms of human life — involve this kind of continuous change, and require a commensurate level of adaptability. Artificial general intelligence has always been a question of generalization, not just learning a bunch of narrowly construed skills that can be tested against frozen benchmarks or a frozen world — the world isn’t frozen.
This gets to the question of what “having a skill” really means. When we say a human has a certain skill, we implicitly mean they have the ability to adapt to change. If we say that MaNa can play StarCraft II, we mean that if he faces another professional player who suddenly tries some off-the-wall strategies or tactics never before seen in the game of StarCraft, he will be able to adapt on the fly. The element of surprise might trip him up in the first round, or the first five, but over the course of more games over more time, he will adapt and respond. He isn’t a collection of frozen weights instantiating frozen skills interacting with a frozen world, he’s a general intelligence that can generalize, evolved in a world that changes.
When we talk about about what an AI system “can do”, what “skills it has”, we are often bending the definition so what capability or skill means no longer fits the real world, everyday definition we apply to humans. We don’t think about whether, as is always true for humans, the AI has the ability to adapt on the fly, to change in response to change, to generate non-random, intelligent, reasonable, novel behaviour in response to novelty. If AI can hit a fixed target, even though all targets in the real world are always and forever moving, we say that’s good enough, and that’s equivalent to what humans do. But it isn’t. And we know this. We just have to think about it.
One of the most talked about imagined uses cases of AI is to use AI recursively for AI research. But the job of a researcher is one of the most fluid, changing, unfrozen occupations I can think of. There is no way an AI system that can’t adapt to change with only a small amount of data can do research, in the sense that a human does research.
Fifth, even in contexts where the data sets are massive and the problems or tasks aren’t changing, AI systems can’t generalize. LLMs have been trained on millions of books, likely also millions of academic papers, everything in the Common Crawl dataset, and more. GPT-4 was released 2 years and 8 months ago. Likely somewhere around 1 billion people use LLMs. Why, in all the trillions of tokens generated by LLMs, is there not one example of an LLM generating a correct and novel idea in any scientific, technical, medical, or academic field? LLMs are equipped with as close as we can get to all the written knowledge in existence. They have been prompted billions of times. Where is the novel insight? ChatGPT is a fantastic search engine, but a miserable thinker. Maybe we shouldn’t think that if we feed an AI model some training data, it will have mastery over much more than literally exactly the data we fed it. In other words, LLMs’ generalization is incredibly weak.
Generalization is not something that seems to be improved with scaling, except maybe very meagerly.[4] If we were to somehow scale the training data and compute for LLMs by another 1 million times (which is probably impossible), it’s not clear that, even then, LLMs could generate their first novel and correct idea in physics, biology, economics, philosophy, or anything else. I reckon this is something so broke scaling ain’t gonna fix it. This is fundamental. If we think of generalization as the main measure of AGI progress, I’m not sure there’s been much progress in the last ten years. Maybe a little, but not a lot.
There have been many impressive, mind-blowing results in AI, to be sure. AlphaStar and ChatGPT are both amazing. But these are systems that rely on not needing to generalize much. They rely on a superabundance of data that covers a very large state space, and the state space in which they can effectively operate extends just barely beyond that. That’s something, but it’s not general intelligence.
Conclusion
General intelligence is (or at least, requires) the ability to learn quickly from very little new data. Deep learning and deep reinforcement learning, in their current state, require huge quantities of data or experience to learn. Data efficiency has been improving over the last decade, but not nearly fast enough to make up the gap between AI and humans within the next decade. The dominant view among people who think very near-term AGI is very likely (with high confidence) is that scaling up the compute and data used to train AI models will cover either all or most of the ground between AI and humans. I gave five reasons this isn’t true:
- Physical limits like electricity make it unrealistic to even continue existing scaling trends, let alone whatever amount of scaling might be required to make up for the data inefficiency of AI relative to humans.
- AI can’t learn from video data (at least not in any way like how LLMs learn from text) and this is not a scaling problem, it’s a fundamental research problem. An AGI will need to see, so this needs to be solved.
- We don’t have anything anywhere close to datasets encompassing all the skills in the world, and certainly not of the huge size that deep learning models require.
- To really possess a skill, an AI system must be able to adapt to change quickly based on very little data. This means that even with continued scaling, even with the ability to learn from video and other modalities, and even with datasets encompassing every skill, AI systems would still quickly break in real world contexts.
- Not even considering adaptation to change, current AI systems’ lack of generalization mean they already fail at real world tasks, even where plenty of text data exists, and even where the task more or less stays the same over time. This appears to be a fundamental feature of deep learning and deep reinforcement learning as we know them (although new fundamental ideas within those paradigms could one day change that).
It is always possible to hand-wave away any amount of remaining research progress that would be required to solve a problem. If I assume scientific and technological progress will continue for the next 1,000 years, then surely at some point the knowledge required to build AGI will be obtained. So, why couldn’t that knowledge be obtained soon? Well, maybe it could. Or maybe it will take much longer than 100 years. Who knows? We have no particular reason to think the knowledge will be obtained soon, and we especially have no reason to think it will be obtained suddenly, with no warning or lead up.
More practically, if this is what someone really believes, then arguably they should have not pulled forward their AGI forecast based on the last ten years of AI progress. Since almost all the energy around near-term AGI seems to be coming as a response to AI progress, and not a sudden conversion to highly abstract and hypothetical views about how suddenly AGI could be invented, I choose to focus on views that see recent AI progress as evidence for near-term AGI.
So, that amounts to arguing against the view that all or almost all or most of the fundamental knowledge to built AGI has already been obtained, and what remains is entirely or almost entirely or mostly scaling up AI models by some number of orders of magnitude that is attainable within the next decade. Scaling is running out of steam. The data is running out, supervised pre-training has been declared over or strongly deemphasized by credible experts, and training via reinforcement learning won’t scale much further. This will probably become increasing apparent over the coming years.
I don’t know to what extent people who have a high credence in near-term AGI will take this as evidence of anything, but it seems inevitable that the valuations of AI companies will have to come crashing down because the AI models’ capabilities can’t catch up the financial expectations those valuations are based on. I think people should take that as evidence because the real world is so much better of a test of AI capabilities than artificially constructed, frozen benchmarks, which are always, in some sense, designed to be easy for current AI systems.
In general, people curious about the prospects of near-term AGI should engage more with real world applications of AI, such as LLMs in a business context or with a robotics use case like self-driving cars, since the real world is much more like the real world than benchmarks, and AGI is defined by how it will perform in the real world, not on benchmarks. Benchmarks are a bad measure of AGI progress and without benchmarks, it’s not clear what other evidence for rapid AGI progress or near-term AGI there really is.
- ^
I chose the end of 2032 or around 7 years from now as a direct response to the sort of AGI timelines I’ve seen from people in effective altruism, such as the philosopher and effective altruism co-founder Will MacAskill.
- ^
I haven’t really given any thought to how you’d do the math for this — obviously, it would just be a toy calculation, anyway — but I wouldn’t be surprised if you extrapolated the scaling of reinforcement learning compute forward to get to some endpoint that serves as a proxy for AGI and it turned out it would require more energy than is generated by the Sun and more minerals than are in the Earth’s crust.
For example, if you thought that AGI would require reinforcement learning training compute to be scaled up not just as much as it has been already, but by that much again one more time, then 1 trillion data centres would be required (more than 100 per person on Earth), and if by that much again two more times, then 1 quintillion data centres would be required (more than 100 million per person on Earth). But I suspect even this is far too optimistic. I suspect you’d start getting into the territory where you’d start counting the number of Dyson spheres required, rather than data centres.
Combinatorial explosion never stops producing shocking results. For instance, according to one calculation, all the energy in the observable universe (and all the mass, converted to energy), if used by a computer as efficient as physics allows, would not be sufficient to have more than one in a million chance of bruteforcing a randomly generated 57-character password using numbers, letters, and symbols. Reinforcement learning is far more efficient than brute force, but the state space of the world is also astronomically larger than the possible combinations of a 57-character password. We should be careful that the idea of scaling up compute all the way to AGI doesn’t implicitly assume harnessing the energy of billions of galaxies, or something like that.
- ^
My point here isn’t that we know it’s extremely unlikely the problem of how to learn from video data will be solved within the next 7 years. My point is that we have no idea when it will be solved. If people were saying that they had no idea when AGI will be created, I would have no qualms with that, and I wouldn’t have written this post.
- ^
Please don’t confuse, here, the concept of an AI model being able to do more things (or talk about more things) because it was trained on data about more things. That’s not generalization, that’s just training. Generalization is a system’s ability to think or to generate intelligent behaviour in situations that go beyond what was covered in the data it was trained on.