Hide table of contents

Background

Ilya Sutskever is a renowned machine learning researcher who co-authored the AlexNet paper in 2012 that helped kick off the deep learning revolution. Sutskever co-founded OpenAI and served as Chief Scientist until 2024. He participated in the temporary ouster of CEO Sam Altman and subsequently left the company. Following his departure, he founded his own company, Safe Superintelligence, which has raised $3 billion, but about which little else is known.

Sutskever’s second-ever interview with Dwarkesh Patel was released today, November 25, 2025. Here are the highlights from my point of view, along with my commentary.

Research vs. scaling

Sutskever says that (plus or minus a few years) 2012 to 2020 was an age of research, 2020 to 2025 was an age of scaling, and 2026 onward will be another age of research.  This reiterates previous comments he’s made. Sutskever specifically predicts that another 100x scaling of AI models would make a difference, but would not transform AI capabilities.

Self-supervised pre-training vs. reinforcement learning-based training for LLMs

Sutskever says “based on what people say on Twitter” — which is a vague, sketchy source — companies “spend more compute on RL than on pre-training at this point, because RL can actually consume quite a bit of compute.” Sutskever notes that reinforcement learning provides “a relatively small amount of learning” for the compute it uses.

If it’s true that the reinforcement learning (RL) compute now exceeds self-supervised pre-training compute, this confirms part of Toby Ord’s incredibly important post “How Well Does RL Scale?”. If true, it spells trouble for large language model (LLM) scaling. LLM companies like Anthropic are banking on RL training. If RL training already uses about as much compute as pre-training, then scaling up RL training 10x or 100x would require about a 10x or 100x scale up in total training compute. At the size of current training runs for the latest LLMs, scaling up RL compute enough to make a big difference is getting prohibitively expensive, particularly as RL training is much less efficient than pre-training.

Confusion about LLMs’ lack of economic impact

Sutskever expresses confusion about why LLMs look so good on benchmarks but fail so badly in practical applications and have yet to impact the economy in any meaningful way. He says LLMs’ generalization ability is “inadequate” — hey, I’m in good company! — and speculates that RL training or poor data curation may be the culprit.

This is an important discussion to have. In my opinion, Sutskever seems like he’s still stuck in the AI industry reality distortion field where it’s dubiously assumed that benchmark performance should have something to do with real world usefulness. When your measurements turn out not to measure the thing you were trying to measure, go back to the drawing board and think about new measurements. The AI researcher François Chollet has done this and created the ARC-AGI benchmarks. More researchers should follow Chollet’s example and try to come up with new, better benchmarks.

A lot of seat-of-the-pants speculation about the nature of intelligence

The conversation is peppered with seat-of-the-pants speculation about big ideas, such as whether LLM pre-training is analogous to the evolution of mammalian vision and locomotion.

I don’t blame anyone too much for engaging in this kind of speculation, since the study of consciousness, intelligence, minds, cognition, etc. is in a pre-paradigmatic and mostly pre-scientific state. The aspiration of the great philosopher of mind Daniel C. Dennett (who sadly passed away in 2024) was to make enough progress in philosophy so that a science of consciousness could be created. In my opinion, Dennett was astonishingly successful in making progress on philosophy, but a true, full-fledged science of consciousness wasn’t created during his lifetime. Time will tell what Dennett’s influence ends up being. The comparison I think of is that Charles Darwin published On the Origin of Species in 1859, but the “modern synthesis” that combined Darwin’s theory of evolution by natural selection with Mendelian genetics didn’t begin until 1900 at the earliest, 41 years later.

People like Sutskever — and he’s hardly alone in this — sometimes engage in seat-of-the-pants theorizing that strikes me as incredibly simplistic and fanciful. Sutskever’s talk at the 2024 NeurIPS AI conference contains some striking examples of this. For instance, he assumes that an artificial neuron in a deep neural network is the computational equivalent of a biological neuron in a human brain. On this basis, he argues that a 10-layer deep artificial neural network should be able to accomplish any cognitive task that a human brain can accomplish in 0.1 seconds. This is extraordinarily dubious.

This kind of thing reminds me of Dennett’s concept of greedy reductionism, wherein philosophers or scientists rush to come up with an overly simplistic theory of some complex phenomenon, particularly human behaviour or cognition. Prematurely formalizing or operationalizing or forming complete theories about big informal concepts like intelligence is the bane of good philosophical, scientific understanding.

The role of emotion in cognition

Sutskever recalls a case study of a man with severe brain damage that inhibited his ability to feel emotions. In Sutskever’s retelling, the man’s decision-making was severely impeded: he spent hours trying to decide which socks to wear and made irrational financial decisions. Sutskever uses this case study to inspire a discussion of what might be missing in machine learning.

Assuming Sutskever’s retelling is accurate — which I would guess it is not entirely — two questions about the case study immediately come to mind. First, Sutskever characterizes this as a complete loss of emotion, but is it a complete loss or a severe, yet not complete, inhibition of emotion? The human brain has a lot of functions spread out all over the place, so maybe not all emotion was destroyed. Also, it’s hard to understand how someone could function at all — even at the level of putting on socks and making financial decisions — with a complete loss of emotion. Second, can we say with confidence the brain damage only destroyed brain structures used for emotion and not for cognition? The impediments to cognition could be downstream of the impediments to emotion, or the brain damage may have disrupted both emotional and cognitive processing.

My other reaction to this segment of the interview is that Sutskever evinces an anti-emotion prejudice that has a long history in the field of artificial intelligence. The prejudice is that emotions are simple and primitive, whereas thoughts are complex and sophisticated. This is just wrong.

First, the distinction between emotion and thought or cognition is hard to cleanly draw, either from a scientific point of view or from one’s own first-person, phenomenological point of view. Let’s say you have an intuition that something is true or false. Is that intuition a thought or a feeling? If it’s a thought, then is anger a thought? If it’s a feeling, then would you accord the proper level of credence to propositions without such feelings? And if not, then why do we say it’s “just” a feeling and not a part of cognition? Are we really talking about a distinction that exists in reality or are we imposing a cultural, historically contingent distinction on a non-binary reality?

Second, emotions are incredibly complex! People can only have the impression that emotions are simple because most people don’t understand much about emotions. For example, the typical person would be hard-pressed to draw the distinction between awe and wonder, jealousy and envy, shame and guilt, embarrassment and humiliation, stress and overwhelm, or empathy and sympathy. Most people have never reflected much on how to define grief or love. Are emotions simple, or is most people’s understanding of them simple? For more on this, watch the emotions researcher Brené Brown’s TV show Atlas of the Heart or read her book of the same name.

Conclusion

Sutskever’s comments on LLM scaling undermine the case that LLMs will scale to artificial general intelligence (AGI) in the not-too-distant future. He reframes AGI as a set of open research problems (as I also recently did). He is extremely optimistic about solving the research problems in the not-too-distant future, but it’s crucial to distinguish between the idea of AGI as a matter of scaling versus AGI as a matter of research.

Parts of the interview remind me that many fundamental theoretical and conceptual problems persist in philosophy of mind, cognitive science, and artificial intelligence, such as what human cognitive capacities are evolved versus learned and what is the role of emotion in cognition. Some of these problems remain wide open — we are still in a pre-paradigmatic, largely pre-scientific state.

Comments


No comments on this post yet.
Be the first to respond.
Curated and popular this week
 ·  · 19m read
 · 
Author’s note: This is an adapted version of my recent talk at EA Global NYC (I’ll add a link when it’s available). The content has been adjusted to reflect things I learned from talking to people after my talk. If you saw the talk, you might still be interested in the “some objections” section at the end.   23 November: edited to make a slight terminological change to one of the approaches (precise probabilities --> precise probabilities  to as many things as you can) Summary Wild animal welfare faces frequent tractability concerns, amounting to the idea that ecosystems are too complex to intervene in without causing harm. However, I suspect these concerns reflect inconsistent justification standards rather than unique intractability. To explore this idea: * I provide some context about why people sometimes have tractability concerns about wild animal welfare, providing a concrete example using bird-window collisions. * I then describe four approaches to handling uncertainty about indirect effects: spotlighting (focusing on target beneficiaries while ignoring broader impacts), ignoring cluelessness (acting on knowable effects only), assigning precise probabilities to all outcomes, and seeking ecologically inert interventions. * I argue that, when applied consistently across cause areas, none of these approaches suggest wild animal welfare is distinctively intractable compared to global health or AI safety. Rather, the apparent difference most commonly stems from arbitrarily wide "spotlights" applied to wild animal welfare (requiring consideration of millions of species) versus narrow ones for other causes (typically just humans). While I remain unsure about the right approach to handling indirect effects, I think that this is a problem for all cause areas as soon as you realize wild animals belong in your moral circle, and especially if you take a consequentialist approach to moral analysis. Overall, while I’m sympathetic to worries about unanticipated ecol
 ·  · 4m read
 · 
Today we’re announcing a new cluster headache advocacy and research initiative: ClusterFree Learn more about how you (and anyone) can help. Our mission ClusterFree’s mission is to help cluster headache patients globally access safe, effective pain relief treatments as soon as possible through advocacy and research. Cluster headache (also known as ‘suicide headache’) is considered the most painful condition known to mankind. We believe it is one of the largest sources of preventable extreme suffering in humans today. Every year, about 3 million adults (and an unknown number of minors) suffer from this debilitating condition. And yet, even in the EU, only 47% of the cluster headache population had unrestricted access to standard treatments (primarily oxygen and triptans) in 2019. Despite affecting a similar number of people as multiple sclerosis, global investment into cluster headache is minuscule. At the same time, countless patients have reported previously unattainable relief using certain psychedelics, even at low doses. For example, psilocybin, LSD and 5-MeO-DALT can effectively prevent attacks, and N,N-DMT can abort attacks within seconds and also have some preventative effects. However, these life-saving treatments are inaccessible to the vast majority of patients. We want to tackle these problems by: * Publishing open letters demanding that governments, regulatory bodies, and medical associations worldwide take action immediately, with a focus on easing restrictions around psychedelic use. * Providing patient groups with high-quality resources and supporting their advocacy efforts. * Engaging with policymakers globally to advocate for better access to treatments. * Publishing research on cluster headache and supporting other researchers in the field. * Collaborating with entrepreneurs and philanthropists motivated to bring new, effective treatments to market. About us ClusterFree is a non-profit initiative incubated by the Qualia Research Insti
 ·  · 7m read
 · 
Read the grantmaking strategy as a visualized PDF here.  Over the last year and a half, the Animal Welfare Fund (AWF) has implemented organizational improvements, such as increased staffing, communications, evaluation, and fundraising efforts, enabling us to expand the scope and sustainability of our impact. To capitalize on these changes, we are refining our 3-year grantmaking strategy to maximize our support for non-human animals. In this update, we describe the role we believe AWF can play in the animal welfare space, outline our approach to key trade-offs, and summarize our funding priorities in each of our focus areas. While we planned the strategy for the next 3 years, we expect that through this period our strategy will inevitably evolve as the landscape of opportunities changes and we learn about the impact of the work we supported; we plan to revisit our portfolio annually and publish updates if we make any significant changes. The Problem We See Most animal suffering occurs among groups of animals, regions, and supply chains where animal welfare has received disproportionately little attention from philanthropists, governments, businesses, and even the animal advocacy movement. In many of these areas, animals are raised and sold through independent, often smaller-scale farmers, traders, distributors, processors, and local markets, and governments often lack the resources or authority to monitor conditions or enforce standards. These areas also tend to have fewer validated interventions and less developed animal advocacy movements.  Barriers to making progress We believe that making progress in these areas requires testing novel ideas and scaling proven ones. However, there are numerous barriers to making this progress, including: * Many existing funding sources favour large, established organisations in familiar networks, doing interventions with established track records. * Many large, established organisations have built strategic positions that