Yarrow Bouchard🔸

1014 karmaJoined Canadamedium.com/@strangecosmos

Bio

Pronouns: she/her or they/them. 

I got interested in effective altruism back before it was called effective altruism, back before Giving What We Can had a website. Later on, I got involved in my university EA group and helped run it for a few years. Now I’m trying to figure out where effective altruism can fit into my life these days and what it means to me.

Comments
255

Topic contributions
1

This is, of course, sensitive to your assumptions.

In principle, yes, but in a typical bet structure, there is no upside for the person taking the other side of that bet, so what would be the point of it for them?

Sometimes these bets are structured as loans. As in, I would loan someone money and they would promise to pay me that money back plus a premium after 7 years. But I don’t want to give a stranger from another country a 7-year loan that I wouldn’t be able to compel them to repay once the time is up. 

There is Long Bets, which is a nice site, but since everything goes to charity, it’s largely symbolic. (Also, the money is paid up by both sides in advance, and the Long Now Foundation just holds onto it until the bet is resolved. So, it's a little bit wasteful in that respect.)

What part do you think is uncertain? Do you think RL training could become orders of magnitude more compute efficient? 

The recent anthology Essays on Longtermism, which is open access and free to read here, has several essays with good criticisms of longtermism. You might find some of those essays interesting. The authors included in that anthology are a mix of proponents of longtermism and critics of longtermism.

This is not necessarily to disagree with any of your specific arguments or your conclusion, but I think for people who have not been extremely immersed in effective altruist discourse for years, what has been happening with effective altruism over the last 5-10 years can easily be mis-diagnosed.

In the last 5-10 years, has EA shifted significantly toward prioritizing very long-term outcomes (i.e. outcomes more than 1,000 years in the future) over relatively near-term outcomes (i.e. outcomes within the next 100 years)? My impression is no, not really.

Instead, what has happened is that a large number of people in EA have come to believe that there’s more than a 50% chance of artificial general intelligence being created within the next 20 years, with many thinking there’s more than a 50% chance of it being created within 10 years. If AGI is created, many people in EA believe there is a significant risk of human extinction (or another really, really bad outcome). "Significant risk" could mean anywhere from 10% to over 50%. People vary on that. 

This is not really about the very long-term future. It’s actually about the near-term future: what happens within the next 10-20 years. It’s not a pivot from the near-term to the very long-term, it’s a pivot from global poverty and factory farming to near-term AGI. So, it’s not really about longtermism at all. 

The people who are concerned about existential risk from near-term AGI don’t think it’s only a justified worry if you account for lives in the distant future. They think it’s a justified worry if you only account for people who already alive right now. The shift in opinion is not anything to do with arguments about longtermism, but about people thinking AGI is much more likely much sooner than they previously did, and also them accepting arguments that AGI would be incredibly dangerous if created. 

The pivot in EA over the last 5-10 years has also not, in my observation, been a pivot from global poverty and factory farming to existential risk in general, but a pivot to only specifically existential risk from near-term AGI. 

To put my cards on the table, my own personal view is:

  • Longtermism is, in principle, correct (i.e. all else being equal, future lives matter as much as present lives or past lives and, if we can, we should try to ensure there are a lot of good future lives), but even after years of discussion, it seems really hard for anyone to give a good example of what actions longtermism justifies that we wouldn’t have already been doing anyway — other than actions related to existential risk, the discussion of which predates the discussion of longtermism by many years.
  • Whereas many people in EA seem to think the probability of AGI being created within the next 7 years is 50% or more, I think that probability is significantly less than 0.1%.
  • The arguments that AGI would be extremely dangerous rely on assumptions about the underlying technologies used to create AGI. I don’t think we know yet what those technologies will be, so I don’t accept these arguments. I’m not necessarily saying I know for sure AGI won’t be dangerous, either, I’m just saying knowing for sure would require information we don’t have.
  • I miss when EA was much more focused on global poverty and factory farming. (So, I agree with you there.)
  • Some existential risks are probably neglected, but not existential risk from near-term AGI. I think the world most likely still underinvests in global catastrophic risks of the humans vs. nature variety: asteroids, large volcanos, and natural pandemics. These risks are easier to calculate rigorous probabilities for. 

The x-risks you discussed in your post are humans vs. humans risks: nuclear war, bioweapons, and the humans creating AGI. These are far more complex. Asteroids don’t respond to our space telescopes by attempting to disguise themselves to evade detection. But with anything to do with humans, humans will always respond to what we do, and that response is always at least somewhat unpredictable. 

I still think we should do things to reduce the risk from nuclear war and bioweapons. I’m just saying that these risks are more complex and uncertain than risks from nature. So, it’s more harder to do the cost-effectiveness math that shows spending to reduce these risks is justified. However, so much in the world can’t be rigorously analyzed with that kind of math, so that’s not necessarily an argument against it!

As for climate change, I agree it's important, and maybe some people in EA have done some good work in this area — I don't really know — but it seems like there's already so much focus on it from so many people, many of whom are extremely competent, it's hard to see what EA would contribute by focusing on it. By contrast, global poverty charity effectiveness wasn't a topic many people outside of international development thought about — or at least felt they could do anything about — before GiveWell and effective altruism. Moreover, there wasn't any social movement advocating for people to donate 10% of their income to help the global poor. 

The Grok chart contains no numbers, which is so strange I don't think you can conclude much from it except "we used more RL than last time."

Isn't the point just that the amount of compute used for RL training is now roughly the same as the amount of compute used for self-supervised pre-training? Because if this is true, then obviously scaling up RL training compute another 1,000,000x is obviously not feasible.

My main takeaway from this post is not whether RL training would continue to provide benefits if it were scaled up another 1,000,000x, just that the world doesn't have nearly enough GPUs, electricity, or investment capital for that to be possible.

I disagree with the claim that pre-training returns declined much

Could you elaborate or link to somewhere where someone makes this argument? I'm curious to see if a strong defense can be made of self-supervised pre-training of LLMs continuing to scale and deliver worthwhile, significant benefits.

This is a really compelling post. This seems like the sort of post that could have a meaningful impact on the opinions of people in the finance/investment world who are thinking about AI. I would be curious to see how equity research analysts and so on would react to this post. 

This is a very strong conclusion and seems very consequential if true:

This leaves us with inference-scaling as the remaining form of compute-scaling.

I was curious to see if you had a similar analysis that supports the assertion that "the scaling up of pre-training compute also stalled". Let me know if I missed something important. For the convenience of other readers, here are some pertinent quotes from your previous posts. 

From "Inference Scaling Reshapes AI Governance" (February 12, 2025):

But recent reports from unnamed employees at the leading labs suggest that their attempts to scale up pre-training substantially beyond the size of GPT-4 have led to only modest gains which are insufficient to justify continuing such scaling and perhaps even insufficient to warrant public deployment of those models. A possible reason is that they are running out of high-quality training data. While the scaling laws might still be operating (given sufficient compute and data, the models would keep improving), the ability to harness them through rapid scaling of pre-training may not.

 

There is a lot of uncertainty about what is changing and what will come next.

One question is the rate at which pre-training will continue to scale. It may be that pre-training has topped out at a GPT-4 scale model, or it may continue increasing, but at a slower rate than before. Epoch AI suggests the compute used in LLM pre-training has been growing at about 5x per year from 2020 to 2024. It seems like that rate has now fallen, but it is not yet clear if it has gone to zero (with AI progress coming from things other than pre-training compute) or to some fraction of its previous rate.

 

This strongly suggests that even though there are still many more unused tokens on the indexed web (about 30x as many as are used in GPT-4 level pre-training), performance is being limited by lack of high-quality tokens. There have already been attempts to supplement the training data with synthetic data (data produced by an LLM), but if the issue is more about quality than raw quantity, then they need the best synthetic data they can get.

From "The Extreme Inefficiency of RL for Frontier Models" (September 19, 2025):

LLMs and next-token prediction pre-training were the most amazing boost to generality that the field of AI has ever seen, going a long way towards making AGI seem feasible. This self-supervised learning allowed it to imbibe not just knowledge about a single game, or even all board games, or even all games in general, but every single topic that humans have ever written about — from ancient Greek philosophy to particle physics to every facet of pop culture. While their skills in each domain have real limits, the breadth had never been seen before. However, because they are learning so heavily from human generated data they find it easier to climb towards the human range of abilities than to proceed beyond them. LLMs can surpass humans at certain tasks, but we’d typically expect at least a slow-down in the learning curve as they reach the top of the human-range and can no longer copy our best techniques — like a country shifting from fast catch-up growth to slower frontier growth.

The overall concept we're talking about here is to what extent the outlandish amount of capital that's being invested in AI has increased budgets for fundamental AI research. My sense of this is that it's an open question without a clear answer.

DeepMind has always been doing fundamental research, but I actually don't know if that has significantly increased in the last few years. For all I know, it may have even decreased after Google merged Google Brain and DeepMind and seemed to shift focus away from fundamental research and toward productization

I don't really know, and these companies are opaque and secretive about what they're doing, but my vague impression is that ~99% of the capital invested in AI over the last three years is going toward productizing LLMs, and I'm not sure it's significantly easier to get funding for fundamental AI research now than it was three years ago. For all I know, it's harder.

My impression is from anecdotes from AI researchers. I already mentioned Andrej Karpathy saying that he wanted to do fundamental AI research at OpenAI when he re-joined in early 2023, but the company wanted him to focus on product. I got the impression he was disappointed and I think this is a reason he ultimately quit a year later. My understanding is that during his previous stint at OpenAI, he had more freedom to do exploratory research. 

The Turing Award-winning researcher Richard Sutton said in an interview something along the lines of no one wants to fund basic research or it's hard to get money to do basic research. Sutton personally can get funding because of his renown, but I don't know about lesser-known researchers. 

A similar sentiment was expressed by the AI researcher François Chollet here:

Now LLMs have sucked the oxygen out of the room. Everyone is just doing LLMs. I see LLMs as more of an off-ramp on the path to AGI actually. All these new resources are actually going to LLMs instead of everything else they could be going to.

If you look further into the past to like 2015 or 2016, there were like a thousand times fewer people doing AI back then. Yet the rate of progress was higher because people were exploring more directions. The world felt more open-ended. You could just go and try. You could have a cool idea of a launch, try it, and get some interesting results. There was this energy. Now everyone is very much doing some variation of the same thing.

Undoubtedly, there is an outrageous amount of money going toward LLM research that can be quickly productized, toward scaling LLM training, and towards LLM deployment. Initially, I thought this meant the AI labs would spend a lot more money on basic research. I was surprised each time I heard someone such as Karpathy, Sutton, or Chollet giving evidence in the opposite direction.

It's hard to know what's the God's honest truth and what's bluster from Anthropic, but if they honestly believe that they will create AGI in 2026 or 2027, as Dario Amodei has seemed to say, and if they believe they will achieve this mainly by scaling LLMs, then why would they invest much money in basic research that's not related to LLMs or scaling them and that, even if it succeeds, probably won't be productizable for at least 3 years? Investing in diverse basic research would be hedging their bets. Maybe they are, or maybe they're so confident that they feel they don't have to. I don't know.

This is what Epoch AI says about its estimates:

Based on our compute and cost estimates for OpenAI’s released models from Q2 2024 through Q1 2025, the majority of OpenAI’s R&D compute in 2024 was likely allocated to research, experimental training runs, or training runs for unreleased models, rather than the final, primary training runs of released models like GPT-4.5, GPT-4o, and o3.

That's kind of interesting in its own right, but I wouldn't say that money allocated toward training compute for LLMs is the same idea as money allocated to fundamental AI research, if that's what you were intending to say. 

It's uncontroversial that OpenAI spends a lot on research, but I'm trying to draw a distinction between fundamental research, which, to me, connotes things that are more risky, uncertain, speculative, explorative, and may take a long time to pay off, and research that can be quickly productized. 

I don't understand the details of what Epoch AI is trying to say, but I would be curious to learn. 

Do unreleased models include as-yet unreleased models such as GPT-5? (The timeframe is 2024 and OpenAI didn't release GPT-5 until 2025.) Would it also include o4? (Is there still going to be an o4?) Or is it specifically models that are never intended to be released? I'm guessing it's just everything that hasn't been released yet, since I don't know how Epoch AI would have any insight into what OpenAI intends to release or not.

I'm also curious how much trial and error goes into training for LLMs. Does OpenAI often abort training runs or find the results to be disappointing? How many partial or full training runs go into training one model? For example, what percentage of the overall cost is the $400 million estimated for the final training run of GPT-4.5? 100%? 90%? 50%? 10%? 

Overall, this estimate from Epoch AI doesn't seem to tell us much about what amount of money or compute OpenAI is allocating to fundamental research vs. R&D that can quickly be productized. 

the fact that insane amounts of capital are going into 5+ competing companies providing commonly-used AI products should be strong evidence that the economics are looking good

Can you clarify what you mean by "the economics are looking good”? The economics of what are looking good for what?

I can think of a few different things this could mean, such as:

  • The amount of capital invested, the number of companies investing, and the number of users of AI products indicates there is no AI bubble
  • The amount of capital invested (and the competition) is making AGI more likely/making it come sooner, primarily because of scaling
  • The amount of capital invested (and the competition) is making AGI more likely/making it come sooner, primarily because it provides funding for research

Those aren’t the only possible interpretations, but those are three I thought of. 

if AGI is technically possible using something like current tech, then all the incentives and resources are in place to find the appropriate architectures.

You’re talking about research rather than scaling here, right? Do you think there is more funding for fundamental AI research now than in 2020? What about for non-LLM fundamental AI research?

The impression I get is that the vast majority of the capital is going into infrastructure (i.e. data centres) and R&D for ideas that can quickly be productized. I recall that the AI researcher/engineer Andrej Karpathy rejoined OpenAI (his previous employer) after leaving Tesla, but ended up leaving OpenAI after not too long because the company wanted him to work on product rather than on fundamental research. 

Load more