Steadcast
Dwarkesh Podcast cover art
Dwarkesh Podcast

Dario Amodei — "We are near the end of the exponential"

February 13, 20262h 22m · 27,761 words

Show notes

Dario Amodei thinks we are just a few years away from AGI — or as he puts it, from having “a country of geniuses in a data center”. In this episode, we discuss what to make of the scaling hypothesis in the current RL regime, why task-specific RL might lead to generalization, and how AI will diffuse throughout the economy. We also dive into Anthropic’s revenue projections, compute commitments, path to profitability, and more. Watch on YouTube ; read the transcript . Sponsors * Labelbox can get you the RL tasks and environments you need. Their massive network of subject-matter experts ensures realism across domains, and their in-house tooling lets them continuously tweak task difficulty to optimize learning. Reach out at labelbox.com/dwarkesh . * Jane Street sent me another puzzle… this time, they’ve trained backdoors into 3 different language models — they want you to find the triggers. Jane Street isn’t even sure this is possible, but they’ve set aside $50,000 for the best attempts and write-ups. They’re accepting submissions until April 1st at janestreet.com/dwarkesh . * Mercury ’s personal accounts make it easy to share finances with a partner, a roommate… or OpenClaw. Last week, I wanted to try OpenClaw for myself, so I used Mercury to spin up a virtual debit card with a small spend limit, and then I let my agent loose. No matter your use case, apply at mercury.com/personal-banking . Timestamps (00:00:00) - What exactly are we scaling? (00:12:36) - Is diffusion cope? (00:29:42) - Is continual learning necessary? (00:46:20) - If AGI is imminent, why not buy more compute? (00:58:49) - How will AI labs actually make profit? (01:31:19) - Will regulations destroy the boons of AGI? (01:47:41) - Why can’t China and America both have a country of geniuses in a datacenter? Get full access to Dwarkesh Podcast at www.dwarkesh.com/subscribe

Highlighted moments

The most surprising thing has been the lack of public recognition of how close we are to the end of the exponential.
Jump to 0:53 in the transcript
Each model makes money but the company loses money.
Jump to 1:10:30 in the transcript
I think there's actually a stronger history of some of these things seeming like a big deal and then kind of dissolving. Some of them are real. I mean, the need for data is real. Maybe continual learning is a real thing.
Jump to 1:21:24 in the transcript
if you give it a list of rules, it doesn't really understand the rules and it's kind of hard to generalize from them.
Jump to 2:07:03 in the transcript

Transcript

Introduction

0:00So we talked three years ago. I'm curious, in your view, what has been the biggest update of the last three years? What has been the biggest difference between what it felt like last three years versus now? Yeah, I would say, actually, the underlying technology, like the exponential of the technology, has gone, broadly speaking, I would say, about as I expected it to go. I mean, there's like plus or minus, you know, a couple. There's plus or minus a year or two here. There's plus or minus a year or two there. I don't know that I would have predicted the specific direction of code, but actually, when I look at the exponential, it is roughly what I expected in terms of the march of the models from like, you know, smart high school student to smart college student to like, you know, beginning to do Ph.D. and professional stuff.

0:44And in the case of code, reaching beyond that. So, you know, the frontier is a little bit uneven.

Lack of Recognition

0:49It's roughly what I expected. I will tell you, though, what the most surprising thing has been. The most surprising thing has been the lack of public recognition of how close we are to the end of the exponential. To me, it is absolutely wild that, you know, you have people, you know, within the bubble and outside the bubble, you know, but you have people talking about these, these, you know, just the same tired old hot button political issues. And like, you know, or around us, we're like near the end of the exponential.

Scaling

1:21I want to understand what that exponential looks like right now, because the first question I asked you when we recorded three years ago was, you know, what's up with scaling? How does it work? I have a similar question now, but I feel like it's a more complicated question, because at least from the public's point of view. Yes. Three years ago, there were these, you know, well-known public trends were across many orders of magnitude of compute. You could see how the loss improves. And now we have RL scaling and there's no publicly known scaling law for it. It's not even clear what exactly the story is of is this supposed to be teaching the model skills?

1:54Is this supposed to be teaching meta learning? What is the scaling hypothesis at this point?

Hypothesis

1:59Yeah.

Hypothesis

1:59So, so I have actually the same hypothesis that I had even all the way back in 2017. So in 2017, I think I talked about it last time, but I wrote a doc called the big blob of compute hypothesis. And, and, and, you know, it wasn't about the scaling of language models in particular. When I, when I wrote it, GPT-1 had, had just come out, right? So that was, you know, one among many things, right? There was back in those days, there was robotics. People tried to work on reasoning as a separate thing from language models. There was scaling of the kind of RL that happened, that, you know, kind of happened in AlphaGo and, you know, that, that happened at Dota at OpenAI.

2:36And, you know, people remember StarCraft at DeepMind, you know, the AlphaStar.

General Document

2:42So it was written as a more general document. And, and the specific thing I said was the following that, and, you know, it's, it's very, you know, Rich Sutton put out the bitter lesson a couple of years later. But, you know, the, the hypothesis is, is basically the same. So, so what it says is all the cleverness, all the techniques, all, all the kind of, we need a new method to, to do something like that doesn't matter very much. There are only a few things that matter. And I think I listed seven of them. One is like how much raw compute you have.

3:13The other is the quantity of data that you have. Then the third is kind of the quality and distribution of data, right? It needs to be a broad, broad distribution of data. The fourth is, I think, how long you train for. The fifth is you need an objective function that can scale to the moon. So the pre-training objective function is one such objective function, right? Another objective function is, you know, the, the kind of RL objective function that says, like, you have a goal, you're going to go out and reach the goal.

3:44Within that, of course, there's objective rewards, like, you know, like you see in math and coding. And there's more subjective rewards, like you see in RL from human feedback or kind of higher order, higher order versions of that. And, and then the sixth and seventh were things around kind of like normalization or conditioning, like, you know, just getting the numerical stability. So that kind of the big blob of compute flows in this laminar way instead of, instead of running into problems.

Pre-trained Models

4:10So that was the hypothesis. And it's a hypothesis I still hold. I don't think I've seen very much that is not in line with that hypothesis. And so the pre-trained scaling laws were one example of, of, of, of, of kind of what we see there. And indeed, those have continued going, like, you know, you know, I think, I think now it's been, it's been widely reported. Like, you know, we feel good about pre-training, like pre-training is continuing to give us gains. What has changed is that now we're also seeing the same thing for RL, right?

4:44So we're seeing a pre-training phase and then we're seeing like an RL phase on top of that. Um, and with RL, it's, it's actually just the same, like, you know, even, even other companies have, have published, um, uh, um, like, um, you know, in some of their, in some of their releases have published things that say, look, you know, we train the model on math contests, you know, AIME or, or the kind of other things. And, you know, how well, how well the model does is log linear and how long we've trained it.

5:15And we see that as well. And it's not just math contests. It's a wide variety of RL tasks. And so we're seeing the same scaling in RL that we saw for pre-training.

RL Scaling

5:27You mentioned Richard Sutton and the Bitter Lesson. Yeah. I interviewed him, uh, last year and he is actually very non-LLM pilled. And if I'm, if I, I don't know if this is his perspective, but one way to paraphrase this objection is something like, look, something which possesses the true core of human learning would not require all these billions of dollars of data and compute and these bespoke environments to learn how to use Excel or how does an account, you know, how to, how to use PowerPoint, how to navigate a web browser. And the fact that we have to build in these skills using these RL environments hints that we're actually lacking this core human learning algorithm.

Puzzle

6:10Uh, and so we're scaling the wrong thing. And so, yeah, that, that, that does raise the question. Why are we doing all this RL scaling if we do think there's something that's going to be human-like and its ability to learn on the fly? Yeah. Yeah. So I think, I think this kind of puts together several things that should be kind of thought of, thought of differently. Yeah. I think there is a genuine puzzle here, but it, it may not matter. Um, uh, in fact, I would guess it probably, it probably doesn't matter. So let's take the RL out of it for a second, because I actually think RL and it's a red herring to say that RL is any different from pre-training in this matter.

6:42Um, so if we, if we look at pre-training scaling, um, it, it was very interesting back in, you know, 2017 when Alec Radford was doing GPT-1. If you look at the models before GPT-1, they were trained on these data sets that didn't represent a wide, you know, distribution of text, right? You had like, you know, these very standard, you know, kind of language modeling benchmarks. And GPT-1 itself was trained on a bunch of, I think it was fan fiction actually.

7:13Um, but you know, it was, it was like literary, you know, it was like literary text, which is a very small fraction of the text that you get. And what we found with that, you know, and in those days it was like a billion words or something. So small data sets and represented a pretty narrow distribution, right? Like a narrow distribution of kind of what, what you can see, what you can see in the world.

Generalization

7:33And it didn't generalize well. If you did better on, um, you know, the, the, I, you know, I, I forgot what it was, some, some kind of fan fiction corpus. Um, it wouldn't generalize that well to kind of the other tat, you know, we had all these measures of like, you know, how well does the, how well does the model do at predicting all of these other kinds of texts? You really didn't see the generalization. It was only when you trained over all the tasks on the, you know, the internet, when you, when you kind of did a general internet scrape, right? From something like, you know, common crawl or scraping links on Reddit, which is what we did for GPT-2.

8:07It's only when you do that, that you kind of started to get generalization. Um, and I think we're seeing the same thing on RL, that we're starting with first very simple RL tasks, like training on math competitions. Then we're kind of moving to, you know, kind of broader, broader training that involves things like code as a task. And now we're moving to do kind of many, many other tasks. And then I think we're going to increasingly get generalization. So that, that kind of takes out the RL versus the pre-training side of it.

8:39But I think there is a puzzle here either way, which is that on pre-training, when we train the model on pre-training, you know, we, we use like trillions of tokens, right? And, and humans don't see trillions of words. So there is an actual sample efficiency difference here. There, there is actually something different that's, that's happening here, which is that the models start from scratch and, you know, they have to get much more, much more training. But we also see that once they're trained, if we give them a long context length, the only thing blocking a long context length is like inference.

9:13But if we give them like a context length of a million, they're very good at learning and adapting within that context length. And, and so I don't know the full answer to this, but, but I think there's something going on that pre-training, it's, it's not like the process of humans learning. It's somewhere between the process of humans learning and the process of human evolution. It's like, it's somewhere between like, we get many of our priors from evolution. Our brain isn't just a blank slate, right? Whole books have been written about, I think the language models, they're much more blank slates.

9:45They literally start as like random weights, whereas the human brain starts with all these regions. It's connected to all these inputs and outputs. Um, and, and so maybe we should think of pre-training and for that matter, RL as well as, as being something that exists in the middle space between human evolution and, you know, kind of human on, on the spot learning.

Hierarchy of Learning

10:07And as the in context learning that the models do as, as something between long-term human learning and short-term human learning. So, you know, there, there's this hierarchy of like, there's evolution, there's long-term learning, there's short-term learning, and there's just human reaction. And the LOM phases exist along this spectrum, but not necessarily exactly at the same points that there's no analog to some of the human modes of learning. The LOMs are kind of falling between the points.

10:38Does that make sense? Yes. Although some things are still a bit confusing. For example, if the analogy is that this is like evolution, so it's fine that it's not that sample efficient. Then like, well, if we're going to get the kind of super sample efficient agent from in context learning, why are we bothering to build in, you know, there's RL environment companies, which are, it seems like what they're doing is they're teaching it how to use this API, how to use Slack, how to use whatever. It's confusing to me why there's so much emphasis on that. If the kind of agent that can just learn on the fly is emerging or is going to soon emerge or has already emerged.

11:10Yeah, yeah. So, I mean, I can't speak for the emphasis of anyone else. I can, I can only talk about how we, how we think about it. I think the way we think about it is the goal is not to teach the model every possible skill within RL, just as we don't do that within pre-training, right? Within pre-training, we're not trying to expose the model to, you know, every, every possible, you know, way that words could be put together, right? You know, where it's, it's rather that the model trains on a lot of things and then, and then it reaches generalization across pre-training, right?

11:43That was, that was the transition from GPT-1 to GPT-2 that I saw up close, which is like, you know, the, the model reaches a point. You know, I, I, I like had these moments where I was like, oh yeah, you just give the model, like, you just give the model a list of numbers. That's like, you know, you know, this is the cost of the house. This is the square feet of the house. And the model completes the pattern and does linear regression, like not great, but it does it, but it's never seen that exact thing before. And, and so to, you know, to, to the extent that we are building these RL environments, the, the goal is, is very similar to what is, you know, to what was done five or 10 years ago with pre-training with, we're trying to get a, we're trying to get a whole bunch of data, not because we want to cover a specific document or a specific skill,

12:32but because we want to generalize. I mean, I, I think the framework you're laying down obviously makes sense. Like we're making progress towards AGI. I think the crux is something like nobody at this point disagrees that we're going to achieve AGI in this century. And the crux is you say we're hitting the end of the exponential. Um, and somebody else looks at this and says, oh yeah, we're making progress. We've been making progress since 2012. And then 2035 we'll have a human like agent. And so I want to understand what it is that you're seeing, which makes you think, um, yeah, obviously we're seeing the kinds of things that evolution did or that human within the human lifetime learning is like in these models.

13:10And why think that it's one year away and not 10 years away? I, I, I actually think of it as like two, there's kind of two cases to be made here. Or like two, two claims you could make, one of which is like stronger and the other of which is weaker. So I think starting, starting with the weaker claim, you know, when, when I first saw the scaling back in like, you know, 2019, um, you know, I wasn't sure, you know, this was the whole, this was kind of a 50, 50 thing, right? I thought I saw something that was, you know, and, and, and my claim was, this is much more likely than anyone thinks it is like, this is wild.

13:46No one else would even consider this. Maybe there's a 50% chance this happens. Um, on the basic hypothesis of, you know, as you put it within 10 years, we'll get to, you know, you know, what I call kind of country of geniuses in a data center. I'm at like 90% on that. Um, and it's hard to go much higher than 90% because the world is so unpredictable. Um, maybe the irreducible uncertainty would be if we were at 95% where you get to things like, I don't know, maybe, you know, multiple companies have, you know, kind of internal turmoil and nothing happens.

14:21And then Taiwan gets invaded and like all the, all the fabs get blown up by missiles and, and, you know, and then now you would drink to Staria. You know, just, you could construct a scenario where there's like a 5% chance that it, it, or, you know, you can construct a 5% world where like things, things get delayed for, for, for, for, for, for, for 10 years. That's maybe 5%. There's another 5%, which is that I'm very confident on tasks that can be verified. So I think, I think with coding, I'm just, except for that irreducible uncertainty, there's just, there's, I mean, I think we'll be there in one or two years.

14:55There's no way we will not be there in 10 years in terms of being able to do it end to end coding. My one little bit, the one little bit of, of fundamental uncertainty, even on long timescales is this thing about tasks that aren't verifiable, like planning a mission to Mars, like, uh, you know, doing some fundamental scientific discovery, like, like CRISPR, like, you know, writing a, writing a novel. Hard to, hard to, hard to verify those tasks. I am almost certain that we have a reliable path to get there, but like, if there was a little bit uncertainty, it's there.

15:33So, so, so, so, so, so on the 10 years, I'm like, you know, 90%, which is about as certain as you can be. Like, I think it's, I think it's crazy to say that this won't happen by, by, by 2035. Like, in some sane world, it would be outside the mainstream. But, but the emphasis on verification hints to me as a lack of, a lack of, uh, belief that these models are generalized. If you think about humans, we are good at things that both, which we get verifiable reward and things which we don't.

16:04You were like, you haven't just started. No, no, no, no. This is, this is why I'm almost sure. We already see substantial generalization from things that, that verify to things that don't verify. We're already seeing that. But it seems like you were emphasizing this as a spectrum, which will, uh, split apart, which domains we see more progress. And I'm like, but that's, it doesn't seem like how humans get better. The world in which we don't make it or, or the world in which we don't get there is the world in which we do, we do all the things that are, that are verifiable. And then they like, you know, many of them generalize, but, but we kind of don't get fully there.

16:36We don't, we don't, we don't fully, you know, we don't fully color in this side of the box. It's, it's, it's not a, it's not a binary thing. But, but it also seems to me, even if, even if in the world where generalization is weak, when you only say to verify domains, it's not clear to me in such a world, you could automate software engineering. Because software, like in some sense, you are quote unquote, a software engineer. Yeah. But part of being a software engineer for you involves writing these like long memos about your grand vision about different things. Well, I don't think that's part of the job of SWE. That's part, that's part of the job of the company. But I do think SWE involves like design documents and other things like that, which by the way, the models are not bad.

17:11They're already pretty good at writing comments. And so with, with, again, I, again, I'm making like much weaker claims here than I believe to like, you know, to, to, to, to, to kind of set up a, you know, to, to distinguish between two things. Like we're, we're already almost there for software engineering. We are already almost there. By, by what metric? There's one metric, which is like how many lines of code are written by AI. And if you use, if you consider other productivity improvements in the course of the history of software engineering, compilers write all the lines of software. And, but we, there's a difference between how many lines are written and how big the productivity improvement is.

17:42Oh yeah. And then like, we're almost there meaning like the, how big is the productivity improvement, not just how many lines are written. Yeah, yeah. So, so I actually, I actually, I actually agree with you on this. So I, I've made this series of predictions on code and software engineering and, and, and I think people have repeatedly kind of misunderstood them. So, so let me, let me, let me, let me, let me lay out the spectrum, right? Like, I think it was like, you know, like, you know, eight, eight or nine months ago or something. I said, you know, the AI model will be writing 90, 90% of the lines of code in like, you know, three to six months, which, which happened at least at some places, right?

18:19Happened, happened at Anthropic, happened with many people downstream using our models. But, but that's actually a very weak criterion, right? People thought I was saying like, we won't need 90% of the software engineers. Those things are worlds apart, right? Like I would put the spectrum as 90% of code is written by the model. A hundred percent of code is written by the model. And that's a big difference in productivity. 90% of the end to end suite tasks, right? Including things like compiling, including things like setting up clusters and environments, testing features, writing memos.

18:5490% of the suite tasks are written by the models. 100% of today's suite tasks are, are, are, are written by the models. And, and even when, when, when that happens, it doesn't mean software engineers are out of a job. Like there's like new higher level things they can do where they can, they can manage. And then there's a further down the spectrum, like, you know, there's 90% less demand for suite, which I think will happen. But like, this is, this, this is a spectrum. And, you know, I, I wrote about it in, in the adolescence of technology where I went through this kind of spectrum with farming. Um, uh, uh, and so I, I actually totally agree with you on that.

19:28It's just, these are very different benchmarks from each other, but we're proceeding through them super fast. It seems like in part of your vision, it's like going from 90 to a hundred. Um, first it's going to happen fast. And two, that somehow that leads to huge productivity improvements. Um, whereas when I noticed even in greenfield projects that people start with cloud code or something, people report starting a lot of projects. And I'm like, do we see in the world out there a renaissance of software, all these new features that wouldn't exist otherwise. And at least so far, it doesn't seem like we see that.

20:00And so that does make me wonder, even if, even if like I never had to intervene on cloud code, um, there is this thing of like, there's just, the world is complicated. Jobs are complicated and closing the loop on self-contained systems, whether it's just writing software or something, how much sort of, how much broader gains we would see just from that. And so maybe that makes us, this should dilute our estimation of the country of geniuses. Well, I actually, I like, I like simultaneously, I simultaneously agree with you, agree that it's a reason why these things don't happen instantly.

20:35But at the same time, I think the, the, the effect is going to be very fast. So like, I don't know, you can have these two poles, right? One is like, um, you know, AI is like, you know, it's not going to make progress. It's slow. Like it's going to take, you know, kind of forever to diffuse within the economy, right? Economic diffusion has become one of these buzzwords. That's like a reason why we're not going to make AI progress or why AI progress doesn't matter. And, and, you know, the other axis is like, we'll get recursive self-improvement, you know, the whole thing, you know, can't you just draw an exponential line on the, on the curve?

21:07You know, it's, it's, we're going to have, you know, Dyson spheres around the sun and like, you know, you know, so many nanoseconds after, you know, after, after we get recursive, I mean, I'm completely caricaturing the view here, but like, you know, there, there, there are these two extremes, but what we've seen from, from the beginning, you know, at least if you look within Anthropic, there's this bizarre 10 X per year growth in revenue that we've seen, right? So, you know, in 2023, it was like zero to a hundred million, 2024, it was a hundred million to a billion, 2025, it was a billion to like nine or 10 billion.

21:45And then. You guys should have just bought like a billion dollars with your own products. So you could just like have a clean 10 V. And the first month of this year, like that, that exponential is, you would think it would slow down, but it would like, you know, we, we added another few billion to like, you know, to, to, to, we added another few billion to revenue in January. And, and so, you know, obviously that curve can't go on forever, right? You know, the GDP is only so large. I don't, you know, I, I would even guess that it bends, that it bends, bend somewhat this year, but like, that is like a fast curve, right?

22:18That's like a, that's like a really fast curve. And I would bet it stays pretty fast, even as the scale goes to the entire economy. So like, I, I think we should be thinking about this middle world where things are like extremely fast, but not instant, where they take time because of economic diffusion, because of the need to close the loop, because, you know, it's like this fiddly, oh man, I have to do change management within my enterprise. You know, I have to like, you know, you know, I, I, I like, I set this up, but, but, you know, I have to change the security permissions on this in order to make it actually work.

22:54Or, you know, I had this like old piece of software that, you know, that like, you know, checks the model before it's compiled and, and, and like released and I have to rewrite it. And yes, the model can do that, but I have to tell the model to do that. And it has to, it has to take time to do that. And, and, and so I think everything we've seen so far is, is compatible with the idea that there's one fast exponential that's the, the capability of the model. And then there's another fast exponential that's downstream of that, which is the diffusion of the model into the economy, not instant, not slow, much faster than any previous technology, but it has its limits.

23:33And, and, and, and, and this is what we, you know, when I, when I look inside Anthropic, when I look at our customers, fast adoption, but not infinitely fast. Can I try a hot take on you? Yeah. I feel like diffusion is cope that people use to say when it's like, if the model isn't able to do something, they're like, oh, but the diffusion, it's like a diffusion issue. But then you should use the comparison to humans, you would think that the inherent advantages that AIs have would make diffusion a much easier problem for new AIs getting onboarded than new humans getting onboarded.

24:05So an AI can read your entire Slack and your drive in minutes. They can share all the knowledge that the other copy, other copies of the same instance have. You don't have this adverse selection problem when you're hiring AIs because you can just hire copies of a vetted AI model. Hiring a human is like so much more hassle and people hire humans all the time, right? We pay humans upwards of $50 trillion in wages because they're useful, even though it's like, in principle, it would be much easier to integrate AIs into the economy than it is to hire humans. I think like the diffusion, I feel like doesn't really explain. I think diffusion is very real and doesn't have to, you know, doesn't exclusively have to do with limitations on the AI models.

24:45Like, again, there are people who use diffusion to, you know, as kind of a buzzword to say this isn't a big deal. I'm not talking about that. I'm not talking about, you know, AI will diffuse at the speed that previous. I think AI will diffuse much faster than previous technologies have, but not infinitely fast. So I'll just give an example of this, right? Like there's like cloud code. Like cloud code is extremely easy to set up. You know, if you're a developer, you can kind of just start using cloud code. There is no reason why a developer at a large enterprise should not be adopting cloud code as quickly as, you know, an individual developer or developer at a startup.

25:24And we do everything we can to promote it, right? We sell cloud code to enterprises and big enterprises like, you know, big financial companies, big pharmaceutical companies, all of them, they're adopting cloud code much faster than enterprises typically adopt new technology, right? But, but again, it like, it, it, it, it, it takes time. Like any given feature or any given product like cloud code or like co-work will get adopted by the, you know, the individual developers who are on Twitter all the time,

25:59by the like series A startups many months faster than, you know, than they will get adopted by like, you know, a like large enterprise that does food sales. Um, there are a number of factors, like you have to go through legal, you have to provision it for everyone. It has to, you know, like it has to pass security and compliance. The leaders of the company who are further away from the AI revolution, you know, are, are forward looking, but they have to say, oh, it makes sense for us to spend 50 million.

26:31This is what this cloud code thing is. This is why it helps our company. This is why it makes us more productive. And then they have to explain to the people two levels below and they have to say, okay, we have 3000 developers. Like here's how we're going to roll it out to our developers. And we have conversations like this every day. Like, you know, we are doing everything we can to make Anthropics revenue grow 20 or 30 X a year instead of 10 X a year. Um, you know, and, and, and again, you know, many enterprises are just saying, this is so productive.

27:01Like, you know, we're going to take shortcuts on our usual procurement process, right? They're moving much faster than, you know, when we tried to sell them just the ordinary API, which many of them use, but cloud code is a more compelling product. Um, but it's not an infinitely compelling product. And I don't think even AGI or powerful AI or country of geniuses in the data center will be an infinitely compelling product. It will be a compelling product enough, maybe to get three or five or 10 X a year growth, even when you're in the hundreds of billions of dollars, which is extremely hard to do.

27:31And it has never been done in history before, but not infinitely fast. I, I, I buy that it would be a slight slowdown and maybe this is not your claim, but sometimes people talk about this, like, oh, the capabilities are there, but because of diffusion, um, otherwise like we're basically at AGI and then. I, I, I don't believe we're basically at AGI. I think if you had the country of geniuses in a data center, if your company didn't adopt the country of geniuses in a data center. If you had the country of geniuses in a data center, we would know it. Right. Yeah. We would know it if you had the country of geniuses in a data center, like everyone in this room would know it.

28:01Everyone in Washington would know it. Like, you know, people in rural, rural parts that might not know it, but, but, but like we would know it. We don't have that now. That's very clear. As Dario was hinting at, to get generalization, you need to train across a wide variety of realistic tasks and environments. For example, with a sales agent, the hardest part isn't teaching it to mash buttons in a specific database in Salesforce. It's training the agent's judgment across ambiguous situations. How do you sort through a database with thousands of leads to figure out which ones are hot?

28:33How do you actually reach out? What do you do when you get ghosted? When an AI lab wanted to train a sales agent, Labelbox brought in dozens of Fortune 500 salespeople to build a bunch of different RL environments. They created thousands of scenarios where the sales agent had to engage with a potential customer, which was role-played by a second AI. Labelbox made sure that this customer AI had a few different personas, because when you cold call, you have no idea who's going to be on the other end. You need to be able to deal with a whole range of possibilities. Labelbox's sales experts monitored these conversations turn by turn, tweaking the role-playing agent to ensure it did the kinds of things an actual customer would do.

29:08Labelbox could iterate faster than anybody else in the industry. This is super important because RL is an empirical science. It's not a solve problem. Labelbox has a bunch of tools for monitoring agent performance in real time. This lets their experts keep coming up with tasks so that the model stays in the right distribution of difficulty and gets the optimal reward signal during training. Labelbox can do this sort of thing in almost every domain. They've got hedge fund managers, radiologists, even airline pilots. So whatever you're working on, Labelbox can help. Learn more at labelbox.com slash Vorkash.

29:42Coming back to concrete predictions, because I think because there's so many different things to disambiguate, it can be easy to talk past each other when we're talking about capabilities. So, for example, when I interviewed you three years ago, I asked you a prediction about what should we expect three years from now. And I think you were right. So you said we should expect systems, which if you talk to them for the course of an hour, it's hard to tell them apart from a generally well-educated human. Yes. I think you were right about that. And I think spiritually I feel unsatisfied because my internal expectation was that such a system could automate large parts of white-collar work.

30:15And so it might be more productive to talk about the actual end capabilities. You want such a system. So I will basically tell you where I think we are. But let me ask you a very specific question so that we can figure out exactly what kinds of capabilities we should expect soon. So maybe I'll ask about it in the context of a job I understand well, not because it's the most relevant job, but just because I can evaluate the claims about it. Take video editors, right? I have video editors. And part of their job involves learning about our audience's preferences, learning about my preferences and tastes and the different trade-offs we have and just over the course of many months building up this understanding of context.

30:54And so the skill and ability they have six months into the job, a model that can pick up that skill on the job, on the fly, when should we expect such an AI system? Yeah. So I guess what you're talking about is like, you know, we're doing this interview for three hours and then like, you know, someone's going to come in, someone's going to edit it. They're going to be like, oh, you know, you know, I don't know, Dario like, you know, scratched his head and, you know, we could edit that out. And, you know, there was this like long discussion that like is less interesting to people. And then, you know, then there's other thing that's like more interesting to people.

31:27So, you know, let's kind of make this edit. So, you know, I think the country of geniuses in a data center will be able to do that. The way it will be able to do that is, you know, it will have general control of a computer screen, right? Like, you know, and you'll be able to feed this in and it'll be able to also use the computer screen to like go on the web, look at all your previous, look at all your previous interviews. Like look at what people are saying on Twitter in response to your interviews. Like talk to you, ask you questions, talk to your staff, look at the history of kind of edits, edits that you did.

31:59And from that, like do the job. So, I think that's dependent on several things. One that's dependent and I think this is one of the things that's actually blocking deployment, getting to the point on computer use where the models are really masters at using the computer, right? And, you know, we've seen this climb in benchmarks and benchmarks are always, you know, imperfect measures. But like, you know, OS world is, you know, went from, you know, like 5%, you know, like I think when we first released, you know, computer use like a year and a quarter ago, it was like maybe 15%.

32:33I don't remember exactly, but we've climbed from that to like 65% or 70%. And, you know, there may be harder measures as well, but I think computer use has to pass a point of reliability. Can I just ask a follow-up on that before you move on to the next point? I often, for years, I've been trying to build different internal LLM tools for myself. And I often, I have these text in, text out tasks, which should be dead center in the repertoire of these models. And yet, I still hire humans to do them just because it's, if it's something like make, identify what the best clips would be in this transcript.

33:07And maybe they'll do like a 7 out of 10 job at them. But there's not this ongoing way I can engage with them to help them get better at the job the way I could with a human employee. And so that missing ability, even if you saw computer use, would still block my ability to like offload an actual job to them. Again, this gets back to kind of what we were talking about before with learning on the job where it's very interesting. You know, I think with the coding agents, like, I don't think people would say that learning on the job is what is, you know, preventing the coding agents from like, you know, doing everything end-to-end.

33:43Like, they keep getting better. We have engineers at Anthropic who like don't write any code. And when I look at the productivity to your previous question, you know, we have folks who say this GPU kernel, this chip, I used to write it myself. I just have Claude do it. And so there's this enormous improvement in productivity. And I don't know, like when I see Claude code, like familiarity with the code base or like, you know, or a feeling that the model hasn't worked at the company for a year, that's not high up on the list of complaints I see.

34:18And so I think what I'm saying is we're like, we're kind of taking a different path. Don't you think with coding that's because there is an external scaffold of memory which exists instantiated in the code base, which I don't know how many other jobs have coding made fast progress precisely because it has this unique advantage that other economic activity doesn't. But when you say that, what you're implying is that by reading the code base into the context, I have everything that the human needed to learn on the job. So that would be an example of whether it's written or not, whether it's available or not, a case where everything you needed to know you got from the context window, right?

34:59And that what we think of as learning, like, oh, man, I started this job. It's going to take me six months to understand the code base. The model just did it in the context. Yeah, I honestly don't know how to think about this because there are people who qualitative report what you're saying. There was a meter study I'm sure you saw last year where they had experienced developers try to close pull request in repositories that they were familiar with. And those developers reported an uplift. They reported that they felt more productive with the use of these models.

35:31But in fact, if you look at their output and how much was actually merged back in, there's a 20% downlift. They were less productive as a result of these models. And so I'm trying to square the qualitative feeling that people feel with these models versus, one, in a macro level, where is this, like, renaissance of software? And then, two, when people do these independent evaluations, why are we not seeing the productivity benefits that we would expect? Within Anthropic, this is just really unambiguous, right? We're under an incredible amount of commercial pressure and make it even harder for ourselves because we have all this safety stuff we do that I think we do more than other companies.

36:06So, like, the pressure to survive economically while also keeping our values is just incredible, right? We're trying to keep this 10x revenue curve going. There's, like, there is zero time for bullshit. There is zero time for feeling like we're productive when we're not. Like, these tools make us a lot more productive. Like, why do you think we're concerned about competitors using the tools? Because we think we're ahead of the competitors. And, like, we don't want to excel.

36:38We wouldn't be going through all this trouble if this was secretly reducing our productivity. Like, we see the end productivity every few months in the form of model launches. Like, there's no kidding yourself about this. Like, the models make you more productive.

36:57One, people feeling like they're more productive is qualitatively predicted by studies like this. But, two, if I just look at the end output, obviously, you guys are making fast progress. But the fact, you know, the idea was supposed to be with recursive self-improvement is that you make a better AI. The AI helps you build a better next AI, et cetera, et cetera. And what I see instead, if I look at the you, OpenAI, DeepMind, is that people are just shifting around the podium every few months. And maybe you think that stops because you've won or whatever. But why are we not seeing the person with the best coding model have this lasting advantage if, in fact, there are these enormous productivity gains from the last coding model?

37:36Yeah. So, no, no, no. I mean, I think it's all, like, my model of the situation is there's an advantage that's gradually growing. Like, I would say right now the coding models give maybe, I don't know, a, like, 15%, maybe 20% total factor speed up. Like, that's my view. And six months ago, it was maybe 5%. And so it didn't matter. Like, 5% doesn't register. It's now just getting to the point where it's, like, one of several factors that kind of matters.

38:09And that's going to keep speeding up. And so I think six months ago, like, you know, there were several companies that were at roughly the same point because, you know, this wasn't a notable factor. But I think it's starting to speed up more and more. You know, I would also say there are multiple companies that, you know, write models that are used for code. And, you know, we're not perfectly good at, you know, preventing some of these other companies from using – from kind of using our models internally.

38:39So, you know, I think everything we're – kind of everything we're seeing is consistent with this kind of – this kind of snowball model where, you know, there's no hard – again, my theme in all of this is, like, all of this is soft takeoff. Like, soft, smooth exponentials, although the exponentials are relatively steep. And so we're seeing this snowball gather momentum where it's, like, 10%, 20%, 25%, you know, 40%.

39:11And as you go, yeah, Amdahl's Law, you have to get all the, like, things that are preventing you from closing the loop out of the way. But, like, this is one of the biggest priorities within Anthropic. Like, stepping back, I think before in the stack we were talking about, well, when do we get this on-the-job learning? And it seems like the coding – the point you were making of the coding thing is we actually don't need on-the-job learning. That you can have tremendous productivity improvements. You can have potentially trillions of dollars of revenue for AI companies without this basic human ability – maybe that's not your claim, you should clarify.

39:42But without this basic human ability to learn on the job. But I just look at, like, in most domains of economic activity, people say, I hired somebody, they weren't that useful for the first few months, and then over time they built up the context understanding. It's actually hard to define what we're talking about here. But they got something. And then now they're a power horse and they're so valuable to us. And if AI doesn't develop this ability to learn on the fly, I'm not – I'm a bit skeptical that we're going to see huge changes to the world without that ability.

40:13Yeah, so I think two things here, right? There's the state of the technology right now, which is, again, we have these two stages. We have the pre-training and RL stage where you throw a bunch of data and tasks into the models, and then they generalize. So it's like learning, but it's like learning from more data and not learning over kind of one human or one model's lifetime. So, again, this is situated between evolution and human learning. But once you learn all those skills, you have them.

40:45And just like with pre-training, just how the models know more – if I look at a pre-trained model, it knows more about the history of samurai in Japan than I do. It knows more about baseball than I do. It knows, you know, it knows more about, you know, low-pass filters and electronics. You know, all of these things, its knowledge is way broader than mine. So I think even just that, you know, may get us to the point where the models are better at – you know, kind of better at everything.

41:17And then we also have, again, just with scaling the kind of existing setup, we have the in-context learning, which I would describe as kind of like human on-the-job learning but like a little weaker and a little short-term. Like you look at in-context learning. You give the model a bunch of examples. It does get it. There's real learning that happens in context. And like a million tokens is a lot. That's – you know, that can be days of human learning, right? You know, if you think about the model, you know, kind of reading a million words, you know, it takes me – how long would it take me to read a million?

41:51I mean, you know, like days or weeks at least. So you have these two things, and I think these two things within the existing paradigm may just be enough to get you the country of geniuses in the data center. I don't know for sure, but I think they're going to get you a large fraction of it. There may be gaps, but I certainly think just as things are, this, I believe, is enough to generate trillions of dollars of revenue. That's one. That's all one. One, two, is this idea of continual learning, this idea of a single model learning on the job.

42:23I think we're working on that too, and I think there's a good chance that in the next year or two we also make – we also solve that. Again, I – you know, I think you get most of the way there without it. I think the trillions of dollars of, you know, the – I think the trillions of dollars a year market, maybe all of the national security implications and the safety implications that I wrote about in adolescence of technology can happen without it. But I also think we – and I imagine others are working on it, and I think there's a good chance that, you know, that we get there within the next year or two.

43:03There are a bunch of ideas. I won't go into all of them in detail, but, you know, one is just make the context longer. There's nothing preventing longer context from working. You just have to train at longer context and then learn to serve them at inference, and both of those are engineering problems that we are working on and that I would assume others are working on as well. Yeah, so this context line increase, it seemed like there was a period from 2020 to 2023 where from GPT-3 to GPT-4 Turbo, there was an increase from, like, 2,000 context lines to 128K. I feel like for the next – for the two-ish years since then, we've been in the same-ish ballpark.

43:36Yeah. And when context lines get much longer than that, people report qualitative degradation in the ability of the model to consider that full context. So I'm curious what you're internally seeing that makes you think, like, oh, 10 million context, 100 million context, to get human, like, six-month learning, billion context. This isn't a research problem. This is an engineering and inference problem, right? If you want to serve long context, you have to, like, store your entire KV cache. You have to – you know, it's difficult to store all the memory in the GPUs, to juggle the memory around.

44:11I don't even know the detail, you know, at this point, this is at a level of detail that I'm no longer able to follow, although, you know, I knew it in the GPT-3 era of, like, you know, these are the weights, these are the activations you have to store. But, you know, you know, these days the whole thing has flipped because we have MOE models and kind of all of that. But – and this degradation you're talking about, like, again, without getting too specific, like, a question I would ask is, like, there's two things. There's the context length you train at and there's a context length that you serve at.

44:43If you train at a small context length and then try to serve at a long context length, like, maybe you get these degradations. It's better than nothing. You might still offer it, but you get these degradations. And maybe it's harder to train at a long context length. So, you know, there's a lot. I want to, at the same time, ask about, like, maybe some rabbit holes of, like, well, wouldn't you expect that if you had to train on longer context length, that would mean that you're able to get sort of, like, less samples in for the same amount of compute? But before – maybe it's not worth diving deep on that. I want to get an answer to the bigger picture question, which is, like, okay, so I don't feel a preference for a human editor that's been working for me for six months versus an AI that's been working with me for six months.

45:27What year do you predict that that will be the case? I mean, you know, my guess for that is, you know, there's a lot of problems that are basically like we can do this when we have the country of geniuses in a data center. And so, you know, my picture for that is, you know, again, if you made me guess, it's like one to two years, maybe one to three years. It's really hard to tell. I have a strong view, 99%, 95% that, like, all this will happen in 10 years.

45:57Like, I think that's just a super safe bet. And then I have a hunch this is more like a 50-50 thing, that it's going to be more like one to two, maybe more like one to three. So one to three years. The country of geniuses, and the slightly less economically valuable task of editing videos. It seems pretty economically valuable, let me tell you. It's just there are a lot of use cases like that, right? There are a lot of similar ones. Exactly. So you're predicting that within one to three years. And then generally, Anthropic has predicted that by late 26, early 27, we will have AI systems that are, quote, have the ability to navigate interfaces available to humans doing digital work today, intellectual capabilities matching or exceeding that of Nobel Prize winners, and the ability to interface with the physical world.

46:40And then you gave an interview two months ago with DealBook, where you're emphasizing your company's more responsible compute scaling as compared to your competitors. And I'm trying to square these two views, where if you really believe that we're going to have a country of geniuses, you want as big a data center as you can get. There's no reason to slow down. The TAM of a Nobel Prize winner that is actually can do everything a Nobel Prize winner can do is, like, trillions of dollars. And so I'm trying to square this conservatism, which seems rational if you have more moderate timelines, with your stated views about AI progress.

47:16Yeah. So it actually all fits together. And we go back to this fast but not infinitely fast diffusion. So, like, let's say that we're making progress at this rate. You know, the technology is making progress this fast. Again, I have, you know, very high conviction that, like, it's going, you know, we're going to get there within a few years. I have a hunch that we're going to get there within a year or two. So a little uncertainty on the technical side but, like, you know, pretty strong confidence that it won't be off by much.

47:48What I'm less certain about is, again, the economic diffusion side. Like, I really do believe that we could have models that are a country of geniuses in a data center in one to two years. One question is, how many years after that do the trillions in, you know, do the trillions in revenue start rolling in? I don't think it's guaranteed that it's going to be immediate. You know, I think it could be one year.

48:22It could be two years. I could even stretch it to five years, although I'm, like, I'm skeptical of that. And so we have this uncertainty, which is even if the technology goes as fast as I suspect that it will, we don't know exactly how fast it's going to drive revenue. We know it's coming, but with the way you buy these data centers, if you're off by a couple years, that can be ruinous. It is just like how I wrote, you know, in Machines of Loving Grace, I said, look, I think we might get this powerful AI, this country of geniuses in the data center.

48:56That description you gave comes from the Machines of Loving Grace. I said, well, get that 2026, maybe 2027 again. That is my hunch. Wouldn't be surprised if I'm off by a year or two, but, like, that is my hunch. Let's say that happens. That's the starting gun. How long does it take to cure all the diseases, right? That's one of the ways that, like, drives a huge amount of economic value, right? Like, you cure every disease. You know, there's a question of how much of that goes to the pharmaceutical company, to the AI company, but there's an enormous consumer surplus because everyone, you know, assuming we can get access for everyone, which I care about greatly.

49:30We, you know, we cure all of these diseases. How long does it take? You have to do the biological discovery. You have to, you know, you have to, you know, manufacture the new drug. You have to, you know, go through the regulatory process. I mean, we saw this with, like, vaccines and COVID, right? Like, there's just this, we got the vaccine out to everyone, but it took a year and a half, right? And so my question is, how long does it take to get the cure for everything, which AI is the genius that can, in theory, invent out to everyone?

50:00How long from when that AI first exists in the lab to when diseases have actually been cured for everyone, right? And, you know, we've had a polio vaccine for 50 years. We're still trying to eradicate it in the most remote corners of Africa. And, you know, the Gates Foundation is trying as hard as they can. Others are trying as hard as they can. But, you know, that's difficult. Again, I, you know, I don't expect most of the economic diffusion to be as difficult as that, right? That's, like, the most difficult case. But there's a real dilemma here, and where I've settled on it is it will be faster than anything we've seen in the world, but it still has its limits.

50:41And so then when we go to buying data centers, you know, you, again, again, the curve I'm looking at is, okay, we, you know, we've had a 10x a year increase every year. So beginning of this year, we're looking at 10 billion in annual, you know, rate of annualized revenue at the beginning of the year. We have to decide how much compute to buy.

51:06And, you know, it takes a year or two to actually build out the data centers, to reserve the data centers. So basically I'm saying, like, in 2027, how much compute do I get? Well, I could assume that the revenue will continue growing 10x a year, so it'll be 100 billion at the end of 2026 and 1 trillion at the end of 2027. And so I could buy a trillion dollars.

51:37Actually, it would be like $5 trillion of compute because it would be a trillion dollar a year for five years, right? I could buy a trillion dollars of compute that starts at the end of 2027. And if my revenue is not a trillion dollars, if it's even 800 billion, there's no force on earth. There's no hedge on earth that could stop me from going bankrupt if I buy that much compute. And so even though a part of my brain wonders if it's going to keep growing 10x, I can't buy a trillion dollars a year of compute in 2027.

52:14If I'm just off by a year in that rate of growth or if the growth rate is 5x a year instead of 10x a year, then, you know, you go bankrupt. And so you end up in a world where, you know, you're supporting hundreds of billions, not trillions, and you accept some risk that there's so much demand that you can't support the revenue. And you accept still some risk that, you know, you got it wrong and it's still slow. And so when I talked about behaving responsibly, what I meant actually was not the absolute amount.

52:49That actually was not, you know, I think it is true we're spending somewhat less than some of the other players. It's actually the other things like have we been thoughtful about it or are we YOLOing and saying, oh, we're going to do $100 billion here, $100 billion there. I kind of get the impression that, you know, some of the other companies have not written down the spreadsheet, that they don't really understand the risks they're taking. They're just kind of doing stuff because it sounds cool. And we've thought carefully about it, right? We're an enterprise business.

53:20Therefore, you know, we can rely more on revenue. It's less fickle than consumer. We have better margins, which is the buffer between buying too much and buying too little. And so I think we bought an amount that allows us to capture pretty strong upside worlds. It won't capture the full 10x a year. And things would have to go pretty badly for us to be in financial trouble. So I think we've thought carefully and we've made that balance. And that's what I mean when I say that we're being responsible.

53:50Okay. So it seems like it's possible that we actually just have different definitions of the country of a genius in a data center. Because when I think of like actual human geniuses, an actual country of human geniuses in a data center, I'm like, I would happily buy $5 trillion worth of compute to run actual country of human geniuses in a data center. So let's say JP Morgan or Moderna or whatever doesn't want to use them. Also, I've got a country of geniuses. They'll start their own company. And if like they can't start their own company and they're bottlenecked by clinical trials, it is worth stating with clinical trials. It's like most clinical trials fail because the drug doesn't work.

54:23There's not efficacy, right? And I make exactly that point in Machines of Love and Grace. I say the clinical trials are going to go much faster than we're used to, but not instant, not infinitely fast. And then suppose it takes a year for the clinical trials to work out so that you're getting revenue from that and can make more drugs. Okay, well, you've got a country of geniuses and you're an AI lab and you have – you could use many more AI researchers. You also think that there's these like self-reinforcing gains from smart people working on AI tech.

54:53So like, okay, you can have the – That's right. You can have the data center working on like AI progress. Is there more gains from buying – like substantially more gains from buying a trillion dollars a year of compute versus $300 billion a year of compute? If your competitor is buying a trillion, yes, there is. Well, no, there's some gain, but then – but again, there's this chance that they go bankrupt before – you know, again, if you're off by only a year, you destroy yourselves. That's the balance. We're buying a lot.

55:24We're buying a hell of a lot. Like we're not – you know, we're buying an amount that's comparable to that that, you know, the biggest players in the game are buying. But if you're asking me why haven't we signed, you know, $10 trillion of compute starting in mid-2027, first of all, it can't be produced. There isn't that much in the world. But second, what if the country of geniuses comes but it comes in mid-2028 instead of mid-2027?

55:54You go bankrupt. So if your projection is one to three years, it seems like you should have won $10 trillion of compute by 2029, 2020, maybe 2020. I mean, like – I mean, you know – But like are you – like it seems like even in your – the longest version of the timelines you state, the compute you are ramping up to build doesn't seem in accordance. What makes you think that? Well, as you said, you would want the $10 trillion – like human wages, let's say, are on the order of $50 trillion a year. If you look at – so I won't talk about Anthropic in particular.

56:27But if you talk about the industry, like the amount of compute the industry – you know, the amount of compute the industry is building this year is probably in the, you know, I don't know, very low tens of – you know, call it 10, 15 gigawatts next year. I, you know, it goes up by roughly 3X a year. So like next year is 30 or 40 gigawatts and 2028 might be 100, 2029 might be like 300 gigawatts.

56:58And like each gigawatt costs like maybe 10 – I mean, I'm doing the math in my head. But each gigawatt costs maybe $10 billion, you know, order $10 to $15 billion a year. So, you know, you kind of – you know, you put that all together and you're getting about what you described. You're getting multiple trillions a year by 2028 or 2029. So you're getting exactly that. You're getting exactly what you predict. That's for the industry. That's for the industry. That's right. So suppose Anthropics compute keeps 3X-ing a year and then by like 27, you have – or 27, 28, you have 10 gigawatts.

57:34And like multiply that by, as you say, 10 billion. So then it's like 100 billion a year. But then you're saying the TAM by 2028, 29 – Again, I don't want to give exact numbers for Anthropics, but these numbers are too small. These numbers are too small. Okay, interesting. I'm really proud that the puzzles I've worked on with Jane Street have resulted in them hiring a bunch of people from my audience. Well, they're still hiring, and they just sent me another puzzle. For this one, they spent about 20,000 GPU hours trading backdoors into three different language models. Each one has a hidden prompt that elicits completely different behavior.

58:07You just have to find the trigger. This is particularly cool because finding backdoors is actually an open question in Frontier AI research. Anthropic actually released a couple of papers about sleeper agents, and they show that you can build a simple classifier on the residual stream to detect when a backdoor is about to fire. But they already knew what the triggers were because they built them. Here, you don't. And it's not feasible to check the activations for all possible trigger phrases. Unlike the other puzzles they made for this podcast, Jane Street isn't even sure this one is solvable. But they've set aside $50,000 for the best attempts and write-ups.

58:40The puzzle's live at jainestreet.com slash thwarkesh. And they're accepting submissions until April 1st. All right, back to Daria. You've told investors that you plan to be profitable starting in 28. And this is the year where we're, like, potentially getting the country of geniuses as a data center. And, you know, this is, like, going to now unlock all this progress and medicine and health and et cetera, et cetera, and new technologies. Wouldn't this be exactly the time where you'd, like, want to reinvest in the business and build bigger countries so they can make more discoveries?

59:15So, I mean, profitability is this kind of, like, weird thing in this field. Like, I don't think in this field profitability is actually a measure of, you know, kind of spending down versus investing in the business. Like, let's just take a model of this. I actually think profitability happens when you underestimated the amount of demand you were going to get and loss happens when you overestimated the amount of demand you were going to get because you're buying the data centers ahead of time.

59:48So, think about it this way. Ideally, you would like – and, again, these are stylized facts. These numbers are not exact. I'm just trying to make a toy model here. Let's say half of your compute is for training and half of your compute is for inference. And, you know, the inference has some gross margin that's, like, more than 50%. And so, what that means is that if you were in steady state, you build a data center. If you knew exactly the demand you were getting, you would, you know, you would get a certain amount of revenue.

1:00:21Say, I don't know, let's say you pay $100 billion a year for compute. And on $50 billion a year, you support $150 billion of revenue. And the other $50 billion are used for training. So, basically, you're profitable. You make $50 billion of profit. Those are the economics of the industry today. Or, sorry, not today, but, like, that's where we're projecting forward in a year or two. The only thing that makes that not the case is if you get less demand than $50 billion, then you have more than 50% of your data center for research and you're not profitable.

1:00:59So, you know, you train stronger models, but you're, like, not profitable. If you get more demand than you thought, then your research gets squeezed. But, you know, you're kind of able to support more inference and you're more profitable. So, it's – maybe I'm not explaining it well. But the thing I'm trying to say is you decide the amount of compute first. And then you have some target desire of inference versus training. But that gets determined by demand. It doesn't get determined by you. So, what I'm hearing is the reason you're predicting profit is that you are systematically underinvesting in compute, right?

1:01:36Because if you actually – No, no, no. I'm saying it's hard to predict. So, these things about 2028 and when it will happen, that's our attempt to do the best we can with investors. All of this stuff is really uncertain because of the cone of uncertainty. Like, we could be profitable in 2026 if the revenue grows fast enough. And then, you know, if we overestimate or underestimate the next year, that could swing wildly. Like, what I'm trying to get at is you have a model in your head of, like, the business invests, invests, invests, invests, gets scale and kind of then becomes profitable.

1:02:13There's a single point at which things turn around. I don't think the economics of this industry work that way. I see. So, if I'm understanding correctly, you're saying because of the discrepancy between the amount of compute we should have gotten and the amount of compute we got, we were, like, sort of forced to make profit. But that doesn't mean we're going to continue making profit. We're going to, like, reinvest the money because, well, now AI has made so much progress and we want the bigger country of geniuses. And so, then back into revenue is high but losses are also high. If we predict – if every year we predict exactly what the demand is going to be, we'll be profitable every year because spending 50% of your compute on – 50% of your compute on research roughly plus a gross margin that's higher than 50% and correct demand prediction leads to profit.

1:03:03That's the profitable business model that I think is kind of, like, there but, like, obscured by these, like, building ahead in prediction errors. I guess you're treating the 50% as a sort of, like, you know, just like a given constant. Yes. Whereas, in fact, if AI progress is fast and you can increase the progress by scaling up more, you should just have more than 50% and not make profit. Here's what I'll say. You might want to scale up it more. You might want to scale it up more. But, you know, remember the log returns to scale, right? If 70% would get you a very little bit of a smaller model through a factor of 1.4x, right?

1:03:41Like, that extra $20 billion is, you know, that each dollar there is worth much less to you because the log linear setup. And so you might find that it's better to invest that $20 billion in, you know, in serving inference or in hiring engineers who are kind of better at what they're doing. So the reason I said 50%, that's not exactly our target. It's not exactly going to be 50%. It will probably vary, vary over time.

1:04:12What I'm saying is the, like, log linear return, what it leads to is you spend of order one fraction of the business, right? Like, not 5%, not 95%. And then, you know, then you get diminishing returns because of the log scale up. I feel strange that I'm, like, convincing Dario to, like, believe in AI progress or something. But, like, okay, you don't invest in research because it has diminishing returns, but you invest in the other things you mentioned. Again, again, we're talking about diminishing returns after you're spending $50 billion a year, right?

1:04:45Like, this is a point I'm sure you would make, but, like, diminishing returns on a genius could be quite high. And more generally, like, what is profit in a market economy? Profit is basically saying the other companies in the market can, like, do more things with this money than I can. Yeah, I mean, put aside Anthropic. I'm just trying to, like, because, you know, I don't want to give information about Anthropic is why I'm giving these stylized numbers. But, like, let's just derive the equilibrium of the industry, right? I think the – so why doesn't everyone spend 100% of their, you know, 100% of their compute on training and not serve any customers, right?

1:05:23It's because if they didn't get any revenue, they couldn't raise money, they couldn't do compute deals, they couldn't buy more compute the next year. So there's going to be an equilibrium where every company spends less than 100% on training and certainly less than 100% on inference. It should be clear why you don't just serve the current models and, you know, and never train another model because then you don't have any demand because you'll fall behind. So there's some equilibrium. It's not going to be 10%. It's not going to be 90%. Let's just say as a stylized fact it's 50%.

1:05:55That's what I'm getting at. And I think we're going to be in a position where that equilibrium of how much you spend on training is less than the gross margins that you're able to get on compute. And so the underlying economics are profitable. The problem is you have this hellish demand prediction problem when you're buying the next year of compute. And you might guess under and be very profitable but have no compute for research. Or you might guess over and, you know, you are not profitable and you have all the compute for research in the world.

1:06:34Does that make sense just as a dynamic model of the industry? Maybe stepping back I'm like I'm not saying I think the country of genius is going to come in two years and therefore you should buy this compute. To me what you're saying the end conclusion you're arriving at makes a lot of sense. But that's because like oh it seems like country of genius is hard and there's a long way to go. And so the stepping back the thing I'm trying to get at is more like it seems like your worldview is compatible with somebody who says we're like 10 years away from a world in which like we're generating trillions of dollars.

1:07:09And that's just not my view. Yeah. That is not my view. Like I – so I'll like make another prediction. It is hard for me to see that there won't be trillions of dollars in revenue before 2030. Like I can construct a plausible world. It takes maybe three years. So that would be the end of what I think it's plausible. Like in 2028 we get the real country of geniuses in the data center. You know, the revenue has been going into the – maybe is in the low hundreds of billions by 2028.

1:07:44And then the country of geniuses accelerates it to trillions, you know, and we're basically on the slow end of diffusion. It takes two years to get to the trillions. That would be the world where it takes until – that would be the world where it takes until 2030. I suspect even composing the technical exponential and diffusion exponential will get there before 2030. So you laid out a model where entropic makes profit because it seems like fundamentally we're in a compute-constrained world.

1:08:14And so it's like eventually we keep growing compute. No, I think the way the profit comes is – again, and, you know, let's just abstract the whole industry here. Like we have a – you know, let's just imagine we're in like an economics textbook. We have a small number of firms. Each can invest a limited amount in – or like each can invest some fraction in R&D. They have some marginal cost to serve. The margins on that – the profit – the gross profit margins on that marginal cost are like very high because inference is efficient.

1:08:47There's some competition, but the models are also differentiated. There's some, you know, companies will compete to push their research budgets up. But like because there's a small number of players, you know, we have the – what is it called? The Cournot equilibrium I think is what the small number of firm equilibrium is. The point is it doesn't equilibrate to perfect competition with zero margins. If there's like three firms – if there's three firms in the economy, all are kind of independently behaving rationally, it doesn't equilibrate to zero.

1:09:22So help me understand that because right now we do have three leading firms and they're not making profit.

1:09:28And so what – yeah, what is changing? Yeah. So the – again, the gross margins right now are very positive. What's happening is a combination of two things. One is we're still in the exponential scale-up phase of compute. So what – basically what that means is we're training – like a model gets trained. It costs – you know, let's say a model got trained that costs a billion dollars last year. And then this year it produced $4 billion of revenue and cost $1 billion to inference from.

1:10:09So, you know, again, I'm using stylized number here but, you know, that would be 75 percent, you know, gross margins and, you know, this 25 percent tax. So that model as a whole makes $2 billion. But at the same time, we're spending $10 billion to train the next model because there's an exponential scale-up. And so the company loses money. Each model makes money but the company loses money. The equilibrium I'm talking about is an equilibrium where we have the country of geniuses. We have the country of geniuses in the data center but that model training scale-up has equilibrated more.

1:10:46Maybe it's still going up. We're still trying to predict the demand but it's more leveled out. I'm going to use a couple of things there. So let's start with the current world. In the current world, you're right that, as you said before, if you treat each individual model as a company, it's profitable. But, of course, a big part of the production function of being a frontier lab is training the next model, right? Yes, that's right. So if you didn't do that, then you'd make profit for two months. That's right. And then you wouldn't have margins because you wouldn't have the best model.

1:11:18And then so, yeah, you can make profit for two months in the current system. But at some point, that reaches the biggest scale that it can reach. And then in equilibrium, we have algorithmic improvements but we're spending roughly the same amount to train the next model as we spent to train the current model. So this equilibrium relies – I mean at some point, you run out of money in the economy. A fixed lump of labor follows – the economy is going to grow, right? That's one of your predictions. We're going to have data centers in space. But this is another example of the theme I was talking about, which is that the economy will grow much faster with AI than I think it ever has before.

1:11:55But it's not – like right now, the computer is growing 3x a year. Yeah. I don't believe the economy is going to grow 300% a year. Like I said this in Machines of Love and Grace. Like I think we may get 10% or 20% per year growth in the economy. But we're not going to get 300% growth in the economy. So I think in the end, you know, if compute becomes the majority of what the economy produces, it's going to be capped by that. So let's – okay. Now let's assume a model where compute stays capped. Yeah. The world where frontier labs are making money is one where they continue to make fast progress because fundamentally your margin is limited by how good the alternative is.

1:12:36And so you are able to make money because you have a frontier model. If you didn't have a frontier model, you wouldn't be making money. Well, you – I mean – And so this model requires there never to be a steady state. Like forever and ever you keep making more algorithmic progress. I don't think that's true. I mean I feel like we're like – we're – you know, I feel like this is an economics – like, you know, this is like an economics class. Do you know the Tyler Cowen quote? We never stop talking about economics. We never stop talking about economics. So, no, but there are worlds in which – you know, so I don't think this field is going to be – I don't think this field is going to be a monopoly.

1:13:12All my lawyers never want me to say the word monopoly. But I don't think this field is going to be a monopoly. But you do get – you get industries in which there are a small number of players. Not one but a small number of players. And ordinarily, like the way you get monopolies like Facebook or Meta – I always call them Facebook – but is these kind of network effects. The way you get industries in which there are a small number of players are very high costs of entry, right?

1:13:43So, you know, cloud is like this. I think cloud is a good example of this. You have three, maybe four players within cloud. I think that's the same for AI. Three, maybe four. And the reason is that it's so expensive. It requires so much expertise and so much capital to like run a cloud company, right? And so you have to put up all this capital. And then in addition to putting up all this capital, you have to get all of this other stuff that like, you know, requires a lot of skill to, you know, to make it happen.

1:14:15And so it's like if you go to someone and you're like, I want to disrupt this industry, here's $100 billion, you're like, okay, I'm putting $100 billion and also betting that you can do all these other things that these people have been doing. Only to decrease the profit in the industry. And then the effect of your entering is the profit margins go down. So, you know, we have equilibria like this all the time in the economy where we have a few players. Profits are not astronomical. Margins are not astronomical, but they're not zero, right? And, you know, I think that's what we see on cloud.

1:14:46Cloud is very undifferentiated. Models are more differentiated than cloud, right? Like everyone knows Claude is good at different things than GPT is good at, than Gemini is good at. And it's not just Claude's good at coding, GPT is good at, you know, math and reasoning, you know. It's more subtle than that. Like models are good at different types of coding. Models have different styles. Like I think these things are actually, you know, quite different from each other. And so I would expect more differentiation than you see in cloud.

1:15:20Now, there actually is a counter – there is one counterargument. And that counterargument is that if all of that, the process of producing models becomes – if AI models can do that themselves, then that could spread throughout the economy. But that is not an argument for commoditizing AI models in general. That's kind of an argument for commoditizing the whole economy at once. I don't know what quite happens in that world where basically anyone can do anything, anyone can build anything, and there's like no mode around anything at all.

1:15:53I mean, I don't know. Maybe we want that world. Like maybe that's the end state here. Like maybe, you know, maybe when kind of AI models can do – you know, when AI models can do everything, if we've solved all the safety and security problems, like, you know, that's one of the mechanisms for, you know, just kind of the economy flattening itself again. But that's kind of like post – like far post-country geniuses in a data center. Maybe a finer way to put that potential point is, one, it seems like AI research is especially loaded on raw intellectual power, which will be especially abundant in a world with AGI.

1:16:37And, two, if you just look at the world today, there's very few technologies that seem to be diffusing as fast as AI algorithmic progress. And so that does hint that this industry is sort of structurally diffusive. So I think coding is going fast, but I think AI research is a superset of coding, and there are aspects of it that are not going fast. But I do think, again, once we get coding, once we get AI models going fast, then, you know, that will speed up the ability of AI models to kind of do everything else.

1:17:10So I think while coding is going fast now, I think once the AI models are building the next AI models and building everything else, the kind of whole – the whole economy will sort of kind of go at the same pace. I am worried geographically, though. I'm a little worried that, like, just proximity to AI, having heard about AI, that that may be one differentiator. And so when I said the, like, you know, 10% or 20% growth rate, a worry I have is that the growth rate could be, like, 50% in Silicon Valley.

1:17:43And, you know, parts of the world that are kind of socially connected to Silicon Valley and, you know, not that much faster than its current pace elsewhere. And I think that would be a pretty messed up world. So one of the things I think about a lot is how to prevent that. Yeah. Do you think that once we have this country of geniuses at a data center that robotics is sort of quickly solved afterwards because it seems like a big problem with robotics is that a human can learn how to teleoperate current hardware, but current AI models can't, at least not in a way that's super productive.

1:18:15And so if we have this ability to learn like a human, should it solve robotics immediately as well? Yeah. I don't think it's dependent on learning like a human. It could happen in different ways. Again, we could have trained the model on many different video games, which are like robotic controls or many different simulated robotics environments, or just, you know, train them to control computer screens and they learn to generalize. So it will happen.

1:18:37It's not necessarily dependent on human-like learning. Human-like learning is one way it could happen. If the model's like, oh, I pick up a robot. I don't know how to use it. I learn. That could happen because we discovered, discovering continual learning. That could also happen because we trained the model on a bunch of environments and then generalized, or it could happen because the model learns that in the context length. It doesn't actually matter which way. If we go back to the discussion we had like an hour ago, that type of thing can happen in several different ways.

1:19:08But I do think when, for whatever reason, the models have those skills, then robotics will be revolutionized, both the design of robots, because the models will be much better than humans at that, and also the ability to kind of control robots. So we'll get better at building the physical hardware, building the physical robots, and we'll also get better at controlling it. Now, you know, does that mean the robotics industry will also be generating trillions of dollars of revenue? My answer there is yes, but there will be the same extremely fast but not infinitely fast diffusion.

1:19:43So will robotics be revolutionized? Yeah. Maybe tack on another year or two. That's the way I think about these things. There's a general skepticism about extremely fast progress. Like, here's my view, which is like, it sounds like you are going to solve continual learning one or another within a matter of years. But just as people weren't talking about continual learning a couple years ago, and then we realized, oh, why aren't these models as useful as they could be right now, even though they are clearly passing the Turing test and are experts in so many different domains? Maybe it's this thing, and then we solve this thing, and we realize, actually, there's another thing that human intelligence can do, and that's a basis of human labor that these models can't do.

1:20:23And then, so why not think there will be more things like this? Why think that, like, we're, you know, we've, like, found the pieces of human intelligence? Well, to be clear, I mean, I think continual learning, as I've said before, might not be a barrier at all, right? Like, you know, I think we maybe just get there by pre-training generalization and RL generalization. Like, I think there just might not be, there basically might not be such a thing at all. In fact, I would point to the history in ML of people coming up with things that are barriers that end up kind of dissolving within the big blob of compute, right?

1:20:58That, you know, people talked about, you know, you know, how do you have, you know, how do your models keep track of nouns and verbs? And, you know, how do they, you know, they can understand syntactically, but they can't understand semantically. You know, it's only statistical correlations. You can understand a paragraph, but you can't understand a word. There's reasoning. You can't do reasoning. But then suddenly it turns out you can do code and math very well at all. So I think there's actually a stronger history of some of these things seeming like a big deal and then kind of dissolving.

1:21:34Some of them are real. I mean, the need for data is real. Maybe continual learning is a real thing. But again, I would ground us in something like code. Like, I think we may get to the point in like a year or two where the models can just do sui and end. Like, that's a whole task. That's a whole sphere of human activity that we're just saying models can do it now. When you say end-to-end, do you mean setting technical direction, understanding the context of the problem, et cetera?

1:22:06Yes. Yes. I mean all of that. Interesting. I mean, that is, I feel like, AGI complete, which maybe is internally consistent. But it's not like saying 90% of code or 100% of code. It's like, no, no. The other parts of the job as well. No, no, no. I gave this spectrum 90% of code, 100% of code, 90% of end-to-end sui, 100% of end-to-end sui. New tasks are created for sui's. Eventually, those get done as well. But it's a long spectrum there. But we're traversing the spectrum very quickly. I do think it's funny that I've seen a couple of podcasts you've done where the host will be like, oh, but Borges wrote the essay about the computer learning thing.

1:22:43And it always makes me crack up because you're like, you've been an AI researcher for like 10 years. I'm sure there's like some feeling of like, okay, so a podcaster wrote an essay. And then like every interview I get asked about it. You know, the truth of the matter is that we're all trying to figure this out together. Yeah. There are some ways in which I'm able to see things that others aren't. These days, that probably has more to do with like I can see a bunch of stuff within Anthropic and have to make a bunch of decisions than I have any great research insight that others don't.

1:23:16Right? You know, I'm running a 2,500-person company. Like it's actually pretty hard for me to have concrete research insight, you know, much harder than, you know, than it would have been, you know, 10 years ago or, you know, or even two or three years ago.

1:23:32As we go towards a world of a full drop-in remote worker replacement, does a API pricing model still make the most sense? And if not, what is the correct way to price AGI or serve AGI? Yeah, I mean, I think there's going to be a bunch of different business models here sort of all at once that are going to be experimented with. I actually do think that the API model is more durable than many people think. One way I think about it is if the technology is kind of advancing quickly, if it's advancing exponentially, what that means is there's always kind of like a surface area of kind of new use cases that have been developed in the last three months.

1:24:19And any kind of product surface you put in place is always at risk of sort of becoming irrelevant, right? Any given product surface probably makes sense for, you know, a range of capabilities of the model, right? The chatbot is already running into limitations of, you know, making it smarter doesn't really help the average consumer that much. But I don't think that's a limitation of AI models. I don't think that's evidence that, you know, the models are good enough and they're, you know, them getting better doesn't matter to the economy.

1:24:51It doesn't matter to that particular product. And so I think the value of the API is the API always offers an opportunity, you know, very close to the bare metal to build on what the latest thing is. And so, you know, there's kind of always going to be this, you know, this kind of front of new startups and new ideas that weren't possible a few months ago and are possible because the model is advancing. And so I actually – I kind of actually predict that we are – it's going to exist alongside other models, but we're always going to have the API business model because there's always going to be a need for a thousand different people to try experimenting with the model in a different way.

1:25:37And a hundred of them become startups and ten of them become big successful startups and, you know, two or three really end up being the way that people use the model of a given generation. So I basically think it's always going to exist. At the same time, I'm sure there's going to be other models as well. Like not every token that's output by the model is worth the same amount. Think about, you know, what is the value of the tokens that are like, you know, that the model outputs when someone, you know, calls them up and says, my Mac isn't working or something, you know, the models like restart it, right?

1:26:14And like, you know, someone hasn't heard that before, but like, you know, the model said that like 10 million times, right? You know, maybe that's worth like a dollar or a few cents or something. Um, whereas if, uh, the model, you know, the model goes to, you know, one of the, one of the pharmaceutical companies and it says, oh, you know, this molecule you're developing, you should take the aromatic ring from that end of the molecule and put it on that end of the molecule. Um, and, and, you know, if you do that, wonderful things will happen.

1:26:45Um, uh, like, like those tokens could be worth, you know, tens of millions of dollars, right? Um, uh, so, so I think we're definitely going to see business models that, that recognize that, you know, at some point we're going to see, you know, pay for results or, you know, in some, in some form, or we may see forms of compensation that are like labor, um, uh, you know, that, that kind of work by the hour. Um, I, I, I, you know, I don't know. I think, I think, I think because it's a new industry, a lot of things are going to be tried and I, you know, I don't know what will turn out to be the right thing.

1:27:20Um, what I find, uh, I take your point that people will have to try things to figure out what is the best way to use this blob of intelligence. But what I find striking is Claude Code. So I don't think in the history of startups, there has been a single application that has been as hotly competed in as coding agents. And, um, and, and Claude Code is a category leader here. And that seems surprising to me. Like, it doesn't seem intrinsically like Anthropic had to build this.

1:27:51And I wonder if you have an accounting of why it had to be Anthropic or why, how Anthropic ended up building an application in addition to the model underlying it. Yeah. So it actually happened in a pretty simple way, which is we had our own, um, you know, we had our coding models, which we're good at coding. And, and, you know, around the beginning of 2025, I said, I think the time has come where you can have non-trivial acceleration of your own research. Um, if you're an AI company by using these models. And of course, you know, we, you need an interface, you need a harness to use them.

1:28:25And so I encouraged people internally, you know, I didn't say this is one thing that, you know, that you have to use. I just said people should experiment with this.

More from Dwarkesh Podcast

Alex Imas and Phil Trammell – What remains scarce after AGI?

Jun 4, 20261h 16m

Reiner Pope – Chip design from the bottom up

May 22, 20261h 20m

Eric Jang – Building AlphaGo from scratch

May 15, 20262h 37m

David Reich – Why the Bronze Age was an inflection point in human evolution

May 8, 20262h 13m

Reiner Pope – The math behind how LLMs are trained and served

Apr 29, 20262h 13m