
Dylan Patel — Deep dive on the 3 big bottlenecks to scaling AI compute
March 13, 20262h 30m · 31,368 words
Show notes
Dylan Patel , founder of SemiAnalysis , provides a deep dive into the 3 big bottlenecks to scaling AI compute: logic, memory, and power. And walks through the economics of labs, hyperscalers, foundries, and fab equipment manufacturers. Learned a ton about every single level of the stack. Enjoy! Watch on YouTube ; read the transcript . Sponsors * Mercury has already saved me a bunch of time this tax season. Last year, I used Mercury to request W-9s from all the contractors I worked with. Then, when it came time to issue 1099s this year, I literally just clicked a button and Mercury sent them out. Learn more at mercury.com . * Labelbox noticed that even when voice models appear to take interruptions in stride, their performance degrades. To figure out why, they built a new evaluation pipeline called EchoChain. EchoChain diagnoses voice models’ specific failure modes, letting you understand what your model needs to truly handle interruptions. Check it out at labelbox.com/dwarkesh . * Jane Street is basically a research lab with a trading desk attached – and their infrastructure backs this up. They’ve got tens of thousands of GPUs, hundreds of thousands of CPU cores, and exabytes of storage. This is what it takes to find subtle signals hidden deep within noisy market data. If this sounds interesting, you can explore open positions at janestreet.com/dwarkesh . Timestamps (00:00:00) – Why an H100 is worth more today than 3 years ago (00:24:52) – Nvidia secured TSMC allocation early; Google is getting squeezed (00:34:34) – ASML will be the #1 constraint for AI compute scaling by 2030 (00:55:47) – Can't we just use TSMC's older fabs? (01:05:37) – When will China outscale the West in semis? (01:16:01) – The enormous incoming memory crunch (01:42:34) – Scaling power in the US will not be a problem (01:54:44) – Space GPUs aren't happening this decade (02:14:07) – Why aren't more hedge funds making the AGI trade? (02:18:30) – Will TSMC kick Apple out from N2? (02:24:16) – Robots and Taiwan risk Get full access to Dwarkesh Podcast at www.dwarkesh.com/subscribe
Highlighted moments
“you can't make the tools without the chips from Taiwan, which you can't use without the tools in Taiwan.”
Transcript
Introduction
0:00All right. This is the episode of My Roommate Teaches Me Semiconductors. It's also the send-off for this current set. Yeah. After you use it, I'm like, I can't use this again. I got to get out of here. No sloppy seconds for DoorCash.
0:14Okay. Dylan is the CEO of SemiAnalysis. Dylan, the burning question I have for you, if you add up the big four, Amazon, Meta, Google, Microsoft, their combined forecasted CapEx that you published recently, this year is $600 billion. And given yearly prices of renting that compute, that would be close to 50 gigawatts. Now, obviously, we're not putting on 50 gigawatts this year. So presumably, that's paying for compute that is going to be coming online over the coming years.
0:44So I have a question about how to think about the timeline around when that CapEx comes online.
Compute Timeline
0:50Similar question for the labs where OpenAI just announced that they raised $110 billion. Anthropic just announced they raised $30 billion. And if you look at the compute that they have coming online this year, you should tell me how much it is. But isn't it another four gigawatts total that they'll have this year? It feels like the cost to rent the compute that OpenAI and Anthropic will have this year to sustain their compute spend at $10, $13 billion a gigawatt. Those individual raises alone are enough to cover
1:21their compute spend for the year. And then this is not even including the revenue that they're going to earn this year. So help me understand first, when is the timescale at which the big tech CapEx is actually coming online? And two, what are the labs raising all this money for if like the yearly price of a one gigawatt data center is like $13 billion?
Supply Chain
1:41So when you talk about the CapEx of these hyperscalers, right, on the order of $600 billion, and you look at across the rest of the supply chain, gets you to on the order of a trillion dollars, a portion of this is, you know, immediately for compute going online this year, right? The chips and the other parts of CapEx that do get paid this year. But there's a lot of setup CapEx as well, right? So when we're talking about 20 gigawatts this year in America, roughly- Incremental. Incremental added capacity. A portion of this is not spent this year. A portion of that CapEx is
2:16actually spent the prior year. And so when you look at, hey, Google's got $180 billion, actually a big chunk of that is spent on turbine deposits for 28 and 29. A chunk of that is spent on data center construction for 27. A chunk of that is spent on, you know, power purchasing agreements and down payments and all these other things that they're doing for further out into the future so that they can set up this super fast scaling, right? And this applies to all the hyperscalers and other people in the supply chain. And so, you know, 20 gigawatts roughly deployed this year,
2:48a big chunk of that being hyperscalers, chunk not being. And all of these companies,
Hyperscalers
2:52their biggest customers are Anthropic and OpenAI. Anthropic and OpenAI are in the, you know, two gigawatt and, you know, two and a half gigawatt and one and a half gigawatts roughly right now. They're trying to scale to much larger, right? If you look at what Anthropic has done over the last few months, you know, $4 billion, $6 billion revenue added. And if we just draw a straight line, hey, yeah, they'll add another $6 billion of revenue a month. People would argue that's bearish and that they should go faster. What that implies is that they're going to add $60 billion of revenue across the next 10 months, right? And $60 billion of revenue at the current
3:27gross margins that Anthropic had, at least last reported by media, would imply that they have, you know, roughly $40 billion of compute spend for that inference for that 60 bill of revenue. That $40 billion of compute at roughly $10 billion a gigawatt rental cost means that they need to add four gigawatts of inference capacity just to grow revenue. And that's saying that their research and development training fleet stays flat, right? So, you know, in a sense, Anthropic needs to get to
3:57well above five gigawatts by the end of this year. And it's going to be really tough for them to get there, but it's possible. Can I ask a question about that? So if Anthropic was not on track to
Acquiring Compute
4:06have five gigawatts by the end of this year, but it needs that to serve both the revenue that's gone crazier than expected, and maybe it's going to be even more than that, plus the research and training to make sure its models are good enough for next year. How, how, where is that going to come from? You know, Dario, when he was on your podcast was very, very like conservative. He's like, you know, I'm not going to go crazy on compute because if my revenue inflects at a different rate, at a different point, I don't want to go bankrupt. You know, I want to make sure that we're being responsible with this scaling. But in reality, you know, he's definitely missed the pooch in terms
4:38of like going like OpenAI, which was let's just sign these crazy fucking deals. Right. And OpenAI has kind of got way more access to compute than Anthropic by the end of the year. And so what does Anthropic have to do to get the compute? Well, they have to go to lower quality providers that they would not have gone to before. Right. You know, optimally, you know, Anthropic, at least historically has had the best quality providers been like Google and Amazon. Whereas, you know, at least historically minded, you know, the biggest companies in the world now Microsoft, and now they're expanding across the supply chain and going to other players that are newer. OpenAI has been, you know, a bit
5:12more aggressive on going to many players. Yes, they have tons of capacity from Microsoft. They have Google and Amazon as well, but they also have like tons with CoreWeave and Oracle. And they've gone to like random companies or, you know, one would think random companies like SoftBank Energy, who has never built a data center in their life. But, you know, they're building data centers now for OpenAI. So they've gone to and many others like Nscale and others that they're going and getting capacity from. And so there's this like conundrum for Anthropic because they were so conservative on compute because they didn't want to go crazy. Right. And in some sense, a lot of the financial freakouts
5:46in the second half of last year were like, OpenAI signed all these deals, but they don't have the money to pay for them. OK, Oracle stock's going to tank. OK, CoreWeave stock's going to tank. OK, like, you know, all these companies stocks tanked and credit markets went crazy because people like the end buyer can't pay for this. Now it's like, oh, wait, they raised a ton of money. OK, fine, they can pay for it. But in the sense, Anthropic was a lot more conservative. They were like, we'll sign contracts, but we'll be principled and we'll purposely undershoot what we think we can possibly do and be conservative because we don't want to potentially go bankrupt. But the thing I want to
6:18understand is, so what does it mean to have to acquire compute in a pinch? Is it that you have to go with like NeoClouds? Is it that they have worse computers? Like, in what way is it worse? And is it that you had to pay gross margins to a cloud provider that you wouldn't have otherwise had to pay to because they're coming in at the last minute? Who built the spare capacity such that it's available for Anthropic and OpenAI to get last minute? And like, basically, what is the concrete advantage that OpenAI has gotten if they end up at similar compute numbers by 2027? Is it just like they're going to end this year with different gigawatts? If so,
6:50how many gigawatts is Anthropic and OpenAI going to have by the end of this year?
Excess Compute
6:53Yeah. So to acquire excess compute, I mean, yes, there is capacity at hyperscalers that, and not all contracts for compute are long-term, right? Five years, right? There's compute that in 2023 or 2024, H100, 2025, that were signed at not five-year deals, right? OpenAI, the vast majority of their compute is signed at five-year deals, but they can, you know, there were many other customers that had one-year, two-year, three-year deals, six-month deals on demand. And as these contracts roll off, who is the participant in the market most willing to pay price? And in
7:28this sense, right, we've seen H100 prices inflect a lot and go up and people willing to sign long-term deals for, you know, as above $2 even, right? Like I've seen deals where certain AI labs, I'm going to be a little bit vague here for a reason, have signed at as high as $2.40 for two to three years for H100s, which if you think about the margin, $1.40 for Hopper when you release it, or Hopper to build it across five years. And now two years in, you're signing deals that are two
7:59to three years that are at $2.40. Those margins are way higher, right? And so now you can crowd out all of these other suppliers, whether it's Amazon had these or CoreWeave had these or Together AI or Nebius or whoever it is, right? You know, these neoclouds are the firms that had a higher percentage of Hopper in general, because they were more aggressive on it, A. And B, they tended to sign shorter-term deals, you know, not CoreWeave, but the others tended to sign shorter-term deals. And so, hey, if I want Hopper, there is some capacity out there. And then also, while most
8:35of the capacity at like an orc or a CoreWeave is signed for a long-term deal in terms of Blackwell, anything that's going online this quarter is already sold. And in some cases, they're not even hitting all the numbers that they promised they would sell, because there are some data center delays, not just those two, but like Nebius and all the other folks, Microsoft, Amazon, Google. But there is a lot of neoclouds, as well as some of the hyperscalers who have capacity they're building that they did not sell yet, or capacity that they were going to allocate to some internal use that is not necessarily super AGI-focused that they may now turn around and sell.
9:06Or they may, you know, in the case of Anthropic, they don't have to have all the compute directly, right? Amazon can have the compute. They can serve Bedrock, or Google can have the compute and serve Vertex, or Microsoft can have the compute and serve Foundry, and then do a revenue share with
Revenue Share
9:19Anthropic or vice versa. Okay, basically, you're saying Anthropic is having to pay either this like 50% markup in the sense of the revenue share, or in the sense of last minute spot compute that they wouldn't have otherwise had to pay had they bought the compute early. Right. And, you know, there's a trade off there. But also at the same time, you know, for a solid like four months, everyone was like, OpenAI, we're not going to sign deals with you. Like, that sounds crazy, right? Because you guys don't have the money. Now everyone's like, yeah, OpenAI, we believed you the whole time. We can sign any deal because you've raised all this money.
9:49Um, but in a sense, Anthropic is constrained in that sense. Um, there are not that many incremental buyers of compute yet, because Anthropic hit the capabilities here first, where their revenue is mooning. Oh, that's interesting. Like, that's this, you know, because otherwise, you're like, well, having the best model is an extremely depreciating asset that, you know, three months later, or you don't have the best model. But like, the reason it's important is that you can sign these deals, and then lock in the compute in advance, get better prices. Um, doesn't this
10:19also imply, by the way, and maybe this is an obvious point, but there's, at least until recently, people had made this huge point about, oh, what is the depreciation cycle of a GPU? And the bears, Michael Burrys, or whatever, have said, look, people are saying that four or five years for these GPUs. And in fact, if you, maybe it's because the technology is improving so fast, or whatever, in fact, it makes sense to have two year depreciation cycles for these GPUs, which increases the sort of like reported amortized capex in a given year. Uh, and so
10:52it makes it maybe financially less lucrative to building all these clouds. But in fact, you're pointing at like, maybe the depreciation cycle is even longer than five years, because if we're using hoppers, and then especially if AI really takes off, and in 2030, we're like, fuck, we got to like get the seven nanometer fabs up. And we got to like, we got to go back to the A100s, like turn on the A100s again. Uh, then it's like, actually the depreciation cycle is incredibly long. And, um, uh, so I think that's an interesting financial implication of what you're saying. There's a few, um, strings to pull on there. One is, um, what happens to depreciation
11:27of GPUs, right? Um, and, and I guess I didn't answer your prior question, which is like Anthropic, I think we'll be able to get to like five gigawatts ish, maybe a little bit more by the
Gigawatts Projection
11:37end of the year through themselves, as well as their product being served through Bedrock or through Vertex or through Foundry. Uh, I think they'll be able to get to five or six gigawatts, uh, which is way above their like initial plans. Right. Um, you know, and, and anyways, that's, that's sort of like, and, and an open AI will be a little roughly the same, maybe a little higher, um, actually a little bit higher based on our numbers. But anyways, the depreciation cycle of a GPU, right. Michael Burry was saying it's, you know, three years or, or less, right. It's like sort of his argument. And there's sort of two ways and lenses to look at this, like mechanically, um,
12:11in, in, in this, you know, there's a TCO model, right. Uh, total cost of ownership of a GPU where we sort of project pricing out for GPUs and, uh, build up the total cost of a cluster, but there's a number of costs, right. There's your data center cost, right. Um, there's your networking costs, there's your hand, smart hands and people in the data center swapping stuff out. There's your spare parts, right. There's your actual chip costs. There's your server costs, all these, all these various costs gets lumped together and there's some depreciation cycles on it. You know, there's certain credit costs on it. Um, and you get to, okay, that's how you build up. Hey, an H 100 costs a dollar 40 an hour to deploy at volume across five years. If your
12:45depreciation is five years. And then if you sign a deal at $2 an hour for those five years, your gross margin is roughly 35%. It's a little bit above that, but you know, if you sign it for a dollar 90, it's 35% roughly. Um, and then at, you assume at that fifth year, the GPU falls off a bus, right. It's dead. Um, and in some cases, you know, sort of the argument people are making is, well, if you didn't sign a long-term deal, because every two years in videos, tripling, quadrupling the performance while only two X-ing the price or 50% increasing the price,
13:15then the price of an H 100. Sure. Maybe the value in the market was $2 at 35% gross margins, um, in 2024, but in 2026, when Blackwell is in super high volume and deploying millions a year, you're actually now worth a dollar an hour. And when Rubin in 27 is in super high volume, right. Even though it starts shipping this year is in super high volume next year, um, doing millions of chips a year, uh, deployed into clouds. Uh, you've got another three X in performance and another 50% or two X in price. Actually the hopper is only worth 70 cents an hour. Um, and so the price
13:48of a GPU would continue to fall. That's like one lens. The other lens is what is the utility you get out of the chip, right? Because if you could build infinite Rubin or infinite, um, of the newest chip, then yes, that's exactly what would happen. The price of a hopper would fall, um, at a spot or a short-term contract rate, uh, as the new chips come out and the price per performance goes up. But because you are so limited on semiconductors, um, and deployment timelines and all these things, you end up with actually what prices these chips is not, Hey, what's the comparative thing I can
14:21buy today. It's actually, what is the value I can derive out of this chip today? Right. And in that sense, um, let's take GPT 5.4. GPT 5.4 is both way cheaper to run than GPT-4, uh, has fewer active parameters. Um, it's, it's much smaller, right. In that sense of active parameters plus because it's a, you know, a sparser MOE versus GPT-4 being a coarser MOE. Um, there's also been so many other advancements in training, RL, uh, model architecture, et cetera, et cetera, data qualities, um, all these
14:52things that have made GPT 5.4 way better than GPT-4 and it's cheaper to serve. And so when you look at an H100, it can serve more tokens per GPU of 5.4 than if you had ran for GPT-4 on it. Right. So, so at some sense it's producing more tokens of a model that is of higher quality. And so in some sense, you know, obviously GPT-4, what is the maximum TAM for its tokens? You know, maybe, maybe it was a few billion dollars, maybe it was tens of billions of dollars. Adoption takes time for GPT 5.4. That number is probably north of a hundred billion, but there's an adoption lag
15:24and there's competitions. Other people are getting it and there's the constant improvements that everyone else is having. So if, if, if improvements stopped, you know, here, the value of an H100 is now predicated on the value that GPT 5.4 can get out of it instead of the value that GPT-4 can get out of it and the margins and all that stuff that these labs are doing, and they're in a competitive environment, so their margins can't go to infinity. Um, so you sort of have this like dynamic that is quite interesting in that an H100 is worth more today than it was three years ago. That's crazy. Um, and I mean, it's also interesting from the perspective of like,
15:54just take that forward. If you, if we had actual AGI models developed, if we had like genuinely human on a server and a human, like on a flop basis in H100, these are such hand wavy numbers about how many flops can the brain do, but on a flop basis, an H100 is estimated to, uh, one E15 is like how much some people estimate the human brain does in flops. Um, obviously in terms of memory, the human brain has way more H100 is like 80 gigabytes and brain might have petabytes. Oh yeah. You've got petabytes name, name a petabyte of ones and zeros, bro. Name me a string. Well, this is actually the
16:32point or like actually in, no, we've just got the best sparse attention techniques ever. Genuinely, right? Like in, like in the sort of like amount of information that is compressed, it might be petabytes, but like the actual, like this, you know, it's like extremely sparse MOE. Um, but anyways, imagine if we had a human knowledge worker can produce six figures a year of value. And so if an H100 can produce something close to that, if we had actual humans on a server, uh, the value of an H100 is like it can repay itself in the course of like a couple of months. So as I've been going
17:03through everything to prep for taxes, I realized that I worked with over 50 different contractors last year from cinematographers to audio technicians to editors. And I owed all of them 1099s. In the past, I've just used a spreadsheet and a big folder of invoices to figure out who I need to collect tax forms from. But with so many contractors, this takes a bunch of time and I've almost missed some people. This year though, Mercury made my process way more straightforward. Whenever I pay somebody in 2025, I just hit a toggle to have Mercury request a double nine from them. Because of that, everything that I needed to issue 1099s got sent directly to Mercury. I literally just clicked a
17:35button and Mercury generated and sent them all out. This is just one of the many things that I never would have assumed that a banking platform could just handle for me. Mercury has a bunch of features like this, which are going to collectively save me multiple days this tax season. You can learn more at mercury.com. Mercury is a fintech company, not an FDIC insured bank. Banking services provided through Choice Financial Group and Column NA members FDIC. So when I interviewed Dario,
Dario Interview
18:01the point I was trying to make is not that I think the singularity is two years away and therefore Dario desperately needs to buy more compute. Although the revenue is certainly there that he needs to buy more compute. But the point I was trying to make is that given what Dario seems to be saying, given his statements that we're two years away from a data center of geniuses, certainly not more than five years away, and data center of geniuses should be earning trillions upon trillions of dollars of revenue. It just does not make sense why he keeps making these statements about being more conservative on compute or to your point,
18:31being less aggressive than OpenAI on compute. And I guess that point got lost because then people were like roasting me about like, oh, this podcast was like trying to convince this like multi-hundred billion dollar company CEO, like, why don't you YOLO it, bro? But no, I was trying to say that internally his statements are inconsistent. Anyway, so it's good to iron it out. Yeah, I think, you know, going back to like sort of the earlier view that if the models are so powerful, the value of a GPU goes up over time. And as we approach closer and closer to, you know, let's say a point where right now only
19:06OpenAI and Anthropic have that viewpoint, as we approach further and further out, actually everyone is going to, even with open source models, be able to like sort of like start to see that value skyrocket per GPU. And so in that sense, you should commit now to compute. But interestingly, in like an Anthropic fashion, right, you know, there's a bit of a meme that they are, they don't, they have problems with commitment issues and they're like sort of polyamorous. Not DiDario, but like, you know, this is a bit of a meme.
19:39Explains everything. By the way, so there's this interesting economics effect called Alkin Allen, which is the idea that if you increase the fixed cost of different goods, one of which is higher quality and one of which is lower quality, that will make people choose the higher quality good on the margin. So to give a specific example, suppose the, you know, better tasting apple costs $2 and then like the shittier apple costs $1. Okay. Now suppose you put an import tariff on them. And so now, now it's $3 versus $2 for like great apple, medium apple, right?
20:15Is that because they both increased by a dollar or should it be like 50% increase? No, no, no. Cause they both increased by a dollar. The whole effect is that if there's a fixed cost that's applied to both, the relative price, the price difference between them, um, the ratio changes. So previously it was like this, the more expensive one was 2X more expensive. Now it's just 1.5X more expensive. So I wonder if applied to AI, that would mean that, look, if, um, if GPUs are going to get more expensive, there will be a fixed cost increase in the price of compute. Yes. As a result, that will push people to be willing to pay higher margins to, for slightly better models
20:51because the calculus is I'm going to be paying all this money for the compute anyways. I might as well just pay slightly more to making sure it's like the very best model rather than a model that's slightly worse. Right. So the hopper went from two to $3. And if a hopper can make a million tokens of Opus and it can make 2 million tokens of Sonnet, um, the price differential between Opus and Sonnet has decreased, um, because the price of the GPU has increased by a dollar from two to three. Yeah, exactly. Um, interesting. I think that makes a ton of sense. Also, I think we just see all of the
21:24volumes are on the best models today, all the revenues on the best models today. And in a compute limited world, um, there's sort of two things that happen, right? A, companies that have locked up, you know, and, and don't have commitment issues, you know, have these five year contracts for compute, they've kind of locked in a humongous margin advantage because they've locked in compute for five years at a price of what it transacted at five years ago or three years ago or two years ago, whatever it is. Whereas if you're now three years into that five year contract and someone else's two year contract or three year contract rolled
21:57off, and now you're trying to buy that at, you know, modern pricing, when you're priced to the value of models, the price is going to be up a lot more. Um, and so in a sense, it's like the person who committed early has better margins in general. Um, and the percentage of the market that is in long-term contracts is much larger than the percentage of the market in short-term contracts that can be this sort of flex capacity that you add at the last second. And, and at the same time, right? Um, so where does the margin go, right? Uh, cause models get more valuable. Um, how much can the cloud players flex their pricing?
22:32Well, if in fact, like if you look at CoreWeave, their average term duration is like over three years right now. Um, uh, for like 90% plus of their compute, it's over three years. Um, and so they end up with this like conundrum of like, well, they can't actually flex price, but every year they're adding incrementally way more capacity than they had previously. Right. Um, this year alone, right. Meta's adding as much capacity as they had in the entire fleet of compute and data centers for all purposes for serving WhatsApp and Instagram and Facebook in 2022 and doing AI, right. Um, they're adding that alone this year. So in the same sense,
23:05you know, you talk about Meta's doing that CoreWeave and, and Google and Amazon, all these companies are adding insane amounts of compute year on year on year that new compute gets transacted at the new price. Um, so in a sense, yes, you've locked in as long as we're in a sort of a takeoff, right? Oh, opening. I went from 600 megawatts to two gigawatts last year and from two gigawatts to, you know, six plus this year and, you know, six to 12 next year. Right. The incremental added compute is where all the cost is not the prior long-term contracts. So then who holds the card
Infra Providers
23:35is the infra providers for charging margin. Right. So now the cloud players, the neoclouds or the hyperscalers can charge the margin. Oh, they can't because, um, or they can to some extent, but then as you go upstream to, oh, well, who has access to all the memory and logic capacity? Well, it's, it's Nvidia for the most part, they've signed a lot of long-term contracts. Um, you know, they've got like $90 billion of long-term contracts today, and they're negotiating three-year deals with the memory vendors today. Um, you know, you've got, you've got, you know, obviously Amazon and Google through Broadcom and they're, you know, Amazon directly and all these companies sort of AMD, these companies hold all the cards because they've secured the capacity.
24:09Um, and, and TSMC is not raising prices, but memory vendors are just like sort of, to some extent, raising a lot of price, right? So they're going to double or triple price again. Uh, but then they're also signing these long-term deals. So who is able to accrue all the margin dollars is actually, you know, potentially the cloud, potentially the chip vendors, um, and the memory vendors, um, until TSMC or ASML like break out and they like, no, actually we're going to charge a lot more. Um, but at the same time, do the model vendors get to charge crazy margins? Um, I think at least this year, we're going to see margins for the model vendors go up a lot,
24:41right? Cause they're so capacity constrained, they have to demand, destroy demand, right? There is, there's no way they can continue. Anthropic can continue at the current pace without destroying demand. Yeah. Uh, let's get into logic and memory. Um, how specifically NVIDIA has been able to lock up so much of both. So if you, I think according to your numbers by 27, NVIDIA is going to have like 70 plus percent of N3 wafer capacity or something like that. Um, uh, or around that area. And then I
25:12forget what the numbers were for a memory at SK Hynix and Samsung and so forth. But, um, if you look at, so think about how the NeoCloud business works and how NVIDIA works with that, or how the, uh, RL environment business works and how Anthropic works with that. In both those cases, NVIDIA is purposely trying to fracture the complimentary industry to make sure that they have as much leverage as possible. So they're giving, you know, allocation to random NeoClouds to make sure that there's not one person that has all the compute. Similarly, Anthropic or OpenAI, when they're working with the data providers, they say, no, we're going to just seed a huge
25:46industry of these things so that, um, we're not locked into any one supplier for, uh, for data environments. And I wonder why on the three nanometer process, that's going to be Tranium 3, that's going to be TPU V7, uh, other accelerators potentially. And why is TSMC just giving it all up to NVIDIA rather than, you know, trying to fracture the market? Yeah. So I think, um, there's a couple like points here, right? Um, on three nanometer, you know, if we go back to last year, the vast majority of three
26:16nanometer was Apple, right? Apple's being moved to two nanometer, memory prices are going up. So Apple's volumes may go down, right? Because as memory prices go up, they have to, either they cut margin or they, uh, move, move on. You know, there, there's some time lag because they have long-term contracts, but basically Apple likely reduces demand slash moves to two nanometer faster, where two nanometer is only capable of a sort of mobile chips today. Um, and in the future AI chips will move there. So sort of Apple has that. And then Apple's also talking to, uh, third-party vendors because they're getting squeezed out of TSMC a little bit. Um, because
26:48TSMC's margins on high-performance computing, um, HPC, AI chips, et cetera, is higher than it is for mobile. Um, because they have a bigger advantage in mobile, um, sorry, in HPC than they do in mobile. But anyways, when you look at what's, what's TSMC running calculus here, actually they're providing really good, um, allocations to companies that are doing CPUs, right? So when you think about, hey, Amazon has Tranium and Amazon has, uh, Graviton, both of those are on three nanometer, Graviton being their CPU, Tranium being their, their AI chip. They're actually, TSMC is much more
27:21excited to give allocation to Graviton than they are to Tranium because they view CPU business as more stable long-term growth, right? And as a company that is conservative and doesn't want to ride cycles of growth too hard, you actually want to allocate to the, uh, the market that is more stable and lower growth rate first, before you allocate all the incremental capacity to the fast growth rate market. Now that is, that is the case generally. And so when you look at like, hey, same for AMD, right? The allocations they get on, um, you know, their CPUs is, is like TSMC is much
27:56more excited about those than they are for GPUs. Um, likewise for Amazon and NVIDIA, um, is, is a bit unique because all, yes, they have CPUs. Yes. They make switches. Yes. They make networking. Um, they make NVLink, they make all these different infinite band ethernet, all these different products and X, um, by and large, most of these things will be on three nanometer by the end of this year with the Rubin launch and all the chips that are in that family. Um, the GPU being the most important one. And yet NVIDIA is getting the majority of supply, right? Part of this is because you look at the market and you like, sort of like, you know, TSMC and others, like they,
28:30there are many ways that they forecast market demand. Um, but also it's market signal, right? The market signaled, Hey, we need this much capacity next year. We need this much. We need this much. We'll sign non-cancelable, non-returnable. We may even pay deposits, right? Things like this. NVIDIA just did it way earlier than Google or Amazon. And in some cases, Google and Amazon had stumbling blocks. You know, there was one, one of the chips got delayed slightly by a couple of quarters, uh, Tranium and all these sorts of things happen. And then,
29:01so in that case, there was a huge sort of like, okay, well, these guys are delaying, but NVIDIA is wanting more, more, more, more, more. And we are checking with the rest of the supply chain. Is there enough capacity? Right? So they're going to all the PCB vendors and they're saying, Hey, is there enough, uh, victory giant? Is there enough PCB? This is like one of the largest suppliers of PCBs to NVIDIA and they're a Chinese company. All the, all the PCBs come from China, sort of from them, um, or many of them. And anyways, they're like, do you have enough PCB capacity? Great. Oh, Hey, uh, memory vendors who has all the memory capacity. Oh, okay. NVIDIA does great. Um, so when you look at sort of in the same way, you know, who, who is AGI pilled enough to
29:36buy compute in long timelines at levels that seem ridiculous to people who aren't AGI pilled, but nonetheless, they're willing to pay a pretty good margin, um, and sign it now because they view in the future that, that ratio is screwed up. The same thing happens with the supply chain for semiconductors, right? NVIDIA was, well, I don't think NVIDIA is quite AGI pilled, right? You know, Jensen doesn't believe software is going to be automated fully and all these things, right? Accelerated computing, not AI chips, right? It's AI chips, right? But that's what he calls it, right? Yeah. Cause I mean, I think there's a broader term, right? AI is within that, but like
30:08physics modeling and simulations and like, or really just like, he's not embracing the sort of like main use case and I think he's embracing it, but like, I just don't think he's like AGI pilled like Dario, right? Or Sam, but he's still way, way more AGI pilled than Google was at Q3 of last year or Amazon was at Q3 of last year. And he saw way more demand, right? Um, and, and, and, and the reason is pretty simple. You know, you can see all the data center construction. He's like, okay, I want to have this market share. Um, you know, we sort of like have all the data centers tracked and, you know, you can see, you know, there's, there's a lot of data centers that you
30:41could say, well, they could be one or the other. Right. And so in some, to some extent, Google and Amazon, you know, Google, especially, even though they're, you know, their TPU is just better for them to deploy, they have to deploy a crap load of GPUs because they don't have enough TPUs to fill up their data centers. They can't get them fabbed. Wait, can I, so I have a question about that. Google sold, I think a million, was it the V7s, the Ironwoods to Anthropoc. And you're saying in general, there's this big bottleneck right now, this year, next year, I mean, I guess going forward forever now is going to be the, you know, logic memory, the stuff that like it takes to build
31:14these ships. And Google has DeepMind. This is the other third prominent AI lab. And if this is the big bottleneck, why would they sell it rather than just giving it to DeepMind? Right. So, so this is again, like a problem with like, you know, DeepMind people were like, this is insane. Why did we do this? Yeah. Right. But then Google Cloud people and Google executives saw a different like thought process. Right. And basically, um, you, you know, you and I know the compute team. There's one guy from, you know, both of them actually came from Google, uh, at, uh, the main people on, on, uh, the compute team at Thropic, they saw this dislocation, they negotiated a deal and they
31:47were able to get access to these, to this compute before Google realized. And so the, actually the chain of events, at least from our data that we found was in, in early Q3, um, we saw over the course of two, uh, over the course of like six weeks, we, we saw capacity on, um, Anthropic or sorry, on TPUs go up by a significant amount over the course of those six weeks. And it went up like multiple times in those six weeks. Right. There were multiple requests. Google even had to go to TSMC and explain to them why they needed this, uh, increase in capacity
32:20because it was so sudden. But that, a lot of that capacity increase was for selling to Anthropic because Anthropic saw it before Google. And then Google had Dana Bonanno and Gemini three, which caused their user metrics to skyrocket and leadership at Google was like, Oh, and then they started making the statement of, we have to double compute every, is it six months? Or I don't remember the exact number that they said. Um, but they, they really woke up a lot more. And then they're like, Oh, Hey, TSMC, we want more. We want more. And it's like, well, sorry, guys, like we're sold out for next year. Um, we can work on next year.
32:50We can maybe get like five, 10% more for 26, but really we're going to work on 27. Right. It's sort of like, you know, there's this like information asymmetry of the labs in my mind. Right. I don't know if this is exactly, it's the narrative I've spun myself from seeing all the data in the supply chain on like wafer orders and like what's going on with the data centers that, you know, Anthropic signed and fluid stack signed and all this, like sort of, it's, it's, it's, it's pretty clear to me that Google screwed up and you can see this from Google's Gemini ARRs, right. Um, they had next to nothing in Q1, Q3, uh, Q3 a little bit, right. Once they started
33:20inflecting, but Q4, they were at like 5 billion ARR, right. Um, exiting or something like this. So it's like, or 5 billion revenue for Q4, uh, on an ARR basis. Um, and so it's clearly like Google didn't see revenue skyrocket. Um, and in a sense, right. Anthropic was not willing, you know, it was kind of had like a little bit of commitment issues before their ARR exploded, even though they have far more information asymmetry and see what's coming down the pipe. Google is going to be more conservative than Anthropic is a and B Google had, had even less ARR. Um, so they, they sort of were
33:53like, I think just not willing to like sort of do it. And then they realized they should do it. And so now since then, Google, um, has gotten absurdly AGI pilled, right. Uh, in terms of like what they're doing, they bought an energy company, they're buying, putting deposits down for turbines. Uh, they're buying a ridiculous percentage of the powered land. Uh, they're going to utilities and negotiating long-term agreements or doing this on the data center and, um, power side, um, very, very aggressively. Right. So, you know, I think Google woke up towards the end of last year,
34:24but it took them some time. And how many gigawatts do you think Google will have by the end of next
Google's Plans
34:28year? By my data. You charge for that kind of information. Um, I feel like every year the bottleneck for what is preventing us from scaling AI compute keeps changing. Uh, a couple of years ago was co-host last year. It was power this year. You'll tell me what the bottleneck is this year, but I want to understand five years out, what will be the thing that is constraining us from deploying the singularity? Yeah. I think the biggest bottleneck is compute. And for that, the longest lead time supply chains are not power or data centers. They're actually the semiconductor
35:00supply chain themselves, right? It switches back from being power and data center, uh, as a major bottleneck to chips. And in the chip supply chain, there's a number of different bottlenecks, right? There's memory, there's, uh, logic wafers from TSMC. There's, uh, there's fabs themselves. Construction of the fabs takes a couple years, three, two to three years versus a data center takes, uh, less than a year, right? Uh, we've seen Amazon build data centers in as fast as eight months, right? So there's a big difference in lead times because of the complexity of the building,
35:31the fab that actually makes the chips and then the tools, right? Those also have really long lead times. And so the bottlenecks as we've scaled have shifted from, Hey, what is the supply chain currently not? What is it currently not able to do? Um, which was COOS and power and data centers, but those were all shorter lead time items, right? COOS is a much more simple process of packaging chips together. Um, power and data centers are ultimately way more simple than the actual manufacturing of the chips. And so there's been some sliding of, of, of capacity across, you know, mobile or PC to data
36:07center chips, but that's been somewhat fungible. Whereas on, in, whereas COOS and power and data centers have sort of had to start a new as supply chains, but now there's sort of no more capacity for the mobile and PC industries, which used to be the majority of the semiconductor industry to shift over to AI, right? NVIDIA is now the largest customer at TSMC and NVIDIA is the largest customer SK Hynix, the largest memory manufacturer, right? So it's sort of impossible for the scaling or the sliding of resources away from the common person, right? PCs and, and smartphones to shift any more
36:43towards the AI chips. And so now how do we scale the AI chip production? And that's the biggest bottleneck as we go to 2030 is those. It'd be very interesting if there's an absolute gigawatt ceiling that you can project out to 2030 based just on, Hey, we can't produce more than this many EUV machines. Right. So to scale compute further, right? There's some different bottlenecks this year, next year. But ultimately by 28, 29, the bottleneck falls to the lowest rung on the supply chain,
37:15which is ASML, right? ASML makes the world's most complicated machine, i.e. an EUV tool. Um, and the selling price for those is 300, 400 million dollars. And currently they can make about 70. Next year they'll get to 80. Uh, even under very aggressive supply chain expansion, they only get to a little bit over a hundred by the end of the decade. And so what does that mean? Okay. They can make a hundred of these tools by the end of the decade and, um, you know, 70 right now. How does that actually translate to AI compute? Right. We, we see all these numbers from
37:45Sam Altman and, and many others across the supply chain, gigawatts, gigawatts, gigawatts, right? How many gigawatts are we adding? Um, and we see, you know, Elon saying, Hey, the a hundred gigawatts in space a year, a year, right? The, the problem with any of, uh, these numbers or the challenge to these numbers is, you know, actually not the power, not the data center. We can dive into that, but it's, it's, it's manufacturing the chips, right? So a gigawatt of, you know, NVIDIA's Rubin chips, right? So Rubin is announced at GTC. Uh, I believe the week this podcast goes live and to make a
38:18gigawatt worth of data center capacity of NVIDIA's latest chip that they're releasing at the end of this year, towards the end of this year, you need, you know, a few different wafer technologies, right? Um, you need about 55,000 wafers of three nanometer. You need about 6,000 wafers of five nanometer, and then you need about 170,000 wafers of DRAM, right? Memory. And so across these three different buckets, um, each of these requires different amounts of EUV, right? So when you manufacture a wafer, uh, there's thousands and thousands of process steps where you're
38:50depositing material, removing them, but the sort of key critical step, which at least in advanced logic is like 30% of the cost of the chip is something that doesn't actually put anything on the wafer, right? You take the wafer, you deposit photoresist, which is like a, chemical that basically chemically changes when you expose it to light. And then you stick into the EUV tool, which shines light at it in a certain way. It patterns it, right? Cause there is what's called a mask, which is a stencil effectively for the design. And so when you look at a wafer, um, you know, leading edge three nanometer wafer has 70 or so masks, right? 70 or
39:21so layers of lithography, but 20 of them are the most advanced EUV, right? And that specifically, you know, if you think about, okay, well, if I need 55,000 wafers for a gigawatt, if I do 20 EUV passes per wafer, you then you can do the math. That's like, okay, that's 1.1 million passes of EUV for a single gigawatt. So actually like, it's pretty simple. And then once you add the rest of the stuff, it ends up being 2 million right across five nanometer and all the memory you're at roughly, um, 2 million EUV passes for a single gigawatt. You know, these, these tools are very
39:55complicated. So, um, when you think about what it's doing across a wafer, it's taking the wafer and it's scanning and it's stepping across, right? It's standing, stepping across. And it does this hundreds of times across the entire, or dozens of times across the whole wafer. And, and so when you're talking about, Hey, how many EUV passes, that's the entire wafer is being exposed, um, at a certain rate, a wafer, a EUV tool can do roughly 75 wafers per hour. Um, and the tool is up roughly 90% of the time, right? So in the end you end up with, actually, I need about three and a half
40:26EUV tools to do the 2 million EUV wafer passes for the gigawatt. So three and a half EUV tools, uh, satisfies a gigawatt. So it's funny to think about the numbers, right? Because we're talking about, Oh, what's the gigawatt cost. It costs like $50 billion roughly. Right. Whereas what is three and a half EUV tools costs? That's like 1.2, right? Um, it's actually like quite a lower number, which is, which is interesting to think about like, Oh, 50 gigawatts of economic, you know, sort of CapEx in, in the data center. And what gets built on top of that in terms of tokens is even larger, right? It might be a hundred billion dollars worth of AI value into the supply chain
41:00is held up by this $1.2 billion worth of tooling that simply just cannot expand its supply chain quickly. And I think, so you, you, you had this article recently where you were saying over the last three years, TSMC has done a hundred billion dollars of CapEx. So it's like 30, 30, uh, 40. And if you think about, I mean, a small fraction of that is sort of like being used by NVIDIA for the three nanometer that it's going to, or, you know, previously four nanometer that it's using for its chips. Um, but NVIDIA has turned that into what was, what are, it's like your earnings
41:32last quarter was like 40 billion and so 40 billion times four. So $160 billion. So NVIDIA alone is turning some small fraction of a hundred billion in CapEx. It's going to be depreciated over many years, not just this one year, into $160 billion in a single year. And then that gets even more intense when you go down the supply chain to ASML, which is taking a billion dollars worth of machines to produce a gigawatt. And then of course those machines last for more than a year, right? So it's, it's doing more than that. Okay. So now I want to understand, okay, well, how many such machines will there be by 2030? If you include not just the ones that are sold that year, but are,
42:04have been compiling over the previous years. Um, and what does that imply about the Sam Altman says he wants to do a gigawatt a week in 2030? Or are, uh, when you add up those numbers, is that compatible with that? Right. That's, that's completely compatible, right? Cause if you think about TSMC and the entire ecosystem has something 250 to 300 EUV tools already. Um, and then you stack on 70 this year, 80 next year, growing to a hundred by 2030, you're at like 700 EUV tools by the end of the decade. Um, 700 EUV tools, three and a half tools per gigawatt, um, assuming it's all allocated to AI,
42:38which it's not, but three and a half tools per gigawatt gets you to 200 gigawatts worth of AI chips for the data centers to deploy. Right. So 200 gigawatts, Sam wants 50 gigawatts, right? 52 gigawatts a year. He's only taking 25% share then, right? Obviously there's some share given to, um, you know, mobile and PC, uh, assuming that, you know, for some reason we're allowed to even have consumer goods still, um, you know, and we don't get priced out of them, but, you know, roughly like he's saying 25%, 50%, you know, 25% market share of the total chips fab. That's, that's kind of like
43:10very reasonable given, you know, this year alone, I think he's going to have access to 25% of the blackwell GPUs that are deployed. Right. So it's, it's not that crazy. I find it surprising that, you know, when was the first, uh, when did ASML start shipping EUV tools when the seven nanometer started? So I don't know when that was exactly, but you're saying in 2030, they're going to be using machines that initially were shipped in 2020. So 10 years, you're using the same most important machine in this most technologically advanced industry in the world. I find that surprising.
43:42So ASML has been shipping EUV tools now for roughly a decade, but it only entered mass volume production around 2020. You know, the tool's not the same. Um, you know, back then the tools were even lower throughput. Um, there were, there's various specifications around them called overlay, right? You know, I was mentioning you're stacking layers on top of each other, right? You'll do some EUV, you'll do a bunch of different process steps, depositing stuff, etching stuff, cleaning the wafer, you know, dozens of those steps before you do another EUV layer. Um, there's a spec called overlay, right? Which is, okay, you did all this work, you know,
44:13you drew these lines on the wafer. Um, now I want to draw these dots, right? Let's just say I want to draw these dots to connect this, these lines of metal to, and then, you know, holes. And then the next layer up is another set of lines that goes perpendicular. So now you're connecting wires going perpendicular to each other. Um, there, you have to, you have to be able to land them on top of each other. So it's called overlay and overlay is a spec that's been improved rapidly by ASML. Wafer throughput has been improved rapidly by ASML. And also the price of the tool has gone up, but not as much as the capabilities of the tool, right? Initially the EUV tools were
44:43like 150 million and over time they're now like 400 million. Uh, you know, as I, as I look out to 2028, but the capabilities of the tools have more than doubled as well, right? Especially, um, on throughput and overlay accuracy, which is the ability to stack, you know, accurately align the, the subsequent passes on top of each other. Um, even though you do tons of steps between. And so this is, this is, um, you know, ASML is improving super rapidly. I think it's also something noteworthy to say ASML is, you know, maybe one of the most generous companies in the world,
45:17right? They have this linchpin thing. No one has anything competitive. Maybe China will have some EUV by the end of the decade, but no one else, you know, has anything even close to EUV. Um, and yet they haven't taken price and margins up like crazy, right? You know, you go ask, you know, some other folks, you know, that we talk to all the time, like, you know, for example, Leopold and they're like, well, you know, let's, let's, you know, let's, let's have the price go up. Right. Uh, cause they can, the margin is there. You can, you can take the margin. Like Nvidia takes the margin. Memory players are taking the margin, but ASML has never risen the price more than they've
45:48increased the capability of the tool. Um, and so in a sense, they've always provided net benefit to their customer. It's not that the tool is stagnant. It's just that like, you know, these tools are old. Yes, you can upgrade them some and the new tools are coming. And for simplicity's sake, we're kind of ignoring, you know, the advances for this podcast, the advances in overlay or throughput per tool. So you say we're producing 60 of these machines, uh, this year and then 70, 80 over subsequent years. What, what would happen if ASML just decided to double its CapEx or triple its
46:18CapEx? What is preventing them from producing more than a hundred in 2030? Why, why, why so confident that even five years out, you can be relatively sure what their production will be? So I think, I think a couple of factors here, right? ASML has not decided to just go YOLO. Let's expand capacity as fast as possible. Right. Um, in general, the semiconductor supply chain has not right. It's lived through the booms and busts and, uh, we can talk a bit more about it, but basically no one, you know, some players as of very recently have like woken up, but in general, no one really
46:50sees demand for 200 gigawatts a year of AI chips or, you know, trillions of dollars of spend a year in the semiconductor supply chain. They're just like, they're not, they're not AI pilled, right? They're not AGI pilled. We're going to get to a trillion dollars this year. Yeah. I, I, I, I, I feel you, but I'm saying like, no one really understands this in the supply chain. Um, constantly we're told our numbers are way too high. And then when they're right, they're like, oh yeah, but your, your next year's numbers are still too high. And it's like, but anyways, like ASML has sort of, their tool has four major components, right? It has, um,
47:24the source, right? Which is made by Symer in San Diego. Um, it has the, uh, reticle stage, which is made in Wilmington, uh, Connecticut, right? It has the wafer stage, um, and the, um, the optics, right? The lenses and such. And those two are made in Europe, right? And so when you, when you look at each for each of these four, they're tremendously complex supply chains that A, they have not tried to expand massively and B, when they try to expand them, the time lag is quite long, right? Um, and so again, this is the most complicated machine that humans make period,
47:59right at, at a volume, um, at any sort of volume, but like, let's talk about the source specifically, right? What does the sports source do? It drops these tin droplets. It hits it three subsequent times with the laser perfectly. So the first one, uh, hits this tin droplet expands out, it hits it again. So it expands out to this perfect shape and then it blasted at super high power. And, um, the tin droplets get excited enough that they release a UV light, 13.5 nanometer. And then it's in this thing that is like basically collecting all the light and directing it into the lens stack, right? Then you have the lens stack, which is Carl Zeiss, right? As you mentioned, and, and,
48:32and some other folks, but Zeiss being the most important part of it. Um, they also have not tried to expand production capacity because they don't see any, you know, they, they, they're like, oh yeah, yeah. Like we're growing a lot because of AI. We're growing from 60 to a hundred, right? It's like, no, no, no, no, no. We need to go to like a couple hundred, but it's, it's fine. Whatever. Um, each of these tools has, you know, I think 18, um, of these lenses effectively, um, mirrors, um, they are, they're multi-layer mirrors, which are perfect layers of molybdenum and, uh, ruthenium, if I recall correctly, um, stacked on top of each other in many layers.
49:05And then the light bounces off of it perfectly, but it's not just like, you know, like when we think about a lens, you know, it's, it's like in a shape and it focuses the light. This is a, this is like a mirror. That's also a lens. And so it's pretty complicated. Any defect in this perfect layer of stat in this, in these like, uh, super thinly deposited stacks will mess it up. Uh, any curvature issues, like there is a lot of challenges with scaling the production. Um, it's quite artisanal, right. In the sense, right. Because you're not making tens of thousands of these a year, you're making hundreds, you're making thousands, right. Uh, you know, talk about 60 tools
49:37a year, um, 18 of these per tool you end up with, you know, you're still in the, um, you know, hundreds of tools, uh, or a thousand, you're at the thousand number roughly for these, these lenses, um, and projection optics. So then you, and then you step forward to the reticle stage, uh, which is also something really, uh, crazy. This thing moves at, I want to say nine Gs, like it, it will shift nine Gs because as you step across a wafer, the tool will go, um, and the wafer stage is complimentary. It's the wafer part. So you, you line these two things up, you're taking all the
50:10light through the lenses that's focused and, and here's the reticle, here's the wafer and you're passing, uh, the reticles moving one direction, the wafer is moving the direction, the other direction as it scans a, uh, 26 by 33 millimeter section of the wafer. And then it stops, it shifts over to another part of the wafer and does it again. And it does that in just seconds. Right. And each of them are moving at nine Gs and opposite directions. So each of these things is like a wonder and marvel of like chemistry, uh, fabrication, you know, um, you know, sort of like mechanic,
50:42mechanical engineering, um, optical engineering, because you have to align all these things and make sure they're perfect. Uh, all of these things have crazy amounts of metrology because you have to perfectly test everything. Cause if anything is messed up, the yield goes to zero, right? Cause this is such a finely tuned system. And by the way, you, it's so large that you're building it in all these, you're building in the factory in Heindhoven, uh, Netherlands, and they're deconstructing it and shipping it on many planes to the customer site. And then you're reassembling it there and testing it again. And that process takes many, many months. So like, it's, it's just, there's so
51:16many steps in the supply chain, right? Whether it's Zeiss making their, uh, lenses and projection optics or Seimer, which is an ASML owned company making the EUV source. And each of these has its own complex supply chain, right? ASML's commented, their supply chain has over 10,000 people in it, right? Like individual suppliers. Yes. And it might not be directly. It might be through like, Hey, you know, Zeiss has so many suppliers and, you know, XYZ company has so many suppliers, but you know, they, these, you know, if you just think about like, okay, you're talking about two physically moving objects that are like this large and this large, you know, um, the size of a wafer,
51:49right. And it has to be accurate to the level of, you know, single digit nanometers or even smaller because the entire system, the overlay, right? Uh, layer to layer, uh, variation has to be on the order of three nanometers, right? Um, and so if the overlay is three nanometers, that means each individual part, the accuracy of its physical movement has to be even less than that, right? Has to be sub one nanometer in most cases because the, the error of these things stacks up, right? And, and, and so there's no way to like, you know, just like snap your fingers and increase
52:23production, right? You know, it's things simple as power, right? The U S going from 0% power growth to 2% power growth, even though China's already at 30 was like so hard for America to do. Right. Um, and, and, and that's a really simple supply chain with very few people in the supply chain, right. Uh, who make difficult things. And there's, you know, probably what a hundred thousand electricians slash people who work in the supply chain, uh, of electricity, um, or more in the U S. And, you know, when you look at, Oh, ASML employs like so few people, Carl Zeiss probably employs like
52:56less than a thousand people working on this. And all of those people are like super, super specialized. So it's, you know, you can't just train random people up for this, like in the snap of a finger, you can't just get your entire supply chain to get galvanized. Right. NVIDIA has had to do a lot, um, to get the entire supply chain to even deliver the capacity they're going to make this year. Even though when you go talk to Anthropic, they're like, well, we're short of TPUs or short of training. We're short of GPS. When you go talk to OpenAI, they're like, we're short of these things. Right. Um, so OpenAI and Anthropic, they know they need X. NVIDIA is not quite as AGI
53:27pilled and they they're building, uh, you know, X minus one. Um, and you go down the supply chain, everyone's doing minus one. And in some cases they're doing like divided by two, right. Because they just don't, they're not AGI pilled. Right. I think. And, and, and so you end up with the time lag for this whip to react, right. You know, the, the, the sort of AI pilledness is, and, and desire to increase production is so long. And then once they finally understand, Hey, we need to increase production rapidly. Right. And they, they think they understand, Oh, AI means we have to go from 60
53:58to a hundred. And in addition to the tools, all just getting better and faster, you know, the source getting higher power from 500 Watts to a thousand and, you know, all these other aspects of the supply chain, you know, advancing technically plus increase of production. They think they're, they're like actually increasing production a lot. But if you float through the numbers of, Hey, what does Elon want? He wants a hundred gigawatts a year in space by 2028, is it? Um, or 2029. And, you know, Sam Altman wants 50 gigawatts, 52 gigawatts a year, um, by the end of the decade. And you look at, you know, probably anthropic needs the same. And then, you know, Google needs that, you know,
54:31you go across the supply chain. It's like, wait, no, the, the supply chain can't possibly build enough capacity for everyone to get what they want on the side of compute. Real conversations are full of fits and starts and pauses and interruptions. I mean, just listen to this episode. At least superficially, voice models have gotten pretty good at handling these kinds of things. But at a deeper level, interruptions can throw off a model's understanding and degrade the quality of its responses. And it's not always clear why. Labelbox realized that this was a huge bottleneck for their customers. So they built an evaluation pipeline called Echochain to help
55:03you diagnose and fix your voice model's specific failure modes. Echochain starts by feeding conversations into your voice model. It then injects interruptions at specific intervals and classifies any failures into one of three different modes. One, did it acknowledge the correction but keep the old plan? Two, did it adapt briefly but then slide back to old assumptions? Or three, did it abandon the old task entirely? This is extremely useful information because Labelbox can get your model the exact data it needs to fix whatever issue is preventing it from being a viable and competent voice model. So if you want to ensure that your voice model states performance in real conversations,
55:37you should reach out to Labelbox. Go to labelbox.com slash thwarkash.
55:44So I feel like in the data center supply chain for the last few years, people have been making arguments of this specific thing we are bottlenecked by. Therefore, AI compute can't scale more than X. But then as you've written about, oh no, if say the grid is a bottleneck, then we just do behind the meter on the site, we do gas turbines, et cetera. If that doesn't work, there's all these other alternatives that people fall back on. And I want to ask you a question about whether we can imagine
56:14a similar thing happening in the semiconductor supply chain. So if EUV becomes a bottleneck, what if we just went back to 7 nanometer and do what China is doing currently in producing 7 nanometer chips with multi-patterning with DUV machines? And if you look at a 7 nanometer chip like the A100, there's been a lot of progress, obviously, from the A100 to the B100 or B200. But how much of that progress is just numerics? And then if you just told constant, say FP16 from A100 to B100,
56:51the B100 is a little over one petaflop. And then A100 is 300 teraflops. And so you have basically 3x holding numerics constant. You have a 3x improvement from A100 to B100. And then some of that is the process improvement. Some of that is just the accelerator design improving, which we could replicate again in the future. And so then it just seems like actually it's like very small effect from the process improving from 7 nanometer to 4 nanometer. So I don't know. Say we have, I don't know the numbers offhand, but let's say
57:24there's like 150k wafers per month of 3 nanometer, and then eventually similar amounts for 2 nanometer. But then there's a similar amount for 7 nanometer, right? So if you have all those old wafers, and then there's maybe a 50% haircut, because the process, you know, the bits per wafer area are like, what is it, 50% less or something? Then it's like, it doesn't seem like that bad to just bring on 7 nanometer wafers. And then, oh, that gives you another 50 or 100, another 100 gigawatts.
57:53Yeah, tell me why that's naive. Yeah. So I think, you know, we potentially do go crazy enough that this is, this happens because we just need incremental compute and the compute is worth the higher cost power, et cetera, of these chips. But it's also unlikely to some extent, to a large extent, because of, I think, I think just comparing, you know, some of these are like not fair comparisons, right? For example, you know, from A100, which is 312 teraflops to Blackwell, which is like a thousand-ish of FP16, or maybe it's
58:282000, and then Rubin is like 5000 or so FP16. It's not a fair comparison, because these chips have vastly different, you know, design targets, right? At A100, that is what NVIDIA optimized for was FP16, BFloat16 numerics. When you look at Hopper, they didn't care as much about that. They cared about FPA. When you look at Rubin, they don't care about FP16 and BF16 as much. They care mostly about FP4 and 6, right? And so numerics, like, are what they've designed the search, designed their
59:02chip for. And so there's a couple, like, you know, okay, let's just say, let's redesign, let's make a new chip design on 7 nanometers. Sure, we can do that, like, and then it's optimized for the numerics of the modern day. The performance difference is still going to be much larger than the flops different you mentioned, right? Often it's easy to boil things down to flops per watt or flops per dollar, but that's actually not a fair comparison, right? And so this is where sort of you can bring in, hey, let's look at Kimi K1 or DeepSeek. When you look at Kimi or Kimi K2.5, sorry, and DeepSeek,
59:38when you look at these two models and you look at their performance on Hopper versus Blackwell on, you know, very optimized software, you get vastly different performance, right? And most of this is not attributed to flops. A lot of this is, or numerics, right? Because those models are actually 8-bit. So it's not like Blackwell's and Hopper, they're both optimized for 8-bit and Blackwell's not really taking advantage of its 4-bit there. You know, the performance gulf is actually much larger. And, you know, the way you can sort of compare them and think about them is,
1:00:09sure, it's one thing to, you know, shrink process technology and make the transistor smaller, and each chip has X number of flops. But you forget the big gating factors is that these models don't run on a single chip. They run on hundreds of chips at a time, right? If you look at DeepSeek's production deployment, which is well over a year old now, they were running on 160 GPUs, right? And that's what they serve production traffic on. And so they split the model across 160 GPUs. Every time you cross the barrier of a chip to another chip, there is an efficiency loss because you now have to transmit over, you know, high-speed electrical surities, and there is a latency cost,
1:00:43there's a power cost, there's all these dynamics that hurt. As you shrink and shrink and shrink the process node, you've increased the amount of compute in a single chip. Now, in-chip, right, movement of data is, you know, at hundreds or of at least tens of terabytes a second, if not hundreds of terabytes a second. Whereas between chips, you're on the order of a terabytes of second, right? And so this movement of data between chips that are super close to each other physically, and then you can only put so many chips close to each other physically, so you have
1:01:16to put chips in different racks. The order of data between that is on the order of hundreds of gigabits a second, right? 400 gig or 800 gig a second. So 100 gigabytes a second, roughly. And so you've got this, like, huge ladder of, like, oh, on-chip, I can communicate at super fast speeds. Within the rack, I can communicate at, you know, order of magnitude speeds. Outside the rack, I can communicate at an even order of magnitude lower than that. And as you break the bounds of chips, you end up with this performance loss. So anyways, the reason I explain this is because when you look at Hopper versus Blackwell, even if both of them are using, you know, a rack worth of chips, the Hopper is
1:01:51significantly slower because the amount of performance that you have leveraged to the task within that, you know, within each domain of, hey, tens of terabytes a second of communication between these transistors or these processing elements, and, you know, terabytes a second between these processing elements is much, much higher, and therefore the performance is much higher. So when you look at inference at, let's say, 100 tokens a second for DeepSeq and KimiK 2.5, Hopper versus Blackwell, the performance difference is on the order of 20x.
1:02:21Interesting. Not 2 or 3x like the flops performance difference indicates, even though those are on the same process node. Makes sense, yeah. You know, there's just differences in networking technologies and what they've worked on. And so you can translate some of these back. But when you look at like Rubin, what they're doing on 3 nanometers, some of these things are just not possible to do all the way back on A100, even if you make a new chip for 7 nanometer. There's just like certain architectural improvements you can port. There's certain ones you cannot. And so the performance difference is not just going to be the difference in flops. It's in some senses cumulative between
1:02:53the difference in, you know, flops per chip, networking speed between chips, how many flops are on a chip versus a system, memory bandwidth on a single chip and on an entire system. All of these things compound. Can I ask you a very naive question? So this year, last year, the B200 has now two dyes on a single chip. So you can get that bandwidth on a single chip without having to go through enemy link or infinite band. And then next year, Rubin Ultra will have four dyes on one chip. What is preventing us from just doing that with an old, like how many dyes could
1:03:24you have a single chip and still get these tens of terabytes a second? Yeah. So even within Blackwell, there are differences in performance when you go, when you're communicating on the chip versus across the chips. Those bounds are obviously much smaller than when you're going, you know, out of the entire chip, but each die versus, you know, within the package. And so anyways, when you scale, you know, the number of chips up, there is some performance loss. It's not just perfect, but it is way better than different entire packages. Now, how large can advanced packaging scale?
1:03:57The way NVIDIA is doing it is co-host the way, you know, Google and with Broadcom and MediaTek and, you know, Amazon, Tranium, all these chips are doing is called co-host. But actually, you can go and look back at what Tesla did with Dojo, right? Dojo, which they canceled and restarted. Anyways, Dojo was a chip that was the size of an entire wafer. They had 25 chips on it. And there were some trade-offs, right? They couldn't put HBM on it. But the positive side of it was that they had 25 chips on it. And so
1:04:30to date, it is still probably the best chip for running convolutional neural networks. It's just not great at transformers because the, you know, the sort of the shape of the chip, the memory, the arithmetic, all these various specifications of it are just not well suited for transformers. They're well suited for CNNs. And anyway, so, you know, Dojo chips were optimized around that they made a bigger package. But at the same time, you know, as you make packages bigger and bigger and bigger, you have other constraints, right? Networking speed, memory bandwidth,
1:05:00cooling capabilities, all of these things start to rear their heads. It's not simple. But yes, you will see a trend line of more chips on the package. And yes, you're going to be able to do that on 7 nanometer. In fact, that's what Huawei did with their Ascend 910C or D. They put, they put, they were initially just one and then they did two. And they're focusing on scaling the packaging up because that is an area where they can advance faster than sort of process technology where they can't shrink. But at the end of the day, that's still, you know, that's something that you can do on the leading edge chips too, right? Anything you do on 7 nanometer, you can also
1:05:33probably do on 3 nanometer in terms of packaging. So if we're, if you end up in this world in 2030, where the West has the most advanced process technology, but it has not ramped it up as much. Whereas China, I don't know if you think by 2030, they would have UV and I don't know, 2 nanometer or whatever, but they are semiconductor pills. So they're producing in mass quantity. Basically, I'm wondering what the year is where there's a crossover where our advantage in process technology has faded enough and their advantage in scale has
1:06:05increased enough. And also their advantage in like having one country that has the entire supply chain and visionized rather than having random suppliers in Germany and Netherlands and whatever would mean that China would be ahead in its ability to like produce mass flops. Yeah. So to date, China still does not have, you know, entire indigenized semiconductor supply chain, right? But were they in 2030? Yeah. By 2030, it's, it's possible that they do. But, but to date, right, all of, of China's
1:06:367 nanometer and 14 nanometer capacity uses ASML DUV tools, right? And the amount that they can ship and import from ASML is, is large. And, uh, but the point being that the vast majority of ASML's revenue, especially on EUV, all of it, uh, is outside of China. So the scale advantage is still in the favor of the, of let's call it the West plus Taiwan, Japan, et cetera. But they're trying to make their own DUV and EUV tools, right? They're trying to do all these things. The question is how fast can they advance, um, and, and scale up production as well as quality. And to date, we haven't seen that. Now I'm quite
1:07:11bullish that they're going to be able to do these things over the next five to 10 years, right? Really scale up production, really, uh, kick it into high gear. They have more engineers working on it. They're, um, they have more, uh, desire to throw capital at the problem. So by 2030, do they have fully indigenized DUV? I think for sure. For sure. DUV, yes. And fully indigenized EUV by 2030? I think they'll have working tools. I don't think that they'll be able to manufacture a bunch yet, right? You know, there's, they're sort of having it work and then there's production hell, right? Um, and ultimately like ASML had EUV working in the early 2010s at some capacity,
1:07:48right? Now the tools were not accurate enough. They were not, uh, scaled for high production or scaled for high volume manufacturing, reliable enough. And then they had to ramp production and that all took time. Production hell takes time, right? Which is why it took another five to seven years to get EUV into mass production at a fab rather than just working in the lab. So how many, um, DUV tools do you think anybody will manufacture in 2030? ASML? No, uh, China. Oh, that's a great question. Um, you know, current, it's, it's, it's a bit of a,
1:08:20a challenge to look into this supply chain, especially we try really hard. Um, but you know, in some instances they're like buying stuff from Japanese vendors and if they want to fully indigenize supply chain, they need to not buy these lenses or buy these, uh, projection optics or stages from Japanese vendors. They need to build it internally. So it's really tough to say where they'll be able to get to. Like, I honestly think it's like a shot in the dark, but it's, it's probably not unlikely that they'll be able to do, you know, on the order of a hundred DUV tools a year. Uh, whereas ASML is doing hundreds of DUV tools a year currently, you
1:08:54know, no one's made a process node. No company has a process node where they make a million wafers a month. Right. Um, Elon says he wants to do it and China's obviously going to do it. Right. Uh, and I don't think the, you know, TSMC is trying to do that. Um, the memory makers may get there as well, right. To the million wafers a month, but not in a single fab. It's, it's, it's sort of mind boggling to think of that scale, um, and challenging to see the supply chain galvanize for that. So I'm not sure, you know, I don't want to doubt, you know,
1:09:28China's capability to scale. Right. I guess this is an interesting question. And I think it might, uh, you know, at some point in time analysis, we'll do the deep dive on this, but I think this question of like, by when would China be able, like indigenized Chinese production would be, could be bigger than the rest of the West combined. If you just add up like all the D and put in the input of your model, when they'll have, do you view machines that scale when they'll have you machines that scale? Cause I think there's this like question around if you have long timelines on AI by long, meaning 2035, which is not that long
1:10:00in the grand scheme of things. Um, should you expect a world where China is like dominating in semiconductors, which I think, I don't know, it doesn't get asked enough. This would be in San Francisco. We're just like thinking on timescale of like, you know, weeks. And then if you're outside of San Francisco, you're not thinking about AGI at all. And so this question of like, okay, what if we have AGI? What if you have this transformational thing that is commanding tens of trillions of dollars or hundreds of trillions of dollars of economic growth and weight, you know, token output and so forth. Uh, but then it happens in 2035. And like, what does that imply for the West versus China? I think it's just like, I don't know, the semi analysis
1:10:33has got to write the definitive model on this. Yeah. So I think it's, it's really challenging when you move timescales out that far, right? Like what we tend to focus on is like, we're tracking every data center, we're tracking every fab, we're tracking all the tools and we're tracking where they're going. But the, the time lags for these things are, are relatively short, right? Um, we can only make like reasonably accurate estimates for data center capacity based on, you know, land purchasing and, you know, permits and turbine purchasing and all these
1:11:03things. And we know where all these things are going and we like, that's what the data we sell is. But like, you know, as you go out to like 2035, you know, things are just so radically different and your error bars get so large, it's kind of hard to make an estimate. Uh, but at the end of the day, like, you know, there is, if takeoff or timelines are slow enough, right? Um, then certainly China, I don't see why they wouldn't be able to catch up drastically. Right. Um, you know, in, in some sense, we've got like this valley, right. Of where, you know, call it three
1:11:33to six months ago, Chinese models were, or maybe even now Chinese models are competitive as they've ever been. Uh, I think, I think Opus 4.6 and GPT 5.4 have really pulled away and made the gap a little bit bigger, but I'm sure, you know, some new Chinese models will come out. But as we move from, you know, Hey, these companies are selling tokens where they provide the entire, uh, reasoning chain and all that to, uh, selling automated, you know, white collar work, right. Automated software engineer, send them the request, they give you the result back. And there's a bunch of thinking on the backend that they don't show you. The ability to distill out of American models into
1:12:04Chinese models will be harder. A B as the scale of the compute that the labs have, right. Uh, opening, I exited the year with roughly two gigawatts last year. Um, anthropic, we'll get to, you know, two plus gigawatts this year. And, and by the end of next year, they'll both be at like 10 gigawatts of capacity. Um, China has, is not scaling their AI lab compute nearly as fast. And so at some point, you know, when you can't distill the learnings from these labs into the Chinese models, plus this compute, uh, race that open anthropic, Google, et cetera, meta are all racing on at some point, they end up getting to a point where, you know,
1:12:40the model performance should start to diverge more. Um, and then all of this capex that's being spent on, you know, data centers and all that, right. Amazon, you know, 200 billion, Google 180, you know, so on and so forth. All these companies are spending hundreds of billions of dollars of capex. Um, you know, there's, there's, you know, nearly a trillion dollars of capex being invested in data centers in America this year, roughly. Right. Um, you, you end up with, okay, well, what's the return on invested capital here? Uh, you and I would think that the return on
1:13:11invested capital for data center capex is very high. Um, and at least if we look at anthropics revenues and, you know, January, they added like 4 billion in February, which is a shorter month, they added like six. Um, we'll see what they can do in March and April. Um, given compute constraints are what's bottlenecking their growth, right? The reliability of cloud code is actually quite low because they're so compute constrained. Uh, but if this continues, then the ROIC on these data centers is super high. Um, and at some point the U S economy starts growing faster and faster over the next, you know, this year and next year, because of all this capex and all this revenue that these
1:13:44models are generating, um, and downstream supply chain versus China doesn't have that yet. Right. Um, they have not built the scale of infrastructure to then invest in model, uh, to invest in models, to get to the capabilities, to then deploy these models at such scale. Right. Cause when you look at like anthropics, Hey, they're at call it 20 billion ARR of that, you know, the margins are sub 50% at least last reported by the information. So then, you know, you're at, okay, that's like 13, $14 billion of compute that it's running on rental cost wise, which is actually like $50 billion
1:14:16worth of capex that someone laid out for anthropic to generate their current revenue. Um, and China has just not done this. If, if, and when anthropic 10 X is revenue again, uh, and I think our answer would be when not if, um, then China doesn't have the compute to deploy at that scale. And so there is some sense of like, Oh, we're in fast takeoff ish. Right. And it's not like we're talking about, you know, Dyson sphere by X date. It's more like the revenue is compounding at such a rate that it does affect the economic growth. Um, and the resources, these labs are gathering or going so
1:14:50fast that, you know, and, and China hasn't done that yet. So in that case, the U S and the West is actually diverging. The flip side is actually these, these infrastructure investments have middling returns. Maybe they're not as good as, as, as hoped, you know, maybe Google is wrong for wanting to take free cashflow to zero and spend $300 billion on capex next year. Maybe they're just wrong. Um, and you know, people on wall street who are bearish and people who don't understand AI are correct. Right. Um, and in which case then the U S is building all this capacity. It doesn't get
1:15:20really great returns. And China is able to build the fully vertical indigenized supply chain, not, you know, U S Japan, Korea, uh, Taiwan, Southeast Asia, you know, Europe, all these, all these countries together, building this like less vertical supply chain. Um, and in a sense, at some point, China is able to scale past us. If AI takes longer to get to certain capability levels, then, you know, I would say the vast majority of your guests on this podcast believe it's like fast timelines, U S wins long timelines, China wins. Right. But I don't know, like, I don't know what fast timelines
1:15:53means, right? Like, I, I like don't think you have to believe in AGI to have the timelines where the U S wins. Okay. Let's go back to memory because I think this is maybe, uh, people, people on wall street and people in the industry are understanding how big this is, but maybe generally people don't understand how big a deal this is. So we've got this memory crunch as you're talking about. And earlier I was asking about, Oh, could we solve for the EUV tool shortage by going back to seven animators? So let me ask a similar question about memory. Um, uh, HBM is made of DRAM, but has three to four X less, uh, bits per wafer area than the DRAM it's made out of. Is it
1:16:29possible that accelerators in the future could just use commodity DRAM and not HBM? And so just, we can make much more, uh, capacity out of the, the DRAM we get. And the reason I think this might be possible is look, if we're going to have agents that are just going off and doing work and it doesn't really, you don't, it's not a synchronous chatbot application, then you don't necessarily need extremely high, uh, fast latency kinds of things anymore. And so maybe you can have the low, low bandwidth, uh, because the, the reason you stack DRAM into stacks and make HBM is for
1:17:04higher bandwidth. And so is it possible to go to HBM, uh, accelerators and, um, and basically have the opposite of cloud code fast, like have cloud code slow and, and do that. Yeah. I think, I think at the end of the day, the incremental purchaser who's willing to pay the highest price for tokens also ends up being the one that's like less price sensitive. And you know, the compute should be allocated in a capitalistic society towards the value, the, the goods that have the highest value and the private market determines this by willingness to pay. And so
1:17:35to some extent, um, sure. Anthropic could actually release a slow mode, right? They could release cloud slow mode and have an increase in tokens per dollar by a significant amount. Um, they could probably like reduce the price of opus four, six by, you know, four X, five X and reduce the speed by another, by maybe just like two X, like the curve on inference throughput versus speed is there already just on HBM. Um, and yet they don't, um, because no one actually wants to use a slow model. And furthermore on these agentic tasks, you know, it's, it's great that the model can run at this time
1:18:10horizon of hours. That's kind of like, okay, well, if the model was just running slower, that hours would become a day, right? Um, or vice versa, right? If the model's running faster, that hours becomes hour. Um, and yet no one really wants to move to that day long wait period because the highest value tasks also have some time sensitivity to them. Right. And, and so I'm, I struggle to see, you know, yes, you could use DDR. Um, but then there's a couple of like things that are challenging with this, right? You could use regular DRAM. Um, one is you're, you're still limited, you know,
1:18:44one of the like core constraints of chips, even though they're, you know, sort of like, you know, there's an, a chip is like a certain size, all of the IO escapes on the edges of the chip, right? So oftentimes, you know, what you see is the left and the right of the chip are HBM, the IO from the chip to the HBM is on the sides. And then the top and bottom are IO to other chips, right? Um, and so if you were to change from HBM to DDR, then all of a sudden this IO on this edge would have significantly less bandwidth, but it had significantly more capacity per chip because,
1:19:20and, and, and so yes, you're making less, um, you know, the, the, the metric that you actually care about is bandwidth per wafer, not bits per wafer. Because the, the thing that is constraining the flops is just getting in and out the next matrix. And for that, you just need more bandwidth. Yeah. Getting out the weights and getting out, getting in and out the KV cache. Right. And so in many cases, these GPUs are not running at full memory capacity. Yes. It's obviously like a system design thing, you know, model hardware, software co-design of, Hey, what do I, what do I, how much
1:19:54KV cache do I do? How much do I keep on the chip? How much do I offload to other chips and call when I need it for tool calling or whatever? How much do I, um, how many chips do I paralyze this on? Obviously these are like the, the search space of this is like very broad, which is why we have like inference X, like it's just like an open source model, like searches all the optimal points on inference for a variety of eight different chips, um, and models. Um, anyways, like the point is you don't necessarily, you're not always necessarily constrained by memory capacity. Uh, you can be constrained by flops. You can be constrained by network bandwidth. You can be constrained by memory
1:20:27bandwidth, uh, or you can be constrained by memory capacity. There's sort of like four, if you're really to simplify it down, there's like four constraints and each of these can break out into more. But in this case, if you switch to DDR, yes, you produce four X the bits per DRAM wafer, but all of a sudden the constraints shift a lot and your system design shifts a lot. You go slower. Yes. Is the market smaller? Okay. Maybe possibly, but also now all of a sudden all these flops are wasted because they're just sitting there waiting for memory. It's like, great. I don't need all that capacity because I can't really increase batch size because then the KV cache is going to take even longer to
1:21:00read. And so you never, you can, yeah. Interesting. Uh, what, what is the bandwidth difference between HBM and, uh, a normal DRAM? Yeah. So an HBM stack, uh, of HBM four, let's just talk about like the stuff that's in Rubin. Cause that's where we've been indexing on is 20, 48 bits across connected in an area that's like 13 millimeters or wide. Um, so 20, 48 bits and it, it transfers memory at around 10 giga transfers a second. So HBM, a stack of HBM four is 20, 48 bits on an area that's 13 millimeters wide, roughly or 11. And that's, that's the shoreline that you're taking on the
1:21:33chip. And in that shoreline, um, you have 20, 48 bits transferring at 10 giga transfers per second. Uh, you multiply those together and you divide by eight bits to bytes. You're at roughly two and a half terabytes a second per HBM stack, right? When you look at DDR, um, in that same area, it's maybe 64 or 128 bits wide. And that DDR five is transferring at any, you know, anywhere from 6.4 giga transfers a second to maybe eight, 8,000 giga transfers a second. So you're, you're bandwidth is like significantly lower, right? It's 64 times 8,000 divided by eight. Um, you're
1:22:08at 60, 64 gigabytes a second. Um, and even if you take a generous interpretation of 128 times eight giga transfers, you're at 128 gigabytes a second for the same shoreline versus two and a half terabytes a second. There's a, there's an order of magnitude difference in bandwidth per edge area. And if your chip is a square or it's 26 by 33, right? Is the maximum size for a chip individual die? Um, you only have so much edge area. And then on the inside of that chip, you put all your compute. Um, there's things you can do to try and change, right? More SRAM, more caching, blah, blah, blah. Uh, but at the end of the day, you're very constrained by bandwidth.
1:22:40Interesting. So, um, then there's a question of like, where can you destroy demand to free up enough for AI? Um, and, uh, and I guess the picture is especially bad because as you were saying, if it takes 4x more wafer area to get the same bite for HBM, you had to destroy 4x as much consumer demand for laptops and phones and whatever, in order to free up one bite for AI. So what, yeah, what does this imply for the next year or two of, sorry for the run on question? I think on your newsletter, you said 30% of the capex in 2026 of big tech is going towards memory.
1:23:15Yes. That's insane, right? Yeah. Like of the 600 billion or whatever, you're saying 30% is going just to, uh, just to. And, you know, obviously there's some level of like margin stacking that NVIDIA does. And so if you separate out, you know, and you apply their margin to the memory and the logic, but at the end of the day, yeah, like a third of their capex is going to memory. That's, that's so, that's crazy. Okay. So what is the question I'm trying to ask? It's something like, yeah, what is this, what basically, what should we expect over the next year or two as this memory crunch hits? Yeah. So memory crunch will continue to be harder and harder. Um, and prices continue to go up and this affects different parts of the market
More from Dwarkesh Podcast

Alex Imas and Phil Trammell – What remains scarce after AGI?
Jun 4, 20261h 16m

Reiner Pope – Chip design from the bottom up
May 22, 20261h 20m

Eric Jang – Building AlphaGo from scratch
May 15, 20262h 37m

David Reich – Why the Bronze Age was an inflection point in human evolution
May 8, 20262h 13m

Reiner Pope – The math behind how LLMs are trained and served
Apr 29, 20262h 13m