Why Cerebras CEO Andrew Feldman Built The World's Largest Computer Chip

May 21, 202651 min · 9,508 words

Open in Steadcast for Mac Apple Podcasts Overcast

Show notes

Size is the name of the game for the AI chipmaker Cerebras: Their chips are truly massive, about the size of a dinner plate. According to Andrew Feldman, CEO and founder of Cerebras, that is about 58 times larger than the average chip. That sheer size enables blazing fast inference for AI queries. Feldman joins us on the week of his company's IPO to talk about his core product and how it fits into the AI boom. We discuss the history of the GPU, competition between open-and closed-source models, the company's relationship with with TSMC, and more. Read more: Nvidia Tells Skeptical Investors That AI Is Ready to Go Mainstream Trump Set to Sign AI Cybersecurity Directive as Soon as Thursday Only Bloomberg - Business News, Stock Markets, Finance, Breaking & World News subscribers can get the Odd Lots newsletter in their inbox each week, plus unlimited access to the site and app. Subscribe at bloomberg.com/subscriptions/oddlots Subscribe to the Odd Lots Newsletter Join the conversation: discord.gg/oddlots See omnystudio.com/listener for privacy information.

Highlighted moments

“by going to wafer scale, we could use this fast memory. Now, we couldn't make that memory store more information per square millimeter, but we could add square millimeters.”

Jump to 7:36 in the transcript

“the GPU has a characteristic that as you try and go faster, the cost and the power used per token increase. Sort of like as you go faster in your car, your miles per gallon decrease.”

Jump to 16:11 in the transcript

“You invent technology that has been unbuildable, never been invented for 75 years in the history of compute. You write software that is extraordinary. You built a product that is vastly faster than the incumbent. And what are we all constrained by? Buildings.”

Jump to 22:29 in the transcript

Transcript

Introduction to OddLots

0:00OddLots is brought to you by VanEck. For years, investors basically forgot about real assets, energy, gold, and infrastructure. But look what's driving markets now. Central banks loading up on gold, massive capex cycles, currencies doing weird things. These assets are at the center of it. RACS, the VanEck Real Asset ETF, is an actively managed one-stop shop for real assets, spanning gold, commodities, natural resource equities, and more. Go to vanEck.com slash R-A-A-X pod to learn more.

0:30Fun disclosures later in this episode.

AI Discussion

0:32So there's a lot of noise about AI, but time's too tight for more promises. So let's talk about results. At IBM, we work with our employees to integrate technology right into the systems they need. Now, a global workforce of 300,000 can use AI to fill their HR questions, resolving 94% of common questions. Not noise, proof of how we can help companies get smarter by putting AI where it actually pays off, deep in the work that moves the business. Let's create smarter business, IBM.

1:02Everyone has been there. Your team's feedback is scattered across emails, chats, and sticky notes. It's a mess. But PDF Spaces and Adobe Acrobat gives you one collaborative workspace to streamline every file and comment. So if you need six departments to finally agree on a proposal, do that with Acrobat. Need to turn a mountain of feedback into one plan of action? Do that with Acrobat. Want to stop searching for files and finally get everyone on the same page? Do that, do that, do that with Acrobat.

1:35Learn more at adobe.com slash do that with Acrobat.

1:42Bloomberg Audio Studios. Podcasts, radio, news.

Podcast Introduction

1:46Hello and welcome to another episode of the Odd Lots podcast. I'm Jill Weisenthal. And I'm Tracy Alloway. Tracy, I have to say, unfortunately, I don't have AI psychosis. I'm certain of that. Debatable. I'm pretty sure I don't have AI psychosis. I do have to say, unfortunately,

2:17like the amount of time now where it's like it feels like AI related questions and there's many of them are sort of like swallowing up the other thoughts that I have in my head of whether it's questions about which model is best and why and what are the economics of inference and how much training is pre-training versus post-training for each model. Like it's just sort of like this blob that's growing that's taking up more and more of my thoughts. What is your definition of AI psychosis?

2:49Because one would argue that maybe thinking about AI literally all the time would be a form of psychosis. Well, let's just say like I'm not the type who thinks that like I don't like think that the AI is a friend for one thing. I'm not in love with the AI models. I don't think that in collaboration with ChatGPT that I'm stumbling on unified theory of physics and things like that. So like... But you do spend a lot of time inputting instructions, pressing the button and seeing what comes out.

3:20And seeing what comes out. I'm just saying I think I'm aware that I'm talking to a machine and that we're not establishing any great breakthroughs of which we are collaborators and partners and friends. Recognizing you have a problem is the first step towards healing, Joe. So seriously, though, there's a good reason to think about AI more and more, which is that a huge chunk of not just the market, but the real economy is now revolving around AI, right? Totally. So anyway, again, within the AI conversation, there are a lot of subcategories.

3:50One of the subcategories happens to be another of Lot's favorite topic, which is chips. Of course, chips are used in multiple different ways. Chips are used in different parts of the AI supply chain, different types of chips in different roles. And so we have to learn more. We have to learn more. And I have to say, I'm particularly interested in the company we're about to speak to, partly because the two things I know about them are, number one, they just had a huge IPO, right? Raising something like $5.5 billion at kind of insane multiple.

4:21I can't even do a price to earnings multiple because they're not profitable yet. But I think just on a sales basis, it was like 67 times forward earnings, which is pretty juicy, pretty hot. And the second thing I know about the company is they make giant wafers. Yes. Which is just a fun image to have in your head. That's right. So if you were thinking, it's like, okay, there is a hot entrant in this space. What is their differentiator? Well, one fact about them is their chips

4:51are just enormous about the size of the dinner plate. One might think you're reading an Onion article, but in fact, it's real. And apparently it actually has some real technical advantages. And it's different to what everyone else is doing. So everyone else is, I guess, doing this sort of like modular networking thing where you get together a bunch of chips and you connect them together and that's how you get more compute, more memory, more power, basically. But this company has done something different in the form of the giant wafer. The giant wafer. And if you figure that to get maximum performance,

5:23you sort of want to lessen the distance between things than put it all on one wafer. Anyway, we're going to learn a lot more. I'm very excited to say about giant wafers and more. I'm very excited to say we do have the founder and CEO of Cerebros on the podcast, Andrew Feldman, truly the perfect guest.

Interview with Andrew Feldman

5:40So, Andrew, thank you so much for coming on the podcast on the week of your IPO. Well, thank you so much for having me. What a pleasure. Absolutely. Why don't you just start us off the big giant chip? They're apparently real. They're as big as a dinner plate. What is the technical reason why this actually makes sense as a superior form of architecture for at least some aspect of AI? Well, I think larger chips process more information in less time. Okay. And that produces faster results.

6:11And everybody had gone to bigger chips. NVIDIA had moved from 400 square millimeters to 800 square millimeters over the course of five or six years for this exact reason. And in the compute industry, wafer scale, which is building a chip this big. There it is. By the way, for those who are just listening, Andrew is now holding up the chip and yes, it actually looks bigger than a dinner plate to be honest. But that is a big chip. That's a big chip. That's a big chip. It's beautiful.

6:42It's 58 times larger than any other chip that had ever been built. Wow. And what it did was it allowed us to use a different type of memory. Okay. A type of memory that, at the beginning, there are two types of memory. There's memory that can store a lot, but it's really slow. Okay. And there's memory that can't store very much per square millimeter, but it's blisteringly fast. Okay. And historically, all graphics processing units used this memory

7:12that could store a lot, but was really slow. And that's the reason they do inference so slowly. So if you're using Claude right now or you're using anything but ChatGPT, what you'll frequently feel is you'll enter your prompt and you'll wait for an answer. Mm-hmm. Right? And that's because the memory is slow and they have to move a ton of information from memory to compute. Now, by going to wafer scale, we could use this fast memory. Now, we couldn't make that memory store

7:43more information per square millimeter, but we could add square millimeters. And so by building this big chip, we were able to stuff it to the gills with this fast memory. And that's why we're 15 times faster than the fastest GPU. That's why on some problems we're 50, 100, even 1,000 times faster than graphics processing units. Wait, can you explain how you actually managed to do this? Because I know there have been previous attempts to do wafer scale

8:14and I seem to remember there was even like an early attempt in the 1980s or something to do it. How were you able to pull this off? Yeah, it was an ambitious undertaking, that's for sure. Every previous effort in the 75-year history of our industry had failed, including Gene Amdahl, who's sort of on the Mount Rushmore of compute in our industry. He failed sort of spectacularly in the mid-80s at a company called Trilogy. Not only that, but after we succeeded,

8:45people who had visited us, who'd been in our labs, tried to copy us, and they also failed. And so what we were able to do is solve a set of really fundamental problems. And those problems cut across a wide swath of technology. They cut across lithography, so we had to collaborate closely with TSMC and they turned out to be a great partner. We had to make inventions in material and packaging. That's how you put a processor, how you put a piece of silicon on a motherboard,

9:15deliver power and I.O. to it. We had to make inventions in power delivery, right? When you build a giant chip, you're going to deliver way more power to it than if you do a chip the size of a postage stamp. We had to invent ways to cool it. We had to write new types of software that ran on it. All of these had never been done before. And it was a decade-long process. It took us five years and about $500 million to deliver the first one. And it's been

9:46an extraordinary run since. In December, we signed a deal with OpenAI north of $20 billion, one of the largest contracts ever signed in Silicon Valley. And then in March, we signed a deal with AWS where they would deploy our systems in their data centers, in their AWS data centers. And so, it's just been an extraordinary run, but it took a long time. It took extraordinary engineering. And there were certainly long periods of time when it wasn't clear we were going to make this work.

10:17Obviously, you've hit this remarkable milestone. You have, in fact, IPO'd and so forth. And right now, markets valuing your company at $64 billion early days of the IPO. Just for the listener to understand, the chips are, are they solely an inference as opposed to, you know, in training? When we think about AI, I think about, okay, there's training, training the model, and then answer giving. That's the inference. Are the chips just for inference? So, a couple things. I think you framed it exactly right.

10:48Training is how we make AI. Yeah. And inference is how we use AI. And so, what happened was that in sort of 2025, in the first part of 2025, the models we made were smart enough to be useful. Yeah. And there was an explosion of use. And we use AI by doing inference. So, there was this sort of tidal wave of demand on inference. And that has continued in 2026, and we think it will continue for years and years to come. And so,

11:18that's the, what had happened. In 2015, when we began thinking about the company, we knew that AI was on the horizon and it would eat a huge amount of compute. Right? And we made sort of two fundamental bets. We bet that it would need dedicated silicon.

11:38And, right, graphics had needed dedicated silicon. That's how you got the graphics processing unit. Yeah. Mobile compute had needed dedicated compute. That's where you got ARM processors. We made that bet and we made a bet that modifying the GPU architecture wouldn't be right. You needed to start with a clean sheet of paper. And so, what we started with was a new vision and that vision could do training and it could do inference and it was orders of magnitude faster at both.

12:08But right now, what we're seeing is such an explosion in demand for inference that a lot of the business this minute is inference even though we're just as fast at, you know, the same amount faster than GPUs on training. That's interesting. Maybe we'll get more to the theoretical training market a little later. Just real quick on inference. Ben Thompson, who writes a newsletter about tech, he wrote a piece in which he distinguishes between answer inference and agentic inference. So, answer inference is like,

12:39you know, format my resume or whatever or write me an essay on X or Y or answer some questions and then agentic inference is like, okay, here's this thing that's going to go around. Do you distinguish and do services for you? Not producing visual answers. Do you distinguish between those two? Is that a real divide in your view? And can your chips do both? Our chips can do both. I think it is a divide. Okay. I think speed matters equally in both. Okay.

13:09I think if you are engaged with the AI, if you're writing code, which is agentic, if you're writing code or you're doing work, nobody wants to wait. I mean, we could just turn the question around and say, well, how big is the market for slow search? Zero. How big is the market for dial-up internet? Zero. Why is that? Because nobody wants to wait. Right? So, if you're engaged with the AI, speed is of the essence. But if the AI is doing agentic work and your competitor

13:41gets three times, five times, ten times as much work done in 20 minutes than you do, you're going to get smoked. And so, this notion somehow that Ben proposed that speed isn't very important in agentic flows is dead wrong. That speed is important in all aspects of productive work. And that your ability to get more done in less time is a fundamental advantage that accrues over time. Right? If,

14:11while your competitor is doing one unit of work, you can do three. And in the next time they do one unit of work, you do six. Sure. Right? This adds up over time and you beat them in any line of work. And so, speed, which is sort of our specialty, is important across the board. What do giant wafers and speed in general actually mean for, I guess, the economics of tokens? Because one way I think about it, I have this sort of vision in my head,

14:42like, okay, if I'm out shopping for toothpaste, I know I need toothpaste every once in a while and I go into like a CVS, a store, I get one thing of toothpaste and then maybe a week later I get some more toothpaste. Or, I could go to Costco and buy a giant thing of toothpaste and take it home probably at a cheaper cost. And that's sort of how I think of the giant wafers. Maybe it's a bad analogy. But what does speed actually mean for the cost of tokens? Well, I think there are a couple observations. I think people

15:13have chosen so far to price speed a little higher. For example, Anthropic offered a premium service in which they offered tokens twice as fast and charged six times as much. And they sold it out and they couldn't meet the demand. Now, just to give you an idea, we're 15 times faster than they're twice as fast. And so, people value speed because it allows them to do more work.

15:43And they value their time. And when you can do more work in less time, you are making people more productive. That's why people have chosen to price them at a premium. They don't cost more to make. In fact, the GPU architecture is an extremely good architecture and extremely efficient at building very slow tokens. And if you don't mine slow, the cost per token on a GPU is extremely low. But the GPU has a characteristic

16:14that as you try and go faster, the cost and the power used per token increase. Sort of like as you go faster in your car, your miles per gallon decrease. Right? So, what happens is as you try and get fast enough to be useful, fast enough to be interesting, fast enough to keep users intelligence focused on this product, they become extremely expensive and extremely power hungry.

16:44And so, the question is not just what people are paying for a token, what people are choosing to price them at, but what they actually cost to make. And GPUs make very slow tokens very cheaply and they're unbelievably expensive at fast tokens. We make fast tokens vastly less expensive than GPUs and we use a tiny fraction of the power. data centers need

17:23electricity, AI needs copper, reshoring needs steel, and Gold's Run may tell you something about how the world is repricing money and debt. All of those point back to real assets. The Rax ETF is an actively managed one-stop real asset shop from gold to commodities to natural resource equities, adjusting as conditions change. Visit vanek.com slash R-A-A-X pod to learn more. And investors should consider the investment objective, risks, charges, and expenses of the fund carefully before investing to obtain a prospectus and summary

17:54prospectus, which contain this and other information. Visit vanek.com. Please read the prospectus and summary prospectus carefully before investing. Rax is distributed by Vanek Securities Corporation Distributor. Being a small business owner isn't just a career, it's a calling. Chase for Business knows how much heart and effort go into building something of your own. That's why they make business growth their priority. The Chase team takes the time to understand your mission, where you are now, and where you want to go. Their broad range of solutions is designed with you in mind

18:25so you can bring your ideas to life. From banking to payment acceptance to credit cards, you can conveniently manage all your business finances all in one place with their digital tools. Looking for tips and advice? Their online resources are always available to give you the solutions you need to help your business thrive. See how your business can get stronger and go farther with Chase for Business. Learn more at chase.com slash business. Chase for Business. Make more of what's yours. The Chase mobile app is available for

18:55select mobile devices. Message and data rates may apply. JPMorgan Chase Bank N.A. Member FDIC. Copyright 2026. JPMorgan Chase and Company. Support for the show comes from Public. Lately, it feels like there are two types of investing platforms. Some are traditional brokerages that haven't changed much in decades, and others feel less like investing and more like a game. Public is positioned differently. It's an investing platform for people who are serious about building their wealth. On Public, you can build a portfolio of stocks,

19:26options, bonds, crypto, without all the bugs or the confetti. Retirement accounts, yep. High-yield cash, yes again. They even have direct indexing. Public has modern design, powerful tools, and customer support that actually helps. Go to public.com slash market and earn an uncapped 1% bonus when you transfer your portfolio. That's public.com slash market. Add paid for by Public Holdings. Brokered services by Public Investing member FINRA SIPC. Advisory services by Public Advisors, SEC Registered Advisor.

19:57Crypto services by ZeroHash. All investing involves risk of loss. See complete disclosures at public.com slash disclosures. Let's say we stipulate

Market Share and Growth

20:06that this is all true and everyone wants the fastest and everyone's like, you know what? This is the solution that the Cerebras technology, one big chip, this is really where it's at. How much of like your market share for the inference market when you look out next year, the year after, et cetera, how much is your market share going to be dictated by your ability to get capacity at TSMC FABs? How much is that a gating mechanism

20:36for growth? You know, TSMC is a huge part of the supply chain. Yeah. But we have some real advantages. There are three areas right now that are limiting vendors in building AI computes. Number one is HBM memory. It's this memory we described earlier that can store a lot but it's really slow. That's made by three companies approximately, Samsung, Hynix, and Micron. And it's under unbelievable supply pressure. It's extremely difficult

21:06to get. There are very long lead times. It's unbelievably expensive right now. We don't use it. The second part that's limiting is a process inside of TSMC called COOS. And this is a process that NVIDIA and other GPUs use. We don't use it. The third thing is that at TSMC the factory that is under most pressure is their three nanometer factory. We don't use it. We use five nanometer. So we have managed

21:37to avoid some of the most binding supply constraints. Now, TSMC still has to give us a meaningful allocation. And they've been an extraordinary partner from the get-go. And they are the greatest manufacturing company on earth by far. A fab is sort of a modern pyramid. It's an unbelievable thing. And I highly recommend you or any of your listeners, if you get a chance to go to Taipei, go and see them. They are just extraordinary. Can you do fab tours? You can. Oh, I didn't know that.

22:08You can do fab tours. You can go and they have a museum of innovation and it is an extraordinary thing. They are the sort of the national champion of Taiwan. Yeah. But I think today TSMC has given us as many wafers as we've needed. Business today is constrained by data centers. And that's the grand irony, right? You invent technology that has been unbuildable, never been invented for 75 years in the history of compute. You write software that is extraordinary.

22:39You built a product that is vastly faster than the incumbent. And what are we all constrained by? Buildings. Right? Data centers right now are everybody's constraint in the entire industry. Powered buildings. So real estate. It is an amazing thing right now. And that is true sort of across the board. And that will not change for the next 15 or 18 months for sure. I mean, since we're talking physical constraints, I guess I should ask you, we did an episode about helium recently.

23:10Oh, yeah. A helium shortage given the situation in the Strait of Hormuz. And one of the things that helium is used for is lithography on semiconductor chips. Has that affected you at all? Or is that something that you're monitoring? We monitor, but there's not a lot we can do. And there's plenty of stuff to worry about that we can't affect. We obviously are in communication every day with TSMC. We're in communication with our entire supply chain every single day. And we stay abreast of the various issues.

23:42But it has had no impact on us. And we put that in the bucket of things that our manufacturing partners worry about also and that we can't help. You know, so in addition to manufacturing these chips, you actually, I didn't realize this. You have your own cloud. We do. Or you have your own cloud services. We do. I have a bunch of questions about that. But you have your own cloud services through which a user can actually get access to various open source models and so forth.

24:14It looks a little bit sort of visually, it looks a lot like the open router interface, roughly the same environment, except it's all like the open source. What I'm, something I'm curious about, and maybe you can speak to this, you know, in traditional software, open source, one nice thing about open source is you don't have to pay for it. So it's free. It's a little bit different when we're talking about, there's no really such thing as like free AI software

Why Cerebras CEO Andrew Feldman Built The World's Largest Computer Chip

Show notes

Highlighted moments

Transcript

Introduction to OddLots

AI Discussion

Podcast Introduction

Interview with Andrew Feldman

Market Share and Growth

More from Odd Lots

Deutsche Bank's Ozan Tarman and Aditya Singhal on Understanding the Macro Risks

Why the Price of Oil, Beef, Electricity, and Everything Else Makes No Sense

Stripe's John Collison on How Agentic Commerce Will Reshape the Internet

Why SocGen's Albert Edwards Sees Double-Digit Inflation Coming Back

Martin Wolf on the 'Terrifying' Superpower That the US Wields