
Sergey Levine - Building LLMs for the Physical World - [Invest Like the Best, EP.465]
March 31, 20261h 6m · 14,164 words
Show notes
My guest today is Sergey Levine, a professor at UC Berkeley and co-founder of Physical Intelligence. The company is building robotic foundation models designed to control any embodied system to do any task in any environment. Sergey argues that solving robotics at full generality is the right path, and that building systems that learn across many robots, environments, and tasks may be the more scalable approach than building narrow specialists. We discuss how these models can perform new tasks without being trained on them directly, and why everyday human actions remain the hardest problems in the field. He also reflects on how human trust and acceptance may matter as much as technical breakthroughs in determining when robots become part of daily life. Please enjoy my conversation with Sergey Levine. For the full show notes, transcript, and links to mentioned content, check out the episode page here. ----- Become a Colossus member to get our quarterly print magazine and private audio experience, including exclusive profiles and early access to select episodes. Subscribe at colossus.com/subscribe. ----- Ramp’s mission is to help companies manage their spend in a way that reduces expenses and frees up time for teams to work on more valuable projects. Go to ramp.com/invest to sign up for free and get a $250 welcome bonus. ----- Trusted by thousands of businesses, Vanta continuously monitors your security posture and streamlines audits so you can win enterprise deals and build customer trust without the traditional overhead. Visit vanta.com/invest. ----- WorkOS is a developer platform that enables SaaS companies to quickly add enterprise features to their applications. Visit WorkOS.com to transform your application into an enterprise-ready solution in minutes, not months. ----- Rogo is the AI platform for finance. They're building agents for Wall Street that are trained to understand how bankers and investors actually do work: from diligence and modeling, to turning analysis into deliverables. To learn more, visit rogo.ai/invest. ----- Ridgeline has built a complete, real-time, modern operating system for investment managers. It handles trading, portfolio management, compliance, customer reporting, and much more through an all-in-one real-time cloud platform. Visit ridgelineapps.com. ----- Editing and post-production work for this episode was provided by The Podcast Consultant (https://thepodcastconsultant.com). Timestamps: (00:00:00) Welcome to Invest Like the Best (00:02:43) Intro: Sergey Levine (00:03:29) Why Bet on Generality Over Specialization (00:07:24) What if PI succeeds? (00:09:05) Pros and Cons of Humanoid Robotics (00:11:02) Timeline of Major Milestones in Robotics (00:15:47) Sergey's Personal Journey (00:18:22) Making General Intelligence Happen (00:19:57) Understanding Robot Data Collection (00:22:12) Most Surprising Discovery at Physical Intelligence (00:24:48) The Science of Common Sense (00:25:36) Long-Range Tasks in Robotics (00:27:24) Why Wouldn’t We Have A Robot in Our Kitchen by 2050 (00:31:21) Other Interesting Approaches (00:32:38) Cool vs. Useful in Robotics (00:36:48) Form Factor Innovation (00:38:22) Physical Intelligence Analogy (00:39:30) Economic Transformation from Robotics (00:40:48) Controversies in the Robotics Community (00:42:16) Arguments Against End-to-End Learning (00:42:34) Compositional Learning Explained (00:43:25) Last Tasks Robots will Conquer (00:44:30) Dark Parts of the Robotics Brain (00:47:05) What Makes a Great Researcher (00:50:15) Manufacturing and Scale Challenges (00:51:17) How Companies Should Prepare for Robotics (00:53:38) Boston Dynamics' Demos (00:55:43) Converging Technologies Enabling Robotics (00:56:47) How to Stay Up To Date in Robotics (00:59:51) Near Term Objectives (01:00:49) Confidence Level Among Researchers (01:03:31) Google's Experimentation Culture (01:04:24) The Kindest Thing
Highlighted moments
“There isn't like a humanoid problem and a car problem and a bulldozer problem and a robot bolted to the table problem. There's one problem.”
“What if we just add more data labeled with the semantic command? So basically, just take whatever the robot experienced and just label it with some semantic commands, but don't add any more low-level actions. And that actually helps.”
“LLMs make certain kinds of representations very convenient. They make it very convenient to basically turn text into other text. But that's not necessarily the best representation for what an embodied system needs to do. Sometimes it needs to think about things more spatially. Sometimes semantically, sometimes other representations.”
“getting to a particular level of usefulness so that robots can be deployed so they can do useful tasks so they can start collecting data from open world settings at scale. Because that's such a sudden event, getting past the activation energy, I think there is a lot of uncertainty about the timing of that.”
Transcript
0:00Most software companies try to maximize your time on their app to juice engagement. Ramp does the exact opposite. Ramp understands that no one wants to spend hours chasing receipts, reviewing expense reports, and checking for policy violations. So they built their tools to give that time back, using AI to automate 85% of expense reviews with 99% accuracy. And since Ramp saves companies 5%, it's no wonder that Shopify runs on Ramp, Stripe runs on Ramp, and my business does too. To see what happens when you eliminate the busy work, check out Ramp.com slash invest.
0:30Felix by Rogo is a personal finance agent that turns a single prompt into finished, client-ready work using your firm's own templates, context, and standards. Send Felix an email like, take these comments and turn them for me, or update my tracker with the context of these emails, or run the ability to pay math on this buyer, and Felix sends back finished PowerPoint decks, Excel models, and sourced research. Felix works the way your team already does, delivering work quickly and accurately around the clock. Learn more at rogo.ai slash Felix. OpenAI, Cursor, Anthropic, Perplexity, and Vercel all have something in common.
1:04They all use WorkOS. And here's why. To achieve enterprise adoption at scale, you have to deliver on core capabilities like SSO, SCIM, RBAC, and audit logs. That's where WorkOS comes in. Instead of spending months building these mission-critical capabilities yourself, you can just use WorkOS APIs to gain all of them on day zero. That's why so many of the top AI teams you hear about already run on WorkOS. WorkOS is the fastest way to become enterprise-ready and stay focused on what matters most, your product. Visit workos.com to get started.
1:34Hello and welcome, everyone. I'm Patrick O'Shaughnessy, and this is Invest Like the Best. This show is an open-ended exploration of markets, ideas, stories, and strategies that will help you better invest both your time and your money. If you enjoy these conversations and want to go deeper, check out Colossus, our quarterly publication with in-depth profiles of the people shaping business and investing. You can find Colossus along with all of our podcasts at colossus.com. Patrick O'Shaughnessy is the CEO of Positive Sum. All opinions expressed by Patrick and podcast guests are solely their own opinions
2:07and do not reflect the opinion of Positive Sum. This podcast is for informational purposes only and should not be relied upon as a basis for investment decisions. Clients of Positive Sum may maintain positions in the securities discussed in this podcast. To learn more, visit psum.vc. My guest today is Sergey Levine, one of the co-founders and researchers at Physical Intelligence. As a disclaimer, I'm an investor in Physical Intelligence because I believe it's one of the most important companies tackling the problem of robotics.
2:39As you hear us discuss today, robotics has what I would call a scarecrow problem. All of these amazing physical devices are becoming ever more possible in all sorts of cool permutations, but what they all really need is an intelligence, a brain, and that is what they're developing at Physical Intelligence. They're trying to develop foundation models that can make any physical robot do any task in any environment. The nature of our conversation today is all of the problems facing robotics and all of the promise of solving these problems across the world. I hope you enjoy this great conversation with Sergey Levine.
3:13Sergey, this is going to be a real treat and a blast to learn about possibly the most exciting, impactful area of technology being developed. Just to set the stage before we go back in time, maybe you could just define physical intelligence as you see it. Fundamentally, the goal of physical intelligence is to develop robotic foundation models that can control basically any embodied system to do any task. Broadly speaking, you could imagine that in the same way that a language model is kind of rapidly evolving towards a system that can do any task
3:43that can be expressed in language, what we would like is to build a new class of models that can do any task that can be done by a physical actuated device. Part of the thesis of this company is that we believe that doing it at the full level of generality might actually in the long run be easier than trying to special case, very specific narrow application domains. Again, in much the same way that for language models, it turned out to be easier in some ways to solve natural language tasks in their full generality than to narrowly target like machine translation or sentiment analysis or whatever.
4:14That may not be obvious why you would make that that versus a robot that just does your dishes or something. What are the key trade-offs to understand and why make the decision that he made? In the world of natural language, we saw that there were a lot of efforts to develop domain-specific solutions that tackled specific problems. Somebody would spend a lot of time thinking about how like English differs from French and then build a machine translation system. The reason that language models took over for all of those different application domains is because they can leverage much broader sources of data.
4:45It's not even as simple as saying like, oh, we have this data for this application, this data for this application, we like merge everything. It's actually more than that. It's when you can leverage weekly labeled data. In the case of language models that you just mine from the web, you actually learn more about the world. So you establish like foundation of world understanding. And then on top of that foundation, it turned out to be much more effective to build out different applications. To bring this into robotics, the calculus doesn't look quite the same because in robotics, we don't have like an internet-sized data set that we can just draw on. But this notion of understanding the world, if anything, is actually more important in robotics.
5:18Because if you have many different tasks, maybe even many different physical systems, then you can go from training individual dishwashing specialists or laundry folding specialists and instead train a model that actually understands physical interaction. People can master new skills very, very rapidly because we understand physical interaction. We can intuitively grasp what's going to happen in this new unfamiliar situation. They'll just like bootstrap things really, really quickly. If we can draw on data from many sources, many applications, many robots, then we can have a model that has a physical understanding
5:49and it'll be much, much easier to put new applications on top of that platform. What is the hardest part about building in this way for you when you see other approaches that are more maybe legible to the average person? Oh, there's a robot moving around doing this one specific thing. It looks a certain way. What's the hardest part about this approach as you're doing it? I think this has actually been kind of an issue in my whole career because when you work on robotic learning, the more general, the more this becomes important.
6:19Effective robotic learning, effective generalization, isn't actually the optimal way to have like a really exciting demo. The way to have a really exciting demo is to pick a really cool task, control everything else in the environment, like set it up so that it's perfectly clean, perfectly pristine, and just make it work in that one setting. That's the way you make a robot demo. And generalization, you can't just show it in one spot. The point of generalization is that it does something relatively mundane that any human could do, but it does it in any situation. So we had some demos that we released last April where we showed our robot cleaning kitchens.
6:53It's cool, but if you watch an individual video out of context, it's just like, okay, it's like picking up plates, like anybody can pick up plates, except that we just put it into that home just for that demo and it never had training data from that setting. So obviously you kind of have to like understand what's going on to appreciate why this is actually pushing the frontier. What is your model for the stakes of what you're doing? If you are successful, I'm curious for you to define what that would mean successful other than we cross this chasm of general physical intelligence. But if you cross that line, then what? And one of the things that I think would be really, really exciting that would be enabled
7:26by a general purpose embodied foundation model is the ability to unlock people's imagination in how they build robots and other embodied systems. Personal computers were a really big deal in my mind because it made it possible for lots of people to hack together all sorts of like really cool stuff. And there was this Cambrian explosion of like amazing applications that started in the 90s and so on, and then was further accelerated by the internet. And I think something like that might happen in the world of robotics, but it can't happen today because if you want to put together some cool new robotics application, some cool
7:58new robotics idea, you kind of have to build this monstrous stack and you need to basically solve the intelligence problem. But if there is a solution that someone can build on top of, there's a foundation model that you can prompt that'll provide like basic functionality, and then you can fine tune it a little bit or adjust it in some way to your application. Now it actually makes it a lot more tractable for lots of people, lots of companies, lots of individuals to try all sorts of different things. Sometimes we think that robots are going to be one thing. There's people, and now we're going to make like metal people, and that'll be robots. But I don't think that's how it's going to be because no technology has been like that.
8:30It's going to be more like kind of a toolkit where you can put together all sorts of like really cool applications, get really creative with it. You know, maybe I'm going to make a robot with like five arms, and this one is going to look like that. It's going to move. This one's going to hang from the ceiling and figure out kind of the right thing to tackle your domain, maybe also experiment with software. But you need the right platform on top of which to do that. And I think the foundation model can be that thing. What are the, in your mind, the pros and cons of the humanoid approach to robotics? There's a lot of value to that. There's a lot of value to capturing the imagination, and there's a lot of value getting people to think about what the future might look like in a way that's understandable.
9:03In my mind, it's one of many possible kinds of robots that we're likely to have. The challenge of intelligence looks very similar for all of these different robots. I don't think we should be tackling intelligence in the context of one specific body. I think we should handle it in a general way, because otherwise it's just really hard to get a handle on this. We need lots of data. The cool thing about being able to build robots is that ultimately they don't have to be constrained to look like humans at all. You can build the right tool for the job. You could imagine that you're building a house with a robot that is a swarm of 1,000 quadcopters.
9:36And I think that in the future, we'll have a robotic foundation model, which can then be adapted to all sorts of applications. And they might really run the gamut from bulldozers or something to humanoids to robotic arms. Maybe it would need to be adapted to each one. Maybe we need to be fine-tuned. Maybe we need something in context to understand how that body works. But the fundamentals of how you interact with objects, how things move in the world, how causality works, that's all conserved for all of these different systems. Do you have a favorite example of what might be possible with true general intelligence
10:07that might not be possible with a humanoid-only intelligence or something? There are a few things that I think are worth thinking about. One is that we can make machines that are very big and machines that are very small. This is not by any means a short-term thing, but in the long run, I think there's lots of really exciting applications in medicine and surgery where we not only might in the long run not be limited to robots that look like humans, we might not be limited to robots that can even be controlled by humans. Currently, for example, in robotic surgery, it's done entirely to teleoperations, so you need something that are person-neutral in real time with the right level of dexterity.
10:40And of course, that limitation holds for current learning-enabled systems too, but in the long run, we could imagine addressing that. Think about the most important hash marks on the timeline of robotics research that have gotten us to here. I always think it's super helpful to set the historical context before we talk about what state is today and where we're going. Could you walk us through that? At some level, doing end-to-end control for robotic systems is a very, very old idea. The first, for example, autonomous driving systems that used end-to-end learning, they existed in the 1980s.
11:11Alvin was 1986 or 87, and that was a driving system that was demonstrated to drive on highways, controlled by a neural network, and then from a camera. The neural network was tiny. There are some very venerable concepts, but historically, what has been really difficult in robotic learning is that you need a system that handles the application you want to address, that is cost-effective to train for that application, meaning that you don't need a huge amount of data for every single application you want to tackle. Handles long-tail scenarios with common sense, so if something weird takes place in the world,
11:43it needs to have a reasonable response to it. And then also, for the thing that it's actually supposed to do, it needs to be robust, fast, and reliable. And getting all those things together is very, very hard, because with machine learning, it works best when there's a lot of data. So if you sort of naively approach a robotic problem and say, like, I want to do washing dishes, the obvious thing to do is to collect, like, an enormous amount of data of washing dishes. But that's not cost-effective, because then you go on to the next application, and you go through that process all over again. Being able to train general-purpose models that can handle many tasks is essential to this,
12:14because now you need a lot less data for each new task. But then even further, and this is the thing that has probably changed the most in the last few years, you also then need to handle the unusual scenarios. For the unusual scenarios, you are probably not going to have experience. What you need to rely on is knowledge that you've acquired from other sources that you can ground in a new situation. And people are extremely good at this. If you're driving a car and there is something going on in the middle of the road, and someone put up a sign saying, don't go here, there's the gas leak or something. You've probably never experienced that before,
12:47but you can put these things together and figure out what you're supposed to do in that unusual situation, because you have common sense. This has been like a huge mystery in robotic learning world. Where do you get that common sense? And this is what's changed in the last few years, because it turns out that multimodal language models are really good at pulling in knowledge and trying to articulate that knowledge. They're not very good at grounding that knowledge in physical situations, but they know stuff. There is a path to get that common sense by essentially leveraging the knowledge that is contained in multimodal LLMs. But
13:19there's also a challenge because you have to somehow plug into that knowledge in the right way. You can't just like show it a picture and say, what would you do here? Because it doesn't have the context. It doesn't know that you're a robot. This is what you look like. This is what's going on. That's a technological challenge. And we've made some headway on addressing that technological challenge the research community in general has. But most important, it's kind of that light at the end of the tunnel. Now we have this way of pulling in lots of knowledge, which can help us handle those long tail scenarios. Are there hash mark equivalents on the timeline of AlexNet or the Transformer? Are there big major events that you think everyone will point to when
13:49writing the history books about this? I think it's very early on right now to like answer that definitively. Probably the first end-to-end learning systems, which were in the 80s, that's definitely a milestone. The first deep reinforcement learning systems, which were in the early 2010s, those are probably a milestone because deep reinforcement learning gives us a way to go beyond human level performance, which I think will be essential for robotic systems. And then there's the more recent stuff, but I don't know how that's going to shake out as far as whether that's something that people will point to. But I do think that the advent of multimodal LLMs that can be adapted to robotic
14:22control to bring in that common sense, I do think that's a really important advance. I think we're probably going to see quite a few important advance in the next few years, and maybe those will be the things people point to. As your business scales up, everything gets more complex, especially your compliance and security needs. With so many tools offering band-aids and patches, it's unfortunately far too easy for something to slip through the cracks. Fortunately, Vanta is a powerful tool designed to simplify and automate your security work and deliver a single source of truth for compliance and risk. There's a reason that Ramp, Cursor, and Snowflake all use Vanta. It frees them to
14:55focus on building amazing differentiated products, knowing that compliance and security are under control. Invest like the best listeners get a special offer of $1,000 off Vanta when you go to Vanta.com slash invest. I know firsthand how complex the tech stack is for asset management firms, and seemingly every new tool and data source makes the problem even worse, adding more complexity, more headcount, and more risk. Ridgeline offers a better way forward, one unified platform that automates away that complexity across portfolio accounting, reconciliation, reporting, trading, compliance, and more, all at scale. Ridgeline is revolutionizing investment management,
15:29helping ambitious firms scale faster, operate smarter, and stay ahead of the curve. See what Ridgeline can unlock for your firm. Schedule a demo at ridgeline.ai. Can you tell us your own personal history of approaching the problem? Maybe the origin of when you first became interested and why, and then how you've decided what to spend your personal time and attention on ever since then? So I started working in robotics in 2014 after I finished my graduate degree and started a postdoc with Professor Peter Abiel at UC Berkeley. I've worked on robots
16:01before, but I figured I should get a little bit more education after finishing my degree. And his lab worked on robots, so I tried to apply what I had learned previously to robotics before that I worked on computer graphics. The thing that I've always wanted to really figure out is how to get AI systems that get better and better the more they do things. Because I think that's tremendously powerful. If you can have a system that gets better and better the more it does something, and it just keeps getting better and there's no limit that it can master all the skills you'd want it to do. Initially, I tried to approach it in a very blank slate way. If you start with nothing,
16:36you practice a particular skill, then you get better at that skill. You can do that in a limited setting and you get something that works, but it's very hard to turn that into a general system that can work in open world settings. Because if I practice something over here and then it goes over there, now something is different and needs to practice all over again. When I worked at Google afterwards, I tried to see if we can do that, but now parallelize it across many robots. So collective learning, can you put 20 robots in a room and have them all learn together? And that
17:06works and it generalizes, but it's very hard for that to handle these tail cases, these edge cases. Now it becomes a savant of this particular task and that's all it knows in the world. The next step is what I mentioned before is combining this ability to practice skills with lots of prior knowledge. And that's actually a really, really hard problem. It's not just in robotics where it's a hard problem. I think it's a hard problem in all of AI because arguably the two big impressive results in AI over the last few decades have been gerund of AI and deep reinforcement learning. Like if you want a single example to epitomize this gerund of AI, that's like LLMs,
17:39deep reinforcement learning, AlphaGo. They're both very, very impressive and they're very impressive for very different reasons. The gerund of AI is impressive because it can reproduce some of the things that humans can do. Like it can draw pictures that look like human pictures, write text. Deep RL is impressive for the opposite reason. It does things that humans hadn't thought of. The big challenge, and this is what I'm leaning up to and what I hope to figure out here at physical intelligence, is how to combine those threads, how to bring in all of that knowledge that you get with gerund of AI, but also go beyond just human level performance with reinforcement learning.
18:13What literally have you done and are you doing to make that happen? In the past few years, we started off first by developing the basic foundations. The basic foundation is what's called a vision language action model. A vision language action model, you can think of as an LLM that has been adapted for robotic control. So the way these things are trained is they're first trained on text data. Then they're adapted with lots of image data from the web to understand images. And then they're adapted to robots with lots of very diverse robot data.
18:44That's a starting point. That's a way to take all of that web knowledge, get it into a model that can control robots, and get some interesting behaviors out of it. And then from there, we studied two threads, how to get this thing to handle unusual situations with common sense, and how to get it to improve with reinforcement learning. The way you get common sense is by essentially using chain of thought. The robot enters a scene, and instead of directly starting to move, it thinks about what it was asked to do. So if it was told, clean up the kitchen, looks at the scene and says, based on this, I should pick up the plate. And then it goes and does it. So that unlocks all this prior knowledge,
19:20because those intermediate inferences benefit from the web-scale pre-training. That handles edge cases. And then the reinforcement learning part comes in after you've practiced it a few times, and you keep getting better and better at the task directly through your experience. For example, we had this demo on making espresso. That system practiced making those espressos many, many times, and used that to improve robustness, improve speed, improve throughput. And we're not done with that. I think there's a lot more to do there, but we have the starting point. The robot data itself. Is the right way to think about it, I'm looking at the
19:51gen one of these things. I see a camera here, maybe there's some sensors somewhere else. Is it effectively the data being gathered by various sensors strategically placed on the robot at different parts? Yeah. Something I'll say about sensors is that I think you can actually get away with less than one might think and still do quite a lot. This platform here has three cameras, one on each wrist and a base camera. It doesn't have touch sensing. It doesn't have force sensing. It's very bare bones and very low cost. I'm sure that more sensors could make it better, but a good learning method can actually compensate for deficient sensing fairly well.
20:23The wrist cameras are essentially a touch sensor in disguise because you can see local deformations when you touch something. If I think about the analogy to the expert systems of the 80s and 90s in basic AI, to the lesson that scales all you need and the sort of counterintuitive nature of that, that you're not teaching it any specific thing, just blasting it with data. And there's this reservoir of internet data. Talk about how to create the reservoir of data needed for this. So I don't think anybody really knows how much robot data is needed to have truly generalizable and powerful embodied AI. My sense is that we actually
20:55don't need to know. What we need to do is get to the point where these systems are useful enough that they can go into the world and gather more data themselves. Tesla doesn't worry about how much data their cars can collect. If anything, it's the other way around. That's a little too much data. The key is not so much to quantify. Here is exactly the price tag of getting the ultimate robot data set. The key is to get a system that can go into the world that's useful enough that does a wide variety of different things and they can keep pulling in more data. You brought up the example of Tesla, the beautiful system of a thing that's useful without
21:27the AI to begin with because the human drives it and it gathers data. Why then not start with your best guess at something that's useful as a single robot to have the same sort of flywheel thing happen? I think it's a good idea. And do you think that's an approach that you'll pursue? I don't think that there's like one right answer, right? So I think there are some domains where deploying a system under human control makes a lot of sense. There's some domains where deploying a partially autonomous system is very reasonable. That's kind of domain dependent because robots aren't just one thing. Some people might not want a robot in their home that is
21:57constantly being controlled by a person offsite, but maybe for some applications that doesn't matter. If you mark the start of physical intelligence through today, what has been the most surprising thing to you that you've discovered or the nature of how the research has gone? One of the things that's been surprising to me is that I think we've made a lot more progress on dexterity than I thought we would. We had good reason to believe that if we just collect more and more data, that just steadily gets better. What was surprising is that we could also get these systems to perform very dexterous behaviors without really doing anything particularly special for that. The same,
22:32by the way, also applied to getting systems to work on different embodiments where we could get our models to work on all sorts of other robots, including robots with multi-fingered hands, robots with different numbers of degrees of freedom. Obviously, we needed to get data and we needed to fine-tune the model, but the model itself didn't need to change. It didn't even need to be told through any kind of prompt what the robot was. That was also surprising to me because I would have thought that we would need some fancy techniques to adapt the system to faster, more dexterous, more complex tasks, and also to different kinds of embodiments. But
23:03it actually seems to generalize pretty well. I'm always interested in the spectrum of capabilities and especially where the systems today are more advanced than you think people would probably expect and where they're less advanced than people might expect. This is something that's always been very tricky to understand in robotics. There's this idea that roboticists always talk about called Morvix paradox. It's actually true in all areas of AI, but especially in robotics, this is a big deal. We kind of have a cognitive bias to think that things that are easy for us will be easy for
23:33the machine. Solving calculus problems is difficult for most people. Picking up a cup is easy for most people, so we think, oh, machines should be able to do this. But it's actually the other way around, that there are things that are easy for us because they have to be, otherwise we wouldn't survive. We're very good at spotting the tiger in the jungle because the people that weren't so good at it got eaten by the tiger and they're not around anymore. Because of that, we have this cognitive bias and we think that there are things that should be very easy, but they're actually very difficult engineering challenges. However, something that is changing is that machine learning slightly changes that equation. Programming something by hand to pick up any cup anywhere, that's difficult.
24:08Getting a machine learning system to do it, if you have data for it, it's actually not that difficult. And I think increasingly what we'll see is a shift where domains where collecting data is straightforward, they actually end up falling into the easy bucket over time, even if they are physically intricate. But there will be domains where collecting data is difficult, where you need to use more common sense, where you need to reason at multiple levels of abstraction, connect physical skills that you've learned in other areas to knowledge that you've got from the web. And those will be tough, and that's where we'll need more technology advances. What is the science of common sense? When we say that,
24:43what does that mean? For the purpose of robotic learning, we can think of it as applying semantic inferences, using knowledge learned from other domains, to the current physical task at hand. You can think of common sense as the opposite of muscle memory. So muscle memory, like if you play a sport, you practice something a lot, you hardly think about it, you just kind of do it on autopilot. Common sense, in my mind, I don't know if this is the conventional definition, but I think it's a reasonable definition, is when you know something to be true because
25:13you saw it, or you read about it, or you heard it. And now you are in a situation where that fact is highly pertinent to what you need to do. And you are able to make that connection, apply it to your situation, grounded in the environment that you're in, and make the right decision. One of the other differences that's so interesting to me is people that have used, everyone's used chatbot now. You query it, you get an answer, query it, get an answer. We're now seeing what happens with cloud code and other things where you give it something complicated, and it's able to do a very long without failing. What's the similar thing, long range, in robotics?
25:45It's something that we're working on quite a bit right now, and the methodology is not that different at some level. The way that our models work now, as I mentioned, is they use this kind of chain of thought process to reason about the task. When you have that, you can actually do very long horizon tasks. You can have a robot that goes and takes out all the dishes from the dishwasher, puts them in the correct cabinets, wipes down the counter, all that kind of stuff. The interesting thing here is that we found, maybe about six months ago, that our models had gotten to the point where they could
26:16be improved just from supervising them with high-level instructions. You take a robot, you put it in a new kitchen, you ask it to clean the kitchen. It gets to work, and then it fails somewhere. So now, okay, what do you do? Well, you add more data. Traditionally, what we would do in that situation is add more teleoperation data to cover a wider range of kitchens. But what we tried kind of on a whim is to see, okay, well, what if we don't add more teleoperation data? What if we just add more data labeled with the semantic command? So basically, just take whatever the robot experienced and just label it with some semantic commands, but don't add any more low-level actions. And that
26:50actually helps. That actually improves the ability to generalize. So what that means is that the bottleneck had actually shifted from the lowest level, meaning the robot's ability to physically do the task, to this middle level, where now the system is more bottlenecked by its ability to interpret the scene and select the correct next step, which can be supervised with language. That's a big deal, because now that means that someone can literally talk to the robot. It's coaching, basically. Yeah, exactly. And make it better just by talking to it. We were in 2050, and there's no robot in my kitchen doing my dishes for me. What do you think the most
27:21likely explanation is for it not having gotten there by that point? I can think of a few reasons. My suspicion is that there is a long tail of challenges that has to do with the interaction of technology and people. Autonomous cars aren't that different in this regard, where getting to a level of comfort with deploying autonomous vehicles on the road was a significant challenge that ran in parallel with getting the technology to that level. Early Tesla self-driving was a bit controversial, because it wasn't perfect. There was a question, like, are people
27:55comfortable with this level of imperfection? Probably there are some tasks for robots where people will be comfortable with something that's not perfect, something that needs to learn from its mistakes. There are some areas where people will not be comfortable. Are you comfortable with occasionally breaking your dishes? Maybe in a few years it will stop breaking those dishes, but maybe in the meantime it's not quite there. Are you comfortable with a robot like that in a home where there's, like, small children? Maybe not, and that's okay. I think that figuring out how those factors interact and what that means for the timeline and for how these systems get better with experience, I think that's a tricky question. And I think it needs to be approached
28:27very carefully with a lot of sensitivity. There may be some domains where it makes a lot more sense for these systems to be deployed and strapped and collect more data, and maybe other domains require more care. Can you imagine a purely technical explanation for why something might not work? I think the place where I would see the biggest technical risk is dealing with the breadth of different situations. If we were talking about a well-defined but slightly chaotic environment like cleaning hotel rooms or assisting human cooks in a restaurant, I have, like, a very
29:01good sense for how to get that under control. If you're imagining a robot going into a home, one place where I can anticipate a challenge is that there are a lot of other unexpected things that can happen, and you need a system that's very good at inferring what's going on and adapting to it or reacting intelligently. And I think we have a lot of ideas for how we can approach it, but that is the hardest part of the problem, because when you're in a situation where just about anything could happen, and you're controlling, like, a physical device that affects the world around it, then you really need to get
29:33things right, at least at some level, pretty much in every case. Like, it doesn't mean that you always have to succeed, but it doesn't mean that you always have to do something sensible that people are okay with. And I think there are a lot of really good ideas for how to do that, but that is probably the most challenging part of the equation. If I go back to thinking about the right model to think about the physical intelligence approach to doing this whole exercise, help me make it as simple as possible. So one might be, we're going to build a whole variety of different kinds of form factors to do a whole variety of different kinds of things and mash all this data together and start to, you
30:03know, experiment with how we can, on evals, make it better. Is that just the simplest way of doing it? Is there an even simpler way? And I'm asking because I'd love to then contrast it with some other approaches that you're interested in that you're not doing that others are doing. In my mind, the most important thing to get right is to get the system to be general. In particular, to get it to be general with respect to how it can be improved. For example, hand-designed robotic controllers are not very general with respect to how they can be improved because it requires like a human engineer to go in and improve it. A learning-based perception system is more general because
30:36all it requires is human labelers to go in and label more data. A system that learns autonomously from data that it gathers through its own experience is even more general because you don't need the human labelers. The key is this generality, particularly with respect to improvement. And the decisions we make are to a very large extent centered around that. I don't know if the correct design for robots to have three cameras. I don't know if it needs like a touch sensor. I think we're very agnostic to that. I think we'll try a lot of those different choices. I'm not even sure if in the long run, it's going to have a language model. Maybe we'll have some other kind of model that's trained on
31:08very diverse data. The key is this level of generality. What other approaches are the most interesting to you? One thing that's like a very important question in this area and something that I think the research community and the tech community has not fully answered is the dichotomy between different data sources, particularly with respect to real data and simulation. It's a very controversial topic. I have a very strong opinion about it, but I think that it's worth acknowledging that if we look, for example, at humanoids, we've seen videos of humanoids doing all these acrobatics. There's a
31:43particular pipeline that makes that work, which is very heavily reliant on simulation and very light on real world data, often actually zero real world data. And then there are the approaches that work well for robotic manipulation that often are the opposite. They often use very little simulated data, often use large amounts of real world data and very large foundation models. And it is surprising that in these two robotic domains, the dominant approaches look so different. It may be that one will win out and there is a particular approach that can handle everything in the long run,
32:17or maybe there's some sort of synthesis of these ideas that's important. I don't know the answer to it. I have subjective opinions. I think the approach we're taking is a very good one, but I think that it's interesting to look at that and see why is it that these things are so different. Can you talk about the contrast between cool and useful? The Boston Dynamics robot is very cool. The backflip is super cool. Inverting the body, it all looks really good. I don't know what I need that requires a robot to do a backflip. So I'm curious how you think about optimizing around cool
32:48versus useful. The strategy we've taken is subject to the constraint that it's useful, make it as cool as possible. We make decisions first and foremost based on our assessment of what will drive the tech forward towards this truly general, broadly applicable robotic foundation model. But in doing that, we try to stress test it against the toughest challenges we can throw at it. The toughest challenges are the ones that look cool. We didn't set out, for example, to build a robot that can make espresso or can fold laundry. But in the process of building these general systems, we figure like these would be
33:22particularly challenging, particularly exciting things to try with them to see how far we can push them. Can you talk about the Robot Olympics? There was a gentleman named Benji Holson who used to work at Everyday Robots, part of Alphabet before it dissolved. He spends a lot of time thinking about tasks that robots could do. So he wrote a really interesting blog post a while back. There was this Robot Olympics that was held in China where robots would like run around on a track and jump and so on. But maybe these aren't the real challenges you should worry about. How about a Robot Olympics centered around essentially everyday
33:54tasks that people do? That's kind of more of a paradox thing where tasks that people find really easy but their robots struggle with. And he had things like opening a door, washing a frying pan with grease on it, using a plastic bag to pick up dog poop, things that people don't find particularly challenging but that no current robotic system can do. And he listed maybe a dozen of these things. This wasn't part of a concerted research project. We had developed processes and systems for just ingesting new tasks that we wanted to use for all sorts of tasks. And we figured, okay,
34:24a good way to test this is to say, okay, here's a big list of tasks. Let's just go through this process that we've developed and see if it works, basically. So it's almost like a test of our internal operations and model training system. And we tried these things and it actually turned out that we could solve almost all of them. Well, there's one we couldn't do, which was turning a dress shirt inside out because the grippers on this thing wouldn't fit inside the sleeve. So we probably need to change the gripper. I think on a technicality, we didn't succeed at peeling an orange because he said, do it with the fingers and our fingers weren't strong enough. We had to use like
34:55a little tool, like a little knife, basically. Everything else we could do. If anybody watches those videos, one thing that I think is important to keep in mind is we didn't like develop anything special for this. We literally use this as a test of our task onboarding process. There's something interesting there because it suggests the power of generality, that when you have this general system, you can't really just like onboard all these crazy tasks without really doing anything particularly sophisticated. I was curious before when you said superhuman ability, dexterity or something like that, where we're limited by what we can do or maybe by what we can control, even if it
35:28gets smaller. What are some of the other dimensions that we might surpass human ability on in terms of physical ability? What are the other trend lines? So here's a fun one. We were working on a task where our robot had to plug in things like power cables or ethernet cables or something like that. When a person does this, obviously, if you practice it a lot, you'll get really good at it. But when a person does this without having practiced a lot, you pause frequently, right? Because it's not a physical thing. You just have to pause what's going on. You have to make sure that it's like all aligned and all that
35:58stuff. So you do it very slowly. And if you're teleoperating a robot, you do it even more slowly because there's this level of indirection. It turns out to be like pretty straightforward to go in and find all those pauses and remove them. You can speed things up further. So you can get to a task where a person demonstrates what it means to succeed. And then you can have the robot practice the task and succeed in the same way, but a lot more quickly, a lot more efficiently. The most general way to do this is with reinforcement learning. But there are also like some simple tricks you can do that if you just want speed. So that's like one example of something where you can have a machine that does it a lot better.
36:29You know, at some level you have like a processing bottleneck. Like that's why the person does it slowly because they have to process what's going on. But speeding up processing is something that people understand quite well in computer science. There's this amazing Michael Crichton novel called Prey, where it seems like for a given problem, there may be an optimal or set of optimal shapes of the robot to perform the task. And that where you should do is analyze the problem, then have something that can almost like morph or transform into the right form factor. How do you think about that? The innovation on the form factor side rather than the data and model side.
37:04I think that in general, in robotics, the ability to innovate on form factors has been very constrained because of the AI challenge. If you have a traditional AI pipeline, like you're doing some motion planning and stuff like that, it's hard to just go and cobble together some new robot because when you do that, you have to like characterize the dynamics of a system. You have to do SIS-ID. You have to build up all this stuff. If you could just put together a robot in your garage, load up a robotic foundation model, and tell it to do a bunch of stuff, like maybe it won't be
37:35perfect at it. Maybe it needs more data really perfected, but you can at least like get the thing moving. I think that can be a really powerful engine just to get everybody to experiment with this stuff. I don't think that I'm like the right person to design the perfect robot. There are people here, of course, who are a lot better at that. But in general, I think that it's just like with personal computers. So I think the key is to let people experiment and play around with it and just radically lower the barrier to entry for that. Then we'll see a lot of creativity. When we first started using computers, there was a limited number of form factors. Now you can have a computer in your phone, a computer in your car, embed a computer in your refrigerator. They're everywhere and they're
38:07very different. Generality, good software, good foundation on top of which you can build applications. Those are key to enabling that. Your co-founder, Lockie, once described to me the feeling of physical intelligence for a human is like learning how to ride a bike. Like there's that moment when you didn't know how to do it and then you do know how to do it. And that feeling is physical intelligence, that snap of understanding. There's actually a physiological explanation for this. There are studies that were done in monkeys using tools. And you can actually find where in the brain which neurons activate for the monkey to figure out where its hand is. It turns out that
38:39if it's using a tool, they activate based on the location of the tool tip, not based on location of the hand. The tool being an extension of your body is a real physiological thing. Like your brain literally does that. Knowing that, what does that do to impact the approach to your research? It says that physical intelligence should be at some level agnostic to embodiment. So that a good foundation model should figure out how to manipulate whatever body it's controlling, whatever tools it has at hand. There's basically one problem, not many different problems. There isn't like a humanoid
39:11problem and a car problem and a bulldozer problem and a robot bolted to the table problem. There's one problem. And if you solve it as full level of generality, that's really, really powerful. We're in the early stages of seeing some of the job and other sorts of transformation in businesses, in the economy, et cetera, that LLMs make possible. Certainly we've seen it in engineering. How do you think about what might happen or what you hope will happen when we're at a similar stage, whenever that happens to be for robotics, where all of a sudden we have this thing that's general, that's useful. The world's very efficient at deploying these things. People
39:44are creative. Where do you expect to see the world start to change most in the early days? I really don't know. I don't think anybody would have been able to predict how the LLM stuff evolves and people would have guessed, but this is why I keep coming back to this idea that maybe the key is to let people try lots of things. One of the really amazing things about applications of LLMs is that they are really accessible and somebody could put together a really cool new prototype that under the hood is just prompting chat GPT or something, but they can experiment with it. They can try it out, see what it does. And there's an amazing power to having lots of smart people rapidly iterating and
40:19prototyping lots of things. That's a lot of why physical intelligence has really put a premium on engagement. We've open-sourced our models. We would like to engage with lots of other companies that are building robots because we all see a lot of power in this effect of having many people trying out lots of things. What are the major controversies in the robotics community? To me, a controversy, someone gets in an argument with me at a conference, but I can tell you that the kind of arguments that I found myself in, and it's kind of an interesting trajectory, that
40:49in the early days, the main argument I would have with people is, does learning have a place in robotic AI? I think part of why that was often a controversial point is that in a traditional engineering pipeline, robots do look very different than software artifacts. They're physical. They can affect stuff around them. There are safety considerations. There are a lot of weird situations they can get into, and it took a really long time for the robotics research community to really internalize that you don't necessarily need to
41:24program in things like knowledge of physics. You don't necessarily need a physics simulator inside your robot when it's planning. We can actually have a learning system figure all that stuff out. That was a very controversial thing for a very long time. I think at this point, there's a lot of acceptance that learning is a really important part of robotics, but I don't think there's still universal acceptance that end-to-end learning is the right way to go. Basically, I don't think there's universal acceptance of the bitter lesson. The bitter lesson says that you should not program the machine to think the way you think it should think, but you should let it learn from data.
41:54And that is not a universally accepted idea. I think there's good arguments against it, but I think that in the long run, if we want that generality, especially generality in the machine's ability to improve, then we need it to primarily be learning from data. What is the good argument against? My best attempt at steelmanning this is that if you want something reliable in a really complicated open world setting, then you can't afford not to use what you already know about the physical world. And we've got textbooks full of this stuff, so why don't we just plug in what we know from the textbooks? What is compositional learning? Can you describe that?
42:27One of my students, he had this idea where he asked a language model to provide a recipe for how to make a sandwich in international phonetic alphabet. International phonetic alphabet is these symbols that they use in a dictionary to explain how to pronounce a word. And it's very peculiar because it only ever appears for individual words in a dictionary. You never see freeform text written in international phonetic alphabet. But if you ask a good language model, it will write paragraphs in IPA for you. And that is compositional generalization. That means that you have never seen
42:58this particular language, this particular alphabet used to write paragraphs, but you understand paragraphs, you understand that it's compositional with different alphabets, so you can solve the problem. You can imagine the same thing coming up in robotics, that you've learned a repertoire of skills, and now you can combine and mix those skills and apply them to solve new problems. It makes me wonder what the last type of tasks you think will be possible for a robotic system to achieve. I think changing a child's diaper will be really, really hard. This really is just Morvix paradox all over
43:29again, that people are extremely good at certain things. We're very good at physical things. We're also very good at interacting with other people. And it makes sense. We have to be. That's a lot of our existence. So things that involve behaviors that interact with other people, where you have to help somebody, I think that's a lot harder than people appreciate. Elderly care, taking care of small children, I think those things are going to be hard, and they're probably going to be harder than people think. And the stakes are very high. It's not just that. The stakes are high in many places.
44:00It's just that it's probably the pinnacle of something that fools us into thinking that it's easier than it really is. We are so evolved for interacting with people and doing things physically. If you're helping somebody get up the stairs or get out of bed, you don't have to think very carefully about how you're going to do that. So I think it's really the pinnacle of Morvix paradox. If I think about an LLM as a brain, and now it's effectively studied everything, I don't know how else to put it. And then I think about a robotics model's brain instead. What are the
44:30dark parts of the brain? What has it not been able to study? What are the areas that have just been really difficult that matter, but have been hard for us to get into? One of the things that people are remarkably good at is using physical analogies to understand other situations. I don't know whether this is something that LLMs can or can't do, but it is something that people use a lot. They use it in everyday life, and they also use it for very sophisticated problems. So for example, you could say, that company has a lot of momentum. That's a physical analogy. You know exactly what
45:01it means. I don't have to explain that statement to you. But if you actually think about that, it is quite a complex thing. There's like a lot riding on that word momentum. There is an interview with Richard Feynman where he talks about teaching, but he talks about analogies that he makes in regard to subatomic particles. And he says, we use like the word spin. The thing is not really spinning. It's not like a spinning tom. But all those kind of analogies help us make sense of it. And not just in a way that allows explaining concepts, but it actually leads to conclusions. It actually leads to inferences, and those inferences actually make sense.
45:33We are so primed to interact with the physical world, so primed to have physical intelligence that you can use it in everyday speech by saying that company has a lot of momentum, and you can use it when advancing fundamental theoretical physics. That's kind of remarkable. I don't know if LLMs can do that. Maybe they can. But I think that really understanding physical interactions, causal structures, all that kind of stuff, there is something special about that. And it's clear it's something that people get a lot of mileage out of.
45:58Your finance team isn't losing money on big mistakes. It's leaking through a thousand tiny decisions nobody's watching. Ramp puts guardrails on spending before it happens. Real-time limits, automatic rules, zero firefighting. Try it at ramp.com slash invest. As your business grows, Vanta scales with you, automating compliance and giving you a single source of truth for security and risk. Learn more at vanta.com slash invest. Ridgeline is redefining asset management technology as a true partner, not just a software vendor.
46:29They've helped firms 5x in scale, enabling faster growth, smarter operations, and a competitive edge. Visit ridgelineapps.com to see what they can unlock for your firm. Every investment firm is unique and generic AI doesn't understand your process. Rogo does. It's an AI platform built specifically for Wall Street, connected to your data, understanding your process, and producing real outputs. Check them out at rogo.ai slash invest. The best AI and software companies from OpenAI to Cursor to Perplexity use WorkOS to become enterprise-ready overnight, not in months. Visit workos.com to skip the unglamorous
47:03infrastructure work and focus on your product. I'd love to talk about the role of researchers and the actual people doing the research. In LLM world, it's fairly shocking how few people are at the global scale responsible for basically all the progress in LLMs. Someone like Ilya is an example. What is that like in robotics? How many people in the world are truly impacting this trajectory? And then I want to ask what good research means. I think those questions are often very hard to answer about science because I think that
47:33we sometimes have a tendency, especially when we look at history, to underline particular milestones. And certainly in machine learning, this is the case. Alex Knot was a big step forward. That's true. But I think it's also important to remember that these advances, they happen because lots of people are trying lots of things, and even some of the failures are actually very instructive. I complained before a little bit in a low-key way about the controversy around end-to-end robotic learning, but I don't know if robotic learning would have advanced the same way if it were not for the controversy, so to speak. It is true that you can look through the list of
48:08successes and mark down that like, oh, like these folks have a history of repeatedly hitting home runs. But I think in reality, in the scientific community, it's not just the home runs that are responsible for progress. And even some of the failures and even some of the bad ideas are very instructive and pushing towards the good ideas. That's fascinating to think about. The example you gave before is so interesting, where the research insight was like, just give it some coaching, and it gets better. It seems like that sort of insight can be very powerful and high leverage, which makes me wonder, like, what have you learned about what makes for a great
48:39researcher? Research is definitely different from engineering, because in research, the important thing is to get an answer to a question, which often requires cutting some corners. One of the most delicate decisions in research is when do you try new things, or is when do you stick with what you're already trying? That's very, very delicate. It's very, very hard to figure that out. And if you get it wrong, then you can miss something really remarkable. If you get it wrong, you don't stick with something for long enough. You might be like right there, you might be about to get to
49:09the answer, and then you stop just short of it. That's terrible. Or you could get stuck hammering against something that's never going to give way for years, deciding when to turn a little bit and look this way and that to open yourself up to more opportunities, versus when should you keep hammering on the thing because you're about to get the solution. That's often the most important decision. And some people have an instinct for getting that right. That counts for a lot. You've obviously been in and around and are great researchers. What are these people like as people? How do they tend to be distinctive from the average person?
49:40I think they're just the same. I have a very hard time thinking of a single set of personality traits. There is no constant, basically. There might be a commonality in that to do effective science, you have to be very passionate about that. But even that passion can come from many different places. I've worked with people that were remarkably effective that are just driven purely by the desire for novelty. They don't give a damn about what their technology does. They don't give a damn about whether it's useful. They just want cool new ideas. I've also worked with other people that really want to solve a particular thing, and they're just as
50:11happy building stuff as they are testing out experiments, as they are hammering away at things, whatever it takes. You mentioned that there's research and engineering, which also makes me think of manufacturing. Elon would be fond of saying that the factory is the product. The hardest part of this whole equation is actually the scale up of whatever this thing ends up looking like, making 100 million of those. How do you think about that part of the equation, or is it too remote at this stage? I think it's an important part of the equation. I'm not sure it's like the part of the equation that we most need to figure out right now, but it's certainly part of it. A lot of how
50:46I prefer to think about this is to figure out the hard part and then enable a lot of experimentation on the other parts. Making a robot at scale is difficult. Making a robot at scale is even more difficult if you don't know what kind of software is going to run on it afterwards, and you're not even sure whether it's the right kind of robot. One of the really valuable things we can get out of general purpose AI tools like robotic foundation models is the ability to get a lot of the other stuff figured out so that at least some of the uncertainty goes away, so that when you scale things up, you have some confidence that this is really going to work.
51:19A lot of people that listen to this are entrepreneurs, people that run companies. A very popular question has become, how should a traditional company begin to think about using LLMs or preparing itself for the ongoing improvement of these models? How would you answer the same question for robotics? How would you encourage companies to think about this? The technology is changing so rapidly. I want to illustrate why this question is difficult with an example. Here is a particular uncertainty about the tech. Will the robots rely more on demonstrations or on
51:52reinforcement learning from autonomous data? We're working on both of those things, and they're clearly both important, but how somebody should prepare for the technology will be pretty different if they're expecting that they need lots of teleoperation to produce lots of demonstrations, a little bit of autonomous experience versus the opposite, like a tiny number of demonstration and huge amounts of autonomous experience. Is it 90-10 or 10-90? That's something we're hopefully going to learn about over the next few years, but it does change the correct approach pretty dramatically. That's kind of a case study of how changes in technology will dramatically alter this.
52:24From a business standpoint, is the right way to think about it get really clear on the economics of the labor in your business? I'm curious how you think about that, the way that this will change the nature of labor itself. Coding tools are like a really nice example to look at for a template of how this might work. It's not like coding tools came on the scene and suddenly we don't need software engineers anymore. It's that the coding tools increase the productivity of individual software engineers. There's some amount of work that needs to be done to make sure that people are able to use
52:55them. There's some amount of technology development that needs to be done to make them useful for the appropriate use case. And these things are co-evolving and they're also still changing. Coding agents are different than code completion tools and so on. But I think it's like a nice template for us to look at to see how AI tools combine with people doing a job, increase their productivity, and also raise new challenges. And I think we'll actually see something like that with robotics too, that a more realistic template. It's not like the humanoid goes in and the people just leave. There are some aspects of the job that can be done by a robot. Some that can be done with a robot working together
53:28with a person. Some that can be where the person needs to like do something special to make the robot more productive. Some where it's the other way around, where the robot does something that makes the human more productive. And it'll be this kind of dance that we've seen with coding tools. Do you have a favorite robot that's not part of what physical intelligence is doing? And if so, why? It could be anything. It could be a factory robot. It could be an Optimus, a Boston Dynamics. I do really like the Boston Dynamics robot, especially the new version of the Atlas, because it is in some ways very human-like and in some ways very not human-like. They made some interesting
54:00decisions about how they want more range of motion on the joints so it can do some pretty cool things. It's also a very agile robot, which is really cool. It makes for those awesome demos. So I'm a big fan of that. I'm generally a big fan of like everything that Boston Dynamics has done. Should or could anything be read into the fact that Boston Dynamics has been doing very cool demos for a very long time and don't actually do anything useful for customers? I think it's also a fair question for lots of robotics companies, to be fair. There is a lot of value in demos that serve to illustrate challenges on the road to something
54:31useful and productive. Obviously, you can also do a demo without being on the road to something useful and productive. There is value in demos. I think that demos that are used correctly and service to a mission can provide people with an illustration of what to expect. And they also provide a challenge. You just have to be like honest in setting up the challenge. How much do you think about the business endpoints? To this point, Roomba is like the best-selling robot of all time in the consumer category, which is surprising. And of course, we might be on the edge of some sort of Cambrian explosion. But how much of your cycles do you spend thinking about
55:04this is the shape of a product that might result from this that maybe is the way we bootstrap away all this data? It's just something that's very hard to reduce to like a very concrete answer right now. It's not too bad to like think about a space of possibilities. A lot of what we're doing when we develop our models, when we experiment with different tasks, when we do demos like the Robot Olympics, underneath we're kind of prototyping what does it look like when we try to do something real with this to different degrees of real and what goes wrong. It is something we think about a lot.
55:34It's not something that I have even close to like a concrete answer to, but there's a space of possibilities. And a lot of what we actually are planning to do in 2026 is also experiment with different things in that space. When you study the history of general purpose technologies, which certainly this would be a major one if it comes to fruition, you often find this constellation of things happening around that thing that enable it. LLMs are a direct complement to what you're doing. Are there any other surprising technology areas or trends that help you do what you do, but are different?
56:06Robotics hardware has become dramatically more affordable over the last few years. When I started working in robotics about a decade ago, I worked with a robot called a PR2, which I believe had a cost of about $400,000. When I started my lab at UC Berkeley, I used a robot that was in the ballpark of $30,000. Now each arm on this thing is maybe a tenth of that. We think that can be even less. That's not due to like any one single technology. It involves both hardware and software. So the kind of low-cost arms that we have here, they wouldn't be useful in an industrial setting because traditional
56:41control methods that rely on a great deal of precision wouldn't be able to use them. And I think that does make it a lot more practical to think about general purpose robotics today. For people that would want to be fairly technical about following major milestones that are happening in this field, where does that information show up? What do you read to stay informed about what's going on or watch? So a lot of it shows up in research papers. Research papers, unfortunately, are not a very accessible source of information because it takes a bit of care to like sort through everything and figure out what is the signal and what does something really mean. Research results are
57:16sort of intended for an audience that already understands the starting point from all the past research results. Robotics, and I think technology in general is one of those things where the public-facing artifacts, the demos and the videos that somebody might post on social media, are often actually not very good for providing a sense for the true underlying state of things because they're sort of meant more as a demonstration at the edge of capability and grounding that what does the demo really mean requires digging deeper. Probably research papers are the way to go. Sometimes even worse
57:47than that, you have to actually go talk to the individual people and find out what the inside story really is. And maybe that's not a great situation to be in, but that's kind of how science works. As we look forward to the future in your mission, what feels the most uncertain? I do think the timeline is uncertain. If anything, my sense of the timeline has gotten more optimistic since we started, but it's uncertain because of the nature of the technology. This is something where there's a bootstrap challenge, getting to a particular level of usefulness so that robots can be deployed so they can do useful tasks so they can start collecting data from open world settings at scale.
58:21Because that's such a sudden event, getting past the activation energy, I think there is a lot of uncertainty about the timing of that. That's exacerbated by the fact that the timeline looks different depending on what kind of technology is deployed. The example I gave before about whether it should be data collection through teleoperation or data collection with autonomous systems or something in between, maybe shared autonomy, maybe like this coaching kind of thing, those all sort of change the picture in terms of how deployments work and how in-the-wild data collection works. So because of that, I do think there's quite a bit of uncertainty.
58:52You're in such an interesting position because you're at the center of research. Lots of different kinds of people are talking to you, asking you questions. What are questions that you're surprised people don't ask you? Well, I think the question you asked earlier, actually, about how somebody should prepare, there's a variant of that question, which would be something like, if I want to start using autonomous robots for a thing, what should I start setting up? Should I set up operations? Should I modify my task in some way so it's more accessible? Should I design new hardware? Maybe I should design new
59:22hardware so I can plug your software into it. And I think people make a lot of assumptions about that. For example, one assumption is machine learning requires data, so let me just figure out something that will collect data. That's not often the best assumption because you need the right kind of data. Maybe some data is easy. It's easy to get videos of people doing something, but that doesn't mean that's the right kind of data. And it might be domain dependent. It might be dependent on the thesis about the technology that will succeed. So I think that people do make a lot of assumptions about that. Not that I necessarily have a better answer for them, even if they ask me, but it's something where there's a big space of possibilities. We talked about these big, uncertain, long-term
59:55timelines. What is the very next thing you are trying to solve? A big focus for us right now is actually better understanding this mid-level reasoning part of the problem. Because we think that we have a pretty good sense for how to acquire the low-level physical behaviors, but getting those low-level physical behaviors to generalize requires bringing to bear a lot of this common-sense knowledge. The representation of that might be really important. So LLMs make certain kinds of representations very convenient. They make it very convenient to
1:00:26basically turn text into other text. But that's not necessarily the best representation for what an embodied system needs to do. Sometimes it needs to think about things more spatially. Sometimes semantically, sometimes other representations. And trying to figure out exactly how to structure that internal thinking process might be a very important question. The answer to that question might be different in the world of embodied foundation models than it is in the world of LLMs. So that's a concrete thing that we're working on now. If I could somehow get the 100 most informed and active robotics researchers in the room at once and
1:00:58poll them on how certain they are that things will have unlimited capabilities and how soon that might happen, where do you fall in that distribution? Probably I'm on the optimistic end when it comes to established robotics researchers and on the pessimistic end relative to robotics entrepreneurs. I understand the entrepreneur part for sure. You're optimistic by nature. Why are you on the optimistic end of the researcher community? Robotics is a very long history, which has precious few successes, especially when it comes to robotic AI.
1:01:30So I think if we're being honest about it, most robots that are out there doing useful work are still running state-of-the-art technology from the 1980s because the robotics problem is hard. Not our fault. It's just a difficult problem. Because of that, I do think that there is good reason for caution to say that, well, maybe we've made a lot of headway on this part of the problem, but there's like many other problems that still remain. Part of why I'm optimistic about this is that I kind of have like a sense of what has proven tough for me before, and I can see a lot of the puzzle pieces that I'm imagining could be slotted in to address many
1:02:00of those things. As my co-founder Carol likes to say, when you've climbed the mountain, only then do you see if there's another mountain after it. In robotics, there's been a lot of experience of lots of mountains. Some caution is justified. Given that endurance is required, who or what most inspires you? Boston Dynamics. I think there's like a lot of things that we can debate on the technology side, but there is a lot of value in repeatedly showing something that people wouldn't have thought possible, even if there's all sorts of caveats and assumptions and so on.
1:02:35And certainly in robotics, whatever we might say about demos and whatnot, like I think it's very fair to say that people have revised their thoughts about what's possible from seeing some of that stuff. I think I'm also inspired by organizations that create an atmosphere for experimentation. There are some research labs that have done a very good job of this. OpenAI has historically done a great job of this, of creating an atmosphere where individual researchers can experiment with things and be empowered to see those things through. ChatGPT was basically John Schulman's
1:03:08pet experiment for a while. It wasn't a concerted corporate strategy with lots of spreadsheets and pie charts. It was a pet project. I think there's something pretty inspiring about organizations that empower people to have pet projects turn into world-changing successes. Certainly one of the aspirations that I and my co-founders have here at Physical Intelligence is to provide some of that to the best of our ability. It's hard to do. I feel like Google used to have that one day you can do whatever you want thing. Is that the spirit of it? I was absolutely shocked when I started working at
1:03:41Google at the level of leverage that I felt I could have. One of the projects that I did with many of my colleagues there in 2015 was colloquially referred to as the arm farm. So we took a couple dozen robots, put them in a lab, and had them collect data. I found out from somebody that they had a warehouse full of robots that nobody was using. I asked Jeff Dean and Vincent Mon-Hook if we could stick them in a lab. And I was just thinking like, okay, they're not going to take me seriously. I was a level four research scientist. Jeff was like, yeah, let's do it. What do you need? I just remember feeling like, wow, I had never in my life thought that I'd have that leverage. I mean,
1:04:16I was very young at the time. That's very special. And I think getting to a place where people can unlock their creativity and have that kind of agency can make for a very remarkable place. My friend Jesse has this great question, which is for companies that you're not involved with, which one do you most hope succeeds and why? People say boom a lot because they want to fly places faster. Increasingly, as I've asked this question, people have said bye because the sheer impact that it might have if you're successful is massive on such a global scale. And it's been
1:04:47really fun just to hear about all the ins and outs of how you're thinking about the problem and attacking it. When I do these interviews, I have the same traditional last question for everyone. What is the kindest thing that anyone's ever done for you? It's a tough question to answer because I do think there are many moments in my career where I got a leg up on something. I think I have the kind of personality where I sometimes don't appreciate in the moment and only reflect on it afterwards. The three moments in my career that stand out, actually one of them I'd already mentioned to you, which was the arm farm thing. I'm especially grateful to Jeff and to Vincent for willing to take that on me and my colleagues.
1:05:20And there are a couple other moments. When I started my postdoc with Peter Beal at Berkeley, I had zero robotics experience. I had done virtual character animation and computer graphics. I felt like that was a bet on my potential more so than my actual accomplishments. And there was another moment even earlier on, I got an internship at NVIDIA that got me to like experience some cool stuff when I was just like a sophomore. And I think the hiring manager for that also took a bet on me. And I think that these kinds of things, they really matter in a person's career. And I think that at the moment I should have been more grateful, but certainly in hindsight,
1:05:51it's something that made a big difference. And hopefully I can make that difference in other people's careers as well. Well, I've learned so much from you and your co-founders and so much today. Thank you so much for your time. Thank you. If you enjoyed this episode, visit Colossus.com. You'll find every episode of this podcast complete with hand edited transcripts. You can also subscribe to Colossus, our quarterly print, digital, and private audio publication featuring in-depth profiles of the founders, investors, and companies that we admire most. Learn more at Colossus.com slash subscribe.
1:06:21You know how small advantages compound over time. That's true in investing and just as true in how you run your company. Your spending system is your capital allocation strategy. Ramp makes it smarter by default. Better data, better decisions, better economics over time. See how at ramp.com slash
1:06:56invest. As your business grows, Vanta scales with you, automating compliance and giving you a single source of truth for security and risk. Learn more at vanta.com slash invest. Every investment firm is unique and generic AI doesn't understand your process. Rogo does. It's an AI platform built specifically for Wall Street, connected to your data, understanding your process, and producing real outputs. Check them out at rogo.ai slash invest. The best AI and software companies from OpenAI to Cursor to Perplexity use WorkOS to become enterprise ready overnight, not in months. Visit workos.com to
1:07:27skip the unglamorous infrastructure work and focus on your product. Ridgeline is redefining asset management technology as a true partner, not just a software vendor. They've helped firms 5x in scale, enabling faster growth, smarter operations, and a competitive edge. Visit ridgelineapps.com to see what they can unlock for your firm.
More from Invest Like the Best

Gavin Baker - Watts and Wafers - [Invest Like the Best, EP.473]
May 20, 20261h 16m

Krishna Rao - Anthropic's CFO on Compute, Scaling to $30B ARR, and the Returns to Frontier Intelligence - [Invest Like the Best, EP.472]
May 13, 20261h 16m

Brian Chesky - AI Founder Mode - [Invest Like the Best, EP.471]
May 5, 20261h 15m

Paul Tudor Jones - Lessons From 50 Years in Markets - [Invest Like the Best, EP.470]
Apr 28, 20261h 6m

Dylan Patel - The Infinite Demand for Tokens, Claude Mythos, and Supply Constraints - [Invest Like the Best, EP.469]
Apr 23, 202645 min