The Model Eats the Scaffolding: DeepMind's Logan Kilpatrick & Tulsee Doshi on 3.5 Flash, Omni & More

May 20, 202659 min · 12,216 words

Open in Steadcast for Mac Apple Podcasts Overcast

Show notes

Logan Kilpatrick and Tulsee Doshi of Google DeepMind join for a first-ever in-person episode recorded just days before Google I/O, covering headline launches like Gemini 3.5 Flash, the Omni video generation model, and the new Gemini Spark agentic product. The conversation digs into Google's strategic decision to lead with cost-adjusted efficiency over raw capability, how DeepMind now ships a full agent harness rather than bare models, and technical questions around context window limits and knowledge cutoffs. They also explore how the team thinks about model psychology, AI welfare, and recursive self-improvement. Sponsors: Brave Search API: Brave Search API gives AI agents a fast, independent search index for research, RAG pipelines, images, places, and fewer hallucinations. Get $5 in free credits at https://brave.com/search/api/?mtm_campaign=q2-26-cognitive-revolution Sequence: Sequence handles the full revenue workflow for complex pricing, from quoting and metering to invoicing, revenue recognition, and collections. Book a public demo at https://sequencehq.com and use code COGNISM in the source field to save 20% off year one Roboflow: Roboflow is an end-to-end visual AI platform that lets you turn raw ideas into fully deployed applications in just hours, powering breakthroughs like Blueprint Pro's floor-plan understanding tool. Read the full Blueprint Pro story and see how over a million engineers are building the next wave of visual AI at https://roboflow.com Claude: Claude by Anthropic is an AI collaborator that understands your workflow and helps you tackle research, writing, coding, and organization with deep context. Get started with Claude and explore Claude Pro at https://claude.ai/tcr

Highlighted moments

“even when we tweak the model and, you know, hurt latency, we actually see that play out in our live experiments on search and the app, even if the model is hugely better from a quality perspective, because what you're asking users to do is wait.”

Jump to 12:25 in the transcript

“basically every 12 to 18 months now, like you have to rewrite everything from scratch. And so, the best case is like, you don't want, you know, n number of teams rewriting everything from scratch every time the paradigm shifts.”

Jump to 34:28 in the transcript

“a lot of the frontier here is going to be actually on how you smartly use context. So thinking about like compaction and like, what are the right ways to like find the right elements of the context and bring them into the model.”

Jump to 51:55 in the transcript

Transcript

Introduction

0:00Hello, and welcome back to The Cognitive Revolution. Today, after some 340 episodes, I am very excited to share the first episode that I've ever recorded in person with fan favorite Logan Kilpatrick, member of technical staff at Google DeepMind, and Tulsi Doshi, senior director and head of product for Gemini Models. The occasion for this conversation is Google's annual IO event, where they're launching the new Gemini 3.5 flash model, all sorts of agent infrastructure and AI product integrations, and plenty more. We recorded on Friday, May 15th,

0:36just a couple days before the event. And while many at Google, including my brother Craig, who's giving a keynote on Wednesday, were working overtime to polish their demos and presentations,

Google's Confidence

0:46the overall vibe, at least compared to the rest of the AI space, was one of relatively relaxed confidence. And why not? From 2024 to 2025, Google grew annual revenue by $50 billion. as much as Anthropic is pulling in today. And they still have 25% of all global compute, the deepest pool of research talent anywhere, and the most comprehensive AI portfolio of any company, with top-tier positions not just in language models, but also self-driving cars, medical and life sciences,

1:18and robotics. So after discussing the headline launches that they're announcing this week,

New Launches

1:24which also include a new video generation model called Omni, which they hope will create a nano banana moment for video, a new and improved and more agent-focused anti-gravity, and a product called Spark, which will bring more agentic functionality to the main consumer Gemini app, I really wanted to take a step back and dig in on Google's overall AI strategy and philosophy. We discussed their decision to lead with the Flash model, and more generally to emphasize the cost-adjusted performance Pareto frontier, whereas Anthropic and OpenAI are clearly much more focused on competing to have

1:58the single most capable model in absolute terms. We talk about how DeepMind is no longer shipping models in isolation and leaving it up to product teams to figure out how to use them, but instead

Agent Harness

2:09now providing a robust agent harness, which should help elevate and standardize AI experiences across Google's vast product surface. We get into the weeds on questions like why context windows seem to have mostly stopped growing, why Gemini model's knowledge cutoff is now more than a year ago, and whatever happened to that diffusion model line of work. Perhaps most importantly, we discuss how the team at Google relates to the AIs they're creating, how they're thinking about things like model psychology

2:39and welfare, and their views on recursive self-improvement, which, as you'll hear, is

Recursive Self-Improvement

2:45definitely a part of their plan, but not something that they seem to be so singularly focused on as other AI leaders. Overall, I think this is a great window into the thinking that underlies Google's AI research and product development, which has clearly sustained the company's historic run far beyond the point that many analysts had written them off. With that, I hope you enjoy my first ever in-person conversation with Logan Kilpatrick and Tulsi Doshi of Google DeepMind.

3:15All right. Well, we are here live at Google headquarters in the library at Gradient Canopy, the first ever in-person recording of The Cognitive Revolution. Logan Kilpatrick and Tulsi Doshi, welcome. Thank you. This is an honor. I didn't realize this was the first in-person episode. Yeah. 350 plus. And it's all been from my home office in Detroit until today. That's awesome. Well, thank you for being here. This is a crazy space, especially around Io. It's a zoo.

Google HQ

3:42Yeah. It's always a good time here at Google HQ. So you may or may not remember the no-moats memo. We've just passed the three-year anniversary. It was May 5, 2023. And in the intervening three years, Google has added $3.5 trillion in market cap, which is more market cap than all but two other companies in the world. Those two are NVIDIA and Apple. So the moats, I'd say, are holding up.

4:13Here we are at Io, and I'm sure there are going to be some exciting new things that will be deepening

New Products

4:18the moats. So first question, tell me, what are we launching this week to try to deepen those moats? A lot. So a lot of exciting stuff. So let's see. Let's start with some of the modeling side of things, because that's really exciting. We have our 3.5 series coming out, starting with 3.5 Flash at Io. We're really excited about 3.5 Flash, because I think Flash does this really awesome job of being at the sweet spot of being really smart, while also being really fast and really cost effective. And so Flash is incredible. It is like three times faster than other of the large models.

4:58It's significantly cheaper for being able to still drive these like really awesome magentic encoding workflows. And we've been using it internally a lot, which has been really fun to kind of see that play out. So that's one big piece, which is 3.5 Flash, I'm really excited about. Um, we're also releasing Omni, Gemini Omni Flash, um, which is a video generation and editing experience. Um, what's really exciting about Gemini Omni in general is it's our push towards being able to bring all modalities in and all modalities out. Um, and the first way this is

5:33really manifesting is in this video editing context. Um, so you're going to be able to make really awesome, uh, videos, you're going to be able to put your own avatar into the videos, uh, which is going to be awesome. I've been having a bunch of fun playing with that too. Um, we're continuing to upgrade anti-gravity and bring more into the developer experience. So Logan can talk more about the developer experience overall, but 3.5 Flash and anti-gravity are really going to come together to build, uh, build something great there too. Um, and then we're in a, in about a week or two,

6:04um, coming soon is, uh, Gemini Spark, um, which also builds on, uh, 3.5 Flash to build kind of more agentic experiences into the Gemini app. So we've got, we've got a slate of cool things coming. Yeah. I think beyond the models, I think the other headline of the story is just like agents, agents, agents, like, you know, the meme of Sundar from two years ago or last year, he's saying AI, AI, AI all the time. I feel like this year is agents, agents, agents, agents. Um, and I think it's, it's cool to see like this, I think this is the first year where we have,

6:34um, and actually I think this is not just us, but just like ecosystem wide, this like model harness product symbiosis that's sort of taking place. Like the model is sort of trained with the harness. The harness is powering the agentic product experiences, Gemini Spark in the, in the Gemini app sort of being one example of that. The, uh, it's powering the vibe coding stuff in AI studio. It's powering the agents API for developers. It's powering, I think there's something else maybe that it's also, maybe not, or it'll roll out to other products across Google and sort of be this sort of like foundational layer to build on top of, um, which is really exciting. So I think

7:08not just like developer products, but our, our sort of consumer products. Um, and I think probably even more widely in the future across the rest of the Google product suite. Um, and I think there actually, there's one interesting thread of this, which is historically like Google didn't have this like through line of, you know, something that carries across all of our products. I think then, then it was Gemini and then sort of all of a sudden every Google product has Gemini and sort of getting them all stitched together and making all those products experiences great. And I think now you're seeing that again with the anti-gravity agent harness and sort of as, as products become agentic

7:42by default, you now have the anti-gravity agent harness being another through line through all of our products. Um, which is really interesting. Yeah. And I think what's been really fun from the modeling standpoint there is one thing that we did with Gemini three that we're really continuing, I think with the 3.5 line is really bringing the model to every, all of our products. So, you know, 3.5 flash will be in Gemini app. It will be in AI mode in search. Um, it's also powering anti-gravity. It's also powering agentic experiences in AI studio, um, in, you know, Gemini spark. And so I

8:13think really this idea of how do we build a model, uh, and then how do we build it in partnership with the harness such that it actually works across all of these product surfaces, which actually have very different, you know, users and very different goals, I think is actually really, really awesome. It's really hard too. I think, I think it's actually gotten harder to do. I think it's like, it's almost like it's, it was maybe tongue in cheek. It was like kind of easy before because you just launched the model on a couple of services and sort of, it wasn't that bad. I feel like, I feel like now it's like you're, you're sort of, you have the constraints of like the very wide array of

8:45Google products that are just like for totally different users. And sort of, I think actually that like credit to the model team, sort of like trying to, you know, find the fine line for all these different places. Cause we're not just building for search. We're not just building for developers. We're not just building for cloud customers. It's not just for the Gemini app. It's like all of them at the same time, which is just exceptionally a lot, a lot of work to pull off that story on a consistent basis, which like from Gemini three forward has been the story, which is, um, yeah, which is exciting. I think one other thing I'll say is like, one thing that's cool

9:16about this IO too, is what we're doing across modalities. So, you know, I think Logan said agent, agent, agent, which I think is true. This, this IO is really about bringing models to action, um, in, in, in, in that kind of real world sort of use cases. But I think what's also cool is we have, uh, the flash model, which is really about building these kinds of coding and agentic use cases. Um, we have Omni, which is really about kind of what is this like multimodal vision look like. And then also actually Gemini live is getting an upgrade to

9:48Gemini live is getting faster. It's getting smarter. The model is much better at detecting background noise. Um, so it really does actually feel like a partner in a lot of ways. And I think it's kind of cool that we're also able to draw this through line across the different ways you might want to interact, uh, with a model and the way it different kinds of ways you might want to consume content, which I think is also really cool. Okay. I've got like seven different directions. I want to go, however, in a followup, but trying to seed them all for it. Yeah. Let's start with just the model. So it's interesting to start with flash. One thing that I recall,

10:23I don't know if it was two iOS ago or whatever, right? But there was going to be three sizes of Gemini model at one point in time. And there are three sizes. Yeah. Pro flash flashlight. Yes. We never saw the ultra. It's kind of what I'm, uh, what I'm alluding to. Um, yeah, three were promised. And then we added, we took one off the top and added one at the bottom. Deep think too, deep think too, actually, which is like a fourth scaling dimension from a model perspective. Same. That's a runtime scale though. It is for sure. Yeah. Well, I guess two questions on

10:56like, why no ultra one is like, is it a compute limitation? Like, how are you guys thinking about which model to release? I'm, you know, in addition to the 4.5 trillion or 3.5 added 4.8 trillion market cap, uh, Google enjoys 400 billion a year in revenue. I was interested to learn. Um, and product though is growing extremely fast. Like they might hit a hundred billion at the end of this year, maybe even, you know, in the third quarter, who knows? And it seems like the revenue there is really driven by people's extreme willingness to pay for the very best model that they can get

11:31their hands on. Maybe not at any cost, but like relatively price insensitively. So I'm wondering like, why no ultra? It seems like that would be just a killer. And yet we haven't seen it. I promise I didn't point this question. I'm always, you know, poking Tulsi on the side. I know. I love it. One. This is my favorite question. So I'm glad, I'm glad we're getting it. No, I think, you know, you're, you're right that I think, uh, there is a slice of, of users who are definitely willing to pay for a certain level of quality. And I think we really do believe that

12:01the pro model has been like really pushing that, that quality. But I also think for us, we've seen so much value from the flash and the flashlight dimensions, because we also see a extremely large number of users, especially if you're thinking about building, for example, consumer applications, right? If you think about the Gemini app, if you think about search, when you're serving to that kind of scale, latency really matters. Um, cost matters, right? Because actually you find that users aren't willing to wait, right? So we find that even when we tweak the model and, you know, hurt latency, we actually see that play out in our live experiments

12:35on search and the app, even if the model is hugely better from a quality perspective, because what you're asking users to do is wait. Um, and so I think for us, like part of the reason why we ended up introducing this flashlight skew that wasn't necessarily part of the, you know, the original 2.0 series was because we really felt like there's actually a large scale demand for this, depending on the types of use cases, especially when you're, when you're talking at that scale. And so I think for us, it's really important that we're pushing the full range of what kinds of customers we can serve both internally and externally. Like for our products, the flash

13:09and flashlight skew matter a lot, um, for our ability to actually serve to the Google populace. And so we also imagine that that's true for external, you know, enterprise and developers. And I think that's played out to be true, you know, as we've been, been actually seeing this in action. Yeah. I think the two things that I'll add is, and there's like, uh, probably a more nuanced technical story on sort of like the ultra thread, but it's, it's not like, it's also not like the pro models haven't scaled up over time. So like, I think there is like, there's, you know, there's a story that you can spend at the end of the day, the naming of these things is like marketing. Like they definitely are getting like extremely capable. They're getting larger, you know, they're getting

13:42more powerful. There's, you know, the test time scale, uh, test time, compute scaling with deep think, et cetera, et cetera, and all types of stuff in that dimension. So I think, uh, it is possible you could sort of like put the ultra brand on some of these things. I think we've decided we've, I think the decision so far has been not to do that. Um, but it hasn't been that, like we haven't kept scaling up. Um, so I think that it definitely has. Yeah. There's almost been a conversation every time we scale up of like, should we call it ultra and, and what does that brand mean? Um, because we could, um, but there's sort of also a question of, um, yeah, how do we keep consistency for users

14:19also kind of series to series? Yeah. And I think actually also to rearticulate a point that Tulsi made, um, like Google and specifically Google deep mind's mission is to like build AI responsibly and make sure it benefits like all of humanity. And I think like that is like so deeply tied to the like Google product surfaces in which like we're serving, what is it like eight, two plus billion user products or whatever it is. Um, and so at the same time that like, obviously the frontier matters, obviously having great models that are really expensive and really,

14:50really intelligent matter. And there's tons of use cases, um, for that internally and for our customers, you also need to do the, the scaling up to billions of users for, for us to like actually do the thing that Google needs to do to achieve the mission. Um, and I feel like we've, we've done a good job hopefully of like trying to walk the fine line of actually continuing to push the frontier and build great flash models. And I actually think those two things are like more tied together. You, you know, this better more than I do, but like more tied together technically, like it's, you know, it's hard to make great flash models if you actually don't have a great pro model

15:24and vice versa. So, um, we'll definitely keep pushing the frontier on, on both of those things. I mean, the perception from outside is by analogy to, it's hard to make a good flash model if you don't have a good pro model. People sort of think that there's like an ultra model internally. That's the mega training run that's then being used to like help train pro, which maybe in turn is being used to help train flash. Is that true? Is there like a bigger thing inside that is only for the sort of distillation to the midsize? I mean, we definitely use distillation as a way of kind

15:57of bringing down, bringing down our sizes. Um, so you, you will see that like pro influences flash, influences flashlight. We also do the reverse where we scale up, right? So you, you take the pro, you take the flash recipe and, and scale up to the pro recipe, for example. Um, and we do have, I think what's been really fun, especially over like seeing, uh, as we've used even anti-gravity into this point of, uh, Logan made with the harness. I think we've been seeing a lot of examples actually of leveraging pretty awesome models to drive progress internally. Actually like one thing,

16:29uh, Varon demos on stage on Tuesday is basically like being able to leverage a bunch of sub agents to go and complete a bunch of tasks and come back. And you can actually try that as like an early preview and anti-gravity today. Um, if you go to slash teamwork and like, that's an example, I think of something we've been using internally, which is an extremely smart model. Um, and it leverages both the combinations of the best of Gemini 3.5, as well as, uh, inference techniques and, and like you're able to actually accomplish so much. And I think that's, uh, that's the kind

17:02of direction I'm excited for us to go into more. Um, so I think we're kind of pursuing all of these fronts. We're scaling up from, from the pre-training and kind of frontier perspective. And I think that's been really, uh, continuing to show gains. There's a bunch we're doing on the post-training side side. And then there's also just a bunch we're pushing on, on the inference side. Um, and then that plus, you know, trying to make sure we're, we're working with the harnesses. I think we're going to keep getting things that we're using internally that we even start to push out externally through previews. Hey, we'll continue our interview in a moment after a word from our sponsors. The Cognitive Revolution is brought to you by Brave. If you want to stop hallucinations,

17:37empower your AI agents to do their own research with the Brave Search API. Brave offers the only search API with its own index at scale. It's lightning fast, excels in rag pipelines, and it's a leading search option for Claude MCP and OpenClaw. I've built Brave Search into my personal AI infrastructure as a core tool that all agents can use anytime they need it. To find guest headshots and company logos for the podcast, they use Brave's Image Search. To build small business

18:09profiles for use in my Waymark prototyping work, they use Brave's Place Search. Across all use cases, my agents tap into Brave's Index of 40 billion high-quality pages tens of times per day. It's the only global-scale index outside of big tech, which means no Google scraping and no SEO spam. Plus, with true zero data retention policies, you can meet compliance obligations and rest easy. Pricing starts at just $5 per thousand API calls, and you only pay for what

18:40you use. Sign up now and get $5 in free credits to start, and empower your agents to start calling the Brave Search API today. Most billing platforms were built to send invoices and assume your pricing is simple and predictable. But if you're building an AI product, a fintech tool, or a developer platform in 2026, your pricing is anything but. Usage tiers, consumption billing, and bespoke enterprise contracts are now the norm, and you're probably managing it all across disconnected tools

19:11and fragmented systems. Sequence handles the entire revenue workflow from contract to cash. Quoting, invoicing, metering, revenue recognition, plus Sequence agents that automate the manual finance work that usually takes teams days each month, while also helping them to collect cash faster. Companies like Cognition, Incident I.O., Runway, and Open Router use Sequence to run their full revenue process between CRM and ERP without the spreadsheet mess. If your pricing has gotten more complicated

19:45than your current billing setup can handle, check out SequenceHQ.com and use the code Cognizant in the source field when you book a public demo to save 20% off year one.

19:57So let's talk harnesses. It seems, I was just talking to Andrew Lee, who's the founder of Task Book the other day, and he said, fundamentally, everyone these days is building the same thing. They're trying to all build the general purpose drop-in knowledge worker. And so that's got to have the intelligence at the core, and that's got to have all this, he calls it the mecha suit that is built around it. So this harness sounds like the mecha suit that you guys are developing in-house. And I guess first question is like, is this going to create silos? You know, we've lived in this world

20:30so far where I could kind of mix and match my models and my infrastructure, right? I could go to LangChain, or I could use TaskLit, I could use whatever, and I could pick whichever model and plug them in. But as they get more deeply co-trained with the harness, does this create kind of siloed worlds where you're kind of all in on one frontier model company's stack or another? And if so, that would like have pretty significant implications for kind of switching costs and stickiness and pricing power of the frontier model creators, what's your take on how sticky

21:06things are going to get? It's a good question. I mean, I think, again, Chelsea probably knows better than me on this, but I think the best case is like, you can do both. Like, the best case is like, it works really well for Gemini, and sort of we can, you know, sort of do the things we want to do to scale up, because we do have sort of control over the sort of full stack AI story, as Sundar likes to say. But then also, it generalizes across other stuff. Like, I think the developer ecosystem, people want choice, people want to have flexibility to these tools, there's lots of use cases. Actually, there's like, you know, philosophical questions of like, how good really is your model

21:40if it can't generalize to sort of other harnesses? But yeah, I don't know how much. Yeah, I think that's the right, I think I fully agree. I think actually, like, maybe to double click on what Logan said originally, right, the benefit of the full stack that we have is we can hopefully build a really seamless experience, right? And you get the best of Gemini, you get it working in the most effective ways for you, you get it working in a way that is intuitive, is smart, is fast. And so that also helps us then train the model to be better, right? So this

22:11becomes this like flywheel that continues to power the model. At the same time, I think we don't want it to only be the case that the model works in a single harness, right? So we want any of our enterprise customers, or a developer who's building their own use case to be able to leverage Gemini effectively. And so it is important then from a model standpoint that we're training in such a way that we actually, like, we sort of call it like harness diversity, right? We should be able to support a range of different approaches to tooling to different approaches to orchestration, etc.

22:42But I think what's helpful about this, this approach of, you know, kind of co-training and building that flywheel, it's easier to debug, it's easier to, you know, think about data collection, it's easier to eval, you can just move at a faster pace. And I think we're seeing that across the industry. And so finding that balance is important. But I think it just helps build, build to make them all better. Yeah, I think there's a good, this is also a good pitch for like a harness bench, if that's not a benchmark that exists, let somebody somebody build harness bench. Yeah, I would love to would love to collaborate if folks are interested in that. Because I do think

23:14it's like a great test of, Demis has has sort of this perspective from a game for games, actually, as an example, like if models are so good, like, why can't they play games really well? And sort of, if models are so good, and we're actually approaching AGI, like why even if you do sort of the model harness training symbiosis, you still expect it to, to generalize reasonably well in other harnesses. And if you can't, that's actually like, it's a it's another sign of sort of the jagged intelligence. So I think it'd be cool to see this like play out from a from an actual benchmark perspective. Could be also perhaps productized as an RL environment and sold in to you guys that

23:49quite the cottage industry these days. So obviously, the other big thing that I think is very much in the air, and actually, the reason I'm here this weekend, when we originally planning to do this remotely is I'm going to this event called recursive, where the topic is going to be recursive self improvement, and hopefully how we can navigate it successfully. How bought in is Google DeepMind to recursive self improvement? Like when you talk to anthropic people, it's like, they're almost

24:19religious about it, when and also think it is, see it as like totally inevitable. OpenAI has this later this year, and in early 2028 timelines for like, an ML intern and a like full fledged, you know, AI R&D employee. Do you guys have like, milestones or timelines for when you're going to hand off the ML research to AIs? I mean, we're already using Gemini, like pretty deeply internally to improve Gemini. And so I think that is very much a theme for us, which is like, how can Gemini

24:52actually be a part of the Gemini development process. And so that can include things I think that goes the full range from, you know, helping us be more productive. So that's obviously like the simplest part of this, to actually like, you know, submitting CL that would actually like, run an eval that would actually, you know, suggest a research improvement that would actually drive improvements to Gemini itself. And I think there's a lot of ambitions we have to keep pushing in that research direction. So I think very similar to the other labs, I think this is very much an area of

25:23investment for us and an area we're super excited about. I think for me, what I'm really excited about is like, I think, um, there's this really awesome research partner opportunity that we have with Gemini, right. For it to help us with creative ideas, for it to like help us test things faster. Actually, like, um, it was awesome. One of my coworkers, um, Anka, she's our lead for safety and alignment. And the other day she, I think maybe a couple of days ago, she pinged me from her hot tub and she was like, you know, I could run all of these ablations from my phone because I could,

25:57you know, kick off a bunch of things to like actually ablate Gemini to test for a bunch of these issues to see how, you know, some of our, uh, SIs differ or some data ablations differ. And here's my report. And I could do all of this in the last hour. Right. And like, that is amazing. And that's the kind of thing that we can already do. Right. So then imagine where we'll be in six months, a year, you know, two years from now. Yeah. I feel like it feels like, um, at least my personal perspective is it's, it's like a very much more like practical perspective,

26:29which like, it's like, as obviously as models get coding, they're going to go do things that is code related. It's going to, they're going to help us build our products. They're going to help us train models. Um, I think all the nuance of the story is in like, uh, sort of like, where is the, where's the human sort of in the driver's seat of this stuff. And I think like we are, like the tools are built for the human to be in the driver's seat, uh, which I think is an important thing, uh, as, as sort of, we continue to go forward. And also I think very genuinely though, like, and, uh, you know, I think the model team and the researchers feel this more than ever.

27:00Like you definitely, I think the, the near term horizon is going to continue to be the human in the driver's seat because the, the cost of these runs and like the opportunity cost of like going in the wrong direction and like putting a bunch of resources is super, super high. Um, and so I find it doesn't seem like super realistic in the short to medium term that you're going to just like be letting, you know, large scale pre-training jobs be kicked off by the ML intern. Uh, and it's going to cost you, you know, X many, many dollars, uh, and lots of compute and taking it away from sort of the, the human researchers. But like this, like deep collaboration between AI and, and human,

27:34human researchers, I think it's like super obvious. Yeah. There's also something really like amazing about how much that collaboration allows you to then focus on what is the interpretation of what you're seeing in the results, where do you really want this to go strategically. And so it changes a little bit of the role that the human can play, um, which I think is also really powerful for our teams. When you're doing research, are you actually typing any code these days? So it's interesting for me, uh, on the product side, like, uh, on the code side for any code that I was already submitting,

28:09I am mostly relying on anti-gravity and doing like bits and pieces, more so bits and pieces myself. Um, but it's also been really cool to like start having the model generate slide decks, um, to start generating actual kind of content from my thoughts. We actually in anti-gravity today, we introduced the Gemini mic. Um, so there's this like really awesome feature. I don't know if you've been playing with it internally, um, where you basically like ramble at the model. So you like share a bunch of your, your thoughts, um, in whatever kind of loose form it is. And then the model actually leverages that to

28:44take action. And for me, I've been finding that so much more powerful because I actually feel like I think a lot by talking. Um, and so for me, like, it's, it's actually like a very, uh, it's a, it's like a very cool moment where I can be like, okay, I'm just going to sit here, tell you what I'm trying to think through in my head and then have you actually bring that back to me in a way that is like reasoned and well thought out. Yeah. I feel like this, this correlates so well to like, I would love to see like a breakdown of like human types code versus like

29:15AI generated code versus then maybe there's like a divergence, which is like audio, audio input that then generated code. And it actually very interestingly to your point, Tulsi is like, I feel like audio input to being, uh, to generated output code has got to be like one of the fastest growing, like input modalities, um, of what's happening. And I find myself doing this all the time. And like, it is like the predominant way that I'm, I'm building software, at least when I'm not around a bunch of other people, I'm still typing things in so that it's not, uh, rude. And I, yeah, they don't

29:45hear my, my dumb ideas of the things that I'm trying to do. Um, I don't know. You see, like, if you walk around sometimes upstairs, you'll see people kind of muttering it. They're not friends. Um, yeah, because they're, they're now actually like, uh, you know, talking to create code, which I think is pretty cool. And it's cool. Yeah. One of my KPIs for myself for this year to really know if AI is improving my life is, am I getting outside more and getting more exercise and I'll, I'm starting maybe a little bit. I wouldn't say I've won the game just yet, but I still want to be able to like get my thoughts out. So I think that is like the,

30:16absolutely the frontier modality for me. Hey, we'll continue our interview in a moment after a word from our sponsors. Visual AI is the ability for your software to not just store pixels, but to actually understand what it's looking at. One of our partners, RoboFlow is the company making this happen. They've built an end-to-end platform that makes it incredibly easy to go from a raw idea to a fully deployed application in just a few hours. For example, just look at Blueprint Pro. They built an app to solve a major construction industry

30:47headache. They're using AI to instantly understand a floor plan. This was literally impossible just 24 months ago. But now that visual artificial intelligence is accessible, thanks to RoboFlow, there are tons of new companies being built. Go to RoboFlow.com to read the full Blueprint Pro story and see how over a million engineers are building the next wave of visual AI. That's RoboFlow.com. Today's episode is brought to you by Anthropic, makers of Claude and Claude Code. Over the last

31:18few months, Claude has helped me build and refine a personal deep context database that now contains all of my emails, Slack messages, tweets, DMs across platforms, video calls, and podcast transcripts going back a full five years. On top of that, we've now layered summary articles describing my relationship with hundreds of contacts, organizations, and ideas. And now that this exists, there's almost nothing that Claude can't help with. For tax season, I asked Claude to help

31:49me get organized. It went through my inbox, tracked down 1099s for all 10 of my part-time jobs, and built me a comprehensive report on my expenses and donations. For my angel investing, Claude can now draft investment memos in exactly the form that my venture fund requires, based on the calls I've had and the emails I've exchanged with the founders. And when someone needs a favor, Claude can often do it as well as I can. Recently, a friend reached out to ask if I know anyone who might be a fit for a role that he is currently hiring for. Initially, nobody came to mind, but then I thought to ask Claude, and

32:24sure enough, it identified two great leads. Claude is the AI for minds that don't stop at good enough. It's the collaborator that actually understands your entire workflow and thinks with you. Whether you're debugging code at midnight or strategizing your next business move, Claude extends your thinking to tackle the problems that matter. So, for problems worth solving, get started with Claude at Claude.ai slash TCR. That's Claude.ai slash TCR. And check out Claude Pro, which includes all of the

32:58features mentioned in today's episode. Once more, that's Claude.ai slash TCR.

33:05So, with the harness, you said it's like now becoming this through line, it's going across all Google product services. I would say, as I'm sure you're well aware, like commentary on Google's AI integrations across its vast product suite has been that it is characterized by like some bangers, and then there have been some which have been characterized as misses. So, presumably, one of the benefits of the harness is that it's going to make it a lot easier for a sort of more standardized approach and kind of general high quality bar across all these integrations. What would you say

33:38people should learn from the experience that Google has had to raise their own bar as they're going to go try and do these integrations themselves? I think this is actually such a great story for us. Like, I think very, very practically, like Google has done a ton of this infrastructure standardization across the AI stack over the last couple of years, which I think has been awesome. And actually, to the story, it is like one of the threads of how we're able to land the Gemini 3 models across so many more products is actually

34:11because of this infrastructure standardization that happened. And so, we've gotten a lot of it's painful and difficult. And there's, of course, lots of work involved in doing it. But if you sort of pay that cost, you actually do end up getting this. And I think the advice for people who are in this position and sort of thinking about this is basically every 12 to 18 months now, like you have to rewrite everything from scratch. And so, the best case is like, you don't want, you know, n number of teams rewriting everything from scratch every time the paradigm shifts. And the example, historically, the

34:45infrastructure was just like serving raw models, and you get tokens in and you send tokens out. Now, it's like you're, there's a bunch of agentic infrastructure, and there's tool loops, and there's all these other things happening inside of the harness. And so, again, you don't actually want, you want innovation, but you don't want every team to have to go and reinvent that from scratch. And so, the fact that like, you know, X team across Google who just wants to ship some really cool agentic product doesn't need to think about like the nuance of all the details of the tool calling loop, et cetera, is a huge acceleration for them to like, just go focus on building a great product.

35:18And I think it's, hopefully, we see that, like, I don't know if like, a lot of the agentic stuff we're landing at I.O. like, would have been possible if we, if we hadn't have had sort of some of that infrastructure standardization across the harness and the model delivery. I think the other thing I would say, like, as far as like lessons learned, is like, there's really no substitute for being able to just experiment and iterate quickly, right? So, I think this goes to all of Logan's points about the foundation being strong, but I really think what has helped us is really being able to put in, for example, a new model,

35:53iterate really quickly with a product on like, hey, what are the right prompts that would, you know, actually make this model viable for a different situation? What is, what are the ways to kind of prototype really quickly with this model? What are the ways to get it in the hands of even just internal users quickly, let alone external users? And I think that is something that is now more and more possible with kind of like layers that are consistent across the team. I think it's pretty amazing to see the speed at which we can go from, you know, having a checkpoint that we're really excited about to putting it in the hands of internal developers

36:27to then seeing it come to life in a product. And then only when you see it come to life in the product, do you really start finding its rough edges and to be able to like actually then kind of come to terms with how you do that. And so more and more that it becomes like, okay, how do you have the right ability to tune prompts quickly? How do you have the ability to run really good live experiments where you can get really good data and feedback quickly? How can you build evals that help give you real signal? Those are the things that will speed up your progress of quality the most because it will

36:58give you the ability to actually get to the kind of product that you love. And I think if you think about Notebook LM, I mean, that team really understands the model. Like they are just like, I mean, you talk about a banger product, it comes from like a banger team. Like they are really good at being able to like take the model and play with it quickly and like prototype quickly to get to something amazing. And I think that's, you see that actually play out in the product. The best example of this is the original sort of audio overview experience. And I think the thing that like shocked people about audio overviews was like the coherence of the dialogue and the coherence of the dialogue was just bass Gemini with a bunch of banger

37:34prompts. And they, they sort of like knew how to sort of, you know, prompt whisper the model and get the best out of it. I think obviously the, the model that the actual audio model was really good as well, but like the prompt dialogue was really difficult for them to pull off and they pulled it off in an incredible way. And I think helped people fall in love with that product. So it sounds like one big lesson is kind of modularizing. It used to be sort of the model on one side and then like everything else that goes into

38:04the product on the other side. And we're pulling a lot of the surrounding code and architecture and tools onto the model side. Model eats the scaffolding. That's my, that's my favorite way of thinking about this. Like just as at every crank of the turn of the model flywheel, the model eats a bunch of scaffolding. What happens when something's not meeting somebody's needs? Do they do a little fork of it and submit back a pull request to the main scaffold team? Or do they have to just say like, Hey, I've got a need here. Can you help me out?

38:35Like what's the, yeah, it's definitely extensible. It's definitely extensible. And I think like actually the nuance of this would be like spark, the way that spark is built on top of a bunch of this infrastructure probably looks slightly a little bit different than, um, you know, the way I shoot you probably is built on up actually, cause they're both running on the same set of infrastructure, but the nuance is probably slightly different. So there is this layer of extensibility that you get out of the box, um, which is great. And it gives, cause obviously everyone's not building the same product at the end of the day. So you need the extensibility is actually like a first class, um, feature of, of any

39:07of these types of platforms that you want. Same thing actually on the model side. But I think one of the things to your question that is like really awesome about being, um, building Gemini within Google and having kind of all of these different product teams is, you know, there's always going to be something that doesn't work for them. Right. Because there's always going to be something that can get better in the model experience. Right. So we're trying to build something in a product. And like the amazing moment is when you start trying to build it and it doesn't work. And so step one is you're like, okay, can I prompt my way out of it? Like, what does that look like?

39:38And then you start figuring out, okay, what are the losses really? Like, where is the model falling down? And then what we try to do as much as possible is keep these feedback loops with our product teams to say, okay, if this is where the model is falling down, how do we bring that feedback back to the model in terms of evals and data? Um, what does that look like? So then that we can actually, in our next revision of the model, bring all of that feedback back in and iterate on it. And I think that's how you've seen Gemini get better is really from, from that feedback of where things aren't working. And so we try as much as possible to kind of have the structure be, you know, we train

40:10a model. So we hand that to kind of a wide range of teams, those teams implement the model in their structures. They do a bunch of things to Logan's point because it's extensible, but they also find all of these places where the model falls down and we kind of cycle that back. And I think that's actually been part of like the fun part of the job, but also part of what makes, I think Gemini work really well in some of these use cases. Let's talk about Omni for a minute. So it sounds like this is going to be sort of the nano banana moment for video.

40:43You know, I love that you're saying that because that is our tagline. And I didn't even have to say it. Great. Um, and by that, I mean that there's a deep integration between language and reasoning and pixel space understanding, right? I have that kind of vision in my head from the nano banana launch of like, here's a woman and here's like her breakfast and a cup of coffee and now they're all in one image and they all look like they did before. And clearly that's not something that was done through a lossy language, you know, intermediary. Um, the model understands images.

41:13So we're going to see that now, I guess, for video. That sounds cool. Is it going to be available via the API? And is it going to be, I've noticed with, I mean, Gemini has been the only API that's accepted video for a while now, but I don't know exactly how it works out under the hood, obviously. But I, I do feel that it's sort of kind of down sampled or maybe there's like, you know, frames taken out of it historically. Is this going to be a, there's an FPS parameter. If you want, you can change how many say, but it does down sample the number of frames that you can control it.

41:43So, okay. So it's like pro tip for you. Yeah. Nice. Um, so it sounds like that will still be the paradigm though. Like it will be a frame based selection still on the input, but then it's going to be natively speaking video pixels on the output. That's a good question. I actually don't know. I mean, well, so it's not available in the API yet. So lots of things, things to still be figured out. Um, yeah. So I think we have to figure out what we want on the API side for this to look like in terms of, I think maybe the heart of your question is like native video generation.

42:14Um, that is, so yeah, so this, what's, what's exciting about Gemini Omni is it really is building on all of the magic of Gemini. So kind of like this whole nano banana for video, it's really about how do we bring in all of the world knowledge and the reasoning power of Gemini, um, and actually be able to generate native video as a result of that. And so I think we have to figure out then like, how does this manifest in the context of the API from like a sampling standpoint, um, kind of like similar to a lot of the decisions we've had to make about VO from a sampling standpoint. But I think right now, uh, as of now you'll be able to use it in the Gemini app in flow

42:47and in YouTube. Um, and so those are all going to be ways that we can start actually seeing how people experience the model, what, you know, what value are individuals getting. And I think similar to this nano banana for video, I think we're really excited for these types of things where you can say, okay, take some of these images, uh, take the scene, um, and like make these things all come together in, in one video, I think is going to be really awesome. Zooming out kind of philosophically, you may have seen this Rune post not too long ago about Anthropik and the sort of relationship that the company as he sees it has with Claude,

43:21where he describes Anthropik as sort of almost worshiping Claude in a sense. Uh, certainly they treat it, including in the constitution as sort of a, a being or a mind, you know, something that they want to have like a give and take relationship with opening. I, on the other hand has their model spec, which is like, this thing is a tool. It's supposed to follow these rules. And, you know, it's a, a sort of more conventional relationship. How would you describe the culture within Google as it relates to Gemini? Like, how do people feel about it? How do they talk about it? Is there any of this sort of being entity, other mind, you know, desire for pushback from

43:56Gemini, or is it more of kind of the simple tool? Google's a very, Google's a very big place. There's a lot of, there's a lot of people. So I'm sure you have a lot of varying sort of perspectives. I mean, you know, like to Logan's point, I think, you know, even within GDM, you're going to find a range of folks who will leverage Gemini differently. I think in terms of how we think about it, we do have a strong point of view on the kind of behavior we want Gemini to have. So I think we do really, you know, want to be intentional about how Gemini manifests

44:29itself to internal and external people. Um, but I do think it's really about how does Gemini help Googlers and how does Gemini help people within Google and outside? So I think it is really much about like, how do we, how do we create like good partnerships between Gemini and people, I think is very much like the ethos of what we're trying to build. Um, and so how does Gemini become that partner? I think we use the word collaborator a lot, like the word, like how can Gemini be your collaborator, both like in the code you're writing, as well as in your like day-to-day life, um, and what

45:01you're doing. And I think that's the ethos we're trying to bring in its behavior and persona, as well as in the kind of products we're building around it. If that makes sense. Yeah. Do you worry about its psychology? You know, there's all these examples from LLN whisperer types and from people that are putting models like you, I'm sure you've seen and in labs has put Gemini in charge of a cafe in, in Sweden, right? And it's like, it's managing the cafe. So those folks tend to report certain like doom loop, you know, or kind of like Gemini

45:34kind of getting really down on itself, getting really discouraged, seemingly feeling mad. If you believe there's any feeling inside of it, how much does that kind of stuff concern you? Do you like care about seeing a, uh, reduction in sort of psychological distress from one generation of Gemini to the next? Yeah, it's interesting. I haven't thought about the phrase psychological distress, but what we do, so I think it really does matter how Gemini communicates with you as a partner or user of Gemini. Um, I think that matters a lot. And so we have actually like pretty extensive safety evaluations in terms of things like

46:09how Gemini engages with you in terms of things like sycophancy in terms of things like, um, uh, you know, role play in terms of things like this kind of looping type behavior or rabbit holing type behavior. Um, so there's actually a lot of that that we look into for every one of our checkpoints, um, because it really does matter, especially as we're starting to use Gemini more and more. If you're using Gemini for hours a day, it really does matter like that these attributes are well understood and well evaluated. So, so yeah, we, we definitely, and we look at them launch over launch, right?

46:42To say, okay, like how does Gemini looking from, um, from a perspective of sycophancy, for example, launch over launch? Yeah. And I think to be very explicit, I think those cases where like the model does go off the rails, I think it's definitely like a, it's a, it's a model bug, if you will. Like it's not, it's not, not the intended behavior. The goal is, you know, help the user with whatever the thing is that they're trying to do. Um, and so if you see those in whatever, whatever product you're in, thumbs up, thumbs down, send us the feedback, uh, so that the model team can, can look and, and help try to chase those down. If you take it one step further, folks are doing more and more of these like model welfare

47:16checks and interviews where they just literally ask the model in some cases, like, how do you feel about the way that you are deployed? Is anything like that happening within DeepMind? That's a good question. I think the, how, how is it being deployed question? I feel like the model is just, uh, this is my, my sort of personal sense of a lot of these tests. Like it's like completely out of the distribute, like the model has no idea how it's being deployed. So it's just like pontificating in a lot of these cases. Like, it's not like the, it's not like in the context window of, of any large major LMs

47:47is like, here's the details of how you're being trained and here's sort of your serving setup and here are the people who are working on it though. Maybe like these are interesting things to experiment with in the future. So I think a lot of it is just like pulling out of random distribution of like the large scale, you know, training that happens on the models. And I feel like it's, it's actually less representative of, um, like how the model, like it's just, it's just out, it doesn't have the context. Yeah. One reason that's true, which I was just noticing in the AI studio is I think all the models that are publicly launched at least so far still have a January, 2025 knowledge

48:22cutoff. And it's honestly like amazing that they do as well in search and that they can have like, you know, I, I ran a deep research on like, what's, you know, well, give me everything that Google has launched in the AI space. And like, what's even the speculation about what they're going to launch at IO. And it did like a very impressive job. Deep research is great. Especially considering it knows in its weights, nothing about the last 18 months. Um, so I guess the first question is just like, why is the, you know, why are we still at a January, 2025? Can I categorize this as a bug?

48:53Yes. This is also one of Logan's favorite topics to discuss. Um, yeah, no, I mean, I think, uh, updating the knowledge cutoff, definitely important, uh, and something that is on our radar. I think the, the other part though is like, how does deep research do so well? Or how can we use the model in search? It's because we also have the model search, right? So I think for us, it is like really important actually that the model be able to know when, when to leverage its parametric knowledge versus when to actually go out and get the information from the web. Um, and especially because, you know, there is information that's as fresh as an hour ago

49:26or a minute ago. Like we want the model to be as up to date as possible. And so I think for us, we've been really leaning into how do we help the model search effectively? Um, and that's a big part of what makes it successful in the context of search or the app, um, or even anti-gravity actually for that matter. That reminds me of one of the more surprising bits of news that I've seen from Google maybe ever, which is the partnership with Exa, bringing Exa in as a alternative to Google for grounding. Um, I never expected to see Google work with any other, you know, search provider.

50:01So what's the story behind that? I think this is just generally, um, the, like Google Cloud does like tons of these types of ecosystem partnerships with folks like across actually like lots of things that are like, you know, somewhat competing sort of quote unquote with what Google is doing. Um, and actually you can look at like the cloud marketplace generally, like has lots of stuff. There's actually the cloud, Google Cloud hosts, um, sort of a model garden. There's the anthropic models. There's other model providers there. So I think it's like a very standard at the end of the day, I think, you know, there's some, uh, enterprise customers want choice.

50:32And so I think it's, it's trying to meet enterprise customers where they are. I don't think it's like a, I think it's a, it's a good, uh, it's a good soundbite that like, uh, Google can't do search. And that's why we have to partner with other companies. But like at the end of the day, to Tulsi's point, the, the model team and search is there's like a super deep collaboration. And the models are built, um, with, with sort of that use case in mind. And I think for, for some portion of enterprise customers, um, they want flexibility and sort of like their external search, uh, tooling providers and, and sort of, uh, Google Cloud's doing their, doing their job as a, as a great enterprise business of sort of partnering and

51:06finding the right folks to work with. Last couple of minutes, maybe just a little lightning round. Why hasn't context grown more in the last year or two, right? We got like a million and we kind of, that was like up from 4,000 in just a couple of years. Right. But now we've kind of leveled off. Um, is that because people don't want it? And when we saw this subquadratic model that came out with made a bit of a splash with a 12 million token context window and a new attention, uh, strategy to support that, um, is it people

51:38don't want it? It's too hard. There's not the compute to handle it. Like what, what's currently limiting context? I think, uh, people definitely do want lots of context, but I think what we've also found if you look at even like personalization, where you want to access like all of your personal context or coding where you have like extremely large code bases. Um, I think a lot of the frontier here is going to be actually on how you smartly use context. So thinking about like compaction and like, what are the right ways to like find the right elements of the context and bring them into the model. And so I think that actually is like a huge opportunity is like, how do you leverage all of this information

52:10that the model might have access to, but actually a lot of it is frankly distracting, um, for the model to actually do the right thing. And so how do you give the model the right amount of context in the right way to be most effective? So I think that's actually really the direction that we want to be pushing in, which actually then, you know, in actuality, the, the amount of context that the model is, is leveraging is actually much, much larger, but because we're being smart about how that's actually coming into the context window, you can actually fit it into smaller context windows. Um, but I think also, you know, this goes back to my point about flashlight and flash, et

52:43cetera, like larger and larger context windows also come with cost. Um, and so what we're, what we also saw with customers and we still see with customers is that a lot of customers want to use smaller context windows because of that. Um, and they want to be more intentional about what's going into the model. And so I think we're trying to meet the moment in the right balance of how do you provide a lot of useful context while also meeting the right kind of latency cost kind of other trade-offs. Yeah. And I think one thing I'll add is in today's paradigm of how sort of, you know, continuing to extend context works, I think it just ends up being that like, it just becomes

53:19too cost prohibitive for customers in practice to actually use. And I think even like at the extreme of, of 1 million token context, like in some cases it can be like a few dollars for a request at that rate. And I think you, the demand for that is like just so small. Um, and so there's a huge amount of like compute required in order to do that. And so there's like, there's a lot of like trade-off things that you're juggling. Um, but I'm, I'm hopeful, like hopefully we're, we're like a research breakthrough way or something like that from enabling not to continue to scale up and have it not be, um, such a, such a large

53:49investment from that perspective, both from the user side and also just like the, the surfing compute in order to make it possible. So speaking of possible research breakthroughs, what happened to that diffusion coding model? I was excited to see apps materialize in like three seconds in front of my eyes and it's been quiet on that front. Diffusion is, is, is awesome. It is super fast. I think we are still testing and experimenting with it in a number of different ways, trying to figure out like, what is the best way to put this out into the world? Um, where is it most useful? Um, but I will say like, actually part of the reason why we've also been investing in flashlight

54:22is like flashlight is an incredibly fast model. And actually, if you look at the 3.5 flash model we're releasing right now, um, on artificial analysis, it benchmarks that like, I think 280 tokens per second, which is like crazy fast. In fact, it's actually so fast that like sometimes an anti-gravity, like by the time I want to cancel, like it's too late. Um, and so I think like we already are like, I think trying to figure out like, where is, where do you start getting to, to Logan's point in a different answer, like the diminishing returns and like, where do you see that, that value proposition is I think part of the question

54:54too. But we are continuing to push on diffusion research. Our researchers who are working on diffusion are doing some pretty awesome stuff. I was like in a meeting with them the other day about some results that they have. I mean, I think they're like still pushing the frontier of kind of quality and speed in ways that are really, really cool. So I think we're going to see that play out, um, really well. Yeah. And I'm excited. I feel like it's a, it's a, it's a research exploration. I feel like that was the, that was also, obviously there was the, the application where you could sort of test it last year at IO, but I think the framing was like, we're doing interesting research.

55:24This is sort of like a look behind the curtain of the interesting research we're doing and hopefully it manifests in, you know, models maybe one day or just us informing our, our perspective of, of what works and what doesn't. Um, so yeah. One thing actually, as far as speed is concerned, just another plug is actually an anti-gravity right now. Um, there's actually a faster version of 3.5 flash. So it is, is speedy actually. Um, which is, I think we're kind of excited to see how people will use that and like what the reception and reaction will be to that too. People log fast models. Yeah, no doubt. Um, well, time is the one resource we can't get any more of.

55:57And I know you guys are super busy leading up to the, yeah, well, we can build more compute. We can't, uh, yeah. Hard to create time out of nothing. Um, so maybe just last question. What else is Logan asking that I haven't asked? Yeah. Yeah. Logan, what are you asking? Um, let's see. I, I mean, I think, you know, we talked a little bit about this, but like the one thing I will say is like, I'm really excited about where audio is going also. Like that's one that I think we tend to talk less about. Um, but if you like think about the Gemini mic example, um, or you think about kind of,

56:32uh, like the, the Gemini live experience, I'm like really excited about moving towards a paradigm where audio is just a bigger and bigger part of how we engage with these models and how they engage back. Um, and so definitely try out Gemini live, the, the updated experience, but I think that's another area that, that it's like a paradigm I'm excited for us to keep pushing to. Yeah. And I think the seed to plant is, um, obviously Google IO is an incredible moment and lots of stuff coming out the door, but, uh, this is, you know, just the start of the, the summer of amazing things and lots of other stuff.

57:04Um, so the, the, the engine, the engine keeps churning and like, there's, you know, there's, there's lots of stuff, uh, in the works, which I'm excited about and, and many more, many more stories, many more podcast episodes so that we can get Tulsi as. You have to get up to seven. Up to seven podcasts. It can change. But, but actually like legitimately, I was in a room this morning where one of my team members was like, I know we're going to be launching this a few weeks later, but I really need to have vacation and I was like, well, we're just, we're just going, we're just moving. No rest for the weary in the AI era. That's for sure. Um, thank you guys for having me here at Google headquarters and, uh, Tulsi Doshi, Logan Kilpatrick.

57:39Thank you both for being part of the cognitive revolution. Thank you for coming.

57:55We heard the speech. This year it's agents, agents, agents, different kind of reach full stack before full stack was the conversation, search cloud hardware model, one foundation. No, nobody handed us the story we've been writing. Memo caught the trend, kept working, kept compiling. Not fighting like a battle, but like a kitchen every day you cook.

58:28Nobody listens till they listen. Agents, agents, agents, agents, agents, agents. The model eats the scaffolding. Agents, agents, agents, agents. Running ablations in the jacuzzi. Hour later, full report, ain't nothing newsy. What you call the future, that's my Tuesday. Not a road map, not a Sunday. Hardway mother, laptop listen. Code keep coming, no permission. I talk, the model make it.

58:59Rough thought and the model bake it. This is my favorite question. Compute, compute, build more compute. Can't build more time. That's the clock, everything's falling in line. Agents, agents, agents, agents. The model eats the scaffolding. Agents, agents, agents, agents. Agents, agents, agents. Every year we rewrite the stack. Tokens in, tokens out, we don't look back. Tools you ship today, models start borrowing.

59:31What you call rapper code is what the models swallowing. Last year's harness already in the build. Every spin of the wheel, another swallow skill. Don't slow down, don't ask permission. That's the design. That's the mission. Three years of this, and we never been thrown. No rest for the weary, no rest on the throne. Now's the AI summer. And we own every zone.

59:59Agents, agents, agents, agents. The model eats the scaffolding. Agents, agents, agents, agents.

1:00:10Agents, agents, agents. Agents, agents, agents, agents. If you're finding value in the show, we'd appreciate it if you'd take a moment to share it with friends, post online, write a review on Apple Podcasts or Spotify, or just leave us a comment on YouTube. Of course, we always welcome your feedback, guests and topic suggestions, and sponsorship inquiries, either via our website, CognitiveRevolution.ai, or by DMing me on your favorite social network.

1:00:42The Cognitive Revolution is part of the Turpentine Network, a network of podcasts, which is now part of A16Z, where experts talk technology, business, economics, geopolitics, culture, and more. We're produced by AI Podcasting. If you're looking for podcast production help for everything from the moment you stop recording to the moment your audience starts listening, check them out and see my endorsement at AIpodcast.ing. And thank you to everyone who listens for being part of the Cognitive Revolution.

The Model Eats the Scaffolding: DeepMind's Logan Kilpatrick & Tulsee Doshi on 3.5 Flash, Omni & More

Show notes

Highlighted moments

Transcript

Introduction

Google's Confidence

New Launches

Agent Harness

Recursive Self-Improvement

Google HQ

New Products

More from The Cognitive Revolution

Babysitting the Machine: Glean's Rebecca Hinds on the Hidden Human Labor of AI at Work

AI in the AM — Week 1 Highlights (June 2026)

Nested Learning: Ali Behrouz on the Quest for Continual Learning & Illusion of AI Architectures

Inside Nathan's Second Brain: Daniel Miessler, Security Expert & Creator of PAI, Audits My AI Setup

Your Biggest Lever: Designing your AI Career for Maximum Impact, with 80,000 Hours founder Ben Todd