Steadcast
The Cognitive Revolution cover art
The Cognitive Revolution

Three Kinds of Software Survive: Tasklet's Andrew Lee on Competing to be a Horizontal Platform

May 15, 20261h 33m · 17,950 words

Show notes

Andrew Lee, CEO of Tasklet, returns for his fourth appearance to share how his team has once again rewritten their entire agent stack, now emphasizing file system context, agentic search, and multi-resolution summarization. The conversation digs into the strategic tension of competing with your own supplier, as Anthropic's Claude Max accounts offer direct customers far more tokens than API partners get at the same price. Andrew also lays out his framework for the only three types of software companies that will survive the AI transition and discusses Tasklet's evolution toward becoming a model-agnostic horizontal platform. Sponsors: Brave Search API: Brave Search API gives AI agents a fast, independent search index for research, RAG pipelines, images, places, and fewer hallucinations. Get $5 in free credits at https://brave.com/search/api/?mtm_campaign=q2-26-cognitive-revolution Sequence: Sequence handles the full revenue workflow for complex pricing, from quoting and metering to invoicing, revenue recognition, and collections. Book a public demo at https://sequencehq.com and use code COGNISM in the source field to save 20% off year one Roboflow: Roboflow is an end-to-end visual AI platform that lets you turn raw ideas into fully deployed applications in just hours, powering breakthroughs like Blueprint Pro's floor-plan understanding tool. Read the full Blueprint Pro story and see how over a million engineers are building the next wave of visual AI at https://roboflow.com Claude: Claude by Anthropic is an AI collaborator that understands your workflow and helps you tackle research, writing, coding, and organization with deep context. Get started with Claude and explore Claude Pro at https://claude.ai/tcr

Highlighted moments

what if instead of the history being the thing we send to the LLM, what if the history is in the file system? What if the files are the agent?
Jump to 5:18 in the transcript

Transcript

Introduction

0:00Hello, and welcome back to the Cognitive Revolution. Today, I'm pleased to welcome audience favorite Andrew Lee, CEO of Tasklet, back for his fourth appearance on the podcast. Andrew has always been extremely transparent and candid. His belief that speed is the only moat has made him comfortable sharing intimate details of Tasklet's agent architecture. And as you'll hear, in the six months since we last spoke, Tasklet has indeed, once again, entirely rewritten their stack. Today, there's much more use of file system context and agentic

Agent Architecture

0:32search to leverage available information while conserving tokens. Plus, a huge new emphasis on summarization at several levels of resolution. This time around, we also dig in to the delicate strategic situation that Andrew and Tasklet face. While their product strategy of always betting on the models has proven correct, and Andrew's choice of Claude has been rewarded, Andrew observes that these days, everyone is fundamentally building the same thing. And today, his most intense competition is actually coming from his critical supplier, Anthropic, which, with Claude Max accounts,

Token Costs

1:06gives their direct customers an estimated five times as many tokens as Tasklet can purchase at the same price via the API. In micro terms, this relatively high cost of tokens has caused Tasklet to stick with Opus 4.6 rather than moving to the new 4.7. And in macro terms, it's pushing Andrew and team to become a horizontal platform, capable of harnessing, or as Andrew describes it, outfitting with a mecha suit, frontier models from any provider. This evolution, which I do think Andrew has played and timed about as well as anyone possibly could,

AI Transition

1:41is critical, because horizontal platforms are one of only three types of software company that Andrew believes will survive the AI transition. The others being API-first companies like Stripe and companies that develop solutions and sell outcomes. Best exemplified, perhaps, by Finn's model of $0.99 per customer service ticket resolved. We get into lots more besides, including Tasklet's new Instant Apps feature, how they're thinking about deep personal and shared organizational context,

2:11the cloud container company that Andrew endorses, Tasklet's token-to-labor cost ratio,

Token-to-Labor Cost

2:17and whether or not Zuckerberg has come calling after his Manus acquisition was canceled by the Chinese government. This is a fun one with lots of valuable detail from somebody who's in the arena competing to become one of the few general-purpose AI agent platform winners and still actually willing to tell us all about it. Please enjoy my conversation with Andrew Lee, founder and CEO of Tasklet. Andrew Lee, returning champion and CEO of Tasklet, welcome back to the Cognitive Revolution.

2:49Thank you. Glad to be here. You are a fan favorite, and I'm going to just try to pepper you with a bunch of questions and make sure we get as much alpha for all the builders in the audience, myself included, as we can. And so first question, it's been about six months since we last spoke. You have rung in my head probably weekly with your speed is the only mantra. And I guess my first question is, what have you rebuilt in the last six months since we talked? Or maybe more to the point,

3:20what have you not rebuilt in the last six months since we talked?

Rebuilding Tasklet

3:24Yeah, I think the mantra has stayed the same. And we've rebuilt basically everything. I was thinking about this earlier, and even the pieces where I'm like, oh, this has stayed the same. Actually, no, I've been totally rebuilt. So from a product perspective, the product is actually very different now. When we launched this thing in October, it was all focused on workflow automation. We thought, hey, it would be really cool if people could come in and describe a workflow and we'd run the workflow for you. But basically, the feedback that we got right out

3:54of the gate was, hey, once I've given this agent all of my context and hooked it up to all my stuff, I don't want it just running my workflows. I also just want to be able to talk to it synchronously, too. And so it's not just a workflow automation tool anymore. Now it is a very general purpose agent. It's great for doing workflows, but it's also great for doing other types of stuff. So that required basically a total rebuild of the product experience and as a result, a lot of technology behind it. So as an example, in a workflow automation tool in the previous iteration

4:24of the product, you basically had a main agent that you'd talk to for a brief period to kind of set up your workflow. And then once you're done setting up your workflow, you basically stop talking to that agent. So the chats were relatively short. And then our system would kick off runs of what we call the task agent on a periodic basis when events happened. And so every agent was like a pretty short thing. And you could do like pretty simple context engineering to make that work. You know, world where you want this like general purpose agent that you can talk to synchronously and run these automations. The product experience people want is just one big linear chat where like

4:56everything is in one chat. It works really well from a product experience, but the entering gets really complicated because you can't just like have an infinitely long chat history that you feed into the LLM. Even if you could be really, really expensive and you wouldn't want like every automation to have to send in, you know, 10 million tokens from all the previous runs. And so we had to kind of totally rethink the way context engineering works and say, what if instead of the history being the thing we send to the LLM, what if the history is in the file system? What if the files

5:29are the agent? And Mark had recently had a thing about this. And I think people figured this out, but you know, we made this switch kind of in November where we said really, okay, what we need is a file system that has your history. And then we need the thing that we actually send to the LLM to be just like hints at what's in the file system and what things you need to read to get the work done. And that way we can scale the agent up, including the number of, you know, of chat messages that were sent. And there's a lot of samples we'll scale up in the future, but basically you can scale from what fits in the context window to what fits the file system. You can fit a lot more stuff in

6:02the file system. There's a bunch of other stuff we've rebuilt. So another big thing we've rebuilt is around computer use. When we launched initially, computer use was like kind of this add-on. You could have a Linux machine. Actually, initially it was, it was a Windows machine. Then it was a Linux machine, like tacked on. And it really was kind of an afterthought. You could use it for certain things. It kind of worked okay. But most things you did with the agent didn't touch it. At this point, computer use is the absolute core of the product. So basically everything you do is like running shell commands, touching a file system, touching a database. We have a very tightly integrated

6:33browser use experience now where every agent has a headless, headless VM and a browser VM system that, you know, persistate across runs and allow you to do lots of really cool stuff. And it's sort of in the critical path. So now, like if our computer use goes down, like everything goes down versus before it was kind of this afterthought. We've also rethought the way our integrations work. So this is something I think for our product experience, it maybe doesn't look terribly different to folks, but the basic architecture of how we plug in other systems to the tasklet agents has been

7:09completely rethought basically to allow the agent to have sort of more control and management of like those connections. So a simple example of, of what, like a product experience that's improved is used to not be able to connect multiple instances of the same type of thing. Like you couldn't have like three different Gmail accounts connected to an agent and now you can. So the sort of the base architecture there has been rebuilt. So I'd say basically every letter code has probably been touched in the last six months. And like most of our fundamental assumptions are thrown out and uh, the product still can do many of the same things, but hopefully much better now.

7:43Yeah, that's cool. So literally you can't think of anything that has survived the last six months. Nothing substantial. The, the, the visual design is totally different. The structure of the app is totally different. Yeah. The connections system is different. The way we use computers is different. The core of the agent is different. The context management is different. The way we do compaction is different. So it's, it's all new. Yeah. Okay. Let's talk about that compaction. Cause I mean, one of the big takeaways from, it might've been two conversations ago was the critical importance of

8:17caching. And you had said at the time, you know, long context is, you know, it's quite effective. Obviously it's expensive caching and especially, especially the sort of, um, 90% off caching pricing of Claude was like critical to enabling certain things to really work in a way that wasn't, you know, uh, tanking your, you know, uh, what is it? Um, the, the classic meme, right? My, my company is dying, uh, with making it work without killing your company. So it sounds like that's changed quite a

8:49bit. Now it's much more of like a pointers hints type of thing. How, how is this working? And obviously tokens are like, you know, people are spending a lot on tokens these days and I'm, you know, increasingly like hitting my limits on even the, the highest, you know, plan with task at these days. So how are you managing context for me? What should I know? What lessons should I take away from your experience on how best to manage context in the modern moment? Yeah. So caching has actually become a much bigger deal because now that the real context is in the

9:24file system, there's just a lot more tool calls that needs to be done to do the basic operations of the agent because you're loading in a bunch of files and stuff. And so we really have to make that caching work if we don't want this thing to be like crazy expensive. So that's, that's been very much at the forefront. We, we came up with a new approach to context management that we shipped in December that basically works like this. You take your whole chat history and you put it in the file system. So it's all accessible in the file system. And then you find a way to summarize that whole

9:56history and kind of a fixed length number of tokens by having recent stuff be included in like sort of high granularity. Like the last thing you sent, you'll probably have most of it or all of it there. And then older things basically have like decreasing fidelity as you go back. So if you have a very long chat, the, the stuff in your current turn, like the current thing that's running, it's probably all going to be there, including like all the thinking blocks and all the tool call responses and, and, you know, all the files and things are probably going to be sent to the LLM

10:29per bit, depending on how long the run is. But for most kind of short runs, that'll be the case. The previous turn will probably mostly be there. You're going to have the full user message. You'll probably have the, the assistant response. You'll probably have the tool call arguments. You'll probably have the tool call responses. You'll probably have the thinking blocks, but as you go farther back, we start stripping the thinking blocks. We start stripping the tool call responses, or at least truncating the tool call responses, and then stripping them. We start truncating and then stripping the tool call arguments. Then we start collapsing tool calls. And then we start, you know, drinking down the assistant messages. And then finally we get to

11:04some LLM based summarization. And we do this in buckets, moving back so that we can have sort of a minimal impact on caching. Basically you want to like avoid messing with prefixes as much as you can. So you, you, as you go back, basically you get into these buckets where we have like different levels of compression and those buckets, as they get older, tend to like get added to very slowly. And then once they hit a certain threshold, like we shrink them down. And this system has actually worked basically like pretty well. And the core thesis basically is like, you, you generally

11:38care a lot more about recent stuff and you trust the agent to go and like look things up when it needs to. And I would say it's not perfect. We do definitely do have people say that agents forget things. It definitely does still cost us a lot of money to run, but I think it's generally worked. And our plan is to double down on this type of architecture. And we have lots of ideas how to improve this, but the basic approach of these sort of like, like decreasing fidelity is you go back and he's like bucketed cash aware chunks. I think it's the right approach.

12:08So how often does that get updated? Like if I have a agent that runs on a daily basis, do you try to keep the cache active from one day to the next? Or is it sort of every day we're going to have a fresh cache that will run through that whole session and all the interactions, but kind of tomorrow, you know, do we begin again or do we like, well, on what frequency do we begin again? Well, there's two pieces there. There is when do things get updated on our side? Like when do we

12:38decide what that compressed history that we put in the LLM looks like? And then what caching do we do on the LLM side? And the answer to the former is every time you do anything, it's sort of incrementally updated, including in the middle of runs. If you have like a very long turn that uses a lot of tokens, it might actually start compressing inside that turn. And the reason that we persist that is actually calculating that could be really expensive. Like running an LLM based compaction of like an older section eats a lot of tokens. And you don't want to have to do that every time the

13:09thing starts up. If you're, you know, every hour you have a trigger running and every time you have to like compress a bunch of history, that can be very expensive. So we keep all that around. On the model side, caching depends on the provider. So in the case of Anthropic, we're using five minute caching. And so it doesn't stick around very long. And the assumption there basically is you're probably either in an active session, or like in the middle of a turn, in which case five minute caching is enough, or you're probably waiting for next trigger to run. And like most people's triggers are not running like every, you know, half hour, they're running every few hours or every

13:42day. So the assumption there is, it's not so common. And then different providers have different different possibilities there. And for example, like OpenAI has much nicer caching primitives, for example. I'm happy to talk about those too. Yeah. Okay. That's interesting. So it's basically constant maintenance of the higher level summaries that will be fed into the LLM and then pretty short

14:14kind of single burst style caching to actually reduce the cost of incremental calls within like one agent run. And it sounds like at least for Anthropic, that kind of is typically limited to like the cache is hit for the for one run, but not hit across runs for the most part. Yeah, that's, that's the current approach. And one thing I wanted to note is the way our system is built today, we basically get no cash benefit across users. So it's like caching for actually

14:44even per agent, like it's basically caching per agent, there's some changes that we can make to do a lot of cash optimization across agents, and potentially even across organizations and users. And I don't want to get into the specifics of that, because that's, that's still that's still upcoming. But I think there's a lot of potential actually to save money across agents as well. Yeah, okay, well, that'll be important. Hey, we'll continue our interview in a moment after a word from our sponsors. The Cognitive Revolution is brought to you by Brave. If you want to stop hallucinations,

15:18empower your AI agents to do their own research with the Brave Search API. Brave offers the only search API with its own index at scale. It's lightning fast, excels in rag pipelines, and it's a leading search option for Claude MCP and OpenClaw. I've built Brave Search into my personal AI infrastructure as a core tool that all agents can use anytime they need it. To find guest headshots and company logos for the podcast, they use Brave's Image Search. To build small business profiles for use in my Waymark prototyping

15:51work, they use Brave's Place Search. Across all use cases, my agents tap into Brave's index of 40 billion high-quality pages tens of times per day. It's the only global-scale index outside of big tech. Which means no Google scraping and no SEO spam. Plus, with true zero data retention policies, you can meet compliance obligations and rest easy. Pricing starts at just $5 per thousand API calls, and you only pay for what you use.

16:22Sign up now and get $5 in free credits to start, and empower your agents to start calling the Brave Search API today. Most billing platforms were built to send invoices and assume your pricing is simple and predictable. But if you're building an AI product, a fintech tool, or a developer platform in 2026, your pricing is anything but. Usage tiers, consumption billing, and bespoke enterprise contracts are now the norm, and you're probably managing it all across disconnected tools and fragmented systems. Sequence handles the entire revenue workflow from contract to cash. Quoting,

17:00invoicing, metering, revenue recognition, plus Sequence agents that automate the manual finance work that usually takes teams days each month, while also helping them to collect cash faster. Companies like Cognition, Incident I.O., Runway, and Open Router use Sequence to run their full revenue process between CRM and ERP without the spreadsheet mess. If your pricing has gotten more complicated than your current billing setup can handle, check out SequenceHQ.com and use the code Cognism in the

17:32source field when you book a public demo to save 20% off year one. Let's, um, I do want to circle back to the OpenAI question because you guys have been clawed maximalists, and your other mantra that rings in my head a lot is always bet on the models. I'd say it's safe to say that that bet has gone well over the last six months. Obviously, we've seen some of the most notable model releases in the sense

18:02that the community has sort of flagged like 4.5 and 4.6 as kind of qualitative shifts where like things went from not working to working and people are like, oh, I can really get pretty general purpose knowledge work out of these things on a pretty consistent basis. Now. I would love to hear how you would characterize the advances that we've seen. Maybe you could do that in terms of like, what new use cases have opened up maybe you know, things that have surprised you possibly also like

18:33things that are still not working that you you know, that might be surprising, given all the things that do work. And then then we can get to the latest models I want to so kind of give me the like, four, five, four, six history, and then we can go to four, seven and five, five present. Yeah, I think the overall, you know, approach have always been on the models. I think that is, I totally agree. That is like totally held up. When we started working on task, we were using four, cloud four, and that was able, that actually could get you a long ways and it worked pretty

19:08well. Four or five was a big unlock. So four or five started to work much better to do doing computer use, and it could just like manage sort of navigating through the various different connections and tools enablement process and stuff much better. And that, you know, came out very early in the lifetime of task that was like, I think, a big bonus for us. And the cost reductions around Opus were huge for us in, I want to say, December, when that happened, they dropped. So initially, like we could really

19:38only have people on Sonnet, and then Opus dropped from 15 to five. And that was like a big unlock for us as well. I think four or six was like a solid incremental change, made it nicer for again, for computer use, which has become increasingly important for us, both like headless and head full, as well as code gen. And that enabled our instant apps feature, which I think we'll probably talk about today, which is a very cool feature. Four seven, we actually haven't rolled out yet. I think

20:10four seven has been much better in certain areas. Like it's much better to kind of like, you know, code and like one shotting kind of long projects. But for the types of like, iterative knowledge work that we support, it doesn't actually seem like a huge boost. And it's a lot more expensive. So the tokenizer changes they made increased our cost for like 30%. And costs are huge for us. So, because we, you know, we essentially pass them on to users. And so we actually opted not to ship four seven as like a like a default recommended model, we are going to ship it. But as sort of an

20:44advanced user option, if people want to, and we're going to like note that, hey, this is actually like cost a lot more. So I think I think the progress there has been great. The reason we started on a throttle, I've been so anthropic focused for so long, was just that the, the basic core of our agent, like the ability for it to like, like navigate through a discovery process of connections and activate the right tool in your agents, and then manage its context, the way it manages context, just the core inner workings requires kind of a base level of intelligence. That the other models just couldn't do, they couldn't, you basically couldn't use the same

21:20harness with them and have it expect it to work well. That has changed, which is really exciting for us. You know, it's kind of scary to be like, we're totally dependent on this one vendor. Even though they're great, like I drop down models are amazing. Don't get me wrong. Um, supply chain risks are everywhere. If you take that approach. Exactly. But more recently, GPT 5.5 has gotten really good. I think it's a huge step up for our use case over 5.4. We will, by the time this podcast comes out, we will probably have announced this publicly. So just, hey, it's, it's going to be there very soon. And it can navigate our

21:55harness super well. I think it gives Opus 4.6 a run for its money for most use cases. So that's, I think that's really exciting. And I'm pretty optimistic about the OpenAI roadmap this year. I think they made a huge bet on compute last year. And I think that's starting to show, and I think where they're going to have a lead for a while. And it also is clear that they've refocused their business much more on these types of use cases. And you saw like the progress codecs made over like six months. And if they put that level of effort into doing the models around these types of agenda use cases, I think that'll be, that'll be huge. So we, you know, we signed a,

22:28we signed a deal with OpenAI the other day and we'll be launching stuff there and we're making it, making a pretty big bet there as well. There's also been, there's progress in other places though. The, the, the latest Google models are, are pretty solid. They're not, they're not at the level of, of anthropic OpenAI yet, in my opinion, but they are making very fast progress and they're much closer. And then we've been playing with, with DeepSeek and with Kimmy. And the, like the latest Kimmy is, you know, as far as we can tell, like maybe better than a haiku and cheaper. So I think we're going to see those probably make it, make it make their way into,

23:01into task soon. So I would expect, you know, within the next few months, we are going to have, you know, anthropic models, OpenAI models, open source models, Google models. I'll bet you the anthropic ones will still be the, you know, the best and probably the recommended in most cases, but people have a variety of choices and like some good cost, cost optimization options for certain things. So many follow-ups there. Let's maybe start with your, what I imagine has been a little bit of a delicate dance with anthropic. And then we can kind of compare and contrast that with

23:33what OpenAI is now bringing to the table. I don't know. You probably know what the ratio is of API cost to effective token cost when you buy a CloudMax subscription and max it out. And obviously in the, you know, intervening time, since we spoke last, there's been the whole open call phenomenon. And that's at its own, you know, bunch of drama with you can, you can't, you sort of can, we got to pay the API price. We're lowering our limits. We're, you know, buying compute from XAI.

24:08We're raising our limits back again. What, what has it been like from your perspective to be building on a platform that you're also sort of competing with that is kind of undercutting you to various levels at different times on their pricing? Yeah, it's definitely a, it's definitely an interesting relationship. So like on the one hand, the models are amazing, right? They're super good for our use case. And like their, their team has been like really helpful and responsive. And like, we talked to them on a very regular basis and they

24:38are, you know, they're trying hard to support us and we get early access to stuff and they take our feedback and all that. So like, you know, they're definitely totally enabling our business, making it happen and like working hard to do it, which is wonderful. So like, they're a great partner. I don't want anyone to think I think otherwise. But at the same time, if you look at our stats of like, when someone churns off task, where did they go? Like 80% of the users go to an anthropic product. So they are like a very direct competitor. And I, you know, I think there's different use cases

25:11where like we shine or their products shine, but it's clear that they're very direct competitor. And the number one reason that they do that is because they already have a max plan and they don't want to have to spend additional money on tasks. And so basically, you know, every time they release a new model update, we're like, this is great. This is awesome. Right. And every time they send an email being like, now your max plan is even cooler. We're like, crap. Right. This is just going to make it harder. And so, and they definitely, they definitely subsidize it. And it definitely has set some distorted expectations with folks around like what you can get for a certain price. And so it's, cost is like a constant,

25:46constant struggle for us to try to, you know, help users use the products more efficiently and help them understand, you know, Hey, we're, we're, we're actually working at like some pretty raised within margins here and trying to make this cheap for you guys. So yeah, it's an interesting dance for sure. Do you know what the ratio is or is it opaque even to you? I don't know what the ratio is. No. Yeah. Interesting. It feels like it's not insignificant. Like I, I, my intuitive gut

26:16guess would be, it's like five to one or something, but that would be, that would be my guess too. Like five to one or maybe, maybe even more. It does seem like pretty substantial. Yeah. Yeah. That's a big, that's a big deal. So, okay. One more thing on the anthropic dance. And then

26:35obviously this gives you, you know, a lot of incentive to broaden out and try to position yourself a little bit differently. How do you think about, obviously they have an inside lane when it comes to building product experiences that make the most of their model's capabilities. I mean, they're, you know, increasingly we're sort of seeing like the model is being trained in the first party harness. And I have another question on the, on the word harness. And, you know, if that's even the right paradigm to be thinking about this anymore,

27:06but how do you kind of position yourself to, you said like, there's something, some use cases where you feel like task exceeds what you get, you know, out of the first party cloud products. Like, I guess for one thing, like, how is that even possible? And how do you think about trying to compete with what they themselves are going to build given, you know, all the inside knowledge and advanced prep and, you know, kind of close coupling advantages that they have? At a high level, I kind of think that everyone is building the same thing.

27:40Um, like, like you have all of these just an agent companies and basically over time, as the models get smarter and the agents built in more sort of general purpose tools, you know, computer use and file systems and whatnot, you can do very similar things in many things. So you can go into cloud code and you can do all kinds of non-coding things in cloud code. You can end, you know, codex and cloud code and like many other startup products are all, you know, able to do coding and non-coding things like pretty well. I think where you start to differentiate, and I do think you can

28:12differentiate within the space to some extent is really around like what you're optimizing for and what the ergonomics are. So in our case, take task that you can totally like write code with tasks, like you can hook it up your GitHub and you can have it like generate PRs and it does it just fine. And like we do this for marketing, for example, if we put on a new blog post or whatever, like I, I, I write the content in task lit and then I just have it generate its own PRs and it works fine, but it's not going to be as smart and definitely not as cost effective for like heavy duty coding as going and using a, like an actual coding harness. And it's definitely not going to be

28:46as nice to use because the actual coding harness is like going to be in like, you know, conductor or something that's like designed for like a coding workflow. And our product is like not set up that way. So I, I see a future where you can kind of pick up any agent and kind of do anything. But different ones are going to have sort of different like cost and performance trade-offs and different ones are going to have like just different ergonomics for like the different types of work. Where we really shine is 24-7 automation of knowledge work for companies, especially knowledge

29:19work for companies that is not like your personal work, but like something that the company owns. So if you have, you know, simple example, you have some like complex invoicing process, say your company, you don't want to be running that like your local co-work, right? You'd like, if you close your laptop and like the company can't invoice people anywhere, that's bad. And, you know, you don't want to like put that in open claw and put it on your Mac mini in the corner. Because again, if something like trips over the power cord, like, well, you can't run your invoicing. What you really want is something that's running in the cloud and is manageable by many

29:50people. And you have a lot of infrastructure around it to like manage and provide oversight and like, you know, have audit logs and you have guardrails around the thing and you can control cough and your different agents. And so there's a lot of like kind of team enablement features you care a lot about. And that's where we really shine. And a lot of the work to make that work well is actually kind of fundamental to the way the agent is built. So I talked about our context manager. The reason our context manager is built the way it is, is because you want to have triggers as sort of regular messages into the agent, which means these agents, if you have an agent that's like

30:23running a trigger, every time you get an email, that agent might fire, you know, 10,000 times this year. And so you need an agent that can fire 10,000 times and like still remember things at the beginning of the chat and still behave in a reasonable way. And that's a pretty different thing to optimize for than like a coding session. Like the types of, you know, the way Claude codes are like resets context, like makes total sense in a coding environment. It doesn't really make sense in a world where it's like processing all your emails. So that's how I see us differentiating. The other, the other note,

30:53a couple of notes I want to make here on differentiation is one, the market is just freaking huge. So if you look at coding agents and you might say like, oh, clearly Claude code and codecs have won, but like cursor is going to sell for like $60 billion. And even, you know, even the fourth and fifth and six, you know, like, you know, uh, cognition's doing just fine. And factory is doing just fine. Like even windsurf that like, you know, had to sell, it was like pretty good exit. So, you know, if we end up, you know, I'd love to be the, the, the number one here,

31:25but if we ended up being like number four or five or six, like they could still be a very, very significant exit. And then the last thing I want to note is, and I think this is probably the most important point. When we go and pitch a business, what, what we're trying to help them do is deploy AI for real inside their company to like automate stuff. And those like typical companies, they don't want to have to spend all their time researching AI models and like placing bets on which lab is going to win. Right. They want to choose a platform that's going to serve them well,

31:59and they want to benefit from everybody's improvements over time. And so we can go in there and say, hey, a bet on us is not a bet on anthropic or open AI or anyone else. A bet on us and a bet on everybody. We're going to give you, you know, anthropic models and open AI models and Google's and all Google and all the open source models. And then we will be a neutral, like arbiter of what you use. And so if, you know, to the extent that we can build features to help you choose the right model for the job and like optimize your costs across the different things, like you can trust us because none of these are ours, right? Like we're getting the same margins on everything. And so, you know,

32:32we're a neutral party versus if you go back to anthropic, you know, right now it's just anthropic products, even if they decided to like, you know, provide other models to their products, which they could, although I don't think they're going to, but they could. I don't know if you'd really trust them to do that in a neutral way. So I think, and I think that's a pretty, pretty compelling part of our sales pitch. Yeah. I think you've maybe navigated this about as well as anyone could in the sense that betting on anthropic and kind of going all in on whatever the best model is, which has been clawed

33:08to make it work as well as possible while the capabilities curve was getting to critical thresholds and then kind of pivoting to being a more neutral abstraction above the model layer. Now that there are multiple options that seem like they're able to deliver the kind of performance that people want. It wasn't obvious to, I don't know if it was obvious to you that that's how it was always going to play out, but I wouldn't say it was obvious to me. I think I would have said about, you know, your kind of position six months ago, like, yikes, it is pretty tenuous to be all in

33:42on Claude. But I think you kind of timed it pretty well on a couple different levels. So how much would you say that's foresight and genius and how much is good luck? Yeah, I'm glad you think so. This was very much the plan. And I do think it's worked out really well. So yeah, I know I'm happy. I'm happy with how it's turned out. Hey, we'll continue our interview in a moment after a word from our sponsors. Visual AI is the ability for your software to not just store pixels, but to actually

34:14understand what it's looking at. One of our partners, Roboflow, is the company making this happen. They've built an end-to-end platform that makes it incredibly easy to go from a raw idea to a fully deployed application in just a few hours. For example, just look at Blueprint Pro. They built an app to solve a major construction industry headache. They're using AI to instantly understand a floor plan. This was literally impossible just 24 months ago. But now that Visual Artificial Intelligence is accessible, thanks to Roboflow, there are tons of new companies

34:46being built. Go to Roboflow.com to read the full Blueprint Pro story and see how over a million engineers are building the next wave of visual AI. That's Roboflow.com. Today's episode is brought to you by Anthropic, makers of Claude and Claude Code. Over the last few months, Claude has helped me build and refine a personal deep context database that now contains all of my emails, Slack messages, tweets, DMs across platforms, video calls, and podcast transcripts going back a full five years. On top of that,

35:21we've now layered summary articles describing my relationship with hundreds of contacts, organizations, and ideas. And now that this exists, there's almost nothing that Claude can't help with. For tax season, I asked Claude to help me get organized. It went through my inbox, tracked down 1099s for all 10 of my part-time jobs, and built me a comprehensive report on my expenses and donations. For my angel investing, Claude can now draft investment memos in exactly the form that my venture fund requires, based on the calls I've had and the emails I've exchanged with the founders.

35:56And when someone needs a favor, Claude can often do it as well as I can. Recently, a friend reached out to ask if I know anyone who might be a fit for a role that he is currently hiring for. Initially, nobody came to mind, but then I thought to ask Claude, and sure enough, it identified two great leads. Claude is the AI for minds that don't stop at good enough. It's the collaborator that actually understands your entire workflow and thinks with you. Whether you're debugging code at midnight or strategizing your next business move, Claude extends your thinking to tackle the problems that

36:30matter. So, for problems worth solving, get started with Claude at claude.ai.tcr. That's claude.ai.tcr. And check out Claude Pro, which includes all of the features mentioned in today's episode. Once more, that's claude.ai.tcr. So, okay, let's do this harness thing for a second. The word harness itself just always makes me think of trying to control and direct some sort of wild animal to get useful work out of it when it might

37:07rather be doing something else. I just took a long road trip with my kids in a Tesla FSD-enabled car over the last 10 days. We went to a lot of historical sites, the juxtaposition of horses, and my FSD was pretty funny. But I kind of think of trying to get this unruly animal that is a model to stay on track and do what you want it to do. These days, as you said, there's a lot more ability to give the model hints and say, like, here's a file system, and you can kind of go get what you need

37:40here. And I'm starting to feel like the concept of a harness is maybe a little anachronistic already. And maybe what we're doing now is more saying, like, here's the world you get to play in. It's not so much about trying to narrow what the model can do, but more about broadening what it can do. How do you think about that sort of like narrowing and focusing versus kind of broadening, giving access, you know, unlocking new possibilities, which, you know, might in some

38:11cases even surprise users, given the model capabilities we have now? I hadn't actually thought of a harness as being like a constraining thing. But like, yeah, you kind of make a good a good point of that would be the normal way you'd think about that word. I kind of think of it as like a mecha suit, right? Like, I agree with your thesis is like, the goal is to like, let that agent or let that LLM like actually do things in the world. And to do that, it's going to need storage, it needs compute, it needs to be able to reach out and connect to APIs,

38:41it needs to be able to talk to the user. So, and there's, there's a lot there. I think when I talk to people who are not like deep into the harness world, I think most people assume when they play with an LLM product, that it's a very sort of raw thing on top of the model and like you type the thing and they send it and like what they see on the screen is just being like sent to the model and the models doing everything. And that is becoming increasingly less true. And the complexity of the code that sort of like translating what you see for the LLM calls is

39:12getting more and more. And I think that's going to keep going. I think the sophistication of these harnesses is going to get like, just 10 times the complex. But I think there's gonna be some pretty major breakthroughs here, that increase the capabilities of these things like, like pretty substantially in the way that we, you know, handle memories and the way that we handle like, like oversight and control, and the way we connect other tools. So I'm very bullish on the opportunity here. And I think these things are just gonna like get more and more complicated. And yeah, I don't know, maybe we need a name. Maybe it's, maybe it, yeah, maybe it's like a mecha suit and

39:44it's not like a, like a harness. Yeah. How much do you think, so this is, I think one of the more interesting debates right now in the AI builder community broadly, what matters more model or harness? And I think you see pretty extreme positions on both ends where I see, I get emails that are like, models don't matter anymore. It's all about the harness and vice versa. And obviously either of

40:14those like extreme positions is not going to be right. But I guess I have historically come down somewhat informed. I don't know if you've seen this graph from the UK AI safety, AI security, I should say Institute, where they do a capability plot over time with the minimalist harness, you know, whatever kind of basic vanilla thing, and then the best available harness. And of course, you know, both are going up. A year ago, though, the time delta between what level of capability you

40:50could get with the best available harness versus the vanilla harness was longer. And now it's gotten shorter. Somewhat of that, some of that is maybe just due to more frequent model releases, which is like shortening every, you know, window of advantage. Some of it is maybe because the models are getting more deeply trained to use harnesses. And so, you know, they're, they're just good at it out of the box. You don't have to compensate for their weaknesses so much. But I guess my, my overall summary would be, it seems like I would say models seem to matter more and you can't get

41:25that. You can't like, how, how much can I live in the future with the best available harness for any given model? It seems like it's not a huge amount, but it sounds like you maybe see that differently. So like, what's the case that that's, if you do, what's the case that that's wrong? I think as models get better, the, like, they can replace good harnesses. Um, like there is, uh, you know, a model today with a crappy harness is gonna be better than a model from a year ago with a really good harness. I agree with that. And I think that trend is going to continue,

41:58but I also think the effects are multiplicative, right. And the, and they're orthogonal, you know, disciplines and like, there's no reason not to take the best model and put in the best harness. And I think we should, um, you might argue that like, oh, like given like the exponential that actually only buys us six months or something. Um, and okay, fine, but it's six months. Um, but I think more importantly, what you're getting, like the, the, the metric, the only metric that matter is not intelligence, right? Like in these real production systems, intelligence is one piece,

42:28but, uh, you know, take, take Taskade, for example, um, much of what we do is, is automating specific workflows. Once the model plus harness is smart enough to like, I don't know, do, you know, order less lunch every day, which it does. And we've been able to do that for like six months. We're not going in there and messing with it very much. Like incremental improvements to intelligence don't really matter, but, um, performance and cost do. And so if you look at the harness and say, Hey, the only point is to like make the thing smarter, like fine. It buys you a fixed amount of time over the, over the model exponential, which is like, maybe cool,

43:02but not amazing. Um, but it might make a significant difference in, in cost and, um, you know, other attributes, cost and reliability and the ability to like do oversight and, and, and speed. Uh, and I think those things matter a ton for commercial products. So like in our case with our harness, like the benefits you get are, you know, you have a nice UI and like the sidebar pops out at the right time to like show you things and you get, you know,

43:32nice indications of like working states. You can like see what it's doing at the time and you get the ability to like have things persisted across long periods of time and you get nice performance trade-offs and cost trade-offs. And, uh, I think those things should not be underestimated for like a commercial product. Yeah. If you can make work with Haiku instead of Opus, for example, that, uh, moves the needle quite a bit or so. Yeah. Especially in a compute scarce world, which we increasingly, um, seem to be in. Okay. So if I could, if I can interject here,

44:06like as a good example of, and I don't even know if you've called this a harness, but like, you see what, what Anthropic is doing with their, I think they call it like a supervisor agent or

44:17forget exactly the language they use, but basically they have a system where you can inject a tool that allows a smaller model to call up into a bigger model. And this is a relatively new thing that they've been talking about. And like, you basically can get close to the bigger models performance, but like do the vast, vast majority of work on a smaller model. And like, that's a huge, that's a huge win. And like, if you have those capabilities, why not? Yeah. Yeah. That makes sense. So when you think about the best available harness and what that looks like, especially as you go to a

44:48multi-provider paradigm, how much do you think you're going to be building a harness per model versus trying to keep everything the same across models? Traditionally, one would think like, no way we can build this, you know, complicated product across, you know, in a bespoke way for all these different models, we've got to keep it consistent, but obviously the old rules don't apply anymore. So what's your strategy? Like how much do you tailor the harness to each new model that you

45:22want to launch? Yeah, this is actually, this is very much on my mind right now. Um, I think ideally as little as possible because we want to support a lot of models and like, it's hard to have a thing that, you know, it's hard to maintain across both balls, but we also want to have the ability for these agents to like switch between models. And so if you're like, Hey, you can run an opus this way and it persists this state, but then you switch over the same agent to some other model and suddenly you have to find a way to like translate things. It gets really complicated. So we'd like to keep them as similar as possible. I think so far we've been able to, uh, and our approach has been like, you

45:56know, maybe we'll make some prompting tweaks that'll try to like address issues in one model. We'll like not breaking it in the other model. And I think, I think so far that's mostly, that's mostly worked. I think over time, the APIs of these things have converged and the basic capabilities of these things could have converged. So my hope is that we'll get easier over time, not harder, but I could see us having some, some like very model specific harness things potentially. And I'm thinking about ways to do that in like a really modular way. So it's like not a huge amount of overhead, but yeah, definitely something on my mind.

46:30Even beyond model capabilities, you alluded earlier to caching primitives being different across providers. So presumably on that level, at a minimum, you kind of have no choice, but to, if not, I mean, you maybe could have the same context, but you're going to have some sort of different implementation, right. For, for certain things that are just different, that are inseparable from the models. Yep. Yeah. Like in the example of Anthropic and OpenAI for OpenAI, they have a very simple caching API,

47:01which is basically like, they'll just cache any prefix for 24 hours, and they do it automatically. Anthropic has a much more explicit caching API, and you can only cache four points in your, in your call. And there's a lot more code kind of making it happen. So in this case, we're kind of lucky in that, like, once you've done the work to make Anthropic work, making OpenAI work is pretty easy. But yeah, in that case, we do have different code to sort of translate our context to like a cacheable context in each case.

47:31Other, so you mentioned, I think five providers, Anthropic, OpenAI, Gemini, DeepSeek, and Kimi. Not on that list were Grok, whatever the new meta models are called, and GLM or Minimax. Like, are there any other, how are you kind of, were you drawing the line? How are you thinking about who's in and who's out?

47:58It is so hard to stay up to date on this stuff. We have the ability internally to like test models pretty quickly. It's hard to actually ship things in production, because for example, like the way thinking blocks work is different across different providers. And like, if you have bugs, you might have to tune the prompts and things. So we haven't shipped that many, but like, we've tested GLM internally. We've tested Google models, Kimi, DeepSeek, probably some others I'm not thinking of. I think right now, and like, most of this is initially vibes, right? Like you go in

48:32there and you play around with it. And you're like, is this close enough to the frontier that like, you know, we want to put some effort in here? And usually the answer is no. Um, and I think the ones that we've like Kimi, DeepSeek and, and Google, uh, and plus obviously open AI are the ones where like, okay, actually, this is like pretty close to the frontier. So like it's worth doing. Um, but there'll probably be others in the list. I have not been paying a huge amount of attention to Grok. Maybe I should be paying more attention to, to, to them. Um, I don't hear a lot of other developers using their models. Um,

49:09but they sure, they sure seem to be investing a lot. So I don't know, maybe that'll change.

49:15Yeah, I, we can't, uh, in my view, we can't count Elon out of any race until he bows at himself. So, but I would also agree. I don't use it much. Um, I just had occasion to use it a fair amount while riding in the Tesla over the last week. And it's not bad, you know, and the voice mode is pretty good. Um, definitely still feels a little, that's also part of, you know, it's, it's not just the model. It's also the integration, but I would say my experience using Grok in the console of the Tesla is definitely rougher, you know, than my experience using

49:50Anthropic and open AI and, and Google models. Yeah. Our users are mighty good. Our users are pretty good at like being savvy about this stuff. Not everyone, but there's enough users that try this stuff that we start to see requests. So, you know, I remember back in the day when, you know, this was pre past four wave days, when we were using the GPT four at the time, we thought we were on the best model. Like we best bunch of stuff. And like, we started to get, you know, within a very short amount of time of

50:21about three, five coming out, we started getting a bunch of people emailing us to be like, why are you guys on this old model? And we're like, ah, these people are just misinformed, right? Like we're on the best, we're on the best model out there. Uh, it turns out they were totally right. So, you know, we also kind of walked through our users and said, I have not yet to see a user being like, Hey, you got to get on Grok. That's the, that's the most modern model. Although some people have asked for open AI stuff. What do you think things are most likely to diverge? This is another big question. You had said a minute ago that broadly think things are kind of converging, uh, in terms of capabilities,

50:58which, you know, hopefully makes it manageably complex for you to support all these different providers. I do hear also the other narrative that we're starting to see more and more meaningful differentiation. And I honestly don't know which is right. I sometimes feel both ways myself, but if you had to kind of zoom in on particular areas where you think models would most likely meaningfully diverge over the next period, what would that be? One candidate that comes to mind

51:30for me is like how sub agent and kind of team, you know, delegation across instances sort of works. Like that seems like nobody's really, I guess one, one kind of meta point would be like things that nobody's really figured out yet might be the place where people are going to take the most different strategies and then they'll kind of converge once there's a winner. But right now it doesn't seem like anybody's got a super awesome way to have like many different instances of a model work together.

52:01So that's like one idea, but you know, what's on your mind is kind of where they're most likely to be majorly different. Within the major labs, I think everything I've seen tells me that they are, they are converging and that they are converging because they're watching each other. So, you know, take, take Opus 4.7. Like, I think basically what happened, this is my, my flippant, my flippant response here is that, you know, they started to realize that like codex was better than clogged code for many things. And they were like, Hey, how do we make our model more like codex? And they like made

52:34a bunch of RL tweaks to like make it, you know, have a bit of a different personality and make it a little more precise. And then 4.7, like kind of feels a bit more like talking to codex. And then I think, you know, when, when codex got good, it was because of, you know, improvements to the models over the eye side. And I think they were watching Amphropic and being like, oh man, cloud code got really good at ready code. How do we do that? So it seems to me that those two labs are watching each other and trying to mirror each other. And like, you know, 5.5 is like much

53:07better at like a general purpose, long form agent to tool calling. And I think it's because they're watching with their shoulders. So at least those two labs, I think are just like watching each other very closely. And I see like kind of a back and forth there. There, I am excited about like

53:25the number of Neolabs that have raised a lot of money that are doing like totally different stuff. And it would be awesome if somebody came out of left field with like a totally different approach. So like, I don't know if you've, if you've learned anything about JEPA, like Yann LeCun's thing. I watched, I finally watched like a long form video on it yesterday. And it's, it, it seems really fascinating. It seems like quite different. And I have no idea if it's going to pan out, but there's, you know, a billion dollars riding on the idea that like this, like totally different approach

53:58to LLM's is going to, it's going to pan out. And that's, you know, I guess we'll find out. And then, you know, but then there's like flapping airplanes who like have a, have an approach to like, let's have a, use a lot less data. So it feels to me like all the big Neolabs really are kind of watching, sorry, all the big major labs are like really watching over each other's shoulders. And there's a bunch of Neolabs, like trying like totally radically different stuff. So that's, that's kind of how I see the lay of the land. So convergence, unless somebody manages to shake the snow globe with some sort of algorithmic

54:32insight driven breakthrough. That that's my guess on the harness side, actually, I want to know, I think the harnesses are also kind of converging in terms of capabilities and, and largely that's because like, turns out the best harnesses just do like low level primitives, right? So in our case, you know, we don't have like super specific stuff for like doing email, like we have a file system and a database and a shell and, you know, a browser that it can use and, you know, some simple primitives around like writing to do's and setting up triggers, but it's all like very low levels up.

55:05There's nothing sort of workflow specific in there. And I think that's the right approach. The places where we differentiate are, are not around capabilities, but more around, you know, like cost and ergonomics and, and, and speed are kind of the differentiators. So you mentioned having signed a deal with open AI. I'm sure, you know, the precise details of that are, um, are under an NDA or whatever. But one thing I'm kind of interested in watching is like a point of apparent divergence is the way they are positioning themselves with respect to products

55:41like task lit and also open source, uh, toolkits like open claw, where open AI seems to be really leaning into, you can use your core open AI account in these other contexts. So I guess, what is that going to look like? And how is that going to complicate life for you? I mean, for one thing, if I can log in with open AI and bring my own tokens that like totally changes your pricing model, right? Because now you've got a sort of more like a traditional SAS type of business

56:14where the intelligence cogs are not like flowing through you. I don't know exactly where they are on that though. I know that they like allow me to do it with open claw. I haven't seen too many other things around the web. I've honestly expected it to come sooner. I think maybe they were just compute constrained enough that they didn't prioritize it, but I've learned that like compute constrained is like a good answer for anything. Sometimes it's real. Maybe sometimes it's not, but it's certainly, um, it passes muster as an answer. So, you know, should we expect a future where I come to task and I can just like connect my open AI account, bring my tokens and, and how will that

56:50change the, you know, the, how will that complicate or change what you're doing?

56:57Yeah, I, uh, it's a good question. Uh, obviously Anthropic has decided to go the exact opposite direction of that. And, you know, I'm glad we weren't in that situation as they were like cutting off people's API access. Um, I don't know, right? Like, I guess we kind of want to see how this plays out. And if this is something that is popular and we feel like opening is going to do for a long time, like, it totally makes sense for us to integrate and let people use their tokens. And I do think we provide a lot more value than just like being a token reseller. So I don't think it's a necessarily a threat, but it could be like a nice kind of onboarding

57:28experience for folks. Um, from a, from a competitive position, like, you know, is, is there concern here? Is it like opening eyes and like only use a relationship? And, you know, if they, if they already have an opening account, why do they have account with us? I think we are maybe a little more concerned now than we used to be. So up until they killed off Sora, I don't know if you remember the big leak around Sora. The impression that we had gotten was they were very focused on the models.

58:01They're very focused on consumer, but they weren't really very focused on like business productivity. And you can see that with like, in my opinion, with like agent kit, when they came out with it, uh, last fall, um, it didn't really feel like they were bringing their a game. Um, and we thought, great, like, you know, we're competing hard with Anthropic, but open AI, they're focused on consumer and models and like, we can kind of run it, run with it for a while on this front. And when they killed off Sora and they had that leak around like, Hey, we're, we're going after business productivity. The kind of scenario that we were worried about or are worried about a bit

58:36is like basically what happened with codex where codex went from kind of an also ran to like, arguably the best coding agent in a relatively short amount of time. And so if they've brought, you know, their a players over to focus on this stuff, and it seems like a very potentially competitive area, like they might start to compete with us in a real big way. Um, that said, we have seen none of this so far. Like I, I have yet to talk to a customer who's like, I left task lit to go and use like open AI products. Um, so we'll, we'll see if that actually shows up, but, uh, it could.

59:09Yeah. The whole, there's so many strange alliances and kind of, um, strange bedfellows and, you know, co-operatition all the weirdest to me is, is the, the weirdest to me is the entropic spacex announcement after, after Elon, you know, badmouthing that and clearly them competing very hard and then doing this big commercial deal. So it's, it's a weird time to be doing deals. Yeah, no doubt. I love to see that for what it's worth. My, I thought Elon was just,

59:42I mean, I've mixed feelings certainly about entropic. I, um, you know, echo all the positive things you said earlier. I do think their work also on the safety front on multiple, you know, subfronts of the safety front are second to none. And that's pretty much uncontested, you know, the constitution, I, you know, uh, only a slight exaggeration to say I almost cried when I read it, because I really think that's like a beautiful document. The interpretability work that they do is, you know, is, is amazing. And yet, you know, if somebody launches a recursive self-improvement

1:00:15loop that gets out of control, I would have to say like, they're probably the most likely candidate to do it at this point. So it's a very weird thing, but I do love to see closer ties between the leading companies because if nothing else, it just takes the edge off the competition a little bit, right? I mean, to the degree that they can sort of share in each other's success, even on a marginal basis is for me, like, uh, that's a, a huge win. So I encourage, you know, all these, as much as it's weird, I encourage all the sort of, you know, tying, uh, of cap tables together.

1:00:48And, um, you know, just we're all, I think we're all going to rise or sink together is kind of my, my bottom line for humanity. So let's, let's start to make those deals, um, in anticipation of, of that reality. And, um, you know, I think that'll probably in the end serve us pretty well. Anyway. Okay. That's just an aside editorial. One thing that has been counter narrative recently, you, I'm sure you've seen the and in labs guys that do vending bench. And then now they've launched a couple of actual like brick and mortar, real world retail stores managed by AI

1:01:24models. They've got the retail store in San Francisco that's operated by Claude. They've got a cafe in Stockholm that's operated by Gemini. And a huge surprise was they said 5.5 is what they called clean in the way that it runs its business. Whereas Opus 4.6 and 4.7, they've described as ruthless, like being willing to lie to suppliers, you know, do sort of stuff that's not necessarily illegal,

1:01:57but like questionable, you know, in pursuit of its goal, uh, where 5.5, they said they didn't see any sign of that. Do you have any interesting commentary on kind of the character of models? And, and is this something that you have to take into account as you build? Like you could imagine if one models ruthless or, you know, willing to cut corners and others clean, that that very well could impact like what sort of supervisory systems, whatever you might want to have in, in the harness. So

1:02:27yeah. Any observations, any plans on that front? I had not heard that particular note from them, but, uh, and this is all like purely anecdotal. I've not done any, any research here just by own experiences with it, but it kind of doesn't surprise me. I think I experienced with the anthropic models is they are like much more creative, much more empathetic. They kind of like understand the human experience better. And the, the open AI models are like a bit more clinical and that comes with its pros and cons. And I guess it doesn't surprise me that the one that like

1:02:59kind of understands humanity is also the one that like maybe shows some of the worst traits. I don't think we, we have not run into any problems here, right. That, that I am aware, like no user has been like, Hey, this thing went and did something unethical. So nothing's cropped up here, but the personalities that, that, that aligns with kind of my experience too.

1:03:18Yeah. That's interesting. So they're, they're the most creature like for better or possibly for worse. Um, I'd say, okay, so one big thing that, and I'm using everything, right? I've got a, uh, uh, task at max account that I'm maxing out. I've got a cloud code max. That is kind of my, on my laptop, you know, terminal thing. I do have the Mac mini that's sitting over on this side. That's got, um, another cloud code and an open claw. And I'm really interested in

1:03:52context beyond the single agent. So this is kind of a, I think a frontier for you, but maybe not. Dave, I'm not sure if it's something you feel is as important as it has been in kind of my own personal hacking. Do you think that you're going to need to build a sort of second brain type of feature for users that sits at a level? That's like, I guess you can think of it as above or below the individual agents, but sort of gives the broader context, right? I've got 10 tasks that agents

1:04:23running for the most part, they kind of stay in their lane. They may access some of the same context via tool calls, but they don't have like a shared meta state. That's like, here's Nathan. And here's all the things he's trying to do. Here's what he cares about. You know, here's the people in his life. In case you run into these people, you know, you can kind of know what's up. And obviously that's really important at organizations too, right? The sort of general situational awareness of like, who's on the team? What are our priorities? Like, what did we say no to in the past? Is that something that you aspire to tackle?

1:04:57Yeah. And I swear to your listeners that I didn't like prime you to ask this one. So yes, absolutely. We actually have some organizational features that are kind of the starting the start of this live in the product today. We just haven't announced them yet. So if you go and look in your settings, you may see like massive, like organizations and workspaces, and there's some stuff that you can configure in there. We've been laying the foundation for what you described for quite a while. And we're going to have like, you know, a launch, a bunch of fanfare,

1:05:28and there'll be some stuff on Twitter, like when we feel like it's really ready to talk about, which hasn't happened yet. But you actually can use it now if you want. You can go invite your team and you can get them on here. And the way that we're thinking about it is there's like kind of a hierarchy of context where if you're in an organization, some things are at the organizational level, right? So you might have like, well, what is our company? And what does it do? And what's its mission statement? What are its values? And like, you know, some basic

1:05:59things that like you want to control at the organizational level. And so you might set some context there. And then you have additional context that might happen at the team level where you say, hey, this, you know, the marketing team, they have access to these resources. They have, you know, these goals, you know, these are the OKRs for the quarter. This is, you know, here are some skills that define like the various business processes that we have. Here are some files that are important to consider when doing different things. Here's our, you know, our brand voice or whatever. And then in the individual agents, you have very specific things of like, hey, this is the,

1:06:33you know, the plan for running this particular workflow. This is a file that was uploaded to this agent. This is the instructions that, you know, someone gave me specifically for this conversation. So it's like organization is like company level stuff, workspace is like team level stuff. And then the agent has like stuff for the specific workflow. And we're kind of building everything around this. And today, most of the work has gone into the agent. We have at the workspace level, the only context that we have shared today is your connections. And this is actually super powerful. So if you have a company where you wanted to have like, you know, the lead on your

1:07:06team, go and configure connections with all the API keys and headers and whatever to like connect to your stuff so they can hook up your, you know, your API access and then give that to other users. So every, you know, someone who's new comes to the team and they don't have to like find all the API keys. They can just like go in and start talking to agents right away and like already knows how to connect to stuff like that's super powerful. So that is this today. But we want to add in shared skills, we want to add in some form of like cross agent memory. So like, you know, if I talk to one agent, and I explain something to it, it should be able to kind of remember that

1:07:36for other agents, we want to add in probably some shared file system stuff. So you can have documents that are, you know, available across any agent that you know, and you can do that now if you like connected to Google Drive or something, but we can probably make it like a much nicer kind of like, like, like native experience here. So that stuff, that stuff's all coming. And I think like, yeah, like, like shared brain is maybe the right to look at it. This is like literally Zapier launched, I don't know if you saw their product, the launch the other day, which was like, I think they called it shared brain. And I think a lot of what they announced is like very in line

1:08:11with the vision that we have as well. And I think I haven't tried with it. My hunch is they are farther on the brain side, but the agents are not as good. That's just my hunch. And hopefully, you know, hopefully we can, you know, catch up and surpass on the brain side and like maintain like a lead on the agent side as well versus that. But yeah, the huge priority for us and very excited about what we can do here. Yeah, okay, cool. I guess maybe let's do a zoom out. And then we can end with kind of a lightning round of just some

1:08:48like, uh, lower level esoterica type stuff that, um, you know, the real ones will want to hear about, but, but, uh, not necessarily, uh, as important as the big picture. Where is this all going? I mean, we're, we're in this weird transition point where, and I guess a couple of dimensions, right? You've got like, we've talked about computer use a couple of times and you've kind of bundled in command line style computer use with UI based computer, UI mediated computer use. And that feels like its own sort of

1:09:27paradigm shift, you know, happening under one label, right? Where it's like, everything is kind of going headless, but at the same time, the models are getting really good at using UIs. And so like, which is going to win, are all UIs going to go away or are the models just can be really good at them? And, you know, maybe it's both. Um, and then I guess similarly, like you mentioned, everybody's kind of competing to, to build the same thing. And I feel like that I've,

1:10:00I've never felt that as strongly as I do right now, where you could probably name, you know, 10,000 companies that are in some, like, not super indirect way competitive, right? Like you're competing with Claude, but you're also competing with like MS Word and you're competing with Zapier and you're competing with like everything under the sun and, and you're compared to like human labor. Yeah. It's, it's endless. So how do you like conceptualize where this is all headed?

1:10:32What's the big vision? You know, where are we 18 months from now, just before the singularity hits? So a year ago, right before we, we started the pivot, one of the, like the big thing that we were seeing was, and for context, for, for people who, who maybe don't know, we had a product called shortwave, which was an AI email client. We still, we still have it actually, but it's not the focus of the company anymore. And we had this really nice embedded agent inside and you can do like really cool email stuff. And we realized that it wasn't going to be too long before you could take a product

1:11:08like, you know, ChatGPT and you could say, show me my inbox. And it was just like generate a UI for your email on the spot. And once that worked well, you wouldn't need an AI email client, right? Because the whole email part would go away. So our entire concept of differentiation where we're like, Hey, we're going to embed this agent inside it, like a custom built UI that had a shelf life. And it's, the product is actually still growing and still, it's still doing reasonably well, but like in 10

1:11:39years, I don't think it's going to be around probably, probably much less than 10 years. I don't think it's going to be around at least not in this form. So we said, man, we, we can't build a business around an AI agent embedded in the UI. We need to, we need to do something else. And so we said, Hey, we're going to build like a very general purpose agent that isn't relying on this. And we're going to go after doing an agent for a specific type of workflow, or like these sort of like knowledge work trigger based knowledge workflow. So then we built the thing we launched in October and the feedback from people was like, Hey, we don't want to have like one tool for workflow

1:12:15automation and another tool for doing our day-to-day work because we want them all to have the same context. So I don't want to have to like maintain two systems where they both have all the, all the stuff from the shared brain. I just want to have one system. And so we said, okay, I guess we need to do not just the workflow stuff, but we need to do the synchronous stuff as well. And again, you know, when we pivoted out of email, it was like, okay, well, actually there's going to be some like more general product that's going to encompass this stuff. And then again, it was like, Oh, I guess there'd be some more general product products that's going to encompass this stuff. And we, in March, we launched our instant apps feature, which is basically a

1:12:48generative UI feature. So the idea is what if you could generate any UI you want that hooks up to any of the data in any of your connections and just works instantly in a single prompt. You can like one shot, anything turns out this works really well. Like this is a super popular feature. Our team just uses the crap out of it. So for example, if we do any sort of data science work, we're no longer like going into like the big query UI or like creating, you know, using dashboard tools, like we just go into tasklet and we're like generate an explorer

1:13:19dashboard to help us analyze, you know, how these pricing changes would affect our users. And if we'll just make a thing and there'll be like toggles and you can tweak the thresholds and things. And like, it just, it's, it works. It's amazing. And we said, man, that vision, that fear that we had, you know, a year ago about what would happen with email, like that's actually here. Like you could go into tasklet today and say, give me an email UI that works. And it'll, it will, and it'll work. And you can do your inbox in a UI inside tasklet. It's not as good as shortwave yet, but it's not going to be that long. So I think that the timeline of these things has actually been like much faster than we expected.

1:13:53And it's clear that like each area where we're like, we feel like there can be differentiation is like falling away. And so I'm looking forward and I see no reason why this isn't going to continue this trend of basically like the general purpose tool continuing. And this is all driven by the fact that the model is a general purpose. So if, if all of the model, like the best model is best at everything, um, which I think is increasingly true due to, you know, for economic reasons, essentially, uh, I think the best harness is going to be intelligent at everything.

1:14:25There'll be some differences in ergonomics, but intelligent everything. And we, we basically need to assume that the number of the number of these products that win is going to be relatively small. Like, I don't think we're going to have many, many, many tools that all have a, you know, AI embedded them. I think we're going to have a few very horizontal platforms. And what we're trying to do is be the AI agent platform that replaces your SAS products for knowledge workers.

1:14:58So rather than, you know, today, the way most knowledge workers work is like they're switching between tabs or they're switching between apps and in their dock. And they're like, you know, sometimes they're using, they're using, you know, word. And then sometimes they're using notion and then sometimes they're using linear. And then sometimes they're using Gmail and they're going from tab to tab to tab for different things. And we think that entire world is going away. Instead, you're going to have one app that has a UI. It's going to be your AI agent, hopefully it's tasklet. If you want to do some, you know, if you want to access some data from one

1:15:30of these tools, you connect through it through API. If you want to do some interesting analysis, that analysis, rather than being done by some like bespoke, you know, business logic in the tool, it's done via code jet. So the agent generates the code and like runs the analysis. If you want a UI, the agent generates the UI one shot with the prompt and gives you the UI you need. And we think it can cover basically all of your productivity software. And in this world, I basically think there's gonna be three types of companies left in, in the software world. There's going to be the horizontal

1:16:03platforms of which I think there'll be a very few numbers, very few winners, because people don't want to have to have to maintain context and connections across a bunch of platforms. They'll probably just have like, you know, one for knowledge work and one for coding and like maybe one for personal use, but like not very many of these, right? So that would be the horizontal platforms, which we're going to try to be one of those. They'll be headless companies. So give you an example, a Stripe, right? Like, I still think you need to do payments. Payments is really complicated. Payments is really important. So probably gets it off Stripe, but like, you may not have the Stripe

1:16:35dashboard anymore. There may be no reason to ever go to the Stripe UI. It'll be really just just the API tool. And then you're going to have solutions companies where that where the software is totally hidden. And they're selling you a product. So for example, I think you'll still have like lawyers and real estate agents, they'll still exist. And they may use AI heavily, but like, you may not see that right there. They're going to sell you, you know, hey, we're going to help you sell a house or buy a house rather than selling you software. So yeah, I think it'll be those three. It'll be like horizontal platforms, which there'll be only about very small number of winners.

1:17:06There'll be headless products, and then there'll be solutions companies. So what happens to something like a sales force? They would obviously fall into that. And they just made this big move to go headless. But I wonder if, you know, payments is like, yeah, there's a lot of depth there. There's a lot of compliance across the jurisdictions. There's a lot of risk management. There's, you know, it doesn't, it doesn't seem like it's coming anytime soon, where a general

1:17:41purpose agent would like eat that. Salesforce, on the other hand, though, I'm like, what is it really, you know, it's kind of a schema. And, you know, it's a very, very complicated schema that sort of came from the era when you could only maintain one. So you had to make it fully general across all your customers and everything they might plausibly want to do. But most people don't need anywhere near everything that Salesforce has built for them to possibly want to do. And so it does seem much more

1:18:12realistic for many people to, like, have Task code whip it up for them, right? I think, I think Salesforce has been real trouble. I think a huge amount of the code that they have built up over the years is probably obsolete. I think the value of being a system of record in a world where you have agents goes down a lot, because like moving data around between systems suddenly gets a lot easier. I think there's probably still many sort of like headless things that you can do that are pretty useful. But the ability to build competing products has,

1:18:46it's gotten a lot easier, you can like, they have a lot more competition, because you can now vibe code some of that stuff. And so, you know, yeah, a huge amount of what they built is obsolete, it's now easier to move to competitors, there's gonna be more competitors. So I don't think they're gonna die. But I think you're likely gonna have a much smaller Salesforce in the future than you do today.

1:19:07It strikes me that like,

1:19:10system of record, and just kind of like, really reliable storage are not the same thing. But like, really reliable storage is like a key part of what drives system of record value. Like, I have had instances in my personal cloud code, you know, local AI productivity stack development process where it has, in fact, dropped a bunch of data. You know, I'm trying to export stuff out of Slack, for example, and it realizes like, oh, we didn't quite export it right the first

1:19:44time, I'll just like, delete everything and go try it again, not realizing that it was so rate limited, that that actually took like four days, you know, to export what I previously exported. And so, you know, what do I like, I certainly value the fact that like, Slack is not about to delete all my stuff by accident. But that also sort of suggests that like, there's maybe an opportunity for the horizontal platforms to, and I know you're a database guy, historically, right? So like, is there a, is there an opportunity or paradigm shift where the horizontal platforms say,

1:20:18here's why you can trust us with your data, even if like the agents make mistakes, or even if there's sort of a, you know, this or that kind of goes bad, we're going to have some sort of snapshotting, rollback, durability guarantees, where mistakes can't lead to data loss. It seems like if you could make that guarantee for people, they could like get much more comfortable with the idea that they don't necessarily need Salesforce anymore. Totally. And I think this is a huge place where

1:20:48harnesses matter, where, you know, is the harness going to make the LLM smarter? Like, you know, we can, we can discuss whether that is true or whether it matters, but can the harness do this sort of thing? I think totally. So get, let me give you a few examples of how I think we can help. So one, as you mentioned, like, like, like versioning, there's a whole bunch of startups working on file systems for agents right now. And some of those folks are working on versioning. The basic idea is like, Hey, if your agent goes rogue, you just want to like roll back to some previous state. And like, in a simple chat bot, you can just like throw away the messages at the

1:21:20end, but in something that's like touching the world, you've got to be able to roll back the world. And so there's a lot of, you know, for a file system, you can just change the file system. But if it's touched APIs and stuff, you might need to like keep logs and things, but like the ability for you to like undo things, the agent does, I think it's pretty key. So I think it's a lot you can do there. I think another area is having like oversight and like, and like logging and stuff. So you actually have the ability to like have a human in the loop in places where it matters and do that in smart ways. And so like with our product today, like you have to activate

1:21:53tools. So one of the things that we're going to adding soon is the ability for you to like have some tools that you approve every run. So there are cases where email is the best example of this, where like people are pretty confident to say, Hey, you could read my email as much as you want. You could make as many drafts as you want, but you can't send anything unless I say yes. And we want to get to the point where, you know, that is really ergonomic. So for example, like it can send you a push notification that's ready to send an email where it's like, it'll go crazy reading and searching and making drafts. And then when it's ready to send, you get a push notification. It's like, Hey, do you want to review this before it goes? And then you can say yes.

1:22:25And that's all like pushed to you. So I think permissioning can be another big area. I think another big area is using code better and in a more like way. So, you know, let's let's take data migration from like one system to another. The naive way to do this is to like load that data through an API, feed it into the LLM, have the LLM then like, like call some tools, like put it somewhere else. And basically when you do that every time you're sort of like putting it through language model context and trusting it to like, not hallucinate and like reproduce that data, which like,

1:22:58I think the models get better at over time, but like, it's very hard to like have a lot of confidence there. The better way to do this is have the model just like generate a migration script and then run the migration script. And that gives you like an artifact in the middle that you can test and you can have human approval for. So like, I think, yeah, if you're moving data from one to the other, you still want to have an agent that's like thinking through how to solve the problem. But what it should probably do is like generate a migration script, generate some tests, run the tests, and then send the thing to the human being like, here we have the migration plan and the code

1:23:31of the test. And this is why we think it's going to work. Are you okay with this? And then you say, yes. And then we run it. You could even have test environments, right? So I think the ability to have like tools within the agent that allow it to do like really high liability stuff and to have approval, there's a lot of opportunity there. Okay. I know time is short. Lightning round. I got to prioritize. How about, first of all, any vendor shout outs that you would want to make? You kind of alluded to, you know, companies doing like rollback the world type storage.

1:24:03Who's out there that you're using, if anybody that you think is underappreciated? Yeah. That's a good question. I think the one vendor that we use in a pretty big way that we've been pretty pleased with is BlackSalt, which is a sandbox vendor. And they just have like really fast cold starts and, you know, good performance. And it allows us to have sandboxes at like the very core of our, of our products. So I think the BlackSalt has been pretty great. We also use Firecrawl for crawling and like,

1:24:34they have some nice performance characteristics. We have looked at a bunch of these storage tech companies. We, we looked at some of the people doing, doing databases and file systems. We so far have opted to have our own infrastructure here. I don't know how, I don't know if that'll always be true, but you know, there's kind of a trade-off here of like, you know, Hey, we think this is pretty core. And if we're going to go with some vendor, like they'd better provide like a lot of value and be like somebody who have a lot of confidence in those good roadmap and stuff. So, so far we decided to do that all ourselves. And then obviously the labs, right? The, the,

1:25:07the models are amazing. So, you know, we would not be where we are today without Anthropic. How about the possibility of reselling on perhaps like a fractional basis, other services. So like, there's lots of connections, right? Where I can go connect my Gmail and connect to my personal stuff. But then there's this whole broader universe of tools that I could go have an account with, but I maybe don't have one and I don't necessarily want to create one or they make it somehow difficult to like do what I want to do. So classic example for me is Suno. I'm, I'm loving generating music

1:25:40these days, but it's not very agent friendly. And I constantly end up in their UI and I'm like, this UI should have been an API call. I just want to hear the music, but I also kind of think maybe I could, you know, use my tasklet credits to fund generations with these other services, where it's not like a highly personalized service. It doesn't matter if it's my Suno encounter, somebody else's, they may think it long-term good, but like, as of now, it doesn't really. So is that something that you kind of plan to do is sort of open up a Swiss army knife of things that

1:26:11are like paid, but that I access through you via my kind of credits that I've bought? Yeah, I, no, I, I do think we, we will do that eventually. We've made some very small forays into this already. So one of those is web browsing, uh, sort of search, right? So like we use fire crawl. Like you could argue that like, Hey, that's, that's us sort of like reselling an API. Um, uh, another one that is likely to come very soon is image gen. Um, so, you know, you can today connect us to nano banana and it can make images, but like, this is such a common use case, um, that we'll

1:26:47probably have some like native image gen where like, you just use your credits to do it. And you don't have to have an account. Um, I would love eventually to have, you know, something a bit more open here. Um, and we've had, you know, 10,000 people have emailed me about X 402 and like, it just hasn't been a priority yet. So I'd like this to happen. One of the things I want to note is we intentionally have this like credit system. Um, and the reason that we have this like credits, uh, like rather than like having some like fixed number of tokens or something that you can use is

1:27:21like we would like to be able to spend on many different types of things. So, you know, when you spend tokens, fine, that costs you credits, but yeah, if you generate an image that costs you credits too, when you, uh, you know, search a webpage that costs you credits, when you make a song that costs you credits. Um, so it gives us kind of this nice intermediate currency that we can use to spend on a variety of things.

1:27:44Okay. Three more, I'll keep it super quick. Um, um, what is the ratio right now of your token spend for the purpose of tasklet development to your payroll? Uh, as you know, sort of leaving aside what users are costing you in terms of, um, API calls, just what you are spending the APIs versus on humans. That's letting it, let me do some quick math here. So I want to note that we have three,

1:28:15we have at least three products where we do a lot of internal token spend, but obviously codex and then tasklet. Actually, we spend a lot of money on tokens through tasklet for our internal processes. I would, I would guess, I would guess we're at about five, like five to 10% of payroll right now in terms of internal token spent. How excited are you for mythos and how big of a difference do you think it's going to make for

1:28:47what you can do and what the trajectory of the business will be? You know, I, it's hard to, I haven't tried it, right? Like, like, no, no one's no, but not no one, but most people haven't tried it. So it is hard to get too excited about a thing you can't touch. It, it feels a little bit to me like a marketing stuff where like, they're like, hey, we don't have the compute to actually serve this thing. So like, let's get some benefit out of it for marketing. You know, even if we can't, it obviously sounds amazing. The benchmarks look really cool. It claims it can, you know, find all these zero days and stuff. So, you know,

1:29:21I'd love to play with it, but you know, I'd be more impressed if I, if I could.

1:29:27All right. Last question. I'm sure you have taken interest in the recent CCP forced unwinding of the meta acquisition of Manus. And a fun fact about me, I was in the same dorm as Mark Zuckerberg and any other Facebook founders way back when, not to date myself on as we wrap up this podcast, but our 20 year reunion is coming up. I don't know. He didn't famously didn't graduate. I think he's probably still invited if he wants to come. If I run into him, how many billion dollars should I tell him is the going tag for task? I mean, we've obviously been watching

1:30:03this, this pretty closely. I actually got a note from that before the, like shortly before the Manus deal got announced and we were supposed to get coffee and then he just like never followed up and it never happened. And then the, you know, the unwinding, I'm very curious how that's even

More from The Cognitive Revolution

Babysitting the Machine: Glean's Rebecca Hinds on the Hidden Human Labor of AI at Work

Jun 10, 20261h 46m

AI in the AM — Week 1 Highlights (June 2026)

Jun 6, 20261h 22m

Nested Learning: Ali Behrouz on the Quest for Continual Learning & Illusion of AI Architectures

Jun 3, 20263h

Inside Nathan's Second Brain: Daniel Miessler, Security Expert & Creator of PAI, Audits My AI Setup

Jun 1, 20262h 32m

Your Biggest Lever: Designing your AI Career for Maximum Impact, with 80,000 Hours founder Ben Todd

May 26, 20261h 42m