Harrison Chase of LangChain on Deep Agents, LangSmith, and Earning Trust | NVIDIA AI Podcast Ep. 297

May 6, 202624 min · 5,627 words

Open in Steadcast for Mac Apple Podcasts Overcast

Show notes

LangChain has surpassed 1 billion downloads—and the framework that started as a weekend project is now the harness powering the next generation of production-grade AI agents. In this episode, Harrison Chase, co-founder & CEO of LangChain, breaks down the architecture behind deep agents, explains why systems like Claude Code, Manus, and Deep Research all share the same foundational pattern, and lays out what it actually takes to deploy autonomous agents responsibly in the enterprise. 🔬Topics covered: What is a "deep agent," and why does architecture matter more than ever? How enterprises are (and aren't) embracing autonomous agents LangSmith: observability, tracing, and evaluation-driven development Mixing frontier and open models (NVIDIA Nemotron) in multi-agent systems What's next: async subagents, proactive/always-on agents, agent memory, and agent identity Chapters: 00:00 – LangChain origin story and the deep agent architecture 01:46 – What is a deep agent? 03:31 – Enterprise trust: risk, autonomy, and iteration 04:38 – LangSmith: observability and evaluation-driven development 13:30 – Frontier vs. open models and the Nemotron Coalition 18:10 – What's next: async subagents, agent memory, and agent identity

Highlighted moments

“one common misconception here, by the way, is that you need like a thousand scenarios for it to be effective. We, you could start with five, you could start with 10.”

Jump to 10:19 in the transcript

Transcript

Introduction to LangChain

0:00And so I think like these always-on asynchronous event-driven agents, that will be a really big productivity unlock. And especially in enterprises, there's so many events that are just triggering, triggering, triggering. And so if you can have agents listening to those and firing off, I think that will be a massive game. Welcome to the NVIDIA AI podcast. I'm Noah Kravitz. Our guest today is Harrison Chase. Harrison is CEO and co-founder of LangChain. One of just the most incredible stories of this whole generative AI era that we're in,

0:32as Harrison will get into in a minute, LangChain was founded about three years ago, over a billion downloads. The whole point is to help developers build applications with LLMs and now getting into agents and agentic frameworks and all that great stuff. So we're going to get into it in a moment. Harrison, thanks so much for joining the AI podcast and welcome. Thanks for having me. Excited to be here.

Founding LangChain

0:53So let's start about three years ago. You started LangChain with this premise of building tools so developers could build apps with LLMs. What did you see back then that either others didn't, or even if they did, you saw and just thought, this is where things are going. This is where I'm headed. What got us really interested was seeing the applications that people were building on top of LLMs and the systems they were building around the LLMs in order to power those applications.

1:24And those systems had a lot of similarity with each other, even early on. And even early on, we could tell that they would get quite complex over time. And so a lot of what we built is tools to help people build these systems, these agents, which we now call agents around these LLMs and figuring out what the common patterns are and the common tooling is and making it really easy for anyone to do so. Right. And so now you've coined this, I don't want to say this term, but LangChain, you talk about deep agents.

Deep Agents

1:52Yeah.

Deep Agents

1:52So what is a deep agent? And then maybe we can get into talking about the enterprise and why would an enterprise in particular care about that distinction? Yeah. So about a year ago, we saw a few really interesting things. So maybe even backing up, like, you know, three years ago, great. Like LLMs, you want to connect data, you want to connect these other things to them. Fantastic. How do you do that? Turns out it's really hard. And the best way to do that for different types of agents was actually pretty different. You would build different scaffolding. You would build different workflows around the LLMs. About a year ago, we saw Cloud

2:22Code come out. We saw Manus come out. We saw Deep Research come out. And under the hood, all of these had the same kind of, like, general architecture. They were simple in some ways. They were an LLM running in a loop calling tools. But then they also had common patterns of, like, connecting to a file system and having sub-agents and doing planning. And so about, like, nine months ago, we released, for the first time, Deep Agents, which was at the library. And we've been building it ever since. And we've just continued to see kind of, like, this same pattern of giving the LLM more autonomy

2:55in this environment for interacting. This is what powers OpenClaw, for example, is this type of harness. And so Deep Agents is really this new type of agent harness that we think is really general purpose and that you can customize to do different things. But it's not like you're reinventing the scaffolding each time. You're just customizing it with prompts or tools. And so it's way easier to get started with. And also way more powerful because it's a simple thing under the hood. And simple is really good. And so Deep Agents is this general purpose, agent harness, model agnostic, open source that we've been building for a while. And we're starting to see more and more agents build on

3:30top of. So when you're working with customers and the enterprise in particular, and we're getting into these systems that are so powerful, becoming so powerful, in large part because they are autonomous to a larger degree. And as you said, they can do more now. The agents can control the screen and go off and do things with apps and such. What are your conversations like with enterprise

Enterprise Adoption

3:55leaders? And what's the feeling around, is it a tension between risk reward? Is it just the excitement for what the systems can do? And so there's trust in building these systems that give agents more leeway. What are those conversations like? There's a lot of things. So one, not everything needs an autonomous agent. And so one framework we have, LaneGraph, is really good for when you actually want to combine some of the autonomy of LLMs with more directed workflows and more control.

4:25And so honestly, when talking with a lot of enterprises about deep agents, some of them are just like, we love LaneGraph. LaneGraph is better. We're going to stick with LaneGraph. And that's fine with us. We think there are different use cases and different things. But that's definitely kind of like one component that comes into it. Another component that comes into it is definitely just like, okay, great. Like the LLM is doing a bunch, but how do we know what's going on? And so another thing that we work on is LangSmith, which is observability and evals. And that's basically our answer for that. There's this really interesting thing about agents compared to software where agents, the interaction space for agents is way more open-ended. You can ask it

4:59anything. Like text is infinite. If you have a UI, there's a bunch of different buttons you can click. And so it's much more constrained. And then also models are not robust at all. Like, you know, they're non-deterministic and then you change one word and the answer changes completely. So this is why we think observability is really important. And that's a huge thing that enterprises care about and very related to observability is that evals. Because sure, you can see one thing that happens. You can tell why it goes wrong. But what if, you know, what if you want to test how it did on like 10 different questions, a hundred different questions. And so building up these eval data sets is a big thing we work with folks on. Right. And so LangSmith is the platform for

5:33building agents as well as observing and evaluating? Yeah. So the way that we think about the agent development lifecycle is build, test, run, manage. Okay. And so the build is all the open source. You can, it's like, choose your fighter, choose LangGraph, choose deep agents, choose another framework. All of our stuff works kind of modulary, but then this test, run, manage, that's LangSmith. So we've got a bunch of stuff around testing and evaluating these models. We have a deployment platform for deploying these at scale. And then we have observability and other things for managing them. Let's talk about skill, or maybe you can talk about skills

6:07for a minute. When I first started playing with these tools and I'm, you know, I'm not a developer, I'm just kind of a technical lay person, if you will. Right. I love playing with these things. Back in the day when they first came out, uh, maybe it was baby AGI, whatever it was, I, you know, spun my computer right into the ground in an infinite loop. Um, but when I first discovered skills, it took me like, there was a moment where it sort of took me back where I was like, wait, I just describe it and it goes off and it built. And then of course it does. Cause that's how this all works. But can you talk a little bit about skills and about kind of that,

6:40that same idea of giving the agent, the autonomy to write the tool and run with it. But how do you, you know, keep it in check and keep things secure? Yeah. Skills are a great way to package up knowledge and, and other kinds of like instruction sets and other tools for an agent to use. And so they started in coding agents and in a skill would involve basically a markdown file with some instructions and then some scripts that you could run. And, and, and one of the things that's kind of become clear over the past few months is like coding agents are very general purpose in a lot

7:10of ways. And so the same idea of a skill as a markdown file, and then some scripts to run is, is really, really interesting. Um, we see a bunch of different types of skills. Some of the skills are purely kind of like informational. So like you want to learn about something great, go, go read this markdown file. Other skills do things. And this is where it starts to get, I think like really interesting. It could be a Python script that, that hits a URL. It could be a Python script that runs some GPU accelerated compute. Um, and so this is, this also ties into the environment aspect. So, so when we think about agents, we think of a model, a harness, and this is deep agents,

7:42and then an environment that it runs in a runtime for it. Right. And so NVIDIA just released open shell, which is a, it's a secure runtime for it. And then the other thing that's related to the runtime is also like where it runs. Does it run on a Mac mini? Does it run on, on some GPU accelerated environment? Does it run in the cloud? Right. And so those three components and being able to pick and choose what you need for, for different jobs is, is a big part of kind of like customizing your agents. Was there a moment or can I put you on the spot and ask you to think of kind of an aha moment where this, the idea of deep agents really clicked and, and in a use case and whether it was, you

8:18know, something internal at Langchain you're working on, or maybe with a customer, was there kind of an aha moment where you were like, yeah, this is, is it? I think so. It started just by seeing really the three things of Manus, deep research and cloud code. And this is the same way that Langchain started as well. Just going to early meetups, seeing things that people were building and seeing patterns. Right. And, and so the first version of deep agents, um, just like the first version of Langchain, uh, I hacked on over a weekend. Um, and it was a weekend project. I'd been talking internally with some folks and being like, Oh, like, you know, cloud code's really interesting. Like Manus, that's,

8:49they've got some similarities. Right. And so it wasn't until I had time to kind of like sit down on a weekend and, and hack some stuff together that it, you know, we, we, we came up with a few patterns of what these similarities actually were. Um, and then, and then using it, I think the first thing we used it for was a deep research type thing. Um, and so we gave it, we gave it access to a bunch of files and we just put it in this like virtual file system and had to do some research. And, and it wasn't even really doing rag. It was just grepping and globbing like, like a coding agent would over these files. And it, and it worked fantastically well. And so I'd say deep research was the

9:22first concrete thing, but really the idea came, came from just seeing, seeing a pattern and spending a weekend kind of like hacking on it. You, um, you mentioned earlier, uh, the importance of, you know, with the words you use, but audit, auditability, traceability, being able to see how did the agents, you know, do what they did. Um, can you talk a little bit about evaluation driven development and how that plays into, and again, in the enterprise, you know, building that trust in what the agents are doing? Yeah. If you, if you talk about trust in the enterprise that,

9:52that, you know, what does that mean? That means that the agent's doing what you want it to do. There's a few different ways that we see people getting that trust. Part of it is observability and traceability and being able to go into an agent run and see exactly what steps it took and exactly what it did. The other part where trust comes in is, is having like this, these, these scenarios and running the agent over them and seeing how it performs and evaluating that. And, and this is kind of like what we talk about as evaluation driven development. You come up with these scenarios ahead of time. One, one common misconception here, by the way, is that you need like a thousand

10:24scenarios for it to be effective. We, you could start with five, you could start with 10. It really doesn't matter. I think like creating these evals is a really good way to do like product thinking about what the agent should act. Cause this is another thing like agents can do anything, but they shouldn't do everything. They should do like what you want them to do. And so being able to be like, being forced to come up with like, Hey, these are 10 questions that we expect the agent to get asked. This is what we think a good response is for each of them. This is what a bad response is for each of them. That's a really good kind of like mental model for kind of like coming up with what these agents should do. And then you can use that to drive all of your

10:58changes. So you change a prompt, great. You can run it against this benchmark. Did it improve? Did it, did it, did it get worse? And then this, this eval data set is, is living over time as well. So as you release it to first, like a small set of users, you might see them using it in unexpected ways. And, and, and then some of those ways you might be like, okay, maybe they shouldn't be doing that. Let's put some guardrails around it. But other ways you might be like, yeah, that's totally legitimate. We had no idea they would use it. Let's add some data points to our eval data set. So when we go and change the prompt in the future, we can make sure that it's still good

Evaluation Driven Development

11:26at these use cases. Are enterprise customers open to kind of rolling with that, you know, oh, we weren't expecting this behavior necessarily, but it's good behavior. And so, you know, is it, is there a sense of kind of experimentation, obviously in the AI community and the open source community, it's all about experimenting and sharing and things are going so fast. Is the enterprise embracing that at all? The, the best ones do in, in, in, in limited ways. And then with a limited blast radius, they might roll it out internally. For example, they might roll it out to a set of like alpha

11:58customers. They might roll it out to 1% of users or something like that. Um, there's definitely way more caution there than there is with gen AI native startups, but, but building agents is so iterative. Um, and, and the importance of this iteration can, can really not be understated. And so I think the, the enterprises that are, I think a, a, a, a failure mode for enterprises is you have some idea of an agent. You take three months to craft a bunch of examples. You take another three months to, to build the agent. You take another three months to get humans to look at everything by,

12:30but the space has just moved so fast. Like the, the whole, the whole idea you came up with, there's probably a better, like, there's just a better way to do it at that point. And so I think like you have to kind of ship, you have to learn, you have, this is another thing by the way, that no one likes the answer for, but like, you have to, you have to basically redo your agent every, every nine months at the pace that things have been like with these agent harnesses. If, if you're using an agent architecture from like a year and a half ago, you should very strongly be considering looking at rewriting on top of an agent harness or something like that.

13:01Right. And again, for performance only or for, yeah. Yeah. For performance that that's still the bit like, um, that, so, so there's two things. It's like performance, but also scope of what the agent can do. So if the agent's doing a very small thing, it's not as valuable as if it's doing a big thing. And maybe like a year and a half ago, you just couldn't get it to do the big thing. So you focused on the small thing, but now you can. And so if you're not like reevaluating that and saying, Hey, there's this big thing, let's hook up an agent harness. Let's take a stab at that. You, you absolutely need to be doing that. Yeah. So I want to ask you about models, um,

13:33frontier models, you know, the, uh, I would say everybody, but I think the kind of mainstream AI world, you know, focused on the latest and greatest and what can they do and everything. Open models have become incredibly important. I mean, they've always been important, but I feel like the past year or so incredibly important, you know, you spoke earlier about, um, open claw and NVIDIA is open shell and the Nematron family of models. How does, how do you approach and how does Langshank approach? And then your customers mixing frontier and open models together to achieve, you know, cost performance

14:09ratio and all manner of other things. What's your, uh, what's your approach on mixing, mixing those? Yeah. I, I, I think there's a bunch of different ways that we combine them. So I think like one, one obvious way that we worked with NVIDIA on a blueprint for is, is with deep research, you have a bunch of sub agents and those sub agents might want to be specialized agents. And there might be a, there might be an orchestrator kind of like agent that's using a frontier model, but then when it goes to a sub agent, it might want to use either like a fine tuned model or an open source model for, for costs or latency reasons. And so when you have these big adjunct systems with these sub agents, it's totally possible that one part could be using a frontier

14:43model and one part could be using an open source model. And another part could be using a fine tuned model. Um, the other, we've been paying a lot more attention to open source in the past, even just like two weeks, I would say for probably two reasons. One, um, I think they're getting good enough to where they can drive this harness. So, so, you know, the, like being able to properly utilize everything in the harness is, is not, it's not super easy. And for a while it was only the frontier models that could do that. We're starting to see, there's still a step below the frontier, but we're starting to see that these open source models can drive the harness, which

15:14is really interesting because this is the most agentic stuff. And then the other thing that's causing us to look really hard at open models. If I could stop you for a second to hear us and back up, what are the qualities that a model needs to drive the harnesses, um, successfully? So at, at the risk of signing a little broad, like it needs to be intelligent. It needs to be good. Another thing that, that is maybe underappreciated is it, it, it probably needs to be good at coding. Okay. So we've actually seen that like Quinn coder is a better general purpose model than just the Quinn series of models because a lot of what makes up this harness looks very similar to coding agents.

15:48So this harness has a file system. It has a bash tool, right? So if the model knows how to use it, if it's a coding model, then that's actually really, really good. And so I think models that are better at coding are generally actually general, better general purpose agents. Yeah. No, that makes sense. And so then the sub agent models you were talking about. Yeah. And so then a second thing that made us look at this, um, look at open source models even more is, is open claw. So one of the, there's a bunch of really interesting things about open claw, but one of the interesting things is it's always on it's proactive, it's running. And so if, you know, if you're using a coding agent and you kick it off, even let's say like 20 times a day,

16:20you know, you, you're probably okay paying like some good amount for that. If it's running every 10 minutes, like, Oh, Oh my God, you can not. And if you, if you're running like three of these, like you, you just cannot do that. And so I think like cost is a really interesting, um, reason for these open models, especially in these proactive, always on scenarios to, to make them become popular. Um, shifting gears for a second, uh, Langchain just opened, and NVIDIA formed the Nematron coalition and Langchain joined. Can you talk a little bit about,

16:50um, why and what it may or may not mean going forward for Langchain users? Yeah, we, we need open models and we need harnesses that they can run in. Um, and you know, we, we think we can provide the harness and we want to work with NVIDIA and all the other companies in the coalition to, to help provide a model that can, that can work with that harness and others as well. Um, I think like, you know, as we talked about, like the open source models are getting good. They, they're still a little bit behind the frontier models in terms of driving the harness. And so great, we can use them in sub agents, you know, we can maybe use them, um, uh, for some of

17:24these kinds of like triggers and the always on, but if they can drive the really expensive workloads, I think that's going to be really transformational in terms of, um, what you can do with open models, which generally mean what you can do with more sensitive data, what you can do, uh, more cheaply, um, what you can offer to customers just more. Um, and so, yeah, I think at a really high level, we're excited about the NVIDIA, uh, Nemo Tron coalition because we want, we want an open model that works really well with open harnesses. And then a third part, which actually

17:54I don't think was part of the coalition when it started, but I think the open runtime is really important as well. And you guys are also doing stuff around that. Um, this is my favorite question to ask and, you know, I'm sure the hardest, but maybe the most fun to answer. Um, what's next? What do you think agents, agentic systems, uh, Lang Smith, Lang chain, the company for that matter is going to look like in, and I'll let you kind of go with what timeframe makes the most sense. Cause I ask, and depending on the guests, they're like a year. No, that's too long. No, no, no, no.

18:26Um, but what do you think's coming down the pike as far as, you know, agentic systems and all of these things that you're working on every day? I, I'd maybe call out kind of like three things that I think are interesting. Um, one's, one's pretty short term. And I think we'll see in the next like month or two, if, if, if not, if not by the time this comes out, but, but, uh, asynchronous sub agents. So right now when, when an agent kicks off a sub agent, it basically waits for it to respond. And, and, and that's great. But if these sub agents start to get really long running,

18:57you want to just have them run in the background and you want to have this manager orchestrator agent, like check in on them and maybe update them. And so I think one trend that we'll see is, is encoding right now in coding agents, you interact with the agents that's doing coding. I think we'll start to see a trend where you interact with this orchestrator agent and that orchestrator agent spins up a bunch of background coding agents. And you just talk to the orchestrator and say, Hey, what's going on with this experiment? What's going on with this feature? And so I think we'll start to see asynchronous sub agents become a bigger and bigger topic. Man, I hate to resort productivity, but how much of a, is that going to be a step change

19:29or how much of a, of a difference in terms of what you're able to accomplish? So, so I think, I think this bill, like the only reason asynchronous sub agents even make sense is if the agent, the sub agents themselves actually run for a while, right? Like if they just run for like one second and then return, you can just make them synchronous. And, and, and so I think like it will be a productivity gain, but I, it like requires these agents to be long running in the first place. And, and I think that's the real productivity gain. And I think this is just a nice interface on top of them. What, one thing, um, that I, that wasn't on my list of

20:02three things, but I think will also be more and more impactful is basically these agents being proactive, running in the background, always on listening to events that I think will be a massive productivity gain. So I have an email agent. It runs in the background. It listens to my emails. Um, when, when it wants to respond, there's still human in the loop, but it like, it like flags a draft and it's like, Hey, here's a draft. Do you want to approve it? Do you want to change something? That is so much more efficient than if I had to go, like, there's no way I would take an email, copy, paste it, go to chat GPT, say, Hey, can you draft me a response? Copy, paste that like that. And so I think like these always on asynchronous event driven agents,

20:36that will be a really big productivity unlock. And especially in enterprises, there's so many events that are just triggering, triggering, triggering. And so if you can have agents listening to those and firing off, I think that will be a massive game. The other two things that I think are coming down, um, one agent memory, we started to see this a little bit with open call, but I think the idea that it could remember things as you interact with it, it could actually update its own tools and skills and description itself. I think more and more we'll see agents kind of like remembering things and, and, and, um, yeah, learning from their interactions. And that's why, that's why human in the loop

21:08is important as well. That's why I don't think these things will be fully autonomous because they need to learn. And the only way you do that is by interacting with the environment, with humans. And so I think that'll be a big piece of it. And then the last thing is agent identity. So, um, you know, if there's an agent in an enterprise and I chat with it and you chat with it, whose credentials does it use? Does it use mine? Does it use yours? Does it use a fixed set? So previous to open call, I think we saw that basically everyone was doing the, the on behalf of model. So the agent would act on behalf of me, on behalf of you, on behalf of the end user,

21:40and it would pass like my Slack credentials through. And so I might get a different answer than you would get. I think the thing that open claw changed is people started thinking of these agents as like identities as their own, as their own things. And I think we'll actually see more things where they will be like, Hey, Tom is a marketing agent and you can chat with Tom and I can chat with Tom and Tom has a persistent memory and Tom has its own credentials and Tom can go and do things. And Tom is Tom. Tom is not acting on behalf of me or you. Tom has its own accounts with, with Slack or Gmail. And that's a big thing that we need to figure out that I don't think anyone in

22:11the industry really knows, you know, I was chatting with one SAS provider. They, they made, they went in all the open claw craziness. They were making it really easy for people to create accounts for their agents, but still like an account. And so like, will we see, will we just see more and more people create normal accounts? Will there be special agent accounts? I don't know, but I think this idea of like agent identity is really interesting. Yeah. Um, there's a whole can of worms on the other side of the words, agent identity, I think, but not for this conversation. Um, so, you know, you mentioned the weekend project, um, that, that you worked on that unlock things at Langchain for you, uh, open

22:45claw, another weekend project went incredibly viral, incredibly quickly. Um, what, what are your thoughts or how has that impacted the work you do? And I'm thinking more about the perception that, that users developers, developers, enterprise customers might have about agents as it really, you know, has it, was there an, a rush of people knocking at your door saying like, Hey, can you build me a claw? Like how does it change things? A hundred percent. I mean, I think Jensen said, uh, what do you say? Every enterprise needs a claw strategy or something like that. And we're absolutely seeing that.

23:16I think like it's set a North star. It's set a new, new objective for, for, for kind of like what these agents can and should be able to do. Now, there are a lot of things that you probably want to do more securely than, than kind of like in an open claw, the whole reason it took off is because it can, it can do everything. And that's great for, for weekend projects and hobbyists. But when you bring it into an enterprise, you're understandably going to want more, want more control. That's why we're thinking about agent identity. That's why we're thinking about observability, but in terms of like, did it change the North star for, for what we build?

23:47Absolutely. It did. I think it also made it really, it made it so much easier to communicate some of the ideas as well. Um, and so that's, that's been fantastic as well. Amazing. Harrison that so much, we just talked about in a short amount of time and, and so much more, but I'm sure the time we cross paths again, you know, as you mentioned, right, you get your take three months to scope and three months to build and all of a sudden it's nine months and, you know, no more. So the next time we cross paths, I'm sure it'll be a different looking world, but kind of built on these same things. Um, but for folks who've been listening or watching

24:19and want to learn more about LangChain, the work you're doing, um, best places to go online, website, socials, research blog, anything like that? Yeah, we have a great blog. It's blog.langchain.com. A lot of the stuff we talked about around context engineering and agent identity will be blogs on there. And we update that a lot. And then, and then Twitter, I think all, you know, everything in AI is happening on Twitter. Uh, we're just, we're just LangChain on Twitter. And so you can find us there. Easy enough. Harrison Chase, thank you so much. It's been an absolute pleasure. Appreciate you taking the time and joining the podcast. Thank you for having me.

Harrison Chase of LangChain on Deep Agents, LangSmith, and Earning Trust | NVIDIA AI Podcast Ep. 297

Show notes

Highlighted moments

Transcript

Introduction to LangChain

Founding LangChain

Deep Agents

Deep Agents

Enterprise Adoption

Evaluation Driven Development

More from NVIDIA AI Podcast

How Mistral Is Building Frontier AI for the Enterprise | NVIDIA AI Podcast Ep. 301

Everyone Can Build a Robot: Open Source Embodied AI With Seeed Studio | NVIDIA AI Podcast Ep. 300

Inside AI Tokenomics: How to Profitably Turn Tokens Into Business Value | NVIDIA AI Podcast Ep. 299

Snap’s Secret to Processing 10 Petabytes a Day: GPU-Accelerated Spark | NVIDIA AI Podcast Ep. 298

How Dassault Systèmes Is Building AI That Understands Physics - Ep. 296