Episode 7
July 22, 2025
In this episode, Beyang and Thorsten discuss strategies for effective agentic coding, including the 101 of how it's different from coding with chat LLMs, the key constraint of the context window, how and where subagents can help, and the new oracle subagent which combines multiple LLMs.
Transcript
Thorsten: It was a pretty easy to parallelize workload, like one for each of the slowest tests, I guess. And it spawned them in parallel. And then it just goes off and does it. So that's the current iteration of subagents. They have full access to tools, full access to MCP servers. There's still, you know, stuff to figure out. Like, what's the best UX? When do you want to use them?
Beyang: So basically, what are we doing here? We are learning about agents. And one of the things that we've realized as we're kind of building Amp and building for agent decoding is that there is kind of like a strategy around using agents. It's a skill. That is very learnable, but you have to invest actively in learning it. It's like a power tool, right? It's like a high ceiling skill, but takes some investment. And so we wanted to just like chat with people who've been using it, starting with ourselves, experimenting with coding agents and pushing them as far as they can get them. And then sharing what they found works and what they found does not work with folks in the spirit of just figuring out how this new world of software engineering works together. So to that end, I got Thorsten here, creator of Amp. And we're going to talk a little bit today about first some basic agent stuff and then some more of the recent features that you've been working on related to subagents and preserving context windows and trying out different models and that sort of thing. So yeah, anything to add to that?
Thorsten: Glad to be here. No, nothing to add. I mean, just to underline what you've been saying, that it is a new way to program. And I don't want to say we were right. We were first. But it feels like when we had conversations three, four months ago, we were like, this is crazy. This is a new way to do this. Now slowly, not slowly, but in the last, say, four weeks, you had a lot more blog posts popping up where people were describing, you know, it's context engineering and an agent does this and this is what an agent is good for and this is what it's not good for. And everybody's discovering a lot of tricks right now. Yep. And it's, yeah, it's fascinating.
Beyang: It is fascinating.
Thorsten: It's really fascinating.
Beyang: It's like we're reinventing the whole field in real time, which is kind of cool.
Thorsten: Yeah. And especially, I mean, you know, for me, it's been six, seven years. For you, it's been, I think, more than 10, like, in developer tooling. Yeah. I don't think there's been a bigger change in developer tooling. Like, I think...
Beyang: No.
Thorsten: Last 10 years, maybe language servers, maybe something like TreeSitter, you know, formatters, that, you know, like stuff like this.
Beyang: Yeah, yeah. Nothing like this, though.
Thorsten: Nothing like this in the last 10 years in developer tooling. It's crazy.
Beyang: It's crazy. Okay, so before we get into the kind of like new stuff, just kind of like level setting here, I want to talk a little bit about the strategy for invoking agents because I often think it runs counterintuitive both to people who are used to writing code by hand as well as people who were using AI a lot in the sort of like chat LLM era. Now we've kind of entered the coding agent era and the strategy is different now and in some ways in direct tension with the habits that people learned over the past 18 months, 24 months. So talk a little bit about that. What would you say is the overarching strategy or how best would you initiate someone into the world of agent decoding?
Thorsten: I would say use the mindset of an agent is like this little robot that you can send on its tracks and that does something for you. And once you set it on the tracks, it will go and try and find the end of that track. And that means it will edit files, it will self-correct if it runs into errors, it will try and find other relevant files, run tests, run the compilers, sometimes write code to test its own hypothesis or something. And that's, the UI looks, you know, let's leave aside terminal or not terminal, but the general thing, it's just, it's still a chat interface. Like it's still user assistant, user assistant. So it looks like what AI-based programming was, say, half a year ago, right? Where you had a Kodi cursor windsurf with the sidebar and you chat to it. And then, but the difference is half a year ago, these models were, you know, it's kind of like the ChatGPT or Claude app, except that you had like utility stuff on top. Like it would come back with code and you would say, yes, this code, instead of copy and pasting it over here, yes, apply this and then apply this. And then on top of that, the apps would build stuff like the user has applied this, user has applied this, and then you would chat and so on. But the difference with agents now is an agent, my personal definition is it's an LLM with access to tools that allow it to change its environment outside of the context window, right? Yep. So what we're talking about here is you give an agent a bunch of tools to modify files on disk, like a programmer. It can create files, list files, update files, glob files, run tests, and whatnot. And so instead of this back and forth where at some point the agent waits for the user to say, yep, looks good, or flesh this out some more, the agent can now call tools and do stuff on its own. And when it calls a tool, it gets the result of the tool, and that kicks off this inference call again, and then it does stuff again and again and again. Yeah. And again, it looks the same. The UI is similar, except you kick this off with an initial prompt, and then it goes and does stuff for you. And I think that's the mindset you should have. Like, okay, I'm kicking off this thing, this tiny robot, this agent, and then it goes off and does stuff on its own until it runs out of tokens or it says, I don't know how to go any further. And I would say the second thing is, people might disagree, but I think previously it was a lot of back and forth.
Beyang: Yeah, like you and the robot just having an active conversation.
Thorsten: Yeah, and then it would ask you follow-up questions and you would ask follow-up questions and all of that stuff. And there was no problem with underspecifying stuff because the thing was in its box. It was in the conversation box. And now with agents, you want them to be aggressive. You want them to be a gentake and have high agency and the willingness to change stuff. And they're trained to be eager. So that means you have to specify more at the start. So if you, dumb example, if you say change the header of my website to make it more colorful, say maybe a year ago a model would have said, what color would you prefer? You know, something like this. But the models nowadays, say Sonnet 4, it would go and say, yeah, I'm going to make this more colorful. Here you go. It's blue. It's rainbow, gradient, or whatever. So then it costs you tokens to change files. So then you have to go, I actually wanted pink or something. So you want to avoid this. You want to put as much information as possible in that original prompt, I think, and then send it off on its way. And yeah, it's kind of...
Beyang: The analogy in my mind is like with the previous generation of tools, it was almost like you had... It wasn't even a junior engineer. It was like a programming student. And you would ask them something, and they would know some stuff, but then they would immediately ask you, hey, how do I do this? Or what do you think of this code? And then you'd have to go and review it, and you're like, okay, no, change this, no, change that. Now it's more like a full-time engineer of some sort where you don't want to micromanage them. That's an anti-pattern because that way they're not actually getting to do the work. You just want to give them a good spec or give them a good description of what the feature is, enough details that they can go and find the context on their own. Maybe point them at a few places to search or whatnot. And then check back once they're done.
Thorsten: I think the other... I mean, this has been used to death, I think. But the previous generation of assistants is, you know, like in a movie, like you have a little ghost appear on your shoulder and you can, it's like a mentor or whatever. And it can give you tips and tricks or whatever, but it cannot affect the real world because it's a ghost or something. And with agents, it's much more like, you know, Iron Man's, what is it? Jarvis, Jarvis, I think. Yeah, yeah, yeah. Where like, you know, hey, can you do this? And it's connected to all of your systems. And basically, it can affect the real world. And it sounds dumb when, you know, obviously when you talk about ghosts and Iron Man, but there's a big difference. I think there's a big, big difference. And we're only, you know, now we're starting to wake up to it, what it means to give these things tools.
Beyang: Yeah. Do you have a quick example that you can show just to make it concrete?
Thorsten: Yeah. So let me move you over here so I can still see. And then I'll share my entire screen. So here, let me open. We can actually go. Let's close this so it's not distracting. We can also bump the font size, I guess. So example from, say, this morning. I don't know if you've seen this. This was fun. So, okay, this is a fun example. But basically, Eric wrote in our Slack that the first three options have a regular cursor, and the last three have the Mickey Mouse cursor pointer thing, right? Yeah. And I think it was like, I don't know, 6.10 or something in the morning. And I was like, "let's try this. A user reported this. Can you fix it, right?" Yeah. All you gave it was a screenshot. A screenshot and this message here, which is counter to what I've been saying, right? You shouldn't put more information in the first prompt. But to be fair, this was a joke. But what I want to show is, can you fix it? And look how eager it is, right? It doesn't say yes. "You should probably change the CSS class to whatever." It's like, "yep, I'll research this." So here we can get this a second. It used to search subagent, looked at this, found the issue, changed it. It worked. I thought this was wrongly formatted. I told it to format the file. It was already for me. I was wrong. That's like a really small thing, but you can see how eager it is. You just, hey, user reported this. It read the screenshot. It found this thing, and it knew my job is to go and help the user accomplish the task, and I'm going to go and do it. I'm going to use another agent to find stuff. I'm going to open this file. I'm going to read this other file. You know, this is already mind-blowing, right? Like it looks at this file and then instead of saying, show me the content of this other file, it knows I can open this file. So it finds this other checkbox item thing that's probably referenced in this one and then edits it and fixes it and says fixed.
Beyang: And so in the old world, like with chat LLMs, each one of these like tool uses would have been probably like a, a back and forth between human and AI. Right. And some of this, it would probably be you manually checking stuff.
Thorsten: Yeah, or the app would include it, right? So the app would say, like, Cody, Cursor, Windsurf, whatever. When I say user reported this, then you would have to include this file or some RAG or whatever. I don't know if RAG could do this. I don't think it could. Yeah, yeah, yeah. Maybe it could. But basically, you would have to give it that file because it cannot reach outside of its box. And then it would say, oh, you need to do this. And it would reply back with the diff. And then you would have here somewhere like an apply or accept reject button. And maybe the app does it automatically. It always accepts and only allows you to review stuff. But here it truly is the case that the agent called a function to edit this file. And it changed the, where is it, cursor pointer somewhere, somewhere. And updated it on its own. And then it wrote, it's done. And another example from this morning was, this was at 6 a.m., the other one was at 9 a.m. Okay, so this was, I was like, help me, how we start MCP servers, right? And again, I put this into context just to seed the context window, Like, hey, this is what I'm talking about. And then it, again, used the search thing. And it found this. And it says, like, oh, it's here, it's here, it's here. And I'm like, "oh, okay, interesting." And then I was like, "can you add some debug statements to print this?" And then I ran the debugger. Yeah. And then I think it already – this is funny. We'll see this in a second. So it does this. It, you know, adds a debug print statements. And then I restarted it after it was done. And then I said, this was 6 a.m. This is not the best messages, right? I was like, "the issue is that it looks like the NFRs aren't passed along to the MCP server, right?" I don't know why I said to be clear. But then it's like, oh, the code does this. But the fix I added in the debug logging actually fixes the bug. So what it did up here, the merged env, this is already the fix. Interesting. Interesting yeah so we forgot to merge this it didn't realize it it was like I'm gonna merge this and log it these are the merged nfars yeah and then you know I don't know I think does it even say this I yeah there you go I noticed there's an issue I was merging but I need to check where that comes from you know yeah so let me do this and it went off and did it on its own and then it worked um
Beyang: and yeah with with like minimal intervention and steering from you
Thorsten: Yeah I yeah
Beyang: I I think when we're talking when we're talking about the differences between this and the old paradigm it's almost like you have to unlearn the habits that you learned with like chat LLM based tools because the tendency with chat LLMs is like every LLM response there's like a user a user uh action to take either you're like responding to it guiding it yeah um or or you're doing something manual and yeah uh here it's almost like an anti-pattern because what I've seen people coming from that world do is it's like this micromanaging mentality where like you want it to do something very precise and you kind of like are annoyed or frustrated in the beginning that it's oh it's just editing my files? Like how do I know it's right? And stuff.
Thorsten: There was somebody on... I'll give you another one example, concrete example. So this is also another thing that I fixed. If an MCP server failed to start, this is also again a screenshot. I do love the screenshots. We don't show anything. Standard error nothing right so I'm like help me make this happen we need to say why it failed and show this thing so then it again like you know imagine you were in ChatGPT on the website it would say "can you give me the file" so you would paste the files and then it would say "here's a diff apply this" and then you would do this but what it does here is it tries to find it on its own and then I say uh this part here this log I don't know this kind of sucks but well um then oh then I canceled it because I was looking at the code and I said continue to kick it off again um here it just tries to figure out what the thing this class does and you know it gave up I think it couldn't find it um does this and then it went on its own like here's the last message continue it locks at this locks at this I need to understand this it uses search agent looks at this fails, does this, does this, does this, and then it changes it, and it actually worked. Like, it works. If now an MCP server fails to start, we show standard error. And here, now I need to update a UI, right? And it goes, and here I think it, I don't know what that was. Couldn't replace it. That sometimes happens. Searches for stuff, changes it, creates a new component even, imports this, runs the checks, which is specified on agent.md, fix this, let me test this, we don't have any tests, that's fine, blah, blah, blah, summary, here you go. And then I said, I just reviewed the code again, which is also something you now have to do. And I gave it some more information about the interfaces, did it, yeah. And then the interesting bit is now I have, So it basically added a buffer that keeps track of standard error output from each MCP server. If it fails, it puts it in the error. And I'm like, I don't know. We should probably only keep track of the last 50 lines. And everybody listening, not everybody, people might not believe, but I could do this on my own. It's not a hard problem. I could have done this on my own. But instead, I'm like, "can you change this so that it only keeps track of, say, the last 50 lines? And if it's more, it adds this to the start." So the specificity of this is it's nearly pseudo code, right? One level higher. It's a pretty specific instruction how somebody should modify the code. Yes, yes. And why didn't I do it on my own? Because, I don't know, man. like, oh, you have a tendency to mess this stuff up, you know? Yeah. Off by one, who knows? 100%. Slice minus 49. I'm like, how does this work again? Is it negative? What does this do?
Beyang: You're down a rabbit hole. It's longer than you anticipated.
Thorsten: And then it runs a diagnostic. I mean, this is a really dumb example, but yeah, it did it. And I think this is something that most people are starting to realize with agents, you have to know what you want them to do. Like you can, of course, send them off on a goose hunt. But the best results you get when you have like a concrete thing you want them to do. And there was somebody on Twitter who was pretty reflective, and they were saying, I'd write a code by hand because I'm not that good at communicating. So he says he has trouble communicating what he wants. It's easy for him to go and do it. And I think that's already three stages more enlightened than most people who say, like, oh, it doesn't work for me, you know?
Beyang: Yeah. Yeah, I think there's a lot of knee-jerk reactions coming from the chat or even the search world where they type in three or four words and expect it to read their minds. Whereas that works for search because there's enough signal embedded in those three keywords to locate search results. But for actually articulating what you want to be done across a non-trivially sized code base, you need to be more specific because there's otherwise a lot of ways to interpret what exactly you mean.
Thorsten: So this is something from yesterday. We had two Bun build scripts for dev and prod, and I was kind of poking around, and I realized they're pretty much the same. there's like an env var that we could use instead of having two things in the package.json yeah so I'm like "do this can't we consolidate them" and it's like "yes we can do this" and then it's just dumb typing and it's updating JSON and now you can delete this and blah blah blah and then I said we can also move this out and again it's all stuff I could have done um
Beyang: Right
Thorsten: But I don't know like there's other this is easy stuff
Beyang: you got other things to do
Thorsten: Yeah, yeah you know like I don't know it's like a
Beyang: it's like a staff engineer like you could do all the work that a junior engineer does but you know why why would you there's other there's other stuff that you could be doing that's that's yeah you know better taking advantage of your talents
Thorsten: The interesting bit too is what um the big lesson is everything we just looked at is pretty small, right? It's one commit, whatever, pretty small stuff. And what we've all realized, I think, is when you do bigger stuff and you just shut your brain off and it's not good. Like it won't end well. It's, "please build me this large architectural thing where it connects this and connects this." Sometimes you get lucky. Like if everything is aligned and there's a lot of, say, what you want to do is additional to the code base. So there's a lot of examples. Then it can go a long way. But if it's something new, new architecture, and you turn your head off, you end up after 10 minutes and it says, I'm done. And you're like, what did you do? And then you don't understand what even the architecture is.
Beyang: Or it might be broken in some subtle way, and you're like, oh, now I've got to go find the bug in a thousand lines of code.
Thorsten: It's really nasty to review. There's modes where I do this where I basically say, go and do this, and I know the chance of success is 5%. But it's just I kind of want to just explore and try stuff and see what it does so I know what not to do. But yeah, it's basically, it's a lot of small changes. It's a lot of, here, there you go. Look at this. I changed this yesterday and I put like the home bar in the package.json. I'm like, is there some escaping in JSON? And I didn't know that it's easy like this. I didn't know. But I was like, do I need to, does this even work? Does package.json even do this? And then it fixed it for me. And then it does a bunch of other stuff with like env vars and I really don't like writing commands in JSON. And what else? Like, what is this? That's pretty long.
Beyang: So there's a bunch of things like this that it's good for. And I think like everything we've covered up until now is like what I would call like agent 101 or maybe like 102, where it's like, okay, unlearn the habits that you've learned with chat LLM-based tools and try to instruct it so that it can get as far as it can without your assistance. And once – so like getting past that hurdle, I would say, is like the first thing that people have to get past. It's almost like a switch that flips in your brain at some point where you're like, oh, like this is how I use it. Once you get past that point, then people start to think like, okay, this is cool. It just did that thing. I basically didn't have to do much. How can I get it to do like something that's like slightly more complex, longer and longer, larger and larger features, or more nuanced bug fixes. And I think this is where everyone's mind starts to go. And one of the limitations that we've encountered there is just the size of the context window. So can you talk a little bit about the context window and how that affects the quality of the agent? And then let's talk a little bit about how we're solving for that.
Thorsten: So absolute basics. These are large language models based on a transformer architecture. They are, as people in a dismissive way say, they are text completion engines. They complete text. So that means you send something to this large language model and it will come back with the rest of the text added. And the context window is how much of the text you can send to the model and how much it can then complete. So if you say, here's the first two chapters of the Power Broker, the book, write the next whatever sentences, it will come back with some sentences. But if you say here's the first 18 chapters, that might not fit in the context window because it's too long. So it's kind of like the working memory of this thing. Like you cannot fit more in its head at the same time. And the thing to know is that everything in that context window matters in subtle or major ways. That means it's a transformer architecture. So from its perspective, everything you put in the context window is a sequence of tokens, a sequence of, say, characters for our purposes. And in order to figure out how we should complete this text, it's kind of multiplying all of the text with the rest of the text. It's just multiplying a lot of stuff. And that means everything that's in the context window has an influence to what it comes back with. So that's the basics. When we look at this here we see a user message we see a text a tool use a text a tool use tool use text you know whatever under the hood it's the same thing it's it's when I I don't know when I write what remove that this is the description field that I didn't know I said that's the when I write this and send it back up, what happens is that everything from that point upwards, including the tool calls, including the mentioned files, including the system prompt, gets sent to the model. It gets put in the context window and it's basically instructed how would you complete this conversation? And then it comes back and apparently its response is to remove the this is the description thing from this text. That's the most likely completion. And that also fills up the context window. So the longer the conversation gets, we can see my scroll bar here and like the dots. Everything in here is in the context window. This conversation here is the context window. And you can see down here, we used up 85K of 168K tokens. And yes, for everybody's listening, we use Sonnet 4 right now. Its context window is 200K tokens, including the output tokens. So if we want 32k tokens as output, we have to kind of reserve this in the context window and say, well, the input can only be 168 because at any point in time, you want to get 32k tokens back. So the context window is, I think, the most important concept to grasp. Honestly, it's not something I think you can hide or skip. I don't know, maybe I'm too much, you know, my bias is always like, let me see what's going on under the hood. But I truly think knowing the context window is an important thing. And so what you want is...
Beyang: And it's important because, like, the... First of all, there's, like, a hard stop at 200K, right? Minus the 32K that you reserve for the output. So you can't do anything, like, any tasks that, like, are too long to fit within that context window, there's kind of like a hard constraint there. The other thing that we've noticed is that like even as you approach like 60%, 70% of the available context window, there's a certain amount of degradation that happens in model quality. Because I think there's not enough training data when the model's trained that is that size. And so the rough analogy is it starts to behave like more radically, like almost like drunkenly, the more and more like stuff that's in that context window.
Thorsten: Yeah. And we don't ever do this, I think, or most people don't do this anymore, but it has to be said. You want that context window to be focused because you want the agent to be focused on one thing. So in this here, I changed how something is changed in a Svelte UI thing. If I now were to say, "hey, good morning, can you write a database migration for me?" that wouldn't be good because in the context we know there would be now 50% that's about a Svelte component and then you change the topic and now it's about database migrations and remember everything gets multiplied with everything else so your conversation about um the database migration is now multiplied with Svelte stuff and
Beyang: yeah
Thorsten: It might not lead to the best results right like just if you I don't know you leave the party and go to the funeral and you keep your costume on or something it's just it's not it's not appropriate you know yeah and the good the erica I think she shared this yesterday and it's like I don't know if you've she or you was it you or erica the analogy that um you wouldn't cut your vegetables on the same thing on which you just cut like chicken breast or something like yeah it's a different surface like you don't want that stuff to be contaminated. Clean slate. Yeah. Yeah, clean slate. And that's kind of what this is about. It's for each specific task, make sure that the context window contains the information that you think the agent should have and not some other stuff, you know? And again, like that means the name of the game is preserve or context engineering, they call it as of this week, I think.
Beyang: Yeah, someone coined. The term on Twitter.
Thorsten: Yeah, yeah, yeah. Context engineering. Make sure...
Beyang: Prompt engineering is dead. Long live context engineering.
Thorsten: Long, exactly. Yeah, that was it. It's not about prompt engineering. It's about context engineering. But it's the same thing.
Beyang: So there's multiple ways of solving for this, right? Like the goal is to stretch the context as far as you can and to keep it focused. You know, there's things like compaction and summarization. And we implement some of that. But the maybe newest thing that we're working on here, there's a bunch of things that fall under the umbrella of subagents. And subagents help preserve the context window of the main agent. And they also solve for this focus problem that you were talking about. So talk a little bit about subagents. How do they work in AMP? And what you find them useful for?
Thorsten: So let's start. We had it here, right? Was it here? There you go. So this is the first sub-agent we had in AMP. It's this one here. It looks pretty innocent, but this is called the code-based search agent. And again, the definition of an agent is it's a model that has access to tools and it can use those tools to achieve a task. So as you can see here, the main agent, this is our main context window, 16%. It's not a lot, but it wants to figure out where some stuff is. So what it does here is it calls another agent, another model, and sends along this instruction. It says, find where the MCP servers are started and how the envs vars from the form are used. Look for this, look for that. And this is the result. But if we could go back, we would see that what this agent did, it's kind of like the thing that we're looking at, but contained in this smaller window. So what this search agent does is it receives this prompt, and it has, I think, five, six files. It can list file, read file, glob file, find file, or something.
Beyang: So can you zoom in on that, actually? So this is what the main agent is telling the search subagent to do.
Thorsten: Yeah. The main agent says, find where the MCP servers are started and how the NVRs are used. And then under the hood, the subagent goes, And it doesn't have anything to start with. It only has what the main agent says to the subagent. And it goes and finds stuff by, again, doing the agentic loop thing and using tools. And the kicker is that if I could point here and say what was the size of the context window here, it wouldn't be used up a lot more than up here. Because this creates a new context window. It's a new separate conversation. And what we see here is only the input that's in our context window and the result. And the rest is in a different thing. So imagine you are, it's the 60s and you work in an office that has like an archive or something, something. And you have some interns and you're like, I need this file. Like, find me this file. and you don't know whether it's at the front desk, whether it's still in your office, whether it's in the archive, whether it's in your car, whatever. And you send an intern along and the intern goes and it's like, oh, it's not in the office. It's not in the archive. I'm sure it's in the car. It goes out to the car, goes back up to the office because he forgot the car key, gets the car key, gets down there, realizes, oh, the car key doesn't work. I got to get the car key fixed, whatever. Goes on like a little tangent.
Beyang: Yeah.
Thorsten: And you as the person saying, where's that file? You don't have to worry about any of this. It doesn't take up your working memory. It's just that intern's task, and whether they get lunch in between or not doesn't even matter. You only have to send it along and get back. And it's kind of the same here. You know that, I don't know if that's been disproven, but the thing where you can only keep seven things in working memory or something.
Beyang: Right, yeah, I've heard that.
Thorsten: You know, something like this, yeah.
Beyang: There's only so many registers in your brain.
Thorsten: Exactly, yeah, yeah. And with, like, the context window, it's the same. Imagine you would have, like, this search. It fails somewhere in the middle and doesn't find anything. Bam, bam, bam, tool calls. Like, say, 15 tool calls. These would now all be in your context window multiplied with everything else. So is that good?
Beyang: This would be the equivalent of, like, a human going down, like, a rabbit hole. And then, like, who knows how long that rabbit hole leads. At some point, you're like, what the heck was I even doing in the first place?
Thorsten: Exactly. Yeah.
Beyang: Now this is like you're asking someone else to do it so you can remain focused. Yeah.
Thorsten: The customer says, "oh, can you check this for me?" And you're like, "yeah, sure. I'll do this real quick." Then you go to this. Then it's like you need to log in. Where's my phone? Oh, my phone needs a key here. My key is invalid. I need to talk to a security team. I need to do this. Blah, blah, blah, blah, blah. 20 minutes later, what was I doing? And basically with sub-agents, you can say, yes, user, let me take care of this for you. Send over a sub-agent, pause your brain and your work memory, and then the sub-agent comes back. And whatever they were doing is not in your work memory. And that was the first sub-agent. And then we started to play around with other stuff. And then at some point, I mean, I have examples here. Let me open those. so this is this is from Erik and what he was doing was a migration where they don't want to call some method anymore blah blah blah and what we ended up doing is we added sub-agents that this is the first version of the new sub-agents where they could also do a lot more than the search sub-agents they could also change files. They could write files. So that means basically here, this is a big, big task. We're doing a big code migration. Remove the remaining calls. So here it searches for this. Found 41 matches, and it needs to remove them all, I think. And then it decides, okay, I'm going to use... Let me start by systematically updating the key files. I'll start with the focus task, and what it does, it spawns a agent. So if you expand this, this is the end result. Look at how long this is. Can you zoom in a little bit? Yeah, sorry. So this is you can see this, right? This is a single sub agent. Look at the tiny scroll bar here. So it says, "update the GraphQL schema resolver to accept and use gating provider." And I think in this case, we have like a hidden field where it has like more information. So the sub agent says, I'll help you update this. And then look at what it does? Read file, read file completed, read file. And it reads, it searches and it does this and it edits a bunch of files and it needs to update this and read files and it goes off on a big, big journey and all it comes back with is I would say this. I've updated this, blah, blah, blah. These are the changes I made and then I'm done. And then it sends this back up. And look at how early this is in the conversation. Look at this scroll bar here. We're still up here. For the main agent, this basically didn't cost any context window. I'm sure there were some tool calls that failed in there or something that it had to retry and that's noise in the context window. Down there, there's another one, fix the user button call side. Again, look at this. It goes off and fixes a bunch of stuff. The main agent in this case, it kind of turned into a coordinator You know, like just the thing that dispatches work to other agents. So that was the next iteration of sub-agents. And then I think this is the newest. This was three days ago. So what I did was I kind of poked it, right? I'm not sure yet whether this UX will be the thing in the future.
Beyang: Like here you're prompting it to use sub-agents. Because you've developed sort of an intuition for where they'd be helpful.
Thorsten: Yeah, there's no anthropomorphizing of the agent here. There's no, can you help me do this? I know how you're configured. I know that you have access to subagents. Please go and use them. And so here I wanted to speed up the test suite. I gave it a link to like Vitest, like blah, blah, blah, test performance and whatnot. And then it runs the tests and then I ask it to open subagents. And these subagents now, they look a little bit better in VS Code, but these are full-on agents. This is like the main agent just as a subagent to the other one. And each of these is a separate full-on context window. And this is written by the main agent. So this test which shows performance metric based on this, your task, this is written by the main agent to send off this. And then if you scroll through this, you can see that there's a bunch of stuff happening. and again another subagent other instructions bunch of stuff happening bunch of stuff happening and and if we come back you can see up again like the main thread isn't that long then it runs the test and says let me do this and then again spawns a subagent blah blah blah and yep i have to admit it says here oh this is so much faster but at the end of the day it marked out a bunch of stuff it shouldn't have marked out and but it didn't cost me any time i just had this run and and yeah the interesting thing to mention here too is these run in parallel so they can run in parallel so in this case yeah it's a pretty easy to uh parallelize workload like one for each of the slowest tests i guess and it spawned them in parallel and then it just goes off and does it um so that's that's the current iteration of sub-agents. They have full access to tools, full access to MCP servers. There's still stuff to figure out, like what's the best UX, when do you want to use them? Should you be able to, as a user, go in and send a message to a sub-agent and kind of poke it while the other agent waits for the reply or not? This is another example. find all the tool definitions that we have in the repository, make sure they each have a storybook story. Because I noticed that some of them don't. And I think it's used to search agent again. It does this. Let me check this. It says like blah, blah, blah. And I think it's used.
Beyang: Question for you. I think like the search agent works pretty well in that it's like invoked automatically whenever it's used. With the more like generics sub-agent tool, do you have like an intuition for when you like prompt it to use them versus not at the moment
Thorsten: even even like a prototypical intuition? Yeah my intuition is kind of: "are there self-contained tasks that that don't that are pretty simple to do?", you know
Beyang: yeah
Thorsten: Because because you cannot course correct them right now. I'm still exploring them. It's a weird... The code-based search agent, I think that's a natural thing because it's a tool that somebody reaches for. I need to find something. Let me use the tool to find stuff. But with subagents, it's like, "hey, you got an intern. Let's make use of the interns. Don't do this on your own." We could dial it up in the system prompt and say, you should always use subagents for this. But it's also like a UX thing of, do you want to hide a lot of stuff in the subagent or do you want to...
Beyang: Right.
Thorsten: You know, like there's a cost.
Beyang: Sometimes it's good to expose the context.
Thorsten: Like in my case that I showed here, where was it? In the... Where is VS Code? Sorry. Here. This one here, right? "Change this to only log the last 50 lines." It shouldn't use a subagent for this because the subagent has some overhead. It's another inference call. It's another system prompt that needs to be cached and stored. I don't want it to always use a subagent. So in this case, you know, turn it off. Oh, don't even prompt. In this case here, where it has multiple files, it should update, and it's the same change in every file.
Beyang: You stoppped sharing your screen. Oh, sorry.
Thorsten: Oh, yeah, I was sorry. Entire screen. yeah I was saying yeah I hit the button that's right over the other yeah um I was saying in this case here with the 50 lines I don't want it to use a sub agent it's a pretty dumb change I don't need yeah sub agent and here in Chrome uh in Chrome in this thread where it updated three files and had to make the same change in all three files with slight variations I was like "you could use sub agents with this" and I was like "yep let me do this" and it used like sub agents and then each sub says like create this do this update this um yeah
Beyang: okay so we're almost out of time so before we go I want to touch upon like the final thing um which is uh you've implemented a like new subagent that's a separate tool uh that has it makes use of like other models as kind of like the last task so like talk about what that tool is what it's called and why what's the motivation behind
Thorsten: yeah okay so let's start with the motivation um our main tool right now for agentic coding is Sonnet 4 it's pretty good at this it's a agentic model it knows how to use tools it's in our testing for example we found it's much more agentic meaning a lot more eager to use tools than Gemini 2.5 yes um but there are other smarter models meaning models that find more bugs or come up with clever analysis that's cleverer than the other thing or better code review and spots more stuff. And I want to use those modes. And I have to use o3, which I like, like OpenAI is o3. I have to use the ChatGPT app, which I don't like because I have to copy and paste stuff over. So what I want is I want to use that, intelligence of o3 in the conversation with the agent. So I want to say use o3 to debug something and then I want Sonnet 4 to go and fix it. And what I've implemented is let me see if I I got examples here. So there you go. This is from 32 hours ago. I had this problem with like a sick an interrupt signal handler and can we do this but keep the original handler because this overwrote the handler. And then it came back with this. I'm like, "is this really how it should be?" And then I said, where is it? Here. "Ask the Oracle whether there is a data solution." So what's happening here is I added o3 as another sub-agent and called it Oracle and kind of prompted it as a thing that's good for reviewing, for planning, for debugging, all of these higher-level things, not for editing a test file or running terminal commands, just high-level stuff. But I gave it tools and the ability to look and read files autonomously. So this is a tool that the main agent can use. And again, I don't prompt the main agent to constantly use this, but I have it available and I can ask it to use it. So I can basically reach through the agent. And in this case, it asks the Oracle to review stuff. And the Oracle, I think, yeah, you can see it here. So this is what's passed to the Oracle. It's a task and a context and files, relevant files. And this is written by the main agent. So the main agent says, "review this handler, context the user is concerned, look at these files." And then o3 goes off and looks at it, and it came back with this prepend listener, which we didn't have, I think. It had like once, which wasn't a thing, I think. So it has prepensive listeners. So it's smarter, it knows more. And you can see here, this is not like a nice diff or something, but everything we see here in this text box is returned to the main agent. So then the main agent says, "oh, the Oracle suggests a much better approach. Let's use this." And then it goes and makes this edit and implements this better solution. And then it fixes the linting issue and blah, blah, blah. And that's nice. Like it's, it's, it requires some prompting. And of course it would be nice if whenever the agent knows that it doesn't know enough or it's not smart enough, it reaches for the Oracle and maybe we'll get there. But I really like just prompting it here. I ask like, you know, I think you've also bumped into this. Like sometimes the models are eager to suggest stuff that they think is good with like framework stuff, like spelled or effect stuff and I'm like I kind of know this is a weak spot so in this case I'm like okay ask the oracle why I should do this so instead of going to o3 pasting a bunch of files in and saying please come up with a plan for this I now write this in one thing and then I automatically have the answer transfer back to the main agent and then it goes off yep and it fixes it and asks it again and then it makes the change yeah and it's nice I only have one text box only one conversation
Beyang: Yep and and I think like o3 is especially good at this sort of like nuanced reasoning yeah type of thing and it's interesting like I think you're starting to see like model providers run in divergent directions like no everyone's no longer trying to just like copycat uh ChatGPT and find the best chat LLM there's now like different patterns of reasoning and thought that they're optimizing for just because, you know, they have different users of their APIs and they're building different products.
Thorsten: Yeah.
Beyang: And so I think, you know, we'll probably have more to say about this in probably very soon, maybe like a future edition of this kind of like education series. But already it's already very like exciting, I think, to see the power of combining different models together.
Thorsten: Yeah. The cool thing is that we are not one of the foundational model companies. So we can pick and choose, right? And in this case, we can pick and choose Gemini 2.5 and Anthropic, whereas, you know, for coding or whatever, whereas Anthropic cannot do this. Like they won't offer access to Gemini in Claude Code. That doesn't make sense, right? And the other companies, I think what they still rely on is the model selector, where basically for a given conversation, you switch the model and say, now I'm talking to Gemini. Now I'm talking to o3 or whatever. But I think what we're starting to see is the need for people to start multiple threads and then combine them and merge information and knowledge. And with sub-agents, you can have this. You can start o3, give me an analysis, and then the coder goes and implements it. And others flip it around. The smart model plans and sends off the implementers. And I don't know. For us right now, I think having the implementer as the main agent is still nice. It's the power tool thing.
Beyang: Oh, yeah, for sure. For sure. All right, cool. We're basically at time, so thank you so much for taking the time to explain some of the new things that are shipping as well as some of the strategies that you've acquired for those listening. If you like this, please reach out and let us know. We're thinking about doing more of these just because there's a lot of different tips and tricks and things that people have discovered, and we just want to make that accessible to everyone out there trying to learn how to do agentic encoding.
Thorsten: Yeah. Thank you.
Beyang: All right. Take care. See you.
Thorsten: All right. Bye-bye. Bye.