Episode 3
March 28, 2025
Quinn and Thorsten start by sharing how reviews are still very much needed when using AI to code and how it changes the overall flow you're in when coding with an agent. They also talk about a very important question they face: how important is code search, in its current form, in the age of AI agents?
Transcript
Thorsten: The ideal scenario, which every CEO, CIO, COO, CTO would pay a lot of money for is that if somebody has a question like this, they can press a button and a ghost of a senior engineer appears and answers that question perfectly. Like they would pay a lot of money for this. So now we have these agents.
Quinn: Not even perfectly. Senior engineers are not perfect.
Thorsten: They're not perfect, right. But they would pay a lot of money. And now we have these agents that are still slow. Can still, you know, they don't know whether they're wrong or right, which, you know, let's leave the philosophical debate about the senior engineers aside. But what do you think? Like how, give me your thoughts on this. Like how does this relate to code search? How does it relate to search? Like what will change? What will not change? Software engineer here at Sourcegraph and with me is Quinn, Sourcegraph CEO. Hi Quinn.
Quinn: Hey Thorsten. We've been writing a lot of code with this thing.
Thorsten: Yes a lot of code it's been I don't know two weeks since the last episode and I think you wrote a lot of code on the on the server side um basically I think that didn't exist when we recorded last episode I was everywhere on the client side I guess what do you think like what do you think after these thousands of lines of code I'm hearing a lot of I feel like there's this culture war going on right now with the vibe coders versus the traditionalists or something. And the traditionalists would say, I'm skeptical of AI because you don't know what it's doing. You don't know what the code it's writing. And it kind of, to steel man their argument, it would be that it does something and it might work, but do you truly understand the code as if you had written it yourself? Meaning when something goes wrong, do you know how to debug it? Or are there subtle bugs that slip in because you didn't write every line? So what's your impression?
Quinn: Well, a lot of things. First, I'm still processing this change has happened. I've been coding since I was like eight years old. I love coding. I think everyone knows I love coding. And in the last few weeks, it's gone from having AI write like 40% of my code, but I'm still involved to AI being the first drafter and writing probably 85% or so of my code. And I have not yet processed that forever change to this hobby that I've had since I was a kid.
Thorsten: Yeah.
Quinn: And I like it. It's so thrilling, but it is a huge change. It's like if you love doing, playing soccer or football, and then all of a sudden that sport just vanished off the planet. It's weird. Yeah. But with respect to that concern about, oh, I don't know what the code is doing. You know, you have a choice. You can take the AI's code as the first draft and go and figure out what it's doing. And I've been doing that a lot because on the server side, as you mentioned, there's a lot of auth sensitive code and it deals with more moving pieces, more APIs. And when I find I'm doing other things, this first draft is usually okay. But you know that, you know, there's always that guy who whenever you write some code, you push for the repo, you know, he goes and rewrites it. It's like, that's probably not a good use of that time. And people have done that with human stuff too. So you've always got this choice.
Thorsten: Yeah, I agree. I think the mental model that I came up with is, and also to go back one step, yes, I completely changed for me. I've become super lazy to write code. I have to change five lines in a file and I will ask the agent to do it because I'm lazy. And my whole like mental approach to programming has changed in that I don't see it as I modify the text in this file. It's more this, I kind of guide and instruct this other thing and give it enough information so that it would do what I would do. And the mental model is, you know, paint by numbers. And that phrase has been throwing around a lot, but, you know, paint by numbers, The way I know it is you have a blank or picture with no colors and there's little areas and each area has a color. And what I think of with the agent is that it's my job as someone who knows what he's doing to kind of draw those lines and then instruct the agent to fill in that number. But I need to be sure what the lines are. In a recent example, I had it write like a little component, the front-end component, and I knew exactly it's going to create a component, one file in this thing to render this little piece. And I know these are the lines and I watch it that it doesn't go out of the lines. And as long as it's in there, I don't really care about the code. And then when I do something last week where I went through the whole stack to, I think it was cancellation or retrying or something. I go in and I take a look and I try to figure out what would be the lines that I draw in the sand for it to fill out. And then I instructed to fill it out. And that means like I, before it does, I don't go in blind and say, "just do this." I try to think like, okay, what information does it need? Like, what do I need to put into context window? What is roughly the architecture of what I wanted to do? And then I instructed to fill out the blanks.
Quinn: I love that kind of metaphor. There's some people that would say, well, if you're going to do all that work, that's more work than actually writing the code, or that means that it doesn't get it. Or there's some people who just struggle to understand how to give it the lines in which to paint. And then they say, "well, this doesn't work." What do you think is the disconnect from what you have realized and from the vast majority of the world out there that has not realized that?
Thorsten: Yeah, that's a good question. I think there's a lot of variables to play into this. The big one is, would I be really faster to write it myself than if the agent were to do it? And I think that's a subtle question because right now we're working in TypeScript. I'm not a TypeScript expert. I'm not an expert in Svelte. So there's a lot of stuff that even if I know what I would have to write, but the import has to do this, or you write a component like this, or, oh, there's like a Svelte head tag to put something in the header. Like this is all stuff that I would need to look up. It's not something that would block me, would just take time. And for that type of stuff, you know, it's fast if I instruct the agent to do it because either it has the documentation in a context window because I put it there and it can look it up faster than I can or it knows by what's in its weights basically what to do. But that being said, I do think there are circumstances where say you have like a really large files or some complex architecture that's hard to express and there's like some context that it's hard to express where, oh, don't touch this part. We want to migrate this. Don't touch it now. This is all legacy stuff, blah, blah, blah. And then where sometimes you just know, oh, I need to change this one line to do something, right? And then it might be faster for me. But I also think the disconnect is that it's something...
Quinn: Actually, even in that case, the funny thing is, even if you're changing a single line, and most of the time you're working on a team, that means you take a ticket off, you assign it to yourself, you go check out the branch, you go stash, you go rerun Bazel, whatever. You change that line. You test it. You wait for CI. You write a commit message. You write a pull request. You wait for someone else to review it. I actually, when you stack all of that up, telling an agent to go fix it and giving it that one sentence, and it can do all that other stuff, and it can wait for the test. It can be all async. I mean, actually, that sounds pretty damn appealing.
Thorsten: Yes, I agree.
Quinn: It's never just fixing one line. That's never the comparison.
Thorsten: Yes. That's true. And I think it's people underestimate how much toil or, you know, like weird busy work you have to do something. Like it's not, when is it ever, oh, I have like this one file where it's just, I know all of the dependencies and nothing ever breaks and I can just write some code. And I mean, that would be the ideal scenario for an agent. But still, like in the reality, I find there's a lot of weird stuff. Like to give you another example, just today. So I wrote up this whole blog post thing and it has like a lot of markdown blocks in it. And it had like these markdown blocks that would show CLI output from another file as program. So just text output. But what I wanted was that it looks like in the terminal, like it has colors. So this is like six, I don't know, six, what is it, blocks. And I knew exactly what to do. You write a component that takes input and you say, this has this color and this has that color. Whatever then you change that other file and you take the markdown block and put it into this component and render it and I just told the agent to do it and while it was doing it and I was watching and it would do it step by step like it would take these blocks out I realized that oh my god like I'm super fast at vim but still like this like oh copy this out then take this textbook format into an array of strings and do this and do that like yes I could have done it myself but I think you overestimate how much you know formatting and whatnot and stuff goes into it and when you can just watch an agent you switch into this different mind where you're not you're not thinking about the keys you're hitting oh you know like the commands you're using but you think about what's actually going on here like is this the right abstraction yeah and I find it enjoyable I find this, it's fun. Like it's lighter. It feels like you don't have to rename files and then reload this and copy from this. And oh shit, now you deleted the thing that was in your clipboard and blah, blah, blah, blah, blah, all of that.
Quinn: Yeah.
Thorsten: It's just, okay, do this, go. And it does it. And it feels like you have more, I don't know. I don't think it in terms of, you know, and I think that's the problem. Like often people judge this on like, oh, will this replace programmers? And I don't think even in that direction. I think in terms of how does it augment me? Like it's a tool I want to use and it augments, like it makes me faster. It lets me take longer strides basically.
Quinn: Yeah. And so, you know, another argument people might make is, well, maybe it makes you feel faster or maybe it makes you faster in the short term, but you're going to end up with an unmaintainable mess and you had no idea how it got there. And this is where I want to get your thoughts on the thread sharing, because it actually feels like with that, we are building up more context about why and what the intent was behind the code than if you were not using it. Do you want to give a bit of background on this?
Thorsten: Yeah. So the background is, again, I think two weeks ago, right before we record the last episode or something, you added the server-side component to this, which means that on the client, you can say you're in either an isolated mode, which is you're on your own locally and you talk to Anthropic directly, or you're in connected mode, which proxies your requests through us. And then we store your conversations and all of the tool calls that you make on our server, which means you can then in your UI say, give me a URL for this conversation and you can shared with the rest of your team. And the interesting way that I think what you're alluding to is that you can then see how something came to be, like how a little piece of code came to be. And there were some huge surprises where somebody was saying, this is the missing feature. I've been missing this the whole time. I was saying it's really good for debugging, but that's another thing. When you share this, you see how others prompt and you see how others, like how they cut up problems like even what's the scope of one thread you know what's the what's the size of the problem they throw at the agent and then you see you know what are the constraint that person the human puts at you know towards the agents it's like no it should do this it should do this so
Quinn: Yeah
Thorsten: Yes like I agree I think it's super it's super valuable and this is also something you know there was talked a lot about at Zed that you can only capture so much in git commit messages it would be ideal if you could see how the code was written, you know, if you could see how it came to be. And then, I mean, if we take, I don't know, five steps towards the future, can we make better tools by looking at our code and analyzing how the code came to be? Like, is this another thing? Like if we stored this in the future?
Quinn: Yeah. And you should have the prompts that were used stapled to the commit message and to the pull request and this context and all of the context that was used in the agent's workflow. And in theory, you could imagine if you change one of the requirements, let's say some new security requirement or something like that, well, you should be able to see what is all the code that was generated downstream of that document. And do we need to update it? And that's really interesting. It's more data that you're bringing and it feels like you get this for free with thread sharing.
Thorsten: Yeah, yeah. It's just another, it's a, it's a, yeah, I don't know. I think there's 10 dimensions to this that are interesting, but this one that you can see how code came to be is really interesting. And it's something that, you know, I don't know, thinking out loud here, but it feels like it's hitting that same area that people try to solve with like stack diffs or something like this, where it's, oh, or even commits, you know, when, I don't know if people still care that much about it but it's this go and look through this PR commit by commit to see how it came to be and obviously we're not there let's be honest like a thread it doesn't yet allow you to view it at this you know some people you know often careless and it's not that ordered as like a nice revised commit message but I think it goes in the same direction at least right that yeah
Quinn: Yeah. Do you remember coding before there was this idea that you should push up a work in progress pull request? I mean, do you remember coding in the subversion world or CVS?
Thorsten: No, I've never done that. I was no VCS ever straight to Git then. Like a decade without anything of that, push FTP to production and then Git.
Quinn: Yeah. Well, even if you're not pushing up commits, this idea that you only see your team members work when it is completely ready for review. You don't have any way of seeing what's coming early. And it was a big transformation. And teams that go push up work in progress pull requests, you have all this serendipitous discovery that you don't get otherwise. And this feels like another level there. And then also the social proof. So going back to what we were talking about at the beginning about how there's some people that really smart people who want to get it and they just don't and they they struggle to use these, they learn how to use it. But also sometimes it gets you over the hump and you realize, hey, this other really smart dev that I work with, they're able to be really successful with it. So there must be something here. And that has, I think, helped it click in a few people that we work with's minds to say, hey, like this person who I don't think of as some AI fanboy is really using a lot and clicking.
Thorsten: Yeah, I think that's true. And yeah, it's at this point, it's, I don't know, just to come back to this discussion, it's just the code that we wrote is it's not slop. It's not bad. And we wrote, I would say it's 80%. I don't know. Like it's a high number of code that was written by AI. And I mean, I'm thinking back to your refactoring with like the thread work and whatnot. It's this, the architecture stuff is still, I think a human thing, but filling out the files itself or the components, a lot of that is. And isn't that argument, like isn't that an argument in itself? Like the thing that you're using was written by itself. It apparently works, you know, and it's maintainable. Otherwise, we wouldn't be shipping, you know, whatever, 30 commits a day or something.
Quinn: Yeah. Yeah. Let's talk about the feedback loops.
Thorsten: Yeah.
Quinn: So, you know, we have this hypothesis that getting the agent into a feedback loop is one of the most important things. Making it so that it can have really good tests to run. It can run them fast. It can run them in a granular way. Going beyond tests, static analysis, diagnostics, looking at the browser, trying out the app, deployment logs, all these things. And when you start a new app, it's an opportunity to reset your tooling, to have simple tooling. And we have had the agent get into a lot of these loops with tests and it can be great, but still it doesn't know how to run the tests exactly. Sometimes it'll run all the tests in a subproject. Sometimes it'll struggle with the relative paths or PNPM quoting or things like that. So what have you found to work and what's your thoughts on how do we get more feedback loops?
Thorsten: Yeah. So what we have is, right, we have our memory file and in it we say how to run certain tests. And then just to describe the reality, sometimes it gets it wrong and it uses like some way to run tests that makes it block because the test runner thinks it's a TTY and then you have to tell it do this and do that. Ideally, right, it would always know what the best way to run, like how to validate its work. But it's often hard. And even as a human, you don't know. And what I found, what works is basically when I know that it will run into problems, I tell it beforehand. And I say, after each little thing, make sure you run this command. And then it's the type checker or, you know, whatever, linter or something. I cannot even tell you when I know this will be useful, but it's, I don't know, if it touches multiple files and you, over time, you learn when it has like these blind spots where it's super confident in what it's doing. And then I nudge it and say, "no, no, no, run, you know, get the diagnostics for the file, do this." But ideally, you know, you wouldn't have to do this. Ideally there would be something that, thinking out loud, but it's this ideally would be something that watches the agent and or something that can always give feedback to the agent and say, what you just did led to this. Then obviously if you're in a large mono repo, that means you're waiting four minutes for every edit, which is also not feasible because the build is usually slow or something. And also on the other hand, you often have better results when the agent decides to do something on its own or knows how to use these tools.
Quinn: And that component, it's really tantalizing because it feels like a component that is just looking at what the agent did and then figuring out what is the fast test or build command that it can run. It feels like that could be a mostly deterministic component and it is not that hard to build. It also has one interesting dynamic in that it is some upfront work that is the kind of uncelebrated work that Often the DevTools teams that we love inside companies do. It's setting up the build, fixing CI, but it's critical work. And for that reason, I think a lot of the AI coding to date has been done by individuals, a small fraction of the devs on a team that get it and the rest don't, and they haven't gotten the official kind of setup from their company. And they've been working without these feedback loops. But it is not hard to say, given these two files that change, what are all the tests that might need to run? And I'm not talking like...
Thorsten: It's not hard? Isn't that so...
Quinn: Look, there's a world in which you're saying, give me the minimal set of tests that must run in like a Facebook-size, Meta-size monorepo. That is hard. But what we're talking about is literally, you know, we've got a repo with seven PNPM sub projects, and I make a change in the server directory, I should only run the test in server.
Thorsten: Yeah.
Quinn: Like, that is not hard, and that alone would be significantly better. And then the next step is, like, take a look at the test file that I changed and pass that to PNPM test. Oh and make sure to strip off the server slash because we're passing the subdirectory to pnpm -C it's like little things like that that are not hard and we don't need to solve it in the general case I'm not saying hey let's let's put down our pencils and wait until the whole world is adopted perfect perfect basically you know not saying that
Thorsten: Yeah it's it yeah I think there's some practical I mean the evidence is we have like this markdown file that says here's how you run the test and it goes pretty far with just this like you change something and then sometimes it realizes there's a test right next to it and then it on its own says like oh I need to make sure to run the test and it already figures out like based on this description of how to run stuff and these tests do this but yeah like I don't know I feel like there's like this 80:20 thing where 80% is like easy maybe and then the last 20% will be hard and the question is..
Quinn: And you cannot like... It is okay if it's 80:20 when it's like drafting a whole new bunch of big code but if it's 80:20 and getting into the feedback loop it needs to get into that every single change it needs to be really reliable and also we're just doing it for tests and then you get to browser use you get to static analysis you get you know there's so many other things I think that this component needs to be better
Thorsten: Way better than 80:20. Okay how you've been tweeting about this I've been tweeting about it how do you think... Do you think — I mean I know the answer to this but — how do you think build systems and say the code base will change to accommodate for this? My hypothesis I'm saying I'm sorry I'm saying we will bend to the agents we will not we will not make the agents, I don't know, Bazel, whatever. We will change our code base to make the work for the agents better.
Quinn: Yeah. I think it's about incentives and I agree. What is the incentive right now to have a really good build system? I think you only see the perfect build system at Google and Meta. And anyone listening who's been at those companies would probably tell me all the things they didn't like about those systems, but it's an incredible amount of work because you're only benefiting the humans. And when you can benefit the agents and have something that if you have a good environment for them to code in, you can get millions of times output more than any single human. I think the fundamental incentive to make that environment really good is way larger. So I agree. This is going to completely change the calculus there.
Thorsten: Yeah.
Quinn: And as for what does that actually look like? I don't know because it is so early and people have not done the basics for the most part. The people you see talking about having stuff run tests, it's coming from these markdown files in the repo. It's very inexact. And we know. So it's a lot of people that are seeing the very early promise, but when they're tweeting, they're probably not talking about all the times that it fails. Unlike us, which we're trying to do.
Thorsten: Yes.
Quinn: What do you think?
Thorsten: Nothing more to add to this. I think it's... I don't know. I don't have an answer. I think it's truly a step-by-step we'll get there. And then in a year, we'll look back and realize, you know, a lot has changed.
Quinn: Yeah. Well, one thing that will change in our repo that we're working in is, okay, we can run tests. Also, the browser use. Browser works great when it's that storybook and the unauthenticated page because it can open that up. That's awesome. But for the whole server component, it always gets blocked at the authentication. So one thing I did over the weekend was I pulled our users into our database rather than being on WorkOS. And that means that it's going to be a lot easier for us to have, like, here's the MCP server for Puppeteer, and it can sign into the app in dev mode with special credential. And then I think we'll be able to get a lot farther with that visual feedback in the server. So stay tuned.
Thorsten: Yeah. I want to end with, I have two controversial topics. I'm trying to think of which one we should talk about. Let's talk about code search. I think as background, when we talked in December or something, I said that, you know, Sourcegraph is not a code search company. I said this to you before I rejoined. And my hypothesis was and still is that nobody pays us to give them a search box. What they pay us for is the ability to make sense of their code base or to answer questions about the code base. And the UI is just one way to do this. And that's the background. That's the background of my thinking. So now I built this in a single day. I built this search agent that our agent uses, which basically has a bunch of tools available to it. And when you ask it a question, you say, for example, which I did, like how does authentication work when proxying requests to Anthropic? And then it goes and it does what a human would. It looks for like it does a keyword search for auth. It then, you know, what files look like they have plausible answers to this. Then it opens these files and it does globbing to find other related files. It lists directories and all of this. And out comes like...
Quinn: It's an Agent loop inside of that, which I think it's important to be clear about.
Thorsten: Yeah, it's an agent. It's a loop, right? But it's not, there's no deterministic nothing. It's truly the agent. You give it a query. You give it a prompt. It has tools. It goes off and finds stuff. And then it comes back and has like this report. And I sat there thinking, this is what it's about. Like this is what people want. Sure, it's still slow. I'm saying this stuff will get faster and will get more efficient and we can still tune it. But basically, if you have a question such as how do we authenticate requests, the ideal scenario, which every CEO, CIO, COO, CTO would pay a lot of money for is that if somebody has a question like this, they can press a button and a ghost of a senior engineer appears and answers that question perfectly. Like they would pay a lot of money for this. So now we have these agents.
Quinn: Not even perfectly. Senior engineers are not perfect.
Thorsten: They're not perfect. Right. But they would pay a lot of money. And now we have these agents that are still slow. They don't know whether they're wrong or right, which let's leave the philosophical debate about the senior engineers aside. But what do you think? Like how, give me your thoughts on this. Like how does this relate to code search? How does it relate to search? Like what will change? What will not change?
Quinn: I have seen people on the team who've been steeped in search for a long time be quite impressed by this and feel like it changes things. I certainly feel that way. And I think, you know, it's all about what do you mean by code search? You mean by code search something that uses like postings lists to find terms and runs TF IDF. Well, no, I mean, we decided that's not what we meant at code search. Actually, our initial code search, it didn't even have an index. It would do a live search. And it turns out based on the need, you're not usually searching for the needle in the haystack in your code base. You're usually searching for something that's fast to get the first 10 results back. And then with embeddings, now a lot of people think that search needs to have embeddings. And if it doesn't, then it's not even... So search, it is ultimately a tool that a human uses. And the human is writing this agentic loop of doing seven searches on Sourcegraph, looking over here, you know, trying things. And, you know, we're kind of up leveling search and what it means there. And what I love about the code search agent that you built is it shows its work and it produces a report. And it's something that you can, you know, get greater confidence and you see what it's doing. You can actually see what it comes back with. Usually it's correct. And it's hard for it to be incorrect because I don't know if it's something you did, but it doesn't really like take that many leaps... It's, it's looked at so many source documents and I think it's pretty well grounded in them. And it's clear that this is something that takes longer, which is another constraint. We have to relax for changing the definition of a code search, but it is something that you, once you see it, you start to want it, you know, in a lot of the times that you yourself are doing that, you know, deterministic code search takes like 200 milliseconds.
Thorsten: Yeah. It's okay. I have to ask this. It's nitty gritty. It's details, but I find it so interesting because it comes up every day. Every day in our channel this question comes up. It's when you say it produces this little report, which is good, and it shows you what it did. Are you saying this as a developer of the tool or as the user? Like as a user, do you just want like the little magnifying glass thing? And then it says like, I found some files or are you as a user interested in seeing what it did? Because that's also different from, you know, search to classic code search. And I think it's also something that people now talk a lot about now with all of the deep research stuff where like a model goes out and fetches stuff for you and has citations, right? And says like, I found this on this website.
Quinn: But, you know, Thorsten, we should just get rid of the UI. Just go make the code change. You know, let's just hide the code. You know, yeah, actually, we've gotten a lot of user feedback that's really interesting that's along the lines of, "hey, I don't want to see everything that it's doing." And we've been very wary. We love that people say that, but we've been very wary because we do not want to set the wrong expectation that this is magic. And I think a lot of people might have some really good experiences with it. And they'd say, "oh, it's a lot of noise", but we need to be really, really honest about where it is. And we feel that we do need to show it. So I'm speaking as a developer there. I think a lot of users would say, "don't even show me that stuff." But I think we need to. It's, we know better than them.
Thorsten: Yeah, I agree. So yeah I I I don't know it's an eternal struggle for me I like all of the I like the logs I like and you know how when your linux machine boots up and it shows you all of that stuff and you don't know what that even is at most of the time where you're like ah it does something
Quinn: Yeah
Thorsten: It's good to know so I often fall into that mode even though it's not good
Quinn: Yeah so why did you make the search agent a kind of separate inner agent why not just expose the same tools to the top level agent?
Thorsten: So the main I don't have an answer you know like I built I think last week I don't know and I ripped out the old thing two days later and I can't say I validated this and it will stay like this for all eternity but the thought process was that these things are fallible and they sometimes go off the rails and do not find good stuff and when you don't have a sub-agent you only you know per agent you have one context window so if you talk to your agent and it goes and it tries to find stuff for you everything that it looks at every file that it opens every directory it lists ends up in the context window and you know related to do we hide abstractions or not whatever you have in the context window outweighs a lot of other stuff that the model was trained on. Like the context window is sacred. So if you imagine you ask it about database something, something, and it brings up, say, old migrations that are no longer relevant, and it's 15 files with 500 lines each. And even if it says old migration in the file name, and it says ignore this in the file name, If you blast that content of those files in the context window, it will do something to that context window and to the bias of the agent. It's also like a human. Like if I put up 15 post-its here that say don't think of an elephant, guess what I'm going to do? And the goal with the –
Quinn: If I join a company, all the senior devs are like, hey, you don't need this design document. Never look at this document. But every time I walk by their computers, they always have it up. I'm going to say, hey, maybe there's something. I'm going to get confused.
Thorsten: There's something to it. Yeah. And so the idea with the subagent is that it has its own context window and it only comes back with a reply. So the main agent says, you know, I instructed it to basically treat the subagent as a researcher. So if it has like a semantic question, such as how do we do authentication with fireworks or whatever, which is not like a ripgrep thing where it kind of "knows" in quotes that it will need more than a single ripgrep. It asks the agent, the search agent. And that has its own context window. So it goes off the rails sometimes and comes back with the wrong thing. And then the other agent is not dirtied by it and can ask it again. And that gives you a new context window again. I don't know. I don't want to pat myself too much on the back. We should pat the model developers and Claude and whoever else on the back. But it works surprisingly well. It's eerie how it asks questions. Like this morning it would ask, I added markdown rendering to the Svelte thing. And I wanted the page that I added, I wanted a different content based on where the user's locked in or out. I didn't want the redirect thing. And it searched, it asked the agent, how do we handle authentication and user redirects in the server app? That's a question I would turn around and ask the engineer sitting next to me. How do we do this? And then he would say oh look at this file guess what the search agent found it and said like it's handled here and here and then the main agent goes like oh if you want you know access to this information and you want no redirect you have to do this and this and that and that's mind-blowing like
Quinn: Yeah
Thorsten: I don't know it's crazy to me still...
Quinn: Yeah... and there's two things that this touches on one is we're really trying to figure out when do we know better than the model and when not and in this case, we think we know better than the model and we give it this inner loop, but you could almost say that we're letting the model pick which failed steps to omit from its own transcript. So, you know, we're, we're not being too, oh, we know better than the model. And the second thing is what is the right kind of tool to expose? And you look at a lot of the MCP servers out there, like the GitHub one exposes like 20 different tools to the model that feel like at the wrong level of granularity. But, you know, ripgrep and a code search agent, those are two different levels. I think both are probably useful as tools, but we're trying to figure that out. And when that gets back to feedback loops, do we want to be exposing a run terminal command where we have it constructed? Or do we want to give it a, I don't know, check my work command based on the previous transcript, you know, probably something in between. And we're trying to figure that out.
Thorsten: Yeah, it's, I don't think more is better here. Like with these tools, I don't think you will end up with a good experience if you have a thousand tools available. It's, I mean, I've been saying this in every episode we recorded 15 times, but would you as a human do well if I, with every choice you make, I present you with a dropdown of 50 different things you can do? Probably not, you know? And I think curated and tuned tools will, I don't know, that's my bet. I think a curated set of tools will lead to better success in the future and not, I don't know. I saw something earlier today, like another agent framework, and they were like thousands of tools for your agent. I'm like, okay, sure, that's cool if you can enable them, but don't enable all of them by default. I don't think that's a good idea.
Quinn: Why stop at a thousand? Give it a million.
Thorsten: Give it all of them, yeah. Generic tool to do everything. But yeah, exciting times, man. Yeah. A lot to do
Quinn: Yeah yeah this is so fun
Thorsten: All right then let's wrap it up and then I'll see you all next week
Quinn: Happy coding
Thorsten: Bye bye