sethcronin's comments

sethcronin · 2026-03-13T20:08:16 1773432496

I guess I'm skeptical that this actually improves performance. I'm worried that the middle man, the tool outputs, can strip useful context that the agent actually needs to diagnose.

ivzak · 2026-03-13T23:15:00 1773443700

You’re right - poor compression can cause that. But skipping compression altogether is also risky: once context gets too large, models can fail to use it properly even if the needed information is there. So the way to go is to compress without stripping useful context, and that’s what we are doing

backscratches · 2026-03-13T23:22:08 1773444128

Edit your llm generated comment or at least make it output in a less annoying llm tone. It wastes our time.

thebeas · 2026-03-13T20:18:37 1773433117

That's why give the chance to the model to call expand() in case if it needs more context. We know it's counterintuitive, so we will add the benchmarks to the repo soon.

Given our observations, the performance depends on the task and the model itself, most visible on long-running tasks

fcarraldo · 2026-03-13T20:33:30 1773434010

How does the model know it needs more context?

thebeas · 2026-03-13T20:40:16 1773434416

We provide the model with a tool, we call expand() that allows the model to get access to more context if needed by using it.

We state this directly appended into the outputs so the model knows exactly where the lines were removed from.

kingo55 · 2026-03-13T21:20:57 1773436857

Presumably in much the same way it knows it needs to use to calls for reaching its objective.

Zetaphor · 2026-03-14T21:56:39 1773525399

I'd argue not, as with tool calls it has available to it at all times a description of what each tool can be used for. There's plenty of intermediate but still important information that could be compacted away, and unless there was a logical reason to go looking for it the model doesn't know what it doesn't know.

sethcronin · 2026-03-12T20:47:04 1773348424

I think vibecoding itself is the substance. Like enabling people who don't have software engineering or code writing experience to actually develop software is a value in itself. I don't think there needs to be any other proof of concept than a layman can hop on to lovable base 44. Intermediates can grab cursor and Claude Code and just rip through potential project lists to test things out and see where they could go. I guess my opinion is the tool is the product in this case. Most canvases that get sold don't go to professional artists; they go to amateurs. That's mostly what the canvas market is comprised of: people who want to be artists, though they'll never sell a painting.

sethcronin · 2026-03-12T20:32:42 1773347562

Cool idea -- Claude Chrome extension as something like this implemented, but obviously it's restricted to the Chrome browser.

bayes-song · 2026-03-13T01:14:30 1773364470

I really like the Claude Chrome extension, but unfortunately it has too many limitations. Not only is it restricted to Chrome, but even within Chrome some websites especially financial ones are blocked.

sethcronin · 2026-03-12T20:31:05 1773347465

Oops, i read vault and thought obsidian vault haha - but yeah, one of the issues is if your agent can _execute_ on the secret at all, it can be potentially convinced to use it in a way that does not benefit you, even if it doesn't have access to the secret itself.