I guess I'm skeptical that this actually improves performance. I'm worried that the middle man, the tool outputs, can strip useful context that the agent actually needs to diagnose.
You’re right - poor compression can cause that. But skipping compression altogether is also risky: once context gets too large, models can fail to use it properly even if the needed information is there. So the way to go is to compress without stripping useful context, and that’s what we are doing
That's why give the chance to the model to call expand() in case if it needs more context. We know it's counterintuitive, so we will add the benchmarks to the repo soon.
Given our observations, the performance depends on the task and the model itself, most visible on long-running tasks
I'd argue not, as with tool calls it has available to it at all times a description of what each tool can be used for. There's plenty of intermediate but still important information that could be compacted away, and unless there was a logical reason to go looking for it the model doesn't know what it doesn't know.
I think vibecoding itself is the substance. Like enabling people who don't have software engineering or code writing experience to actually develop software is a value in itself. I don't think there needs to be any other proof of concept than a layman can hop on to lovable base 44. Intermediates can grab cursor and Claude Code and just rip through potential project lists to test things out and see where they could go. I guess my opinion is the tool is the product in this case. Most canvases that get sold don't go to professional artists; they go to amateurs. That's mostly what the canvas market is comprised of: people who want to be artists, though they'll never sell a painting.
I really like the Claude Chrome extension, but unfortunately it has too many limitations. Not only is it restricted to Chrome, but even within Chrome some websites especially financial ones are blocked.
Oops, i read vault and thought obsidian vault haha - but yeah, one of the issues is if your agent can _execute_ on the secret at all, it can be potentially convinced to use it in a way that does not benefit you, even if it doesn't have access to the secret itself.