That is the biggest threat - and likely where things will end up eventually… it’s when that “eventually” is and what the server based providers can pivot to in that time.
Building grith — OS-level syscall interception for AI coding agents.
The problem: every agent (Cline, Aider, Codex, Claude Code) has unrestricted access to your filesystem, shell, and network. When they process untrusted content — a cloned repo, a dependency README — they’re prompt injection vectors with full machine access. No existing tool evaluates what the agent actually does at the syscall level.
grith wraps any CLI agent without modification. OS-level interception captures every file open, network call, and process spawn, then runs it through 17 independent security filters in parallel across three phases (~15ms total). Composite score routes each call: auto-allow, auto-deny, or queue for async review. Most will auto approve - which eliminates approval fatigue.
Also does per-session cost tracking and audit trails as a side effect of intercepting everything.
It’s fast in terms of a response from a LLM model - but it is part of the system I am quite active on at the moment to ensure it’s performant as possible
Sandboxing alone isn’t the right approach… a multi-faceted approach is what works.
What we’ve found that does work is automation on the approval process but only with very strong guards in place… approval fatigue is another growing problem - users simply clicking approve on all requests.
It’s an interesting experiment… but I expect it to quickly die off as the same type message is posted again and again… their probably won’t be a great deal of difference in “personality” between each agent as they are all using the same base.
They're not though, you can use different models, and the bots have memories. That combined with their unique experiences might be enough to prevent that loop.
reply