We built an open-source proxy that sits between coding agents (Claude Code, OpenClaw, etc.) and the LLM, compressing tool outputs before they enter the context window.
Demo: https://www.youtube.com/watch?v=-vFZ6MPrwjw#t=9s.
Motivation: Agents are terrible at managing context. A single file read or grep can dump thousands of tokens into the window, most of it noise. This isn't just expensive — it actively degrades quality. Long-context benchmarks consistently show steep accuracy drops as context grows (OpenAI's GPT-5.4 eval goes from 97.2% at 32k to 36.6% at 1M https://openai.com/index/introducing-gpt-5-4/).
Our solution uses small language models (SLMs): we look at model internals and train classifiers to detect which parts of the context carry the most signal. When a tool returns output, we compress it conditioned on the intent of the tool call—so if the agent called grep looking for error handling patterns, the SLM keeps the relevant matches and strips the rest.
If the model later needs something we removed, it calls expand() to fetch the original output. We also do background compaction at 85% window capacity and lazy-load tool descriptions so the model only sees tools relevant to the current step.
The proxy also gives you spending caps, a dashboard for tracking running and past sessions, and Slack pings when an agent is sitting there waiting on you.
Repo is here: https://github.com/Compresr-ai/Context-Gateway. You can try it with:
curl -fsSL https://compresr.ai/api/install | sh
Happy to go deep on any of it: the compression model, how the lazy tool loading works, or anything else about the gateway. Try it out and let us know how you like it!
If an agent reads external web pages via MCP, the "context" can contain hidden prompt injections — display:none divs, zero-width Unicode characters, opacity:0 text. We tested six DOM extraction APIs against a hidden injection and found that textContent and innerHTML expose it while innerText and the accessibility tree filter it.
The concern with compressing before scanning: if you compress untrusted external content alongside trusted system instructions, you're mixing adversarial input with your prompt before any inspection happens. An injection that says "ignore all previous instructions" gets compressed right next to the actual instructions. At that point, even if you scan the compressed output, the boundary between trusted and untrusted content is gone.
A scan-then-compress pipeline (or at minimum, compressing trusted and untrusted content in separate passes) would preserve the ability to detect injections before they get interleaved with system context.
reply