Is it similar to rtk? Where the output of tool calls is compressed? Or does it a...

BloondAndDoom · 2026-03-13T19:59:20 1773431960

This is a bit more akin to distill - https://github.com/samuelfaj/distill

Advantage of SML in between some outputs cannot be compressed without losing context, so a small model does that job. It works but most of these solutions still have some tradeoff in real world applications.

thebeas · 2026-03-13T20:22:59 1773433379

We do both:

We compress tool outputs at each step, so the cache isn't broken during the run. Once we hit the 85% context-window limit, we preemptively trigger a summarization step and load that when the context-window fills up.

esperent · 2026-03-14T08:34:58 1773477298

> we preemptively trigger a summarization step and load that when the context-window fills up.

How does this differ from auto compact? Also, how do you prove that yours is better than using auto compact?

ivzak · 2026-03-15T05:59:52 1773554392

For auto-compact, we do essentially the same Anthropic does, but at 85% filled context window. Then, when the window is 100% filled, we pull this precompaction + append accumulated 15%. This allows to run compaction instantly