Nothing super fancy.
For me “proof” just means the agent has to make its intent explicit in a way I can check before running it.
For example:
1) If it wants to delete a file, it has to output the exact path it thinks it’s deleting. I normalize it and make sure it’s inside the project root. If not, I block it.
2) If it proposes a big change, I require a diff first instead of letting it execute directly.
3) After code changes, I run tests or at least a lint/type check before accepting it.
So it’s less about formal proofs and more about forcing the agent to surface assumptions in a structured way, then verifying those assumptions mechanically.
Still hacky, but it reduced the “creative workaround” behavior a lot.
Is this a policy snippet you add to your CLAUDE.md? Do you still maintain a deny list?
I recently added a snippet asking Claude to not try to bypass the deny list. I didn't have an incidence since but Im still nervous... Claude once bypassed the deny list and nuked an important untracked directory which caused me lots of trouble.