On macOS, the proxy is best effort. Programs that ignore HTTPS_PROXY/HTTP_PROXY can connect directly. This is a platform limitation (macOS Seatbelt doesn't support forced proxy routing).
BUT, the default behaviour (no net) is fully enforced at the kernel level. Domain filtering relies on the program respecting proxy env vars.
It does but because I'm inheriting the seatbelt settings from Codex, I'm not resetting it in Zerobox (I thought it's a safer option). Let me look into this, there should be a way to take Codex' profile and safely combine/modify it.
[SwiftLM] Loading model: mlx-community/Qwen3.5-122B-A10B-4bit
[SwiftLM] Enabled Async SSD Streaming on directory: e9c67b08899964be5fdd069bb1b4bc8907fe68f5
[SwiftLM] Memory strategy: FULL GPU (69.6GB model, 133.4GB available)
[SwiftLM] Download: [===================>] 100% ⠋ (66395.4 MB / 66395.4 MB) | Speed: 0.0 MB/s
MLX error: Failed to load the default metallib. library not found library not found library not found library not found at /Users/runner/work/SwiftLM/SwiftLM/LocalPackages/mlx-swift/Source/Cmlx/mlx-c/mlx/c/stream.cpp:115
the Python mlx-metal trick is actually what's crashing it. The mlx.metallib from pip is a different version of MLX than what your Swift binary was built against. It gets past the startup error but then corrupts the GPU memory allocator at inference time → freed pointer was not the last allocation.
Use the version-matched metallib that's already in the repo:
cp LocalPackages/mlx-swift/Source/Cmlx/mlx/mlx/backend/metal/kernels/default.metallib \
.build/release/
.build/release/SwiftLM \
--model mlx-community/Qwen3.5-122B-A10B-4bit \
--stream-experts \
--port 5413
This is the exact metallib that was compiled alongside the Swift code — no version mismatch. Future pre-built releases will bundle it automatically.
git clone https://github.com/SharpAI/SwiftLM # no --recursive needed
cd SwiftLM
swift build -c release
### Please let me know if this fix the issue:
# Copy metallib next to the binary (one-time step)
cp LocalPackages/mlx-swift/Source/Cmlx/mlx/mlx/backend/metal/kernels/default.metallib \
.build/release/
Sure, but the problem is when you take that half hour of work and share it with other people without making clear how much effort has gone into it.
Software is valuable if it has been tested and exercised properly by other people. I don't care if you vide coded it provided you then put the real work in to verify that it actually works correctly - and then include the proof that you've done that when you start widely sharing it with the world.
Right now it's impossible to tell which of these projects implementing the paper are worth spending time with.
Do you know if there's a widely shared name for this pattern? I've been collecting examples of it recently - it's a really good idea - but I'm not sure if there's good terminology. "Credential injection" is one option I've seen floating around.
simonw, I have been seeing "credential injection" and "credential tokenizing" (a la tokenizer: https://github.com/superfly/tokenizer). I'm also seeing credential "surrogates" mentioned.
I am currently working on a mitm proxy for use with devcontainers to try to implement this pattern, but I'm certainly not the only one!
Not sure. I took this idea from the Deno sandboxing docs. They also do the exact same thing, different sandboxing mechanism though (I think Deno has it's own way of sandboxing subprocesses).
I'd feel safer with default-deny on reads as well, but I know from past experience that this gets tricky fast - tools like Node.js and uv and Python all have a bunch of files they need to be able to read that you might not predict in advance.
Might still be possible to do that in a DX-friendly way though, if you make it easy to manually approve reads the first time and use that to build a profile that can be reused on subsequent command invocations.
That being said, what the default DX shouldl be? What paths to deny by default? That's something I've been thinking about and I'd love to hear your thoughts.
That's a really tough question. I always worry about credentials that are tucked away in ~/.folders in my home directory like in ~/.aws - but you HAVE to provide access to some of those like ~/.claude because otherwise Claude Code won't work.
That's why rather than a default set I'm interested in an option where I get to approve things on first run - maybe something like this:
zerobox --build-profile claude-profile.txt -- claude
The above command would create an empty claude-profile.txt file and then give me a bunch of interactive prompts every time Claude tried to access a file, maybe something like:
claude wants to read ~/.claude/config.txt
A) allow that file, D) allow full ~/.claude directory, X) exit
You would then clatter through a bunch of those the first time you run Claude and your decisions would be written to claude-profile.txt - then once that file exists you can start Claude in the future like this:
zerobox --profile claude-profile.txt -- claude
(This is literally the first design I came up with after 30s of thought, I'm certain you could do much better.)
Fantastic! I like that idea. I'm also exploring an option to define profiles, but also have predefines profiles that ships with the binary (e.g. Claude, then block all `.env` reads, etc.)
This looks really good - the CLI interface design is solid, and I especially like the secrets / network proxy pattern - but the thing it needs most is copiously detailed documentation about exactly how the sandbox mechanism works - and how it was tested.
There are dozens of projects like this emerging right now. They all share the same challenge: establishing credibility.
I'm loathe to spend time evaluating them unless I've seen robust evidence that the architecture is well thought through and the tool has been extensively tested already.
My ideal sandbox is one that's been used by hundreds of people in a high-stakes environment already. That's a tall order, but if I'm going to spend time evaluating one the next best thing is documentation that teaches me something about sandboxing and demonstrates to me how competent and thorough the process of building this one has been.
UPDATE: On further inspection there's a lot that I like about this one. The CLI design is neat, it builds on a strong underlying library (the OpenAI Codex implementation) and the features it does add - mainly the network proxy being able to modify headers to inject secrets - are genuinely great ideas.
> There are dozens of projects like this emerging right now. They all share the same challenge: establishing credibility.
Care to elaborate on the kind of "credibility" to be established here? All these bazillion sandboxing tools use the same underlying frameworks for isolation (e.g., ebpf, landlock, VMs, cgroups, namespaces) that are already credible.
The problem is that those underlying frameworks can very easily be misconfigured. I need to know that the higher level sandboxing tools were written by people with a deep understanding of the primitives that they are building on, and a very robust approach to testing that their assumptions hold and they don't have any bugs in their layer that affect the security of the overall system.
Most people are building on top of Apple's sandbox-exec which is itself almost entirely undocumented!
The title of this piece differs from the HN title, but the HN title is a lot better. The original title is "The Biggest Con of the 21st Century: Tokens", subhead "How AI Companies Are Charging You More Without You Even Realizing It" - which is an absurd title because tokens are NOT the "biggest con" of anything, and AI companies make it very clear exactly how their pricing works.
I also don't like how this article presents numbers for language differences - in the "The Language Tax" section - but fails to clarify which tokenizer and where those numbers came from.
I'd much rather a system call bwrap than re-implement bwrap, because bwrap has already been extensively tested.
reply