If I'm understanding this glorious hack correctly, this allows Nix to determine if Chrome has an update available:
1) Nix network access is only allowed for things where you guarantee the hashes of what's output. This is generally used to do things like "download http://example.com/release-1.2.tar.xz, it will have SHA1 93f3025c7802a1a11e4f16186089b583ef1095b8"
2) There are known pairs of strings that have equivalent SHA1 hashes.
3) To determine if a fetch would succeed in a "pure" way, write a network-accessing function that will return a "true" or "false" string with identical hashes, then you can use that (supposedly deterministic) string to return a (nondeterministic!) True or False to the caller.
Even after reading this 10 times I still don't understand what this hack is about.
I understand these things independently:
1) SHA1 collision weakness
2) Nix checking package SHA1 when updating packages
3) Chromium returning different SHA1 for each download
Someone made a Nix thing that's supposed to check whether a new version of Chrome is available, and if so, generate an updated package.
While doing so, they found themselves needing some kind of "tryFetch" function that would return true or false depending on whether a certain URL is reachable.
But there's no such function in Nix. Why not? Because you're not really supposed to do stuff like that in the Nix paradigm which is all about determinism.
So... they really wanted to do it anyway, so they invented a clever hack, probably too clever.
What can you do in Nix? Well, you can make a package that downloads a certain URL and uses the downloaded result, provided that all the observable outputs of that package are deterministic. So after the package's build script runs, Nix verifies that the result matches a hash specified in the package definition. If the hash doesn't match, the package fails to evaluate.
So this hacker decided to make such a package that tries to download the Chrome update and results in the boolean information about whether the update was available. But the result needs to have the same hash in both cases. That's where a hash collision comes in handy.
So this hacky build script uses a couple of well-known PDF files that both have the same SHA1 hash. If the update exists, it gives PDF 1, otherwise it gives PDF 2.
The update script then depends on that hack package. It "installs" that package, and then checks whether it actually contains PDF 1 or PDF 2, and now it knows whether the Chrome update was available or not.
> Why not? Because you're not really supposed to do stuff like that in the Nix paradigm which is all about determinism.
That's the part I always feel when looking at FP languages. They sound good on paper, examples are very tempting, but when reality kicks in to this pure, predictable, perfect world, it turns into a massive pain.
The key point is that the Nix scripting environment (think “a Ports manifest”) is an intentionally restrictive language intended to have deterministic results for every operation. It’s not intended to be a Turing-complete programming language; that’s the whole point.
What the author of the script has done here, you’re supposed to do by writing code in some other language that generates a Nix manifest (or just by hand-rolling a Nix manifest.) And yet, the author here managed to get Nix to non-deterministically generate Nix.
Considered from the point of view of Nix’s goals, this is more an “exploit” than a truly-needed feature.
Nix's use of determinism actually has an important purpose, it's not just some arbitrary annoying restriction, it's what makes the whole system work properly. This script is a kind of funny meme that should probably be deleted, and anyway isn't crucial to the NixOS system at all, just a minor convenience and probably the hack was just fun to make.
Generally speaking these sandbox determinism requirements in Nixpkgs/NixOS are not annoying, they are a crucial feature: you know what you get when you install something. But yeah, it is a constraint, and sometimes when you try to package some weird program where the Makefile does some arbitrary network operations, you might find it annoying -- and you can locally disable the sandbox -- but the whole Nix philosophy is that build scripts should be reproducible, so then you just have to fix it.
What's the point of this? If you can't download something without knowing its hash in advance, then you can never download a new version of Chrome or anything else, so why do you care whether one is available?
Nix tries to achieve deterministic, reproducible builds, but Chrome's update process is non-deterministic because of (3). This hack lets the update check appear deterministic.
Can you explain how? I don't understand how a hash collision of two PDF files would help with this. Surely if it wanted to download the file at 'https://commondatastorage.googleapis.com/chromium-browser-of..., it would need the actual hash of that specific file?
edit: I just read mbrock's comment above which explained it perfectly. Didn't realize it was testing the url to see if there was an actual update available, I thought it was doing the actual update.
The cheap way to return more complex data than a boolean is to return more booleans -- i.e. split the version number up in bits and return them one by one. Not saying you should do this though :p
This is funny because it is more the opposite case: they wanted the simplicity of returning a Boolean but the system doesn't allow that in this particular part of the pipeline. So they built a much more complex data structure to implement that simple Boolean.
Allowing broken hashes makes the entire system insecure. If you use broken hashes for security, you might as well not use hashes at all. The entire point of a hash is that collisions are pretty much impossible to generate.
It depends on what you're trying to prevent. Only the provider of the hashed file can set up this type of collision. If you trust them, it's still secure. Third parties can't collide with an innocent file.
Even if you trust the package author is not malicious, you may not trust that they are infallible. They could use this trick to sneak in a bug fix which has an unintended consequence.
One of the great things about Nix is that it keeps packages honest about versioning; you can't sneak in an updated package with the same version number like is possible with other package systems that don't pin to hashes.
> They could use this trick to sneak in a bug fix which has an unintended consequence.
They would have to have set up the hash collision beforehand. It's no longer an 'innocent file'. It's nothing something you can decide to do at a later point.
There aren't any assumptions about DNS or certificate pinning here at all? That you think it's relevant strongly suggests you haven't the faintest idea what's going on.
Second pre-image attacks aren't possible (a theoretical pre-image attack exists for MD5 with difficulty just marginally better than brute force, this is a further good reason to stop using MD5 but isn't an immediate problem). So only the person who made file X1 could have produced file X2 with the same MD5() or SHA1(), since they could have produced both with a deliberate collision, whereas anybody else would be obliged to create a second pre-image.
1) Nix network access is only allowed for things where you guarantee the hashes of what's output. This is generally used to do things like "download http://example.com/release-1.2.tar.xz, it will have SHA1 93f3025c7802a1a11e4f16186089b583ef1095b8"
2) There are known pairs of strings that have equivalent SHA1 hashes.
3) To determine if a fetch would succeed in a "pure" way, write a network-accessing function that will return a "true" or "false" string with identical hashes, then you can use that (supposedly deterministic) string to return a (nondeterministic!) True or False to the caller.
4) This is used to run a command like `curl -s -L -f -I https://commondatastorage.googleapis.com/chromium-browser-of... `. If the command succeeds, we know we can use this version of the browser to update.
I don't know how it reads the channel version data without running into the same determinism issues.
Here's the latest update to the hack, moving from MD5 to SHA1: https://github.com/NixOS/nixpkgs/commit/ed8f3b5fa3cebfc3662a...