ChaCha20 is a refinement of Salsa20, which is probably Bernstein's best-known crypto design (it survived the eSTREAM contest to become one of the final portfolio ciphers). Bernstein wrote an extremely readable design paper on Salsa20:
Salsa20 is essentially a fast hash function run in a carefully designed counter mode. If you don't care about speed, you can turn any secure hash function into a stream cipher by, for instance, running the HMAC of that hash in counter mode. Here, Bernstein has designed the Formula 1 car of hash cores to run quickly in software without side channels as the basis for a counter-mode stream cipher.
Poly1305 is, like the GHASH construction from GCM, a "polynomial MAC", which is the modern way to say "cryptographic CRC". Poly1305 was designed more carefully for software performance than GHASH. In particular, because it's based on binary fields, for competitive performance GHASH requires either hardware support (such as the Intel CLMUL instructions) or a table-based implementation that potentially leaks secrets from cache timing. Poly1305 is based on prime fields and is fast in software on platforms without instructions tailored to it. It is also mercifully easier to code (though maybe I'm just irrationally biased against binary field polynomial math).
Damn, that is an impressive writeup. I'm staggered by the amount of work needed to enable a new crypto suite, and amazed it got done so quickly. The scale at which Google operates is genuinely mindblowing.
Are there any public docs on the details of how you've implemented the equal preference cipher suites at the server end? Is this simply an extension to the server's cipher selection, or do you use NPN/ALPN for this?
Could ChaCha20 be implemented in hardware, too, and would that provide significant speed improvements still? Intel, AMD and ARM should probably consider it for their future chip designs, if so.
"Poly1305 also saves network bandwidth, since its output is only 16 bytes compared to HMAC-SHA1, which is 20 bytes. This represents a 16% reduction of the TLS network overhead incurred when using older ciphersuites such as RC4-SHA or AES-SHA."
Does that mean 25B->21B per-packet overhead? What percentage overhead are TLS headers?
If you were designing a new transport to replace TLS, would you build a flexible record layer protocol to run it on top of, in the sense that TLS provides?
> Poly1305 also saves network bandwidth, since its output is only 16 bytes compared to HMAC-SHA1, which is 20 bytes
I'm puzzled why a shorter MAC would be considered a good thing in a security application - because theoretically, a 20-byte MAC gives roughly 4 billion times the space of a 16-byte one. It's like shortening an encryption key - hasn't the trend been toward making them longer and thus more resistant to attack?
A difference between encryption keys and MACs is that MACs do not become weaker as the attacker gains more computing power, except by making bruteforcing the authentication key easier, which does not depend on the length of the tag. The attacker is limited by the amount of messages your system can process.
Another difference is that old communication is not at risk. As soon as the client and server is upgraded, they are secure, as opposed to encryption where attackers may still recover sensitive information from things that were encrypted years ago. So it doesn't need the same degree of future proofing.
So while a 80-bit encryption key is on the weak end of the spectrum, even 80 bits of security is pretty conservative for a MAC.
Does anyone know why google doesn't offer a webserver?
I want SPDY, QUIC, and whatever cypher ordering magic is required to make my service faster on android. Unfortunately I probably won't be able to deploy this for at least a year because I have to wait on nginx and openSSL. By the time I could reasonably deploy this, shipping android phones will have the hardware to make this irrelevant.
Maybe google sees their in house webserver as a competitive advantage. Maybe their own internal infrastructure is too complicated to pull out a simple useful webserver.
nginx has ngx_http_spdy_module and Google gives away mod_spdy for Apache 2.2. QUIC's server is in Chromium's code base; they haven't broken it out into a separate project (the protocol is still not stable).
I've used mod_spdy on Ubuntu 12.04 for several months now, and it seems to work pretty well.
Is 16-byte Poly1305 more secure than 16-byte HMAC-SHA1? (Actually curious. Citation: I've done all the matasano crypto challenges, and all of the stripe µctf stuff :).)
I believe so? Isn't there a 2^80 attack on SHA1? I think the bigger issue is that the security proof for a polynomial MAC is much clearer (and the MAC itself is much faster).
On average a bruteforce attack will find a match halfway through the keyspace, so that would be 2^80 for SHA1... and only 2^64 for Poly1305 and anything else with a 128-bit width (that isn't broken in some other manner.)
You can start at the Poly1305 paper and follow the cites all the way back to Wegman and Carter, or (somewhat more modern) to Krawczyk's "cryptographic CRCs". Or, if I may be permitted to once again coattail 'pbsd, start the other way:
I'd love to offer ChaCha20 server-side, but I am currently using the default package of OpenSSL from Debian Wheezy which doesn't support the cipher. Are there already official OpenSSL builds available with ChaCha20 enabled, or does it still require running the patch from the Chromium team? If available, it'd be nice if someone could backport it.
ChaCha20 is a refinement of Salsa20, which is probably Bernstein's best-known crypto design (it survived the eSTREAM contest to become one of the final portfolio ciphers). Bernstein wrote an extremely readable design paper on Salsa20:
http://cr.yp.to/snuffle/salsafamily-20071225.pdf
Salsa20 is essentially a fast hash function run in a carefully designed counter mode. If you don't care about speed, you can turn any secure hash function into a stream cipher by, for instance, running the HMAC of that hash in counter mode. Here, Bernstein has designed the Formula 1 car of hash cores to run quickly in software without side channels as the basis for a counter-mode stream cipher.
Poly1305 is, like the GHASH construction from GCM, a "polynomial MAC", which is the modern way to say "cryptographic CRC". Poly1305 was designed more carefully for software performance than GHASH. In particular, because it's based on binary fields, for competitive performance GHASH requires either hardware support (such as the Intel CLMUL instructions) or a table-based implementation that potentially leaks secrets from cache timing. Poly1305 is based on prime fields and is fast in software on platforms without instructions tailored to it. It is also mercifully easier to code (though maybe I'm just irrationally biased against binary field polynomial math).