> not in the source code, where they would be more easily found. Why do you say ...

KateLawson · on Oct 9, 2013

Yes, that is correct. I founded SourceDNA.com as a way of automating this kind of analysis. We match components found in binaries in order to identify unlicensed use of third-party code, as well as security patches.

Tools like bindiff have been around for years and take advantage of the fact that compilers don't randomize code generation. Instead, the callgraph and control-flow graphs largely reflect the structure of the original source code. Once you have leverage by exact-matching the parts of the binary that are nearly identical, you can build up and down the tree of nodes to find those that have more changes.

Crypto backdoors can be unbelievably subtle though. A single branch condition, a bit that is flipped, etc. can all lead to catastrophic failures. For example, a compiler optimization for dead code elimination led to some zeroization of key material being skipped. This kind of thing is extremely difficult to find and requires a careful understanding of the underlying code.

I agree with you that most differences can be found, but understanding the ramifications of those differences requires extremely careful analysis. A crypto flaw does not stand out from a mis-optimization.

foobarbazqux · on Oct 10, 2013

> a compiler optimization for dead code elimination led to some zeroization of key material being skipped.

That sounds like a pretty broken compiler, do you have a minimal example?

tedunangst · on Oct 10, 2013

  void
  encrypt(void *data, size_t len, char *password)
  {
    char key[32];

    turn_password_into_key(password, key, sizeof key);
    aes_make_encrypted(data, len, key);

    memset(key, 0, sizeof key); /* optimized away */
  }

The compiler knows what memset does. It also knows that stack variables have no use after the function returns. Therefore, the compiler knows there is no reason to write zeroes to this memory, because the program will never read those zeroes. Hence, the compiler will delete the call to memset.

foobarbazqux · on Oct 10, 2013

Oh I get it. You're saying the key is stored on the stack and then you can find it by inspecting memory if it hasn't been zeroed out. That's really interesting, what a great example of leaning on language implementation. I guess the correct way to write this is to malloc and free the key. Except, couldn't an attacker see the key anyway while it was live in memory, either on the stack or on the heap (or if it's not "heap", whatever you call the thing that malloc takes memory from)?

tribaal · on Oct 10, 2013

I'm not a specialist, but wouldn't a call to free simply deference the memory, but not zero it out? you can then probably still find the key by inspecting memory? again, I'm not a good C programmer, I would like to know too :)

foobarbazqux · on Oct 10, 2013

Yeah, you'd still need to call memset before free. Compilers have a much harder time convincing themselves of things about pointers, so it should survive dead code elimination.

kingkilr · on Oct 10, 2013

This is what memset_s is for.

MacsHeadroom · on Oct 10, 2013

So what we have here is the potential for a backdoor caused by a 2 character difference in code.

Backdoors in source can be as simple as changing a single == to = or removing a minus sign in some seemingly innocuous place.

tedunangst · on Oct 10, 2013

Not everybody programs against the C11 standard.

Hello71 · on Oct 10, 2013

This is likely a reference to the Debian OpenSSL fiasco.

http://www.debian.org/security/2008/dsa-1571

foobarbazqux · on Oct 10, 2013

Oh I see. That is a lot of fiasco. I guess I'm mostly curious about a minimal example of useful code that a dead code eliminator will incorrectly optimize.

epsylon · on Oct 10, 2013

In the debian fiasco it's mostly a case of "manual code elimination" [1].

However there are plenty of examples of compilers aggressively removing code that causes undefined behaviour. Basically, when the compiler encounters UB, it can do whatever it wants to code that triggers the UB, which means possibly removing it. Compilers exploit this fact a lot because UBs happen a lot in "normal code". See here [2] for examples and explanations; I also can't help but mention John Regehr's blog [3] if you're interested in compilers, security, testing, safety.

[1] http://research.swtch.com/openssl

[2] http://blog.llvm.org/2011/05/what-every-c-programmer-should-...

[3] http://blog.regehr.org/archives/213 is a great article for example.

tedunangst · on Oct 10, 2013

That doesn't seem like a good fit. It wasn't a compiler problem, it wasn't dead code, and the problem wasn't skipping zeroing, it was too much zeroing!

lambada · on Oct 9, 2013

That would require a deterministic/reproducible build process. Such things tend to need the same host-OS, library versions, dependency versions etc that AFAIK Truecrypt haven't got documented anywhere. See the efforts of Tor Project, Debian et al to start doing reproducible builds - It's not easy. And this is without things inherently producing non-deterministic builds - embedding timestamps is quite common for example.

https://wiki.debian.org/ReproducibleBuilds

unuthu · on Oct 10, 2013

Umm, I think the person above is actually correct, though I'm by no means an expert. Otherwise, why would it take such a huge effort to achieve a deterministic build process as tor seems to have recently done?