Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

> not in the source code, where they would be more easily found.

Why do you say that? Compiling the source with the same compiler and flags, plus diffing the binaries would quickly show where the differences lie, and if they're hostile. Any half-decent reverse engineer could do this.

That would stand out more than if the source itself was backdoored in a non-glaring way. Open source has taught us that nobody ever reads the source.



Yes, that is correct. I founded SourceDNA.com as a way of automating this kind of analysis. We match components found in binaries in order to identify unlicensed use of third-party code, as well as security patches.

Tools like bindiff have been around for years and take advantage of the fact that compilers don't randomize code generation. Instead, the callgraph and control-flow graphs largely reflect the structure of the original source code. Once you have leverage by exact-matching the parts of the binary that are nearly identical, you can build up and down the tree of nodes to find those that have more changes.

Crypto backdoors can be unbelievably subtle though. A single branch condition, a bit that is flipped, etc. can all lead to catastrophic failures. For example, a compiler optimization for dead code elimination led to some zeroization of key material being skipped. This kind of thing is extremely difficult to find and requires a careful understanding of the underlying code.

I agree with you that most differences can be found, but understanding the ramifications of those differences requires extremely careful analysis. A crypto flaw does not stand out from a mis-optimization.


> a compiler optimization for dead code elimination led to some zeroization of key material being skipped.

That sounds like a pretty broken compiler, do you have a minimal example?


  void
  encrypt(void *data, size_t len, char *password)
  {
    char key[32];

    turn_password_into_key(password, key, sizeof key);
    aes_make_encrypted(data, len, key);

    memset(key, 0, sizeof key); /* optimized away */
  }
The compiler knows what memset does. It also knows that stack variables have no use after the function returns. Therefore, the compiler knows there is no reason to write zeroes to this memory, because the program will never read those zeroes. Hence, the compiler will delete the call to memset.


Oh I get it. You're saying the key is stored on the stack and then you can find it by inspecting memory if it hasn't been zeroed out. That's really interesting, what a great example of leaning on language implementation. I guess the correct way to write this is to malloc and free the key. Except, couldn't an attacker see the key anyway while it was live in memory, either on the stack or on the heap (or if it's not "heap", whatever you call the thing that malloc takes memory from)?


I'm not a specialist, but wouldn't a call to free simply deference the memory, but not zero it out? you can then probably still find the key by inspecting memory? again, I'm not a good C programmer, I would like to know too :)


Yeah, you'd still need to call memset before free. Compilers have a much harder time convincing themselves of things about pointers, so it should survive dead code elimination.


This is what memset_s is for.


So what we have here is the potential for a backdoor caused by a 2 character difference in code.

Backdoors in source can be as simple as changing a single == to = or removing a minus sign in some seemingly innocuous place.


Not everybody programs against the C11 standard.


This is likely a reference to the Debian OpenSSL fiasco.

http://www.debian.org/security/2008/dsa-1571


Oh I see. That is a lot of fiasco. I guess I'm mostly curious about a minimal example of useful code that a dead code eliminator will incorrectly optimize.


In the debian fiasco it's mostly a case of "manual code elimination" [1].

However there are plenty of examples of compilers aggressively removing code that causes undefined behaviour. Basically, when the compiler encounters UB, it can do whatever it wants to code that triggers the UB, which means possibly removing it. Compilers exploit this fact a lot because UBs happen a lot in "normal code". See here [2] for examples and explanations; I also can't help but mention John Regehr's blog [3] if you're interested in compilers, security, testing, safety.

[1] http://research.swtch.com/openssl

[2] http://blog.llvm.org/2011/05/what-every-c-programmer-should-...

[3] http://blog.regehr.org/archives/213 is a great article for example.


That doesn't seem like a good fit. It wasn't a compiler problem, it wasn't dead code, and the problem wasn't skipping zeroing, it was too much zeroing!


That would require a deterministic/reproducible build process. Such things tend to need the same host-OS, library versions, dependency versions etc that AFAIK Truecrypt haven't got documented anywhere. See the efforts of Tor Project, Debian et al to start doing reproducible builds - It's not easy. And this is without things inherently producing non-deterministic builds - embedding timestamps is quite common for example.

https://wiki.debian.org/ReproducibleBuilds


Umm, I think the person above is actually correct, though I'm by no means an expert. Otherwise, why would it take such a huge effort to achieve a deterministic build process as tor seems to have recently done?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: