Elfcat: Visualize ELF Binaries

PennRobotics · on June 22, 2021

Two simple criticisms or potential misunderstandings:

The three nulls in the load segment (between the code and data) are included in both the code and the data highlights as well as its own highlight, which is a bit unintuitive, as the start of the string looks to be /0/0/0Hello. It looks like these are supposed to be between spans like the other non-highlighted nulls, but they are included in a parent span, bin_segment0. (Issue submitted.)

Also, I wish the arrow heads did not opaquely overlap the numbers. Adding opacity="0.3" to the svg tag fixes this for me.

This is cool for a variety of reasons: It makes parsing a readelf output a bit easier, it's a nice/small/functional Rust demo (for Rust idiots like me), and the output can be redirected into html-based documentation easier than a command line tool's output.

higherhalf · on June 23, 2021

> The three nulls in the load segment (between the code and data) are included in [..]

Yes, this is a bug. Thanks for reporting it to the issue tracker.

> Also, I wish the arrow heads did not opaquely overlap the numbers. Adding opacity="0.3" to the svg tag fixes this for me.

Will put to toggle-able options. The arrows really hinder comprehension, perhaps arrowheads could be made smaller too.

ohazi · on June 22, 2021

This is great. I wish every binary format had a visualizer like this. Or maybe even a generic tool that can take in descriptions of binary formats to create new annotations on the fly (like Wireshark).

I've seen similar tools to annotate other binary formats like gpg and asn.1

https://github.com/ConradIrwin/gpg-decoder

gumby · on June 22, 2021

In a pre-graphics era, this is what I wrote objdump for.

The original intent was simply debugging bfd while developing it, though people ended up using it for all sorts of things.

I like Elfcat.

gavinray · on June 22, 2021

Wait, are you insinuating that I am reading a comment by the author of objdump themselves?...

Holy smokes, thank you!

    I like Elfcat.

I can imagine the author is beaming right now. I would frame this comment on my desk were I them, I think.

higherhalf · on June 23, 2021

I sure am! I wish it was more "done" however, right now it's pretty raw.

ohazi · on June 22, 2021

Thanks for writing objdump! I'm probably one of those people.

The most recent thing I did with objdump was use it to patch in a signature that needed to be computed over the text/data/bss segments after compilation. This was for a bare metal embedded system, and being able to do this at the elf level (1) made it easier, and (2) allowed me to use the same setup for multiple targets that each had their own oddball binary/ihex conversion and flashing machinery.

lxe · on June 23, 2021

Wow... you ain't kidding! Here I go down the software history hole: https://en.wikipedia.org/wiki/Cygnus_Solutions

mattgreenrocks · on June 23, 2021

Thank you for objdump. It is a superb tool that I rely on.

jk7tarYZAQNpTQa · on June 22, 2021

> Or maybe even a generic tool that can take in descriptions of binary formats

Kaitai [1] isn't perfect, but maybe suits your needs.

[1] https://kaitai.io/

pabs3 · on June 23, 2021

Veles is a GUI For Kaitai:

https://veles.io/

makapuf · on June 22, 2021

there was hachoir https://hachoir.readthedocs.io/en/latest/metadata.html

AlbertoGP · on June 22, 2021

Just tried it, does what it promises. It’s more than a basic hex viewer because of the extra information being displayed when hovering with the mouse on the different bytes, and the arrows linking the pointers/offsets to their targets.

higherhalf · on June 23, 2021

You can click on their starts and ends (the fields they are referencing) too.

lovasoa · on June 22, 2021

There is also kaitai struct that has an a very complete online IDE that I use for this kind of things: https://ide.kaitai.io/

It works with ELF as well as many many other formats, and the online IDE is only a very small part of what you can do with it.

jcranmer · on June 22, 2021

(I'm surprised this is written in Rust and doesn't use the object crate--did the author do this in part to learn how elf works?)

Speaking of visualizing virtual memory, one of the things that I haven't seen a nice prior tool for is breaking down the memory map of a process on a per-section basis--/proc/pid/maps only tells which libraries are providing which sections. I've built something like that for my own needs, but it's the sort of thing that I would have expected would easily come out of some other tool.

derefr · on June 22, 2021

(Not the author) I’ve learned to not get my hopes up about the capabilities of external format-parsing libraries when building tools like this (to the point of often not bothering to evaluate them for fit-for-purpose any more), as they often expose only a high-level fully-decoded representation that’s unsuited to examination of “why” something decoded to what it did.

As such, they often won’t be of any help in a situation where you’re trying to use them to diagnose where exactly a corrupted piece of data is going wrong — which is one of the biggest use-cases for such tooling!

jcranmer · on June 22, 2021

I've been using object and gimli crates extensively, and I can assure that "expose only a high-level fully-decoded representation" is the exact opposite of what they do. In fact, my biggest criticism (for gimli in particular) is that they lack sufficient high-level representation.

The object crate exposes all of the ELF types directly. The one thing it doesn't do is give you this from its format-agnostic object::read::File type, you have to start from object::read::elf::ElfFile instead. As a bonus, it also gives you all of the processor-specific defines so you don't have to look up what the value of, say, the x86-64 relocations are: https://docs.rs/object/0.25.3/object/elf/index.html#constant...

derefr · on June 22, 2021

My point wasn't about whether you can get into to the nitty-gritty leaf nodes of complex structures in the fully-decoded data; it was about whether the representation it outputs losslessly represents unparseable data elements while still doing best-effort to decode what it can, such that you end up with a representation that was decoded "as much as possible" where the decoded parts can be used to figure out why the non-decoded parts didn't decode, while also not obscuring what the non-decoded parts "say".

I'm using "high-level" here to mean "was successfully transformed through all the decoding/lexing/parsing/cross-reference stages", and "low[er]-level" to mean "failed to be transformed by some of those stages." Which is non-normative, I guess, but this sort of "layers of decoded-ness" representation is what you expect from tools like Wireshark or binwalk.

jcranmer · on June 22, 2021

Ah, I see what you mean now.

Yes, object/gimli are also this kind of low-level. Basically, when you parse the file, you're not actually parsing the file, but you're parsing each element of the structure one bit of a time.

So parsing FileHeader will make sure a) the data is correctly aligned [since it's UB in Rust to have underaligned data] and b) that the e_ident bits are actually the magic number for an ELF file. Want to list all the sections? That's when it's actually going to check that a) the section header offset actually points to valid data and b) the number of section headers exist and is sane, but again, it doesn't actually verify that the section headers themselves make any sense whatsoever.

xvilka · on June 22, 2021

You could try to build it on top of Rizin[1][2] library. In particular see the `dm` commands and subcommands. Let us know if something is unclear or missing or doesn't work as you would expect.

[1] https://rizin.re

[2] https://github.com/rizinorg/rizin

higherhalf · on June 23, 2021

> I'm surprised this is written in Rust and doesn't use the object crate--did the author do this in part to learn how elf works?

No. When I started the project I was expecting to just read data into the ELF structs, in style of C. (Un)fortunately, it's not possible to do safely, so I started looking into crates to do that, and was stumbling upon data deserialization ones. In particular, the first attempt was in nom. In hindsight, that wasn't particularly smart, and specific object-file-parsing ones would be better. I don't regret implementing reading manually, despite it looking pretty ugly, because attending to NIH syndrome is fun.

> Speaking of visualizing virtual memory, one of the things that I haven't seen a nice prior tool for is breaking down the memory map of a process on a per-section basis

That is planned. It's noted in readme, and in issue #3 I go over how it can look like[1].

[1]: https://github.com/ruslashev/elfcat/issues/3#issuecomment-86...

higherhalf · on June 23, 2021

Hello, author here (I edited the bio on github to show). Don't know how I missed it on HN, must have been the grey link.

First I'd like to say that right now this is just the first release, and it's a bit raw so far. That's why I was hesitant to post on HN yet, expecting a more harsh but merited critique. I am reading through the thread for bugs and suggestions. Thanks for that.

setheron · on June 22, 2021

Love it. This is pretty useful to see how patchelf in NixOS works.

Naac · on June 22, 2021

As a visual learner, this is fantastic.

I do wish that that there would be a key explaining the color coding.

PennRobotics · on June 22, 2021

Coral (#e99) -> Elf Header ID

Medium Purple (#99e) -> Elf Header

Light Salmon (#eb9) -> Program Header

"Violet Orange Gradient" -> Executable and data, I think (It looks like this is copied to address 0x10000+0x80 and executed, but I'm not familiar with x86.)

Violet (#f9f) -> Sections (symbol table, string table, etc.)

Sky Blue (#9be) -> Section Headers

higherhalf · on June 23, 2021

The color legend and some basic help instructions are coming.

rst13 · on June 23, 2021

Great tool! Having used wireshark for ages, its refreshing to visualize binaries like this

ellis0n · on June 22, 2021

Great tools that have been looking for decades :)

vyas45 · on June 22, 2021

This is so cool!

MintPaw · on June 22, 2021

A browser based hex viewer? I guess Linux people jump from the command line to the browser because there's no standard GUI on Linux?

Seems a bit complicated, would love to see better binary visualization tools on desktop.

haswell · on June 22, 2021

Another way to frame this is that the primary viewing format is portable html, and can easily be viewed locally or shared with someone else or incorporated into a blog post, etc.

The true value here is the backend and its ability to return data in a form that can be visualized, and the output format/UI can be adapted/enhanced as the project matures.

geraldcombs · on June 22, 2021

You could try Wireshark. Its primary focus is analyzing network packets, but it does support a few file formats and ELF is one.

7373737373 · on June 22, 2021

There's also https://binvis.io

xvilka · on June 22, 2021

Veles[1], which is abandoned nowadays, sadly, is more powerful alternative.

[1] https://github.com/codilime/veles

IshKebab · on June 22, 2021

Yeah, if you just need a single page visualisation like this then HTML is hard to beat. It's easy, cross platform and you don't need any libraries to use it.

Doing this with Qt or GTK would be much much more work. Especially if you're using Rust which doesn't have any really good GUI options yet.

gmadsen · on June 22, 2021

What is the complicated part? Modifying the gui?

0xbadcafebee · on June 23, 2021

Completely unrelated rant. Before Rust really takes off, can someone please, pretty please, for the love of "Bob" and all that is marginally donut-shaped, can somebody pleeeeeease tell the Rust community to adopt inheritance-focused hierarchical package naming conventions???

Way back in the day, in like, 1995, there was this thing called Perl. Perl was awesome. You'd think, "I want to make a custom LDAP client". And you'd look in CPAN for an LDAP module, and it'd be there (https://metacpan.org/pod/Net::LDAP), and you'd use it. A while later you realize you want to manipulate Active Directory SIDs. So you create a new module that inherits Net::LDAP, and publish it (https://metacpan.org/pod/Net::LDAP::SID).

On the plus side: everyone can install your module to get extra functionality with their existing code; people don't need to reinvent the wheel completely to do something a little different; and it's easy to see which module provides what/inherits from what.

On the down side: boring names for modules. (is that a downside?)