> And compression will never be able to eliminate brackets and commas. Isn't tha...

voltagex_ · on Jan 27, 2016

This'd be worth testing. Isn't the way gzip works to build up a dictionary of characters/phrases and then store pointers to the dictionary? I'm not sure you can compress { or } but }); is another story.

ggreer · on Jan 27, 2016

I just ran that experiment: https://gist.github.com/ggreer/4eb3ad61e97926e559f6

ASCII is extremely repetitive. 1MB of random braces gzips down to 163KB. That's a compression ratio of 6.1:1. Optimal compression would be close to 8:1 (since each byte really contains only one bit of information).

forgottenpass · on Jan 27, 2016

for (let i = 0; i < 1024 * 1024; i++) { buf += Math.random() > 0.5 ? "{" : "}"; }

Your sample data is bunk for determining the compression characteristics of braces in a realistic json file.

ggreer · on Jan 29, 2016

The point was to show that compression works at the level of bits, not characters. Realistic JSON is mostly content, which means your compression ratio depends more on content redundancies than braces or quotes. So too for binary serialization formats.

paddyoloughlin · on Jan 27, 2016

Gzip uses DEFLATE, which uses Huffman coding.

Huffman coding works by replacing common symbols with shorter codes and uncommon symbols with longer codes.

The codes used are therefore variable-length.

The codes are created in such a way that no code is the prefix of any other code.

That way, the decoder is able to know when it has reached the end of a code without the need for extra information other than the Huffman tree, which tells the decoder which codes belong to which symbols.

The Huffman tree can be pre-agreed or, more commonly, included with the compressed data.