I'm doing it with clang. As a bonus I get better performance than I did with msv...

bullen · on Dec 31, 2021

I tried looking at all other compilers, the only that fit my size constraints was tcc... that would mean dropping std::string and only use ASCII char*... not a big deal for my game because my .ttf fonts dont have anything else but for other projects throwing UTF-8 out of the window might be a show stopper!?

Also considered Rust, but that is so bloated/slow! Can't stop anyone from using Rust for the game in the future though, because any .so/.dll will be able to hot-deploy.

dundarious · on Dec 31, 2021

std::string is basically just the following:

    struct std_string {
      union {
        char small[16];
        struct { size_t len; char* buf; } big;
      } value;
      unsigned char is_big;
    };

There is nothing it does regarding support for non-ASCII characters over what you get from buf and len. And for UTF-8, you don't even need len, plain old strlen from the 1980s works fine on valid UTF-8, so plain char* from C works just as well.

And the union is just a performance optimization for small strings (is_big is probably not an additional field in good impls, but I separated it here), it's logically identical to just the buf and len.

Which is all just to say, go for it with tcc!

bullen · on Jan 1, 2022

Yes, but the UTF-8 logic, how complex is that?

I don't even know what union does, but I can imagine you have many smaller chunks?

dundarious · on Jan 1, 2022

The union has nothing to do with UTF-8, so let's ignore it. If you want more details about it, search for "c++ small string optimization", but the one-sentence version is that it's just a way to avoid a heap allocation for strings <= 16 bytes long (including any NUL terminator).

So ignoring that irrelevant optimization, std::string is basically:

    struct std_string {
      size_t len;
      char* buf;
    };

For storing valid UTF-8, the len is unnecessary, since a NUL byte is not valid UTF-8. You can still tell how many bytes of UTF-8 you have by using strlen, because when you find a 0 byte, it is never part of the string, it's always the NUL terminator. So the len is not strictly necessary for valid UTF-8 -- leaving us with just char*.

And not trying to dodge your question about UTF-8 logic, but my point was you can dodge that whole question, because std::string provides the same amount of UTF-8 logic as char* -- that is, none at all. If you've been getting by on std::string, then you can get by on char*. If you only need to support UTF-8 input and output, and you don't need to manipulate strings (replace characters, truncate them, normalize them for use as keys in a data structure, etc.) or only need to do simple substring searches for ASCII characters, then you can just use char* or std::string. UTF-8 has a great design, which was consciously chosen to make all of that possible.

jcelerier · on Dec 31, 2021

I don't understand, shipping 50mb of cl.exe is fine but shipping clang isn't ?

bullen · on Jan 1, 2022

Shipping cl.exe is illegal, I tried looking at clang but it was alot bigger than 50MB, and it was unclear how to untangle it from installers and other dependencies:

https://stackoverflow.com/questions/65807034/clang-llvm-zip-...

jcelerier · on Jan 1, 2022

I'm shipping it as part of https://ossia.io (statically linked to the app) ; my installers are between 50 and 100mb for something that ships clang+llvm, boost, qt, ffmpeg and a lot of other things (in addition to its own code):

https://github.com/ossia/score/releases/tag/v3.0.0-rc7

you just need to build llvm/clang statically and target_link_libraries(<couple stuff>) in cmake (and ship the headers if you want to do useful things, this actually takes much more space uncompressed but it'd be the same whatever the compiler)

bullen · on Jan 1, 2022

That sounds like a lot of work, I just want a zip with the compiler ready to extract into any project folder.

jcelerier · on Jan 1, 2022

you can grab it here: https://github.com/mstorsjo/llvm-mingw/releases/tag/20211002

bullen · on Jan 2, 2022

Thx! So back to MinGW it is!

I'm surprised this is so hard to find, why is there no official redistributable compiler for Windows?! What is Microsoft so afraid of?

This is the only disadvantage Windows has compared to Linux!

I guess you can make this alot smaller if you remove the cross compiling parts?

jcelerier · on Jan 13, 2022

it's only "mingw" because it uses the mingw headers. It uses the microsoft modern C runtime (ucrt) and runs just fine under normal cmd.exe shell with c:/windows/formatted/paths, and does not require e.g. MSYS.