Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I'm doing it with clang. As a bonus I get better performance than I did with msvc.


I tried looking at all other compilers, the only that fit my size constraints was tcc... that would mean dropping std::string and only use ASCII char*... not a big deal for my game because my .ttf fonts dont have anything else but for other projects throwing UTF-8 out of the window might be a show stopper!?

Also considered Rust, but that is so bloated/slow! Can't stop anyone from using Rust for the game in the future though, because any .so/.dll will be able to hot-deploy.


std::string is basically just the following:

    struct std_string {
      union {
        char small[16];
        struct { size_t len; char* buf; } big;
      } value;
      unsigned char is_big;
    };
There is nothing it does regarding support for non-ASCII characters over what you get from buf and len. And for UTF-8, you don't even need len, plain old strlen from the 1980s works fine on valid UTF-8, so plain char* from C works just as well.

And the union is just a performance optimization for small strings (is_big is probably not an additional field in good impls, but I separated it here), it's logically identical to just the buf and len.

Which is all just to say, go for it with tcc!


Yes, but the UTF-8 logic, how complex is that?

I don't even know what union does, but I can imagine you have many smaller chunks?


The union has nothing to do with UTF-8, so let's ignore it. If you want more details about it, search for "c++ small string optimization", but the one-sentence version is that it's just a way to avoid a heap allocation for strings <= 16 bytes long (including any NUL terminator).

So ignoring that irrelevant optimization, std::string is basically:

    struct std_string {
      size_t len;
      char* buf;
    };
For storing valid UTF-8, the len is unnecessary, since a NUL byte is not valid UTF-8. You can still tell how many bytes of UTF-8 you have by using strlen, because when you find a 0 byte, it is never part of the string, it's always the NUL terminator. So the len is not strictly necessary for valid UTF-8 -- leaving us with just char*.

And not trying to dodge your question about UTF-8 logic, but my point was you can dodge that whole question, because std::string provides the same amount of UTF-8 logic as char* -- that is, none at all. If you've been getting by on std::string, then you can get by on char*. If you only need to support UTF-8 input and output, and you don't need to manipulate strings (replace characters, truncate them, normalize them for use as keys in a data structure, etc.) or only need to do simple substring searches for ASCII characters, then you can just use char* or std::string. UTF-8 has a great design, which was consciously chosen to make all of that possible.


I don't understand, shipping 50mb of cl.exe is fine but shipping clang isn't ?


Shipping cl.exe is illegal, I tried looking at clang but it was alot bigger than 50MB, and it was unclear how to untangle it from installers and other dependencies:

https://stackoverflow.com/questions/65807034/clang-llvm-zip-...


I'm shipping it as part of https://ossia.io (statically linked to the app) ; my installers are between 50 and 100mb for something that ships clang+llvm, boost, qt, ffmpeg and a lot of other things (in addition to its own code):

https://github.com/ossia/score/releases/tag/v3.0.0-rc7

you just need to build llvm/clang statically and target_link_libraries(<couple stuff>) in cmake (and ship the headers if you want to do useful things, this actually takes much more space uncompressed but it'd be the same whatever the compiler)


That sounds like a lot of work, I just want a zip with the compiler ready to extract into any project folder.



Thx! So back to MinGW it is!

I'm surprised this is so hard to find, why is there no official redistributable compiler for Windows?! What is Microsoft so afraid of?

This is the only disadvantage Windows has compared to Linux!

I guess you can make this alot smaller if you remove the cross compiling parts?


it's only "mingw" because it uses the mingw headers. It uses the microsoft modern C runtime (ucrt) and runs just fine under normal cmd.exe shell with c:/windows/formatted/paths, and does not require e.g. MSYS.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: