Because AI is attacking, plagiarizing, competing with, and destroying the most common industry of people here on HN, so suddenly it mattered more to people who were previously unaffected.
Some people have been concerned with this kind of politics all along. Some people are realizing they should be now, because of AI. And that's okay; both groups can still work together.
The point is it doesn’t take giant tankers going through the Strait of Hormuz to move this volume. It could be handled by tanker trucks going to Suez….
You should be able to make a killing placing commodity bets right now, because you have such crystal clear vision for the causal chain currently underway
What are your top positions? You will never need to work again!
The first draft of Unicode was in 1988. Thompson and Pike came up with UTF-8 in 1992, made an RFC in 1998. UTF-16 came along in 1996, made an RFC in 2000.
The time machine would've involved Microsoft saying "it's clear now that USC-2 was a bad idea, so let's start migrating to something genuinely better".
I don't think it was clear at the time that UTF-8 would take off. UCS-2 and then UTF-16 was well established by 2000 in both Microsoft technologies and elsewhere (like Java). Linux, despite the existence of UTF-8, would still take years to get acceptable internationalization support. Developing good and secure internationalization is a hard problem -- it took a long time for everyone.
It's now 2026, everything always looks different in hindsight.
I don’t remember it quite that way. Localization was a giant question, sure. Are we using C or UTF-8 for the default locale? That had lots of screaming matches. But in the network service world, I don’t remember ever hearing more than a token resistance against choosing UTF-8 as the successor to ASCII. It was a huge win, especially since ASCII text is already valid UTF-8 text. Make your browser default to parsing docs with that encoding and you can still parse all existing ASCII docs with zero changes! That was a huge, enormous selling point.
Windows is far from a niche player, to be sure. Yet it seems like literally every other OS but them was going with one encoding for everything, while they went in a totally different direction that got complaints even then. I truly believe they thought they’d win that battle and eventually everyone else would move to UTF-16 to join them. Meanwhile, every other OS vendor was like, nah, no way we’re rewriting everything from scratch to work with a not-backward compatible encoding.
Microsoft did the hard work of supporting Unicode when UTF-8 didn't exist (and mostly when UTF-16 didn't exist).
Any system that continued with only ASCII well into the 2000s could mostly just jump into UTF-8 without issue. Doing nothing for non-English users for almost two decades turned out to be a solid plan long term. Microsoft certainly didn't have that option.
Blame Java - their use of utf-16 is the sole reason that Microsoft chose it.
Sun sued Microsoft in 1996 for making nonportable extensions to Java (a license violation). Microsoft lost, and created C# in 2000.
At the time, “Starting Java”
was the most feared message on the internet. People really thought that in-browser Java would take over over the world (yes Java, not Javascript)
Sun chose UTF16 in 1995 believing that Unicode would never need more than 64k characters. In 1996 that changed. UTF16 got variable length encoding and became a white elephant
So Microsoft chose UTF16 know full well that it had
no advantages. But at least they can say code pages were far worse :)
At the time it was introduced it was understandable, and Microsoft also needed some time to implement it before that of course. But by about 2000 it was clear that UTF-8 was going to win, and Microsoft should have just properly implemented it in NT instead of dithering about for the next almost 20 years. Linux had quite good support of it by then.
It gets worse for UTF-16, Windows will let you name files using unpaired surrogates, now you have a filename that exists on your disk that cannot be represented in UTF-8 (nor compliant UTF-16 for that matter). Because of that, there's yet another encoding called WTF-8 that can represent the arbitrary invalid 16-bit values.
This is ridiculous. New developers will learn a completely different skill path from what we learned, and they will get where we are faster than we did.
“People asking if Al is going to take their jobs is like an Apache in 1840 asking if white settlers are going to take his buffalo” (Noah Smith on Twitter, I mean X)
reply