Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Perl has had decent Unicode support longer than most similar languages (years before Ruby and Python, for instance), but Perl 6 is just ridiculously good at it, and I hope other languages follow suit. I'm unaware of any other language that handles Unicode this well...am I missing any languages that do? I guess JavaScript is coming along on this front and ES6 includes support for Unicode regexps, which is progress, so maybe that's the closest mainstream language.


Swift does a pretty good job as well for a first attempt.

https://www.mikeash.com/pyblog/friday-qa-2015-11-06-why-is-s...


> ES6 includes support for Unicode regexps

Will it provide \X for example? (\X matches extended grapheme cluster.)

> am I missing any languages that do?

Swift is one notable example. It has built-in and simple enough grapheme handling.


I haven't looked at Swift, at all. I don't buy Apple products, so have no familiarity with their ecosystem. But, now that it's been opened up, I'll have a look at it, though it seems likely to remain predominantly a language for Apple products for the foreseeable future (I think?), so not something I'd find myself using in production any time soon. But, I guess we'll see how that shakes out over time now that it is open.

Given the rate at which JavaScript is converging on a really nice set of modern features and is having warts removed and performance is accelerating, I wonder if any other language is as relevant long-term.


Swift's Unicode support is sufficiently awesome that my web browser was having issues rendering the documentation for their string class due to the epic working examples ;P.


> ...am I missing any languages that do?

Tcl supported Unicode in 1999 with version 8.1. Much like Zathras, Tcl is the beast of burden that is easy to overlook.


On the downside, even today, Tcl can't handle characters outside the basic multi-lingual plane. It only does UCS-2, it can't handle UTF16 surrogate pairs. If you convert an astral-plane codepoint, such as some popular emoji, from UTF8, TCL will convert each UTF8 byte into a separate unicode codepoint. There are similar catches all over, it's just not practical to deal with non-BMP codepoints in TCL, even just to round-trip them.


Rereading OP's comment, I misread it a bit and responded to the wrong part. You're absolutely right, progress on Unicode in Tcl stalled out after the low-hanging fruit of UCS-2 was achieved.


Well, it's not about "supports unicode" as much as it's about the level of support. IIRC, I've heard that Tcl had fairly good unicode support (at least for the time), but I have no idea how it compares to some contemporary versions of languages, or Perl 6 specifically.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: