Why not use UTF8 instead of codepages?

drbits · June 19, 2008

There is a lot of discussion in the forum about mismatched codepages (a Windows problem). Switching to one of the ISO standard UTF-8 should solve this problem.

UTF-8 supports all of the languages except some rare glyphs in "Traditional Chinese". In addition, all of the glyphs in all 15 ISO approved codepages are included. In fact, all characters from [space] to [~] (32-126) have the same values as in the standard Windows codepages, so ASCII strings are passed through with no change (a big advantage for emulators).

All web pages can be displayed in UTF-8 by setting a <head> entry (HTML 3.0 and later, I believe). uTorrent could be changed to specify UTF-8 as its display codepage for Windows.

Those writing translations would then merely store their file in UTF8 and not worry about codepages at all. In Windows XP and later, Notepad supports UTF8.

There is also a UTF-16 format (Micro$oft just calls this "Unicode"). There are various technical reasons for avoiding UTF-16 (because it is not compatible with legacy software and has the bigendian problem). In particular, UTF8 requires no change in string handling for C or C++ programming. On the other hand, dot.Net and J++ normally use UTF-16 internally.

Martin Katz, Ph.D.

Firon · June 19, 2008

If you'd spent more than 2 minutes looking into it, you'd already know µTorrent is UTF-8. The website, the app, and the translations.

And honestly, it's dumb to mention that you're a Ph.D., considering this is the Internet.

Sign In

Why not use UTF8 instead of codepages?

Recommended Posts

drbits

Link to comment

Share on other sites

Firon

Link to comment

Share on other sites

Archived

Browse

Activity