Jump to content

Why not use UTF8 instead of codepages?


drbits

Recommended Posts

There is a lot of discussion in the forum about mismatched codepages (a Windows problem). Switching to one of the ISO standard UTF-8 should solve this problem.

UTF-8 supports all of the languages except some rare glyphs in "Traditional Chinese". In addition, all of the glyphs in all 15 ISO approved codepages are included. In fact, all characters from [space] to [~] (32-126) have the same values as in the standard Windows codepages, so ASCII strings are passed through with no change (a big advantage for emulators).

All web pages can be displayed in UTF-8 by setting a <head> entry (HTML 3.0 and later, I believe). uTorrent could be changed to specify UTF-8 as its display codepage for Windows.

Those writing translations would then merely store their file in UTF8 and not worry about codepages at all. In Windows XP and later, Notepad supports UTF8.

There is also a UTF-16 format (Micro$oft just calls this "Unicode"). There are various technical reasons for avoiding UTF-16 (because it is not compatible with legacy software and has the bigendian problem). In particular, UTF8 requires no change in string handling for C or C++ programming. On the other hand, dot.Net and J++ normally use UTF-16 internally.

Martin Katz, Ph.D.

Link to comment
Share on other sites

Archived

This topic is now archived and is closed to further replies.

×
×
  • Create New...