Jump to content

µTorrent 1.2.3-beta Unicode


ludde

Recommended Posts

  • Replies 120
  • Created
  • Last Reply

this is my first time posting and i would like to say well done to the beta 1.2.3 unicode.

though it is not perfect.... it can read torrents made by bitcomet.

here is an example of a torrent i use all the time that can't be read by abc, bittornado, but can be read on BC, and NOW on utorrent.

Torrent FIle I Use

if anyone here uses bittornado, or abc, you will get errors and the download will not start due to the characters. but now utorrent displays it as traditional chinese and not jumbled words.

well done.

1. torrentspy

uh.JPG

2.utorrent

per.JPG

3.bitcomet

bc.JPG

i see some ppl still have problems, maybe it's because my regional settings are set to Chinese HK SAR.. i havent tried this in English United States settings yet. don't plan to. :P

Link to comment
Share on other sites

Unicode is the 'correct' way to label characters on any modern system. Until recently, we were stuck to 256 chars. So each different system, with each different language, treated characters on its very own personal way.. that's why a file with the content "TÉST" won't show up correctly on DOS, won't show up correctly on a Mac, won't show up correctly on a Russian Windows, etc... some countries of asian language have been using double-byte system for decades, but they have never been fully compatible with our western encodings.

However, recently, most systems have abandoned that clumsy encodings ('ansi' and the like) and started using unicode. Unicode is a system that allows 65k (256^2) chars, using two bytes per char. That way, you can have ANYTHING written on the same 'encoding' and still be compatible with any system or computer available. You can write chinese, english, japanese, arabic, and korean on the same phrase and it'll all work.

In µTorrent's case, it's the ability to record correct names I suppose, without having to rename stuff like "クリスマス特集" to "_______" or just crash and burn (like some programs do -- for example, some WinZip version I used to have).

That's unicode, and it's time everybody adapt to it. Also, UTF-8 is a way to write unicode files (instead of using double bytes, it uses 1 byte per char but using a 0 leading a 'highascii' char that needs to be unicode-compliant), so if you see UTF-8 mentioned, it's just the same thing.

More info at unicode.org or something.

Link to comment
Share on other sites

The 3 main Unicode standards allow up to 4 bytes per character, actually.

The Unicode range itself is currently mapped to 0-10FFFF (1,114,111 characters), and UTF-8, UTF-16, and UTF-32 all support the same amount.

UTF-32 is rare because it loses a lot of space (it takes 4 bytes for any character).

UTF-8 can use 1, 2, 3, or 4 characters, UTF-16 uses 2 or 4. Each of the two have their own advantages.

Link to comment
Share on other sites

unicode will never work in windows 98.

Windows 98 does not have unicode support.

Is it possible to do all the bug fixes and everything else and release a separate version for 98 that does not have Unicode?

Microsoft has a couple of macros for their Unicode support (e.g. _T("string") instead of "string"L), and if you use them it's possible to make non-Unicode builds without a lot of effort. So although I'm not really sure whether this is what Ludde is doing (maybe for the sake of simplicity or clean-ness of the code he won't use that), Unicode does not automatically mean dropping Windows 98 support.

~Grauw

Link to comment
Share on other sites

Unicode just allows more languages to be displayed correctly, that's all AFAIK. Too bad everyone doesn't speak/wrtie English, otherwise we wouldn't be having these kinds of problems. :(

Well, no.

Unicode is more than just languages. There are a lot of typographical marks (even in English alone) such as ', ', ", ", …, ¶, ‴, ※, ⁂, ‱, —, etc. that are used commonly in print but can't be used on a computer that only supports Latin-1. All currencies have their own signs as well, €, £, ¥, ¢, ƒ, ₣, etcetera. There are characters for musical notations as well, e.g. ♩♪♫♬♭♮♯, mathematics: ¬∀∃∏⊂≙÷√∠±⊨⊙, astrology: ♂♃♄♅♆♇♈♉♋♊♌♍♎♏♐♑♒♓, and numerous others: ✔♡♢♣♤♥♦♧♟♞♝♜♛♚♙♘♗♖ ♔♕☀☁☂☛☢☣☪☯☮☭☽✆✂ ❶❷❸➔➞➥➩➲▁▂▃▄▅▆▇█◘◙◚◛ and of course the Swedish sign ⌘ for a tourist attraction.

All those don't fit in just 256 characters.

(If you can't see them well, enlarge your font size. And of course you need to use a system or browser that can understand Unicode, e.g. Firefox on Windows 98 understands Unicode, and of course need to have fonts installed that actually contain those characters :).)

Also, there are signs for old languages like runes, etc. Even if everyone on the whole world would from now on speak English, there would still be an entire legacy of documents which would be lost because we can't store the old documents.

That aside, I'm very glad that the world doesn't speak only English :).

In any case, Unicode is good in all aspects. It solves a lot of problems and makes a lot more things possible.

~Grauw

Link to comment
Share on other sites

though it is not perfect.... it can read torrents made by bitcomet.

here is an example of a torrent i use all the time that can't be read by abc, bittornado, but can be read on BC, and NOW on utorrent.

Torrent FIle I Use

[...]

i see some ppl still have problems, maybe it's because my regional settings are set to Chinese HK SAR.. i havent tried this in English United States settings yet. don't plan to. :P

I just tested the torrent you provided, it works on my machine as well. I tried reading the .torrent, however, I did not see any encoding other than UTF-8. I guess the user that created the .torrent with Bitcomet had no IME installed? Which means the chinese is 100% unicoded.. Just a guessing tho, could be wrong.

Digito, I guess some of the chinese torrent you download might not work with µTorrent just yet. example

Link to comment
Share on other sites

digito: the problem is most of the BitComet torrents aren't Unicode. :|

it doesn't use unicode when the user's code page is chinese or japanese, and instead makes a separate .utf-8 key for that (which nothing but BitComet reads), which leads to problems on the other Unicode enabled clients...

not sure if a workaround can be made easily. (the spec says to use utf-8 in name/path after all, not use path.utf-8 and name.utf-8). we'd need testers, probably chinese users, to do that. :P only problem is the difference in timezones, although you guys are a bit closer to ludde's timezone.

Link to comment
Share on other sites

not sure if a workaround can be made easily. (the spec says to use utf-8 in name/path after all, not use path.utf-8 and name.utf-8).

would you please give me a link about this.

can't find anything about utf8 in http://www.bittorrent.com/protocol.html and http://wiki.theory.org/BitTorrentSpecification

Azureus make use of encoding key to parse strings, when making torrent in AZ it embedded 8encoding5UTF-8 into .torrent file, uT seems totally ignore it (because it's not standard? but it did a workaround).

utf8az0ef.gif

the noted [uTF-8] maps to encoding field in .torrent. when opening a .torrent which using SimplifiedChinese it shows [GBK], and [bIG5] when TraditionalChinese.

first translating string value from encoding, if encoding error catched then fall back to UTF8, or reverse.

on uT torrent making bug, plz check my hex dump http://forum.utorrent.com/viewtopic.php?pid=25759#p25759

as we know BC use non standard name.utf-8/path.utf-8, thus uT's path field should equals to BC's path.utf-8, they both UTF8 encoding. and check the hex value in uT's path field with my hex dump above it's UTF16LE, and length of path field should be 10.

you need a conversion from UTF16LE --> UTF8 for every string value, and the length should be memSize of the string, NOT nums of chars(return value of strlen()). embeding a 8encoding5UTF-8 won't hurt anything.

Unicode enabled clients does not mean UTF8 "only" clients. we need backward compatibility for old torrents. as digito mentioned BitTornado/ABC can't handle UTF8 torrents, how other clients do? AZ/BC do this well, by non standard path.utf-8 or encoding key.

Unicode supports UTF8/UTF16LE/UTF16BE/UTF32/..., how you tell from them without a BOM signature? xml/html still need the encoding attribute, tho they default to iso8859-1.

@digito:

the .torrent you attached is UTF8 encoding, that's strange it's made by BitComet. BC default to systemDefaultCodePage, i can't make UTF8 torrent with BC, even switch to HK SAR. how you do that?

Link to comment
Share on other sites

The torrent maker in 356 was broken, please use 358 for your tests. 358 also uses the encoding field now if available.

BitComet only uses default code page if your system's code page is Chinese or Japanese.

Important Notice: In BitComet v0.58 or before, the string is encoded using MSCB (user's code page), and a ".utf-8" key is added for UTF-8 encoded string. In v0.59, the default encoding is changed to UTF-8 if the user's code page is neither Chinese nor Japanese. BitComet will still keep adding a ".utf-8" key for all string for backward compatibility, e.g. add a "name.utf-8" key to store utf-8 file name. After most of the users upgrade their client a few months later, BitComet may stop adding ".utf-8" key.
Metainfo File Structure

All data in a metainfo file is bencoded. The specification for bencoding is defined above.

The content of a metainfo file (the file ending in ".torrent") is a bencoded dictionary, containing the keys listed below. All character string values are UTF-8 encoded. Keys not marked 'optional' are required fields:

Link to comment
Share on other sites

Are you implying that µTorrent should be coded in Java instead? :P

Pls dont! Java sucks big time. On my ultrafast Pentium II 200mhz pc with a huge amount of ram(32mb!) Azuraeus crawls at a snails pace. Yeah java sucks BIG TIME!

:)

Too bad everyone doesn't speak/wrtie English, otherwise we wouldn't be having these kinds of problems.

You dont want that cos the world will be a more boring place to live in.

:)

Link to comment
Share on other sites

I am so sorry to bother anyone. My problem is that I discovered the Torrent system yesterday. I have gone all over the internet trying to learn how to use it and finally downloaded utorrent after seeing a lot of positive reviews for it. However, I downloaded a file, it says that it is 100% downladed and that I am now a seeder but when I try to view the file (it is an episode in the tv show "Supernatural") it doesn not let me. I double clicked on it and tried a bunch of other things and now feel like crying. Can anyone please help me. My email is timarach05@yahoo.com

Link to comment
Share on other sites

Archived

This topic is now archived and is closed to further replies.


×
×
  • Create New...