Jump to content

µTorrent 1.2.3-beta Unicode


ludde

Recommended Posts

  • Replies 120
  • Created
  • Last Reply
That's not what I needed to see...

http://img374.imageshack.us/img374/9289/utf8az0ef.gif

If that specific string is not there, it's not Unicode. Simple as that.

Okay. I see what you mean now. I used each program Azureus/Bitcomet/Utorrent to create a torrent of chinese files. They automatically create them in UTF-8, so yes, utorrent could recognize it.

However, as soon as I upload the torrent online (either forum/newsgroup) and download again, it's automatically changes to GBK (which is a unicode 1.1 standard). So this includes Chinese, Japanese and Korean languages. I guess thru downloading; once the .torrent passes thru browsers or some newsgroup program (I use thunderbird), it changes into GBK instead of UTF-8.

With downloading the torrent thru mIRC, it's complete fine because there's not de/encoding involved.

So I guess utorrent's unicode must include GBK.. or alteast upto Unicode 2.1.

Link to comment
Share on other sites

No, that's impossible, a transfer like that cannot change the torrent's contents.

Neither a browser nor a mail client have any way to change something like that. (They don't understand bencoding for one)

I've used Japanese and Hebrew torrents in µTorrent with absolutely no problem.

µTorrent's torrent maker cannot make unicode torrents by the way.

The only Unicode encoding used for torrents is UTF-8. Neither BitComet nor Azureus will use anything besides that when using Unicode.

Link to comment
Share on other sites

Humm. You're right. Browsers have no way of changing the content in the .torrent. But Azureus/Bitcomet do support GBK (or GB18030 included in Unicode 2.1) as part of their Unicode. That's the reality nature of their program. If utorrent is going implement Unicode, why doesn't it fully support it? For Azureus, I think the programmer has less work, because it's all done thru Java. (If you download english only java, it wouldn't work) For Bitcomet? I have no idea what they did.

UTF-8 can not fully support Traditional/Simplified Chinese because of the limitations. Simply say, Chinese has more characters than Japanese (or even Hebrew if I might add).

If possible, maybe I can upload a .torrent with GBK so you and the developers can test the torrent out? I think if I were to get a survey the millions of chinese torrents over the internet, I guess 80% of them are in GBK.

Edit: I found the references here:

http://en.wikipedia.org/wiki/CJK

The number of characters required for complete coverage of all these languages' needs cannot fit in the 256-character code space of 8-bit encodings, requiring at least a 16-bit fixed width character encoding or multi-byte variable-length encodings. The 16-bit fixed width encodings, such as Unicode up to and including version 2.0, are now deprecated due to the requirement that software in China support the GB18030 character set.

Now if that hold true. What I don't understand is, why can I create a fully UTF-8 chinese .torrent locally? I guess it's the mighty developer's job to find out. lol.. :P

Link to comment
Share on other sites

If you have web space or something, please upload one of those torrents. It'd help greatly to test with (since we don't have any testers using GBK OSes, just various other languages).

However, UTF-8 can support all the Chinese characters and other languages, since it can use up to 4 bytes to create one character.

The three Unicode encodings (UTF-8, UTF-16, UTF-32) support every Unicode character. (000000-10FFFF)

(UTF-7 and UTF-EBCDIC are not used much)

Link to comment
Share on other sites

Firon, you got PM.

However, UTF-8 can support all the Chinese characters and other languages, since it can use up to 4 bytes to create one character.

The three Unicode encodings (UTF-8, UTF-16, UTF-32) support every Unicode character. (000000-10FFFF)

(UTF-7 and UTF-EBCDIC are not used much)

You're absolutely right! I agree. If you take a look inside the .torrent file I send you. You can see UTF-8 tag in the two files. However, if you do a search for 'Big5' for [big5].torrent you'd see "encoding4:Big5". And "encoding3:GBK4" for [GBK].torrent. But either one doesn't exsist in the other.

No clue what these tags do, but they are in there.

Keep up the good work guys!

Link to comment
Share on other sites

Firon,

For what it's worth, I just wanna confirm 'monkey8' findings.

I've got a couple of chinese 'Big5' torrent files, which are not working properly in the 'Unicode' version of uTorrent, i.e. the filenames were created with strange characters instead of proper chinese character.

The same torrents file would work fine using BitComet and Azureus.

I sincerely hope that uTorrent can fix this problem, this is one feature that I've waiting for... :-)

Keep up the great work ! Thanks !

Link to comment
Share on other sites

[GBK]Simplified Chinese.torrent:

created by13:BitComet/0.5613:creation datei1131723262e8:encoding12:windows-1252

[big5]Traditional Chinese.torrent:

created by13:BitComet/0.6013:creation datei1133402549e8:encoding12:windows-1252

Neither of these BitComet versions make UTF-8 torrents for Japanese and Chinese code pages.

The encoding is set to windows-1252, which is a superset of ISO/IEC 8859-1, an English language encoding.

Azureus displays the same trash µTorrent does on my system and also reports that it is not Unicode, confirming my own finds.

The only reason it shows up in Chinese for you in Azureus is because you set your system to interpret non-unicode trash as Chinese. "Language for non-Unicode programs".

µTorrent actually is Unicode so it is unaffected by this setting, hence why it shows trash. It's doing exactly what it's supposed to do, displaying the torrent with windows-1252, since that's what the encoding in the torrent says it is.

If I set my PC to Chinese with that setting and rebooted, it'd show up fine on Azureus for me too.

Neither of these torrents are Unicode. BitComet's torrent maker sucks (what a surprise).

Azureus makes UTF-8 torrents properly...

µTorrent:

uttrash6wl.gif

Azureus:

aztrash0jt.gif

aztrash19yr.gif

aztrash27gp.gif

And yes, I am on a Unicode OS (Windows XP) with all languages and fonts installed, so it's not something missing on my system.

here's proof if you don't believe me...

chineseworks8xa.th.gif

phew..

Also, neither Azureus nor µTorrent read name.utf-8 or path.utf-8, they're legacy things and unsupported. They both read name and path.

Link to comment
Share on other sites

win xp pro sp2.

i closed utorrent to d/l the new update, saved and then opened the new uttorrent.

i get this error "file missing from job. please recheck"

yet when i right click the torrent and goto "open containing folder" all the files are there.

the path of the torrent is "[2001] A Predator's Portrait"

and it seems to be the only torrent with [ and ] in the title, think that could be the problem?

i have several other torrents with various characters in them though..

Link to comment
Share on other sites

The UI is Unicode... it's showing µ for some people on other language OSes (hebrew, japanese).

You sure you're actually running the unicode build?

Um... The "µ" in the main title and "About" form are all right, different from the previous versions. But the "About" form title and context menu still displayed as "礣orrent".

PS: I'm using Windows XP EN, with Regional and Language Option set "Chinese (PRC)".

Link to comment
Share on other sites

Ah~ I changed the "Language for non-Unicode programs" to "English (United States)" just now, and all "µ" are displayed correctly.

Now I see, the UI isn't completely Unicode.

PS: the "µT" is defined "礣" in Chinese encoding, but maybe not defined in Hebrew and Japanese.

Link to comment
Share on other sites

You mean ASCII. ANSI is an American standardisation organisation.

ANSI os not -only- the American National Standards Institute, it can also mean (quote wikipedia):

"In Microsoft Windows, the phrase "ANSI" refers to the Windows ANSI code pages. Most of these are fixed width though there are some variable width ones for ideographic languages. Some of these are very close to the ISO-8859 series leading many to falsely assume that they are identical.

ASCII art which is colorized or animated by way of ANSI terminal control codes (X3.64 sequences) are commonly referred to as "ANSI art" and were predominantly popular on bulletin board systems throughout the 1980s and 1990s."

- http://en.wikipedia.org/wiki/Ansi

acronyms can mean more than 1 thing... I remember in the old dos days when i had to load the ansi.sys file in to make colored screens and fonts in bat scripts :P

Link to comment
Share on other sites

Archived

This topic is now archived and is closed to further replies.


×
×
  • Create New...