Firon Posted December 1, 2005 Report Share Posted December 1, 2005 That's not what I needed to see...If that specific string is not there, it's not Unicode. Simple as that. Link to comment Share on other sites More sharing options...
Hendrykes Posted December 1, 2005 Report Share Posted December 1, 2005 Edit: keep your request in the other thread, and don't beg/whine for it Link to comment Share on other sites More sharing options...
monkey8 Posted December 1, 2005 Report Share Posted December 1, 2005 That's not what I needed to see...http://img374.imageshack.us/img374/9289/utf8az0ef.gifIf that specific string is not there, it's not Unicode. Simple as that.Okay. I see what you mean now. I used each program Azureus/Bitcomet/Utorrent to create a torrent of chinese files. They automatically create them in UTF-8, so yes, utorrent could recognize it. However, as soon as I upload the torrent online (either forum/newsgroup) and download again, it's automatically changes to GBK (which is a unicode 1.1 standard). So this includes Chinese, Japanese and Korean languages. I guess thru downloading; once the .torrent passes thru browsers or some newsgroup program (I use thunderbird), it changes into GBK instead of UTF-8. With downloading the torrent thru mIRC, it's complete fine because there's not de/encoding involved.So I guess utorrent's unicode must include GBK.. or alteast upto Unicode 2.1. Link to comment Share on other sites More sharing options...
Firon Posted December 1, 2005 Report Share Posted December 1, 2005 No, that's impossible, a transfer like that cannot change the torrent's contents.Neither a browser nor a mail client have any way to change something like that. (They don't understand bencoding for one)I've used Japanese and Hebrew torrents in µTorrent with absolutely no problem.µTorrent's torrent maker cannot make unicode torrents by the way.The only Unicode encoding used for torrents is UTF-8. Neither BitComet nor Azureus will use anything besides that when using Unicode. Link to comment Share on other sites More sharing options...
monkey8 Posted December 1, 2005 Report Share Posted December 1, 2005 Humm. You're right. Browsers have no way of changing the content in the .torrent. But Azureus/Bitcomet do support GBK (or GB18030 included in Unicode 2.1) as part of their Unicode. That's the reality nature of their program. If utorrent is going implement Unicode, why doesn't it fully support it? For Azureus, I think the programmer has less work, because it's all done thru Java. (If you download english only java, it wouldn't work) For Bitcomet? I have no idea what they did.UTF-8 can not fully support Traditional/Simplified Chinese because of the limitations. Simply say, Chinese has more characters than Japanese (or even Hebrew if I might add).If possible, maybe I can upload a .torrent with GBK so you and the developers can test the torrent out? I think if I were to get a survey the millions of chinese torrents over the internet, I guess 80% of them are in GBK.Edit: I found the references here:http://en.wikipedia.org/wiki/CJKThe number of characters required for complete coverage of all these languages' needs cannot fit in the 256-character code space of 8-bit encodings, requiring at least a 16-bit fixed width character encoding or multi-byte variable-length encodings. The 16-bit fixed width encodings, such as Unicode up to and including version 2.0, are now deprecated due to the requirement that software in China support the GB18030 character set.Now if that hold true. What I don't understand is, why can I create a fully UTF-8 chinese .torrent locally? I guess it's the mighty developer's job to find out. lol.. Link to comment Share on other sites More sharing options...
Firon Posted December 1, 2005 Report Share Posted December 1, 2005 If you have web space or something, please upload one of those torrents. It'd help greatly to test with (since we don't have any testers using GBK OSes, just various other languages).However, UTF-8 can support all the Chinese characters and other languages, since it can use up to 4 bytes to create one character. The three Unicode encodings (UTF-8, UTF-16, UTF-32) support every Unicode character. (000000-10FFFF)(UTF-7 and UTF-EBCDIC are not used much) Link to comment Share on other sites More sharing options...
monkey8 Posted December 1, 2005 Report Share Posted December 1, 2005 Firon, you got PM.However, UTF-8 can support all the Chinese characters and other languages, since it can use up to 4 bytes to create one character. The three Unicode encodings (UTF-8, UTF-16, UTF-32) support every Unicode character. (000000-10FFFF)(UTF-7 and UTF-EBCDIC are not used much)You're absolutely right! I agree. If you take a look inside the .torrent file I send you. You can see UTF-8 tag in the two files. However, if you do a search for 'Big5' for [big5].torrent you'd see "encoding4:Big5". And "encoding3:GBK4" for [GBK].torrent. But either one doesn't exsist in the other.No clue what these tags do, but they are in there.Keep up the good work guys! Link to comment Share on other sites More sharing options...
Firon Posted December 1, 2005 Report Share Posted December 1, 2005 Can you resend the PM and files please?I guess those encodings are alternative fallbacks for a torrent that doesn't support UTF-8... Link to comment Share on other sites More sharing options...
ulogin01 Posted December 1, 2005 Report Share Posted December 1, 2005 Firon,For what it's worth, I just wanna confirm 'monkey8' findings.I've got a couple of chinese 'Big5' torrent files, which are not working properly in the 'Unicode' version of uTorrent, i.e. the filenames were created with strange characters instead of proper chinese character.The same torrents file would work fine using BitComet and Azureus.I sincerely hope that uTorrent can fix this problem, this is one feature that I've waiting for... :-)Keep up the great work ! Thanks ! Link to comment Share on other sites More sharing options...
monkey8 Posted December 1, 2005 Report Share Posted December 1, 2005 Firon, I have PM you the file again. If you still can't receive the PM, then I guess you should email me. Link to comment Share on other sites More sharing options...
magnus33 Posted December 1, 2005 Report Share Posted December 1, 2005 Oddly enough utorrent suddenly seems to be working fine with all the trackers now.Wonder if this was a issue with dht.....hmmmmm.All keep using and hoepfully this keeps playing happy now with everything. Link to comment Share on other sites More sharing options...
Firon Posted December 1, 2005 Report Share Posted December 1, 2005 [GBK]Simplified Chinese.torrent:created by13:BitComet/0.5613:creation datei1131723262e8:encoding12:windows-1252[big5]Traditional Chinese.torrent:created by13:BitComet/0.6013:creation datei1133402549e8:encoding12:windows-1252Neither of these BitComet versions make UTF-8 torrents for Japanese and Chinese code pages.The encoding is set to windows-1252, which is a superset of ISO/IEC 8859-1, an English language encoding.Azureus displays the same trash µTorrent does on my system and also reports that it is not Unicode, confirming my own finds.The only reason it shows up in Chinese for you in Azureus is because you set your system to interpret non-unicode trash as Chinese. "Language for non-Unicode programs".µTorrent actually is Unicode so it is unaffected by this setting, hence why it shows trash. It's doing exactly what it's supposed to do, displaying the torrent with windows-1252, since that's what the encoding in the torrent says it is.If I set my PC to Chinese with that setting and rebooted, it'd show up fine on Azureus for me too.Neither of these torrents are Unicode. BitComet's torrent maker sucks (what a surprise).Azureus makes UTF-8 torrents properly...µTorrent:Azureus:And yes, I am on a Unicode OS (Windows XP) with all languages and fonts installed, so it's not something missing on my system.here's proof if you don't believe me... phew..Also, neither Azureus nor µTorrent read name.utf-8 or path.utf-8, they're legacy things and unsupported. They both read name and path. Link to comment Share on other sites More sharing options...
Joes Posted December 1, 2005 Report Share Posted December 1, 2005 What about support of locale settings for non-unicode torrents in unicode µTorrent ? Link to comment Share on other sites More sharing options...
ben Posted December 1, 2005 Report Share Posted December 1, 2005 win xp pro sp2.i closed utorrent to d/l the new update, saved and then opened the new uttorrent.i get this error "file missing from job. please recheck"yet when i right click the torrent and goto "open containing folder" all the files are there.the path of the torrent is "[2001] A Predator's Portrait"and it seems to be the only torrent with [ and ] in the title, think that could be the problem?i have several other torrents with various characters in them though.. Link to comment Share on other sites More sharing options...
Firon Posted December 1, 2005 Report Share Posted December 1, 2005 Files missing from job means that some file is missing. Try a force re-check.If the torrent was Unicode and you had it loaded before, it probably won't work in the unicode no matter what you do. Link to comment Share on other sites More sharing options...
Amelia Posted December 1, 2005 Report Share Posted December 1, 2005 Really Unicode? I think the UI should be Unicode too, the "µ" letter is still displayed incorrectly on my Chinese coded Windows XP EN. Link to comment Share on other sites More sharing options...
Firon Posted December 1, 2005 Report Share Posted December 1, 2005 The UI is Unicode... it's showing µ for some people on other language OSes (hebrew, japanese).You sure you're actually running the unicode build? Link to comment Share on other sites More sharing options...
Amelia Posted December 1, 2005 Report Share Posted December 1, 2005 The UI is Unicode... it's showing µ for some people on other language OSes (hebrew, japanese).You sure you're actually running the unicode build?Um... The "µ" in the main title and "About" form are all right, different from the previous versions. But the "About" form title and context menu still displayed as "礣orrent".PS: I'm using Windows XP EN, with Regional and Language Option set "Chinese (PRC)". Link to comment Share on other sites More sharing options...
Amelia Posted December 1, 2005 Report Share Posted December 1, 2005 Ah~ I changed the "Language for non-Unicode programs" to "English (United States)" just now, and all "µ" are displayed correctly.Now I see, the UI isn't completely Unicode.PS: the "µT" is defined "礣" in Chinese encoding, but maybe not defined in Hebrew and Japanese. Link to comment Share on other sites More sharing options...
hin123 Posted December 1, 2005 Report Share Posted December 1, 2005 uTorrent is not fully unicodeusing Traditional Chinese Windows XPAlso,I used 1.2.2 and set a file path containing Chinese characters.In 1.2.3 unicode the Chinese characters became a square Link to comment Share on other sites More sharing options...
ludde Posted December 1, 2005 Author Report Share Posted December 1, 2005 I know why the µ doesn't show right in the title. Will be fixed in next beta. Link to comment Share on other sites More sharing options...
asmodai Posted December 1, 2005 Report Share Posted December 1, 2005 [snip...] just maybe fill those character strings with uncode in 2000/xp/... and with ANSI in 95/98/ME. You ain't gonna abandon support on 98, are you?You mean ASCII. ANSI is an American standardisation organisation. Link to comment Share on other sites More sharing options...
asmodai Posted December 1, 2005 Report Share Posted December 1, 2005 I know why the µ doesn't show right in the title. Will be fixed in next beta.Remind me again, is Windows UTF-8 by default or was it UCS2 or even UCS4? Link to comment Share on other sites More sharing options...
ludde Posted December 1, 2005 Author Report Share Posted December 1, 2005 Windows uses UTF-16 Link to comment Share on other sites More sharing options...
valur Posted December 1, 2005 Report Share Posted December 1, 2005 You mean ASCII. ANSI is an American standardisation organisation.ANSI os not -only- the American National Standards Institute, it can also mean (quote wikipedia):"In Microsoft Windows, the phrase "ANSI" refers to the Windows ANSI code pages. Most of these are fixed width though there are some variable width ones for ideographic languages. Some of these are very close to the ISO-8859 series leading many to falsely assume that they are identical.ASCII art which is colorized or animated by way of ANSI terminal control codes (X3.64 sequences) are commonly referred to as "ANSI art" and were predominantly popular on bulletin board systems throughout the 1980s and 1990s." - http://en.wikipedia.org/wiki/Ansiacronyms can mean more than 1 thing... I remember in the old dos days when i had to load the ansi.sys file in to make colored screens and fonts in bat scripts Link to comment Share on other sites More sharing options...
Recommended Posts
Archived
This topic is now archived and is closed to further replies.