Jump to content

µTorrent 1.2.3-beta Unicode


ludde

Recommended Posts

@ludde: I don't know if you could do anything with this or if it helps to solve the problem:

apparently a layer over the Win32API if needed which requires a 160kb dll (unicows.dll)

The Microsoft Layer for Unicode on Windows 95/98/Me Systems

-DG

@cTn: Older Windows Systems were limited to 8 Bit so they could only use/show256 signs, letters and numbers (actually even less, since some of them are control characters and thus not printable like 13 = carriage return)

Unicode contains as well national subsets of mainly letters (like umlauts) and the newer Win (since Windows NT4.0 if I'm not mistaken) Versions circumvent the 8Bit-limitation that's basically it...

Cool - thanks much.

Where do I install these files by the way? In \WINDOWS\SYSTEM32 or in some other location?

Thanks

Link to comment
Share on other sites

  • Replies 120
  • Created
  • Last Reply
Thank god unicows.dll will reside in your systems32 folder thus not bloating µtorrent (which would be a shame.)

So THAT is where it goes. In the actual SYSTEM32 folder, or under the DRIVERS sub folder?

Thanks.

OKAY - UPDATE - even though I installed the Unicode layer (in both drivers and system32) it still crashes exactly as before, so I guess the problem wasn't the unicode. I also tried removing the config files in the Utorrent data folder and that didn't work.

So, the bug is still submitted to that bug submission thing. Hope they can tweak it to work ok.

Link to comment
Share on other sites

Back to 'Unicode torrent' topics: (maybe a new thread of this?.. hehe)

Is it safe to say that Bitcomet is at fault for creating non-UTF-8 .torrents on some users' machines? If so, then is there anyway µTorrent developers can tackle this problem and support errors made by Bitcomet?

I wouldn't say it's totally their fault. If you do visit Bitcomet forum, you wouldn't see any language problems after their 0.56 (or maybe later) release. It's because users have different input systems (BIG5/GBK) installed that created this mess. (so it's M$'s fault.. heh) So the problem started even before Bitcomet. As I remember, Bitcomet can not rander Traditional Chinese in their older 0.5x versions. I'm not speaking for Bitcomet, but I think they intentionally made their program to cater other languages. Is that bad programming? I don't know, but it does work. That's what the end-user sees.

Let's imagine this: (just an estimate)

1/10 person creates a (Chinese) non-unicode torrent in Bitcomet (why? Bitcomet doesn't have super-seed.. lol), 4/10 uses µTorrent to d/l and it doesn't recognizes it. That 2/4 might switch to another client, in fact, these two person not only downloads Chinese, he/she also d/l other english contents found in other popular sites. This is what stops a few of my friends from using µTorrent. If the numbers were in hundreds, that's going to hit DHT server. This is not estimates, but I have tried downloading the same Chinese torrent using Az, Bitcomet and µTorrent at the same time (ignoring the messup letters from µT, and using different ports of coz). Bitcomet wins by 1hr hands down, Az comes in second 10 mins later. Simply because they have bigger audience in DHT. (don't get me wrong, µT maxed my d/l in other english torrents! :D)

So is µTorrent going to tackle this? Azureus can be deal with a simple fix to recognize what Bitcomet encoded, but Az is a memory beast. You can't force Bitcomet to change their programming. So is it wise to cater the popluarity? Not saying the Chinese are µT's top marketing segment. But in reality, a lot of Chinese do download english stuff online also, most use Bitcomet (creates torrent) and most of them would try something that gives them an edge - like speed/less mem resouces - which µT can give. Word of mouth are powerful in the Chinese comunity, if µT can deliver what they need, I'm sure µT will gain a lot more popularity.

My lunch break is over, gotta go. ;)

Link to comment
Share on other sites

Azureus doesn't read those torrents properly either. It's the non-Unicode language setting that made it work, not anything in Azureus... (Azureus isn't considered unicode by the OS, it's the JVM that handles the processing of Unicode I suppose)

oh, and BitComet and µTorrent are both on the same DHT network.

But it really is a fault with BitComet because the client won't make UTF-8 torrents like they're supposed to (sigh) for Chinese and Japanese codepages, which screws up just about every other client.

on this note, can you come by the IRC channel (irc.freenode.net #utorrent) to help out? we don't have any chinese testers, so we can't really uh, test anything in chinese. maybe a workaround can be made to deal with BitComet's non-standardness with some help.

Link to comment
Share on other sites

.. It's because users have different input systems (BIG5/GBK) installed that created this mess. (so it's M$'s fault.. heh) So the problem started even before Bitcomet.

How is it microsoft's fault? It's the user's fault for inputing non standard characters using legacy, outdated software (such as cstar, nj etc). Why not use Windows' built-in IME which just works?

Link to comment
Share on other sites

DON'T unleash your anger on µTorrent without proper testing & proof, as µTorrent has nothing to do with playing back your beloved mp3's. Something's fucked up on your PC. :mad:

What he said.

These guys work there butts off getting this going and updated at a amazing rate.

Hell everytime i start it i keep waiting for some sign its started before i clue in its already running and with so little impact on my system.

Pss... merry xmas guys.

Link to comment
Share on other sites

.. It's because users have different input systems (BIG5/GBK) installed that created this mess. (so it's M$'s fault.. heh) So the problem started even before Bitcomet.

How is it microsoft's fault? It's the user's fault for inputing non standard characters using legacy' date=' outdated software (such as cstar, nj etc). Why not use Windows' built-in IME which just works?[/quote']

Since IME uses legacy input method, it uses (over 15+) different index encoding for different characters like BIG5, GBK, and including Unicode. Not too many users installs NJstar nowadays, unless they are using Win9x systems. IME now comes with WinXP that sold in that particular language package. You're right, not entirely microsoft's fault. But they should somehow force all these encoding into an all-in-one standard. That way, we wouldn't see language issues appeared in programs like WinMX, Firefox, IE, and so on. Don't get me wrong, these program work fine displaying those characters, they just don't work saving non-unicoded filenames.

reference: http://www.microsoft.com/globaldev/handson/user/IME_Paper.mspx

Link to comment
Share on other sites

Is it safe to say that Bitcomet is at fault for creating non-UTF-8 .torrents on some users' machines? If so, then is there anyway µTorrent developers can tackle this problem and support errors made by Bitcomet?

check BitComet specification http://www.bitcomet.com/doc/specification.htm

BitComet's UTF-8 extension is different to uTorrent's Unicode support. make 2 torrent in uT and BC with the same files(with Chinese chars), and open these 2 torrent file with any hexEditor, you'll find the difference.

damn BIG5 uses \backslash and more symbols which may cause many problem.

e.g. text file named:許功.txt inside a directory named:許功 (2 Chinese words which contains \backslash used as escape character in many apps)

filename in hex in BIG5=B3 5C A5 5C 2E 74 78 74, in UTF-8 without BOM=E8 A8 B1 E5 8A 9F 2E 74 78 74, in UnicodeLE without BOM=31 8A 9F 52 2E 00 74 00 78 00 74 00

path and name elements inside torrent.

BC:

pathl8:(B3 5C A5 5C 2E 74 78 74)

path.utf-8l10:(E8 A8 B1 E5 8A 9F 2E 74 78 74)

name4:(B3 5C A5 5C)

name.utf-86:(E8 A8 B1 E5 8A 9F)

uT:

pathl6:(31 8A 9F 52 2E 00)

name6:(E8 A8 B1 E5 8A 9F)

name element(length=6) using UTF-8 w/o BOM(2 Chinese words = 6Bytes in UTF-8), == name.utf-8 element in BC

path element(length=6) using UnicodeLE w/o BOM, the problem is string length. 2 Chinese words plus fileExt(.txt)=6, but in Unicode it's 12Bytes.

another problem is when uT opening torrent(filename contains Chinese) uT strip any charater outside ASCII (128~). you should treat anything in path/name elements(non-UTF8-ext) in system default code page.

IMHO i'll suggest BC like UTF-8 extension, using extra tag to support UTF-8, and system default code page in non-UTF8 elements.

Link to comment
Share on other sites

Azureus doesn't read those torrents properly either. It's the non-Unicode language setting that made it work, not anything in Azureus... (Azureus isn't considered unicode by the OS, it's the JVM that handles the processing of Unicode I suppose)

Agree, Azureus just happends to take a free ride on the JVM.. less programming on de/encoding languages.

oh, and BitComet and µTorrent are both on the same DHT network.

I didn't know that! I'm a complete newbie when it comes to DHT. But why do I get more swam if I use "apploc"? (apploc makes µTorrent recognize some chinese torrent) It's something that should be looked into.

But it really is a fault with BitComet because the client won't make UTF-8 torrents like they're supposed to (sigh) for Chinese and Japanese codepages, which screws up just about every other client.

on this note, can you come by the IRC channel (irc.freenode.net #utorrent) to help out? we don't have any chinese testers, so we can't really uh, test anything in chinese. maybe a workaround can be made to deal with BitComet's non-standardness with some help.

I don't mind help testing. In fact, I must just have an idea of how to workaround this issue. I'd come by irc, but I guess every is sleeping. lol

Link to comment
Share on other sites

IMHO i'll suggest BC like UTF-8 extension, using extra tag to support UTF-8, and system default code page in non-UTF8 elements.

This is what exactly I was thinking when I open up the torrents made by Bitcomet. No doubt, BC uses UTF-8 like how other client does. But they also tag an extra info for BIG5 or other languages. So that even some clients can't reconigze it, it still download the file(s) perfectly.

Thru reading BC's specification on unicode I guess what they mean is, it follows the UTF-8 specification but they also added something else like BIG5/GBK..etc on their own. If it wasn't for Azureus with it's multi-language JVM, I guess BC would have cover all the chinese users.

Can I say, really bad programming on BC's part? Well, not unless they release a their own non-unicode specification to the world before hand.. :P

Link to comment
Share on other sites

idle.newbie: µTorrent 1.2.3-beta's torrent maker is utterly broken, don't use it for any comparison... right now it doesn't make Unicode properly, and it strips the last 6 characters. :|

No, it's really bad programming. The proper standard is to use UTF-8 in name and path (Azureus does this). BitComet uses the user's code page (retarded) and uses a .utf-8 key to make the UTF-8 part. This is legacy and unsupported by anything except BitComet. (yes, even Azureus won't read it, although Azureus does make a .utf-8 key for BitComet to read).

Getting more peers on DHT on BitComet than µTorrent is probably a coincidence and depends highly on the torrent anyway. I've easily gotten 2000 peers from DHT with µTorrent.

Simply put, BitComet doesn't follow the UTF-8 standard for torrents (like so many other things it doesn't do right). UTF-8 should only be used in name and path, not a separate .utf-8 key! ALL STRINGS in the infodict should be UTF-8 (this is according to the BT spec).

When loading a torrent, µTorrent parses all the strings as UTF-8 I do believe...

Link to comment
Share on other sites

µTorrent 1.2.3-beta's torrent maker is utterly broken, don't use it for any comparison... right now it doesn't make Unicode properly, and it strips the last 6 characters. :|

i guess that's 'case 1.2.3 output the path tag as UTF-16LE w/o BOM and wrong string length, check my previous hex dump. in Java strlen in UTF16 returns nums of chars not bytes, dunno how C++ does.

tried hexEdit .torrent content, replacing name/path tags into UTF8, defaultPath/filename display and Force Re-Checking are correct on finished torrent. but on unfinish torrent, never download, server respond invalid url.

back to original .torrent, it downloads, but defaultPath/filename been treated as UTF8 and strip every Chinese words out.

it'll be great if there's a encoding dropdown list to force UTF8/UTF16/systemDefaultCodePage/embed encoding tag(non standard!?) on properties page.

switching back to 1.2.2.

Link to comment
Share on other sites

win xp pro sp2

every torrent i try to create comes up with this "file not found" error. i figured the problem came from me trying to create torrents from files that are on network drives, but i moved the files over to this local laptop drive and i still get the same problem. weird thing is, i opened azureus to make the torrent and it said "invalid file." both clients say it for tons of torrents i tried to create, so im assuming its some setting on my computer. i do the "force re-check" option in utorrent, and it says that im missing every single file (even though im the one who origonally made the torrent).. any ideas? this is suddenly a new thing, ive been using bittorrent clients for years and have never had this happen. i also havent installed any virus/etc scanners in months.

Link to comment
Share on other sites

Well, this particular beta's torrent maker is totally broken, so you'll have to use 1.2.2 to make torrents (and it won't work on unicode files/directories).

Now, if Azureus' torrent maker can't make the torrent (which IS unicode enabled and working), then you've got some weird problem...

Link to comment
Share on other sites

Archived

This topic is now archived and is closed to further replies.


×
×
  • Create New...