Jump to content

torrent creation (Unordered info dictionary)


freeUser15

Recommended Posts

okay, i'm on freeBSD under remote console. i have a setup with rTorrent (previously bitTornado).

i am a member of private-tracker and dedicate my seedbox only for it.

so the problem is: since some time ago(like a moth or so) i've(not only me) started experiencing some strange problems with torrents created by some "stable" versions on uTorrent.

reported versions:

# µTorrent 1.7.7 builds older than 8179

# µTorrent 1.8.0 builds older than 11813

# µTorrent 1.8.1 builds older than 12616

# µTorrent 1.8.2 builds older than 14458

# µTorrent 1.8.4 build 16286

some tor-clients open torrent-file and then say "unregistered torrent" (testes with: bitTornado/Deluge/Transemission/rTorrent(0.8.3))

and rTorrent 0.8.5 does not load the file and says: "Download has unordered info dictionary.".

but uTorrent says "fine" and everything goes okay on this side(1.8.4 and 1.6.1 under WinXP and Vista).

i've asked on various forums (i.e. official support ones (except bitTornado and transemission)) and i've got a more-less good answer:

Even if you remove the error message, you won't like it. It will mean that rtorrent and utorrent disagree about the SHA1 hash of the torrent. Because of a bug in utorrent it uses the incorrectly-ordered keys to compute the hash, instead of the bencode representation that's required by the bittorrent specs.

The bencode representation ALWAYS has keys in alphabetical order. So it's utorrent at fault here for making a torrent with wrongly ordered keys and then using that wrong order to compute the hash. Unfortunately most trackers will agree with utorrent, rather than the standard. I guess, if you just shout a falsehood loud enough people will believe you.

In any case, even if you make rtorrent load the file, you'll be the only peer in the swarm, except for other rtorrent clients. All utorrent versions will use the wrong info hash and so you won't find them, and they won't find you.

The only solution is to get the utorrent people to read the bencode specs again and fix their client, and to reupload fixed versions of the torrent files.

Or maybe hack rtorrent to reproduce the utorrent bug, though I doubt you'll see that become part of the code base..

so i still wonder. is this a "continuation of bitTorrent-spec" or "conspiracy"? what is the reason of this and who needs to fix "some" problems?

thanks. and sorry, if this was already mentioned.

Link to comment
Share on other sites

The bencode representation ALWAYS has keys in alphabetical order. So it's utorrent at fault here for making a torrent with wrongly ordered keys and then using that wrong order to compute the hash. Unfortunately most trackers will agree with utorrent, rather than the standard. I guess, if you just shout a falsehood loud enough people will believe you.

Have they looked at the .torrent file directly to see if uTorrent is indeed ignoring the spec for this?

Are they instead assuming that the files LIST needs to be sorted before hashing in their bencode library (which is VERY BAD process)

Could you post one of the offending torrents here?

Link to comment
Share on other sites

Have they looked at the .torrent file directly to see if uTorrent is indeed ignoring the spec for this?

my quote is by anonymous(unfortunately), so i think he/she "assumes".

but. i've made research(before asking) and found the difference between rTorrent 0.8.3 and 0.8.5 (the first one did load the torrent, but failed with "unregistered") source-code here: http://libtorrent.rakshasa.no/changeset/1094

if (b.flags() & Object::flag_unordered)

throw input_error("Download has unordered info dictionary.");

so i think this IS a problem here.

and on Deluge-Forums i've contacted GlobalModerator and he came up with "unregistered" and that's all.. nothing more, so i'm not sure.

Are they instead assuming that the files LIST needs to be sorted before hashing in their bencode library (which is VERY BAD process)

i'm not sure, that this is a "VARYBAD" idea :) .

yes, it will speed-up Torrent-Creation (difference could be seen on a LARGE number of files, or might fail under low-end computers),

but what's the difference on the "receiver" side? ...

i.e. i'm downloading something and torrent has unordered list.. i need to UNORDER my files (to be consistent with sender-creator) and then download.

btw, i've found a funny thing (i have winXP and uTor 1.8.4) i've got that "newTorrent" and i had 90% of files on my hdd... so i had to reHash and Continue.. so i noticed that at first some files in the middle of season(it's a tv-show) got hashed. then some from start (first ones) and then it ended with some missing parts-of-file on the 3rdEP and on 7thEP, while the last ones are 12th or such.

Could you post one of the offending torrents here?

as i said "private tracker".. but i think: or PM me (frack, there is no PMs here.... okay).. icq:968978 and we'll end with something.(of.. it's Friday here.. so i think icq will be from Monday only)

Link to comment
Share on other sites

I was talking with theshadow, and if the keys of ANY dictionary in a torrent are improperly sorted (current cvs is somewhat more tolerant) then the torrent will be outright rejected and won't even load.

The fact that rtorrent is behaving this way (different infohash) tells me that it's doing something wrong in calculating the infohash.

Link to comment
Share on other sites

If you can, please join our IRC channel and provide us with the .torrent file.

i'm not sure, that this is a "VARYBAD" idea

Bear in mind that DWK was talking about lists, which are different from dictionaries. That said, the reason no client should assume any particular sort order about files is that there is no such specification on the way the files lists should be sorted, or that BEncoded lists should have any particular sort order outside of the order given to them by the creator. If a client assumes the files list is (or must be) any order other than specified in the .torrent file's info dictionary, then the pieces key is no longer valid for the .torrent file.

I guess, if you just shout a falsehood loud enough people will believe you.

Indeed. They can keep repeating whatever they're claiming (wherever it is that they heard it from), but that doesn't make it any more true. Unless they have solid/undeniable/reproducible proof that µTorrent is creating malformed .torrent files, they're spouting uninformed crap out of their asses. I've personally looked at various .torrent files created by µTorrent, and haven't seen any with incorrectly ordered keys in the info dictionary.

Link to comment
Share on other sites

For those not in IRC for the discussion, the difference appears to be that rTorrent is re-ordering the files list before calculating the infohash.

Because of how lists behave in bencode, they must be assumed to be in the correct order in the bencoded file and not re-sorted by clients before doing sensitive calculations such as infohash.

rTorrent is attempting to sort the files list before calculating the infohash, which is a violation of bencode specs.

Torrentspy (from http://torrentspy.sf.net/) has one of the strictest bencode libraries out there and it gets the infohash correct.

This is a bug in rTorrent.

Link to comment
Share on other sites

To be sure, µTorrent doesn't create such a key to begin with. That confirms the fact that the only possible cause is a tracker with a bad bencoder injecting data into the info dictionary incorrectly.

The real question is, should µTorrent even be accepting .torrent files that aren't following the specs? I honestly don't know, although I can certainly see a downside: Accepting such malformed .torrent files means the applications creating malformed .torrent files will be less likely/willing to fix them (even though it'd be a simple fix), which would be a bad thing.

Link to comment
Share on other sites

From a technical standpoint, bencoded dictionaries are indeed supposed to be sorted. From the specifications on the infohash, though, it isn't so completely cut-and-dry clear.

According to the specs...

"info_hash: urlencoded 20-byte SHA1 hash of the value of the info key from the Metainfo file."[1]

"info_hash: The 20 byte sha1 hash of the bencoded form of the info value from the metainfo file. Note that this is a substring of the metainfo file."[2]

If it is a substring of the metainfo file, then it would be wrong for other clients to assume they can just re-encode the dictionary from memory and find the hash from that -- they should to be obtaining the hash from the portion of the file itself that belongs to the info dictionary (which may or may not be sorted properly, as we've seen).

Edit: Honestly, after a bit more pondering, taking the hash from a substring of the physical file makes more sense. The point of the metainfo file is that it's supposed to be self-contained and consistent across different platforms/clients. If the infohash is no longer a property of the metainfo file, but a property of an interpretation of the metainfo file, then that portability no longer exists. Seems to me like µTorrent is the one correctly interpreting the specs here as far as infohash calculation is concerned.

Essentially, if a client does not immediately outright reject a .torrent file due to improper dictionary key sorting, then it should still be calculating the infohash based on the .torrent metainfo file, and based on that file only.

Admittedly, this nuance may a point in the specs that needs more explication.

Edit: And just for documentation purposes, the quote in freeUser15's first post came from this Rakshasa libTorrent Trac ticket.

Link to comment
Share on other sites

For the record, I'm the one who added the rtorrent code that would reject such broken .torrent files, after some users complained in the rtorrent IRC channel that they couldn't see any peers. Before that change, rtorrent would use the info_hash computed from the correct bencode form, now it will just reject the file outright. From what I understand, utorrent will compute the info_hash from the string in the .torrent file. Please correct me if I'm wrong, I've never run utorrent and don't have a computer that can run it (unless there's a PPC OSX version of it?). And hence our disagreement.

Personally, I believe the broken files should be rejected. This is the only clean way to resolve the problem: force the people generating broken bencode dictionaries to get their act together. This is what rtorrent does now, but it seems hard for the users to accept it, or the ticket you linked to wouldn't have been opened. Maybe this will change a popular client like utorrent adopts the same stance. Because nothing good can come from subverting the one rule that makes the bencode representation of a dictionary a unique string for any set of keys and values, making it independent of the internal data structure used to hold it.

However, if we argue that such broken files should be accepted, you can of course arrive at two different info_hashes. Both can be explained to comply with the bittorrent specs by bending it a little, in two different ways.

In your quote of the specs, you focused of course on the part that validates the utorrent way. I read the official specification this way (and wiki.theory.org only provides a first interpretation, not the official spec anyway):

"info_hash: The 20 byte sha1 hash of the bencoded form of the info value from the metainfo file. Note that this is a substring of the metainfo file [assuming the metainfo file is correctly bencoded -ED]."

This obviously means that to get the info_hash, you have to create the proper bencoded form of the broken form present in the .torrent file. This is what rtorrent used to do.

So you either bend the specs to not require a bencoded form of the info value for the info_hash and just generate the info_hash from any random bytes you may find in the info key in the file. Or you bend the spec to not require the metainfo itself to be in bencoded form when you load the file, even though you can still parse it into keys and values, and create the bencode representation from that.

My feeling is that the better choice would be to allow files to be a little broken, but to internally fix them so that they conform to the specs. Dropping the bencode representation requirement for generating the info_hash is a poorer choice. For instance, it means that if you open the .torrent file in any torrent editor, and save it without any modifications, you would get a new info_hash, which would, incidentally, be the info_hash you would've gotten if you did it the rtorrent way. (This assumes the torrent editor creates the correct bencode representation of course.)

There is only one correct bencode representation for any torrent metainfo set. And if you find an incorrectly bencoded form, you either fix it, or you reject it and tell the user to complain to the tracker admin or something. You can't just use the broken encoding as if nothing was wrong.

Personally, as I said before, I believe just rejecting broken bencode files is the cleanest solution. Otherwise, someone would need to clarify the specs to say how "mostly" conforming files should be treated. That's a matter for the BT mailing list to discuss, I suppose.

Oh, and on the rtorrent ticket discussion you wrote

µTorrent doesn't bencode incorrectly.

Then this might interest you, it's an extension handshake packet as it is created by all recent utorrent versions (IPs blackened to protect the guilty but otherwise unmodified):

d1:ei0e1:md11:upload_onlyi3e12:ut_holepunchi4e11:ut_metadatai2e6:ut_pexi1ee4:ipv44:XXXX4:ipv616:XXXXXXXXXXXXXXXX13:metadata_sizei15408e1:pi16033e4:reqqi255e1:v15:µTorrent 1.8.46:yourip4:XXXXe

I wonder why the ipv4/ipv6 keys come after the m key? Are you sure it shouldn't be

d1:ei0e4:ipv44:XXXX4:ipv616:XXXXXXXXXXXXXXXX1:md11:upload_onlyi3e12:ut_holepunchi4e11:ut_metadatai2e6:ut_pexi1ee13:metadata_sizei15408e1:pi16033e4:reqqi255e1:v15:µTorrent 1.8.46:yourip4:XXXXe

I originally found this when I implemented the broken-bencode-rejection patch. As a result I had to change the patch and limit it only to the info dictionary of torrent files, and so this issue never arose again. Please forgive me that I couldn't resist bringing it up... and for not opening a bug report here at the time.

Link to comment
Share on other sites

jdrexler: thanks for reporting the incorrect encoding of the extension message.

Your message focuses on the fact that bencoded dictionaries should have their keys ordered. I think we're all agreeing about this. What is still missing is any evidence that suggests that a .torrent file generated by uTorrent has a dictionary whose keys are _not_ ordered. Until anybody can produce such evidence, I will assume that the bug is limited to the extension message.

Link to comment
Share on other sites

Personally, I believe the broken files should be rejected. This is the only clean way to resolve the problem: force the people generating broken bencode dictionaries to get their act together. This is what rtorrent does now, but it seems hard for the users to accept it, or the ticket you linked to wouldn't have been opened. Maybe this will change a popular client like utorrent adopts the same stance. Because nothing good can come from subverting the one rule that makes the bencode representation of a dictionary a unique string for any set of keys and values, making it independent of the internal data structure used to hold it.

That I certainly agree with wholeheartedly. If all clients simply rejected bad .torrent files, we would not need to be worrying about fringe cases as we're discussing now :/ Unfortunately, clients (not just µTorrent) are accepting them, so to that end, the only thing we can do at that point is follow the spec's words. As far as the official specs are concerned, I still think my previous point holds:

The point of the metainfo file is that it's supposed to be self-contained and consistent across different platforms/clients. If the infohash is no longer a property of the metainfo file, but a property of an interpretation of the metainfo file, then that portability no longer exists.

If there's a disagreement about how we should interpret the specs, then indeed, as I mentioned earlier, "this nuance may a point in the specs that needs more explication." I'm not trying to bend my interpretation the specs to match µTorrent's behavior -- I'm simply following what was said in the official specs, which is fairly unambiguous about where the infohash should come from. Had I not found that line about substrings in the official specs (and believe you me, I didn't expect to find it -- else I would not have had to think about it again), I would not have come to any conclusion that µTorrent was the one correctly interpreting the spec. It would technically have been a wash as to which side was "correct", even though I still believe the actual .torrent metainfo file itself (not what one thinks the metainfo file should look like) should still remain the undisputed reference.

What is unambiguous (as we're all agreeing on) is how invalid .torrent files should be handled. If all clients can reach a consensus and agree that invalid .torrent files should simply be thrown out, then that would definitely be the best solution going forward. The only problem then becomes that users with b0rked .torrent files will end up stranded if they can't find other sources to replace them with (and many users do keep old torrents lying around).

Regarding my "µTorrent doesn't bencode incorrectly", I'll admit that I phrased it poorly. I'd meant µTorrent does not create malformed .torrent files, as that's all I concerned myself with.

Edit: To further illustrate my point about .torrent file vs interpretation of... If we saw the word "tomato", and one of us pronounced it "to-may-to" while the other pronounced it "to-mah-to", who's correct? Well, that really is inconsequential, because what was written down was t-o-m-a-t-o -- "tomato", and that's what we should be using to decide what we're all talking about.

Link to comment
Share on other sites

jdrexler: thanks for reporting the incorrect encoding of the extension message.

Your message focuses on the fact that bencoded dictionaries should have their keys ordered. I think we're all agreeing about this. What is still missing is any evidence that suggests that a .torrent file generated by uTorrent has a dictionary whose keys are _not_ ordered. Until anybody can produce such evidence, I will assume that the bug is limited to the extension message.

No, I think it's pretty clear that the broken .torrent files were generated by that private tracker. I believe the "bug in utorrent" that the quote in the initial post is referring to, is the fact that utorrent generates the info_hash from a broken bencode representation, instead of the (IMHO) correct info_hash, so that it would end up in a different swarm than rtorrent peers (and some other torrent clients).

@Ultima: I don't agree on the portability point. Portability is ALREADY lost when the .torrent file is malformed. The only way to reinstate portability is to agree on how to fix the brokenness. At best you can argue with the number of torrent applications/trackers that interpret it one way or another. Apparently from the discussion here, there are more torrent clients that use the correct bencode encoding to generate the info_hash. So following suit would be a net increase in portability, no?

I understand the point about legacy torrent files though. However that can be mitigated by loading the .torrent in any torrent editor and saving it again. Maybe utorrent can offer the option of fixing the .torrent file, instead of rejecting it. Or just fix it internally, as rtorrent used to do. When doing it this way, at least all clients and editors will agree on the right encoding. Except for the one tracker admin who spawned the whole issue maybe. If it's really just one tracker doing this, then worrying about old torrent files is probably useless anyway. The tracker admin should just fix the files and the whole issue would go away. And if you reject broken files then it also wouldn't come back again, because the problem would become obvious immediately when making such a stupid mistake again.

Or you could go for broke and just join both swarms at once... heh.

As a side note, I'm surprised that torrentspy doesn't complain about this either, because the file is clearly broken. Looks like they forgot to implement the "obvious" key ordering check. Well, it's obvious now that someone did it wrong for the first time...

Link to comment
Share on other sites

Edit: To further illustrate my point about .torrent file vs interpretation of... If we saw the word "tomato", and one of us pronounced it "to-may-to" while the other pronounced it "to-mah-to", who's correct? Well, that really is inconsequential, because what was written down was t-o-m-a-t-o -- "tomato", and that's what we should be using to decide what we're all talking about.

One of the reasons that precise and unambiguous language is used in specifications for that exact reason, so we know which ways of saying tomato are accepted. Obviously you can support both ways in the spec, but when someone starts saying 'banana-car' it's it kinda starts getting confusing.

The definition of bencode was terse but precise enough that you can encode and decode as many times as you want without any loss of nor injection of information. It doesn't matter what platform you are on, if you follow the spec there will be no difference between decoded and encoded versions.

info_hash: The 20 byte sha1 hash of the bencoded form of the info value from the metainfo file. Note that this is a substring of the metainfo file.

When the bencode'd stream of data (of which 'physical' files are but one type) is malformed, how does one proceed to interpret such a statement? We do not have a bencoded form of the info value so we got no way of calculating the hash.

Ordered dictionaries are a much more commonly used data structure than unordered ones, and as such make much more sense in terms of easing the implementation of tools that manipulate bencoded data.

Link to comment
Share on other sites

Fair enough. I'm not clinging onto the spec just to say µTorrent is correct. I'm trying to offer a middle-ground that everyone can agree on, and thought the spec's definition would be good enough -- I guess not. Either way, so long as some kind of reasonable agreement can be reached, it doesn't matter to me what's decided upon.

Link to comment
Share on other sites

Archived

This topic is now archived and is closed to further replies.

×
×
  • Create New...