Improved auto-ban system idea (Part 2)!

Lexxington · May 30, 2006

WRT the auto-ban system which does not work properly, I'd like to suggest an idea which would probably be more effective and easy to implement not only for uTorrent but for other BT clients too.

The problem is, that because multiple peers can be involved in one particular piece, they can mistakenly be banned along with the REAL corruptors of the torrent when they are in fact completely innocent!

I suggested an idea before which obviously has been overlooked for whatever reason but I have an improved idea which is more elegant anyway.

Yet again this has been prompted by a 7.5Gb torrent using 4 Mb pieces (I wish people would NOT do this and use a max of 512 kb or so) and around 25 sources of which only 3 are seeds. uTorrent banned 2 of the seeds and 8 other peers when in fact (by a process of elimination) only 2 peers were actually corruptors, and none of those were seeds naturally enough.

Anyway my idea is this:

You download as normal until a hash fail occurs.

We now have a suspect list of possible corruptors (those involved in that corrupted piece).

Now, unlike my previous idea, we do something very very simple indeed!

We take each of those peers and we re-ask them for (say) 4 x 16 kb blocks from a piece we have already downloaded and hash-checked with the meta-info and of course which they have Duh.

Now, because we know the hash for the piece is correct, we can IMMEDIATELY compare the SHA-1 hashes from the requested 16 kb blocks as they arrive against the known (pre-calculated) hashes and thus very very quickly indeed see just who is sending corrupted blocks and who isn't!

EDIT:

In fact it's not even necessary to do a SHA-1 (or any other hash check) on the re-requested blocks. All we have to do is a binary compare with the already downloaded data as a 100% verification. This makes things even simpler and less CPU intensive!

N.B I'm talking about the 16 kB BLOCKS not PIECE sizes here. In other words this is an 'extension' of the BT protocol which we apply in extenuating circumstances.

I suggest the beginning block, the end block, and 2-3 random blocks from a piece?

The point is that we do NOT need to assign the suspects to a whole PIECE each (my previous idea) but to a mere 4-5 16 kb blocks only, meaning that we can VERY VERY quickly indeed filter out the corruptors from the innocent peers!

I'm sure this could be easily incorporated and it's really just a case of laziness NOT to do something like this, as the current system is to be perfectly frank absolute crap when it tends to ban your BEST peers.

Any comments?

Âµtorrent-Guest · May 30, 2006

the idea sound interesting!

A thought worth to be thinking about!

But how does this idea would look like in programming language?!

Edit:

Vote: +1

schnurlos · May 30, 2006

I think this should be ludde's problem to get this into utorrent. But sounds great and possible!

1+

winMX_67 · May 30, 2006

+1

Switeck · May 30, 2006

"Peer review" -- any chunk that fails hash puts all the peers/seeds that contributed on probation.

They are then checked versis an already-complete chunk, as mentioned above.

They could even all be asked for the same chunk for real fun.

Then, corruptors might switch to corrupting only "1 out of X" 16 KB pieces to bypass this check.

Firon · May 30, 2006

Improving the ban system is difficult to do, and is the reason why basically all the clients do it the same way right now.

Lexxington · May 30, 2006

Improving the ban system is difficult to do, and is the reason why basically all the clients do it the same way right now.

Why is it difficult?

If the idea above is sound, programatically it should be easy to implement?

As a programmer myself (not Windows though) I don't see the difficulty.

If the beta had more debug options I would be glad to put the idea to test.

The only problem I envisage is something like Switeck alluded to i.e. if a corruptor is not sending corrupt data for every block they send.

Also, by 'corruptor', I mean someone corrupting both intentionally and unintenionally as in the case of routers doing weird things.

BitComet doesn't seem to ban at all at least in the 0.6 version I use. The ban system in uTorrent is worse than none at all though as it stands at present. Presumably Bit Tornado uses the same method? Never used Azureus (without it crashing) so no idea about that.

However, I was pleased to see a 'reset bans list' in the latest beta, but you still have to constantly 'babysit' the program manually to stop it from banning your most productive peers!

Firon · May 30, 2006

The only hashes available are for the entire piece, not the 16K chunks. Adding hashes for every chunk would make the torrent HUGE.

Lexxington · May 30, 2006

The only hashes available are for the entire piece, not the 16K chunks. Adding hashes for every chunk would make the torrent HUGE.

AHH,

No, you misunderstood the original post Firon!

Yes, of course adding hashes for the 16 kb blocks to the meta-info would be completely impractical as you say, but that's not what my idea is based on.

When a piece is downloaded fully whatever it's size, and it passes the hash check, then we can subsequently calculate the SHA-1 hash for any of the 16 Kb blocks that make up that piece.

If we then ask a 'suspect' for those blocks, we know for CERTAIN if they are corrupt or not as soon as they are received by us. We don't have to download a whole PIECE, just a few 16 kb blocks.

You see the difference now?

Obviously we can't do that for pieces we don't already have, as there is no way of knowing (in advance) whether the blocks we are receiving are corrupt or not because we don't have the hashes for them.

More generally, we can calculate SHA-1 hashes for any length of data from the pieces we KNOW are good.

Incidentally, you can expand on the above idea to reduce data wastage when a piece fails the hash check when there are x peers involved, by only discarding blocks from peers known to have caused corruption and keeping the other blocks.

A simple table of peer IPs vs block numbers should save an awful lot of wasted data, once the bad eggs are thrown out! i.e You'd only need to re-download the missing blocks, NOT the whole piece from scratch.

Firon · May 30, 2006

But since we've already told the other peers that we have that block, I don't think you would be able to re-download it. You can't un-advertise the piece either (unless you disconnect and reconnect).

Lexxington · May 31, 2006

But since we've already told the other peers that we have that block, I don't think you would be able to re-download it. You can't un-advertise the piece either (unless you disconnect and reconnect).

If I understand the protocol correctly, advertising that you have a piece using the 'have' message is for the benefit of peers who may want to download FROM you.

There's nothing to stop you from re-requesting blocks for a piece you already have from other peers as far as I can see from looking at the protocol.

Switeck · May 31, 2006

If hostiles start re-requesting blocks then how do you fight that without fighting your double-checking scheme?

I'm not saying hostiles currently do this...well, besides D-link routers in 'DMZ gaming' mode. :lol:

Dark Shroud · May 31, 2006

BitTornado does have a ban mode for peers that send bad data, I've seen in work.

Firon · May 31, 2006

It blocks after 3 hashfails (compared to µT's more lenient 5)

Lexxington · May 31, 2006

If hostiles start re-requesting blocks then how do you fight that without fighting your double-checking scheme?
I'm not saying hostiles currently do this...well, besides D-link routers in 'DMZ gaming' mode.

Well, one thing at a time eh? Let's get the auto-ban working properly first before worrying about it being used against itself!

Besides, they may already be doing this to waste your bandwidth.

The BT protocol relies a lot on trust anyway as do most p2p protocols. For example the 'bitfield' message sent immediately after the handshake, tells other clients what pieces you do and don't have. The protocol relies on that to be 'honest' so that you know which pieces to request blocks from...

BTW, I've seen the auto-ban system work just as badly on very popular 256 kb piece size torrents, where it's banned lots of innocent peers for no reason other than the bad method which it uses.

It'd be nice if it was sorted out anyway surely?

So, Firon what's the verdict then?

Other people here seem to like the idea also. Could you maybe have a word with ludde PLEASE?

Kazuaki Shimazaki · May 31, 2006

Yet again this has been prompted by a 7.5Gb torrent using 4 Mb pieces (I wish people would NOT do this and use a max of 512 kb or so) and around 25 sources of which only 3 are seeds. uTorrent banned 2 of the seeds and 8 other peers when in fact (by a process of elimination) only 2 peers were actually corruptors, and none of those were seeds naturally enough.

Try manually unbanning the seeds? Maybe a switch can be added so seeds are never banned, no matter what.

Now, unlike my previous idea, we do something very very simple indeed!
We take each of those peers and we re-ask them for (say) 4 x 16 kb blocks from a piece we have already downloaded and hash-checked with the meta-info and of course which they have Duh.

Very likely, your block test will come back clearing every one of your suspects. A good corrupter does not ruin every piece, and not even every block in a piece he decides to ruin. It is too obvious, gets him banned too quickly from everyone, and is hardly necessary. With a program, just ruining any one piece is enough. Even with a video, while one missing piece may only be a jerk on the screen, ruining say 5-10% of the pieces will almost certainly render it totally useless.

Now what do you do when the suspects all pass. Go for a witchhunt and request yet more pieces?

Your investigative method also exacerberates losses. Like the old method, you still have to replace your failed piece. Now, you pay extra because you are requesting data you already have. Say the piece is only 512K like you wanted. Say 8 peers contributed to your failed piece. In this case, you requested 64*8=512K data - one whole piece.

I'm not sure how algorithms work in BT programs right now, but a reasonable thing to do with a re-request of a piece he knows you have via the Bitfields and Haves is to put it low in the pile, if it is even accepted at all. Which means you may wait a good long time before he gets around to it (I've had days when clients didn't respond to my data requests for over 30 minutes), and what do you do with him during this time.

Lexxington · May 31, 2006

Yes, you made some fair points there KS thanks.

Unfortunately, without being able to test the idea, it's impossible to verify that a corruptor is or is not sending corrupted data in every block he sends.

Also, unbanning seeds (if the corruption is deliberate) may not be such a good idea. If a corruptor sends a 'have all' message then he's free to send any old data to you.

Ok, for your other question. If the suspects come back all clear, then you assign EACH of them to a whole piece that you don't have and see what results. Any pieces that pass the hash checks then, eliminate those suspects from the lists and any that fail get banned. i.e. my original idea. If the corruption comes back again, you resort to block checking again first.

What we most definitely should not be doing is using the system as it is NOW, 'cos it just does NOT work effectively!

What I am forced to do now with poisoned torrents, is use uTorrent to get an idea of the corruptors then selectively add the suspects to a custom Peer Guardian's block list and see what happens.

What I've found is, that the 'suspects' with the most downloaded data are rarely the true corruptors, but uTorrent bans them anyway, which is as good as useless as they're the fastest peers!

After doing all this tedious and time-consuming nonsense I go back to downloading in BitComet (which doesn't ban the peers) and see if the corruption is cleared up, or alternatively just stick with uTorrent, but then I have to keep an eye on it to make sure it hasn't banned any useful peers. A real PITA.

Torrents that use 4 Mb piece sizes can add up to a HUGE amount of wastage. On my 7.5 Gb torrent over 1 Gb was wasted data due to hash fails caused by just 2-3 corruptors!

It's not really fair to compare that to a 512 kb piece size vs redundant block checking then, as the amount saved if only the corrupted blocks had to be downloaded could be pretty substantial.

Switeck · May 31, 2006

I'm trying to throw out possible scenarios because hostiles are CERTAIN to evolve to fight this improvement. Also, false positives need to be minimized -- as are false negatives!

Doing tests on every suspected corruptor is unneccessary if you're on a torrent with 100+ Peers+Seeds so long as seeds are common enough. µTorrent could ban every potential corruptor and still have plenty of sources to work with. There's little reason to use additional bandwidth testing already-downloaded chunks when it's "cheaper" bandwidth-wise to just connect to more peers+seeds.

It is only the torrents that are poorly seeded or have few peers that need this improved auto-ban system.

Lexxington · June 1, 2006

Thanks for the input Switeck.

Well at least you and some others here actually take the idea seriously and can understand the principle and see the possibility for improvement, which is more than I can say for some of the apparently closed minded attitudes of the moderators!

Whilst you're probably right about torrents with lots of seeds and peers, I don't see the point of unnecessarily penalising innocent peers who would suffer after being banned by not RECEIVING any data from you either.

Incidentally, I realised today that doing a SHA-1 hash check for the re-requested blocks would be redundant and time-consuming.

As we already KNOW that every byte of the piece is good, when re-requesting blocks we can just do a binary compare against the known good data as the blocks arrive. No need for hash checking there!

Having completed my 7.5 Gb torrent, what I'm going to try to do now, is let the corruptors back in to the swarm and log their traffic in detail and check the blocks against downloaded data using Hex Workshop to see if these peers are corrupting continuously.

The only thing is, I don't know if uTorrent will actually enable that, as I don't know if with write cacheing switched off whether the incoming data is written to the temp .dat file or straight to the actual file before the piece is complete?

Anyone know?

Firon · June 1, 2006

With write caching off, it writes straight to the file (though it only writes in 16KB chunks).

Lexxington · June 1, 2006

That's exactly what I need to check the idea then.

As soon as I get notification from the program that there's been a hash fail, I can stop the download and check back through the logs and do a binary compare in Hex Workshop with the blocks from the known corruptors to see if they're doing it ALL the time or just some of it.

Be interesting to see anyway.

Thanks for the info.

Switeck · June 1, 2006

BTW, don't quote the whole previous post -- it spams up the message thread too quick.

You could probably just pause the download instead of stopping it. You might even retain connections that way, so as to make confirming corruptors easier.

...or you could write down corruptors' ip+port and try to add them to the torrent to see if you can reconnect to them -- there might even be a way to do that WITHOUT connecting to other peers! (block the tracker ip?)

Lexxington · June 1, 2006

Well, disaster.

Torrent has been removed from the tracker and DHT is not picking up the rogue peers!

If anyone can send me some links for alternative rogue torrents preferably not TOO large then it'd be appreciated.

Merged double post:

Hi, sorry for quoting the whole post.

Good ideas there. I tried adding the known corruptors back manually, but no go.

IIRC when I was seeding last night I noticed the blocked IP in Peer Guardian was different. The last two digits I mean, so I suppose they must be on dynamic IPs.

Switeck · June 1, 2006

They may not be dynamic ips -- when it's a company dedicated to corrupting torrents, they often work out of a whole ip range with each corruptor using a different ip that varies the last number. I am especially wary of ANYTHING in the 38.x.x.x range, due to my experiences in the past on the Gnutella network.

Lexxington · June 1, 2006

Yeah, 38.x.x.x seems to ring a bell somewhere.

Well, I'm not having much luck here still. The 4 Mb piece torrent has come alive again on the tracker but none of the corrupting peers have surfaced since I've been logging!

Think I might try an old Battlestar Galactica torrent that was pretty badly corrupted and see how that goes.

Improved auto-ban system idea (Part 2)!

Recommended Posts

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Archived