Jump to content

Novel idea for increasing hashing speed


bkman

Recommended Posts

This may well be impossible, but hear me out and give it a think:

Hashing on modern machines is more often than not largely hard-drive limited, and this is quite often due especially to file fragmentation. So my proposal is, if utorrent can look (just as windows system utilities do) at the layout of the fragments, and then hash each piece not in linear order, but in order of which pieces are closer to each other on the HD in order to minimise seek times.

Could this be feasible?

Link to comment
Share on other sites

No, Hashes are order dependant. ie. 1234 is not the same as 2314.

I'm fairly sure that this is not true of Bittorent. Each piece is hashed independantly, hence the order is irrelevant. Although linear hashing would be needed to check against the global torrent hash, I think.

Correct me if I am wrong.

Link to comment
Share on other sites

But wouldn't it at least cut down on the massive and erratic read/writes of the hard disk, paritybit? Or is that better left for NCQ-enabled drives to decide? :? Forgive me, if that is a stupid question...I'm only trying to understand the issue at hand... :)

Link to comment
Share on other sites

Well I assume you mean to hash pieces closer to eachother. Pieces are about say 1 meg each. A lot of the time they will be fairly close to eachother (as they are all part of the same torrent). So you hash pieces that are close to eachother your seek times may be a bit smaller but with modern drives this would be only a few ms at most. It would decrease head movment but chances are these pieces are all near eachother anyway.

Link to comment
Share on other sites

It's an interesting idea, but unfortunately it cannot work.

Yes, it's true that BT hashes are made up of individually hashed pieces. Yes, you can calculate the hash for each piece independently of any other piece.

But there are still two problems that are better shown visually. Let's say your torrent is divided up into quarter-MB (256KB) pieces. If we use a letter to represent 32KB of data, our torrent might look like this:

A1 A2 A3 A4 A5 A6 A7 A8 B1 B2 B3 B4 B5 B6 B7 B8

In order to get the correct hashes for each piece, you must read them in order. Hashing is not commutative like addition, where you can add numbers up in any order and they always come out the same. That is, you must always read A3 before A4 before A5, etc. But since the hashes for each piece are independent, you could do all the B pieces (in order) before you do all of the A pieces (in order).

The first problem is that the piece size and the block size on the hard drive are almost guaranteed to not match. This is a problem because you might end up with several pieces per block (for small pieces and big blocks) or several blocks per piece (for big pieces and small blocks). On the hard drive, our torrent might look like this, grouped by block:

64KB Blocks:  A5A6 B1B2 B7B8 A1A2 A3A4 A7A8 B3B4 B5B6

In this scenario, we'd be jumping all over the place to read the A pieces in order, then the B pieces in order. We could do B without doing A, but it wouldn't gain us anything because we'd still be thrashing the hard drive to do either one.

The second problem only applies to multi-file torrents. Let's break our example torrent up by file:

W.nfo: A1
X.jpg: A1A2A3A4A5
Y.jpg: A5A6A7A8B1B2
Z.jpg: B2B3B4B5B6B7B8

In a multi-file torrent, a single file may span multiple pieces (as our Y file does here) and a piece may span several files (as both our A and B pieces do here). For the purposes of hashing, BitTorrent treats multi-file trrents as one big file, as if each file was appended to the end of the file before it. (Kindof like a ZIP file.) Since the W file is probably not going to

be exactly our piece size (256KB), the X file is appended to it to continue the hashing, and so on.

This becomes a problem because most filesystems still use blocks for files, and start a new block for each file. That means, in this example, that our A1 chunk is split over 2 different blocks, as are our A5 and B2 chunks. This means even more thrashing!

So, having said all of that, here's what I do to help speed up my torrents: · Run a defragger while you sleep or at work. Make sure it is set to not just defrag but to also reorganize to increase free linear space. This will help when you start new torrents because you'll be more likely to get all of your blocks right next to each other. · Set uTorrent to pre-allocate the space for files. (Look under Settings > Torrent Options > Other Settings.) Again, this will increase the chance that your blocks will be side-by-side. It does mean that your torrent will take a little longer to start up, because it's got to allocate all that space, but it means much less thrashing in the long run.HTH, YMMV, etc.

-KB

Link to comment
Share on other sites

knowbuddy: Most of those issues did occur to me, but they could be "overcome" (read: made the best of) with a sufficiently intelligent hash code. Basically, you could do some form of quick analysis to determine which pieces have the lowest "cost" in terms of seek times (even if they can't be done in one read), and prioritise them by that order. Furthermore, with the clever use of a memory buffer situations like the one you describe could be handled efficiently. Heck, if you go all out multi-threading the hashing of two or more buffered chunks simultaneously could be possible.

Of course, I'm starting to realize now that all this amounts is a lot of work from the developer for an uncertain benefit for the purposes of a torrent client. Oh well, I think it is an interesting project to try one day none-the-less. ;)

Link to comment
Share on other sites

A couple of things about this idea I am realising.

1) To know where the blocks are on the disk you would need to do some reading.

2) This reading would amount to a non-trivial amount of time.

3) This time increase would probably incur a larger penalty then just hashing the files in order to begin with.

Link to comment
Share on other sites

Archived

This topic is now archived and is closed to further replies.

×
×
  • Create New...