Availability and new torrents

Recommended Posts

I thought I understood the BitTorrent concept, but µTorrent shows me that I don't. Sorry about the ramble, but...

So far, so good. Obviously, the original uploader needs to upload 100% of the file, and NO OTHER Client has to do this (a good user will upload > 150% of what was downloaded, but that might be 100 copies of the same piece).

Further, Clients should implement a 'rarest first' algorithm, so that they attempt to download those pieces that fewest peers have, to more quickly share the entire Torrent. At first, this implies that the poor uploader will get hit on a lot, since all pieces are equally rare, so there are probably back-off rules to allow Clients to download less-rare pieces too.

What I don't understand is, why do I see so many Torrents that have only a few seeds (none of which I'm currently connected to), literally 100s of other peers, and an availability of <1? The availability is often marginally greater than the highest peer percentage. If I am connected to seed/s, the availability is often (Num seeds + fraction of highest non-seed percentage).

I'm assuming here that the whole number part of the Availability field for a Torrent is the availability of the rarest piece - that is, how many peers out there have the rarest piece. The fraction part would be some figure on how distributed the remaining pieces are.

An example: The torrent is made up of 100 pieces, there is only one seed (presumably the originator), and there are 200 peers. Assuming purely random selection of pieces (isn't probability fun?), after everyone has downloaded ONE piece, surely availability would be nearly 2 (plus 1 more for the seed)? After everyone has downloaded TWO pieces, availability should be nearly 4? After all, when everyone has finally downloaded 100 pieces, availability should be 200 (all 200 peers have all 100 pieces).

I understand that this doesn't allow for ultra-fast downloaders uploading quickly to others - original seeders would often limit their upload rate. And this example doesn't allow for peers arriving and leaving, but I hope you understand: simply put, when the availabiility for a Torrent is less than one, why doesn't µTorrent try to connect to the Seed(s) more, where the rarest pieces (by definition) are?

Share on other sites

Because when there's few seeds, there's no connection slots for them to support you or any other new peer? Everyone has a finite number of connection slots.

You don't need to be connected to seeds, or everyone in the swarm to finish. As long as you're connected to SOMEONE, you'll finish the torrent if overall in the swarm avail is >1

Simple as that.

Share on other sites

Further, Clients should implement a 'rarest first' algorithm, so that they attempt to download those pieces that fewest peers have, to more quickly share the entire Torrent. At first, this implies that the poor uploader will get hit on a lot, since all pieces are equally rare, so there are probably back-off rules to allow Clients to download less-rare pieces too.

Well, clients do have this

An example: The torrent is made up of 100 pieces, there is only one seed (presumably the originator), and there are 200 peers. Assuming purely random selection of pieces (isn't probability fun?), after everyone has downloaded ONE piece, surely availability would be nearly 2 (plus 1 more for the seed)? After everyone has downloaded TWO pieces, availability should be nearly 4? After all, when everyone has finally downloaded 100 pieces, availability should be 200 (all 200 peers have all 100 pieces).

If there are 1 seed with all pieces (eg 100), and 200peers with 1 same piece, than avialability will be 1.01. If there are 1seed with all pieces and two peers of witch one has 40% of pieces, and other has 60% (different than first peer) pieces, than avialability will be 2.00

Share on other sites

Unfortunately, real-world conditions make the amound of bandwidth required to do this not equal 100% of the torrent's total size.

What I don't understand is, why do I see so many Torrents that have only a few seeds (none of which I'm currently connected to), literally 100s of other peers, and an availability of <1? The availability is often marginally greater than the highest peer percentage. If I am connected to seed/s, the availability is often (Num seeds + fraction of highest non-seed percentage).

You don't need ANY seeds once TOTAL availability across the ENTIRE swarm is >1. You don't need to worry about the availability until no peer you're connected to has a piece you don't.

I'm assuming here that the whole number part of the Availability field for a Torrent is the availability of the rarest piece - that is, how many peers out there have the rarest piece. The fraction part would be some figure on how distributed the remaining pieces are.

It's not. A whole number is 1 copy of EVERY piece.

An example: The torrent is made up of 100 pieces, there is only one seed (presumably the originator), and there are 200 peers. Assuming purely random selection of pieces (isn't probability fun?), after everyone has downloaded ONE piece, surely availability would be nearly 2 (plus 1 more for the seed)? After everyone has downloaded TWO pieces, availability should be nearly 4? After all, when everyone has finally downloaded 100 pieces, availability should be 200 (all 200 peers have all 100 pieces).

It doesn't work that way, because the instant a peer has a piece, that piece gets shared to the peers it's connected to, so you're looking at a worst case scenario of 1.002 (the .002 is the 2 pieces given by the seed, everyone downloaded those from each other) You won't even be close to a full copy excluding the seed by the time every peer has downloaded 1. You'll be extremely lucky if there's a full copy by the time every peer has downloaded 2, and probably not at 2 distributed copies by the time everyone has downloaded 3.

simply put, when the availabiility for a Torrent is less than one, why doesn't µTorrent try to connect to the Seed(s) more, where the rarest pieces (by definition) are?

Because of the following:

1> It doesn't know which peers in its peer cache are seeds to connect to them.

2> Connecting to seeds won't guarantee better speeds.

3> There is no benefit in everyone putting strain on the seeds to get pieces.

4> Seeds aren't necessarily the only source for the "rarest" pieces.

5> Pieces will eventually relay the rare pieces outwards towards the younger peers.

Share on other sites

2> Connecting to seeds won't guarantee better speeds.

In "real-world conditions", like you've said, seeds often gives even lower speed than peers. Two-way (download and upload) connection to peer often gives highest speeds.

Share on other sites

You don't need ANY seeds once TOTAL availability across the ENTIRE swarm is >1. You don't need to worry about the availability until no peer you're connected to has a piece you don't.

I don't believe "until no peer you're connected to" is correct - those peers might in turn be connected to peers that do have the piece (unless you were referring to this indirect connectivity). I often have it that I'm connected to 0 of 3 seeds and the availability is 0.867. That's what prompted this post.

I guess I'm worried that a/the seed will disappear (either be brought offline too early, or through network problems). I waited for over a week at 87% (with +/- nine other peers also at 87%) before someone reseeded.

I'm assuming here that the whole number part of the Availability field for a Torrent is the availability of the rarest piece - that is' date=' how many peers out there have the rarest piece. The fraction part would be some figure on how distributed the remaining pieces are.[/quote']

It's not. A whole number is 1 copy of EVERY piece.

I think I said that, only in a different way. If the rarest piece is available 7 times, then by definition every other piece is available 7+ times.

Simply put' date=' when the availability for a Torrent is less than one, why doesn't µTorrent try to connect to the Seed(s) more, where the rarest pieces (by definition) are?[/quote']

Because of the following:

1> It doesn't know which peers in its peer cache are seeds to connect to them.

2> Connecting to seeds won't guarantee better speeds.

3> There is no benefit in everyone putting strain on the seeds to get pieces.

4> Seeds aren't necessarily the only source for the "rarest" pieces.

5> Pieces will eventually relay the rare pieces outwards towards the younger peers.

I agree with 2>, 3> and 5>, and 4> except in the early stages of a Torrent, but I don't understand 1>. I assume this is more than just an implementation detail of µTorrent, it's part of the protocol definition?

I guess I'm just wary of a new Torrent with only a very few seeds, since you're at the mercy of those seeds. It also seems to me that almost by design the early stages of any Torrent has all peers being nearly at the same download percentage, because it's quick and easy for them to share amongst themselves those pieces 'grudgingly' doled out by the seed.

What you see as you watch a Torrent download (strangely compelling: as a previous poster said, "almost fun"!) is that all the other peers are slowly growing in percentage, as are you, but the availability of the Torrent doesn't grow exponentially. This must be because of the seed's 'reluctance' to upload compared to the downloaders' 'eagerness' to do so, causing the whole swarm's percentage to grow only as fast as the seed doles it out. Once there's more than one seed, things start getting a move-along. As the Torrent dies off, though, and seeders abandon the Torrent, I'm not sure what happens. Maybe my above-described 87% Torrent was a tail-ender and some re-seeder felt sorry for us!

Share on other sites

but I don't understand 1>. I assume this is more than just an implementation detail of µTorrent, it's part of the protocol definition?

Peerlists returned from trackers/dht don't include information about how complete the peer is (too much overhead). No client knows which peers it gets from the tracker are seeds and which ones are downloaders.

Share on other sites

The slow growth of a torrent phenomenon, which I have seen myself many times, is caused by the initial seed having a slow connection. He provides pieces out to others, who then share amongst themselves and pass onto other new users. Because his ability to pass out new peers is slow, the total growth is slow. However, if the useres aree willing to go to 1:1, the torrent tends to go from an availabilty of 0.99 to 17.99 all at once, and then continue to grow like that.

You mostly just have to live with it, or only download torrents with fast initial seeds, or that have all ready moved on past the initial seed.

The final problem, is that even though clients prefer rare-er pieces, it's not perfect of course. It's quite common to need to upload 150% or higher of the total size of a torrent before a complete set of all the pieces are delivered. Super-Seeding was created to get around this (though I wont discuss the technical details here), when the initial seeder is slow, it can usually create new peers at only 110% upload and occasionally even less. Of course, when there is more than one seed, or that seed is very fast, super-seeding doesn't help (and can make things worse) but in those cases, it doesn't matter, as the torrent will grow anyway.

Edit - And I've spent three days waiting at 99.8% of completion, it's just the nature of seeds with low upload bandwidth, combined with not giving out that 'last' piece. As soon as they did, everyone on the torrent finished very quickly.

Part of the problem I think, is that people start seeding as the initial seed, while still downloading other torrents, so only a small part of their bandwidth is available.

Share on other sites

but I don't understand 1>. I assume this is more than just an implementation detail of µTorrent' date=' it's part of the protocol definition?[/quote']

Peerlists returned from trackers/dht don't include information about how complete the peer is (too much overhead). No client knows which peers it gets from the tracker are seeds and which ones are downloaders.

Then how does µTorrent display the peers' percentage complete? If it's done using a different technique, can't the two sources be amalgamated somehow?

Share on other sites

µTorrent gets the percentage complete as part of the peer connection handshake.

No the two can't be amalgamated. That would put way too much strain on the tracker/dht network.

http://wiki.theory.org/BitTorrentSpecification

Share on other sites

Just to clarify what DWK said: The only info provided by the tracker is the peer IP:port. Once you contact the peer, you get all those details from them (percentage complete, BT client, etc etc).