Archived

This topic is now archived and is closed to further replies.

rafi

reduce uTP PPS by Path MTU discovery (for 2.04)

Recommended Posts

During the recent 2.03 beta development, there was an attempt to try and minimize uTP packets' fragmentation, by forcing the DF flag . The motivating was to reduce the increase in PPS caused by this fragmentation.

This seemed to me a bit problematic and premature (to say the least) w/o some form of PMTUD to go with it.

Since connection-oriented PMTUD for uTP is intended to be added in the next 2.04, I'll list down here some ideas & questions. I hope they can trigger more thoughts on the issue by others.

Some points were posted in this forum-thread : http://forum.bittorrent.org/viewtopic.php?id=119

Some other issues/points:

1) Is using ICMP type 3 code 4 (datagram too big) on Windows possible at all under regular user privilege ? I couldn't see those incoming packets at all in Wireshark from the Internet. Only from my local router/gateway .

2) Provide a setting for the MTU (default to 0 = auto detect)

3) On startup try and check the MTU to the local gateway and set it as the start size (if the settings is 0)

4) If using a fragmentation flag = don't fragment… - start with the above size - (as in 3) , and modify the uTP data-retransmit mechanism. In case you see that the connection was OK, and just the data wouldn't go through – use the second try with the DF flag reverted (=do fragment), and try again. If the peer responds – set DF again, and fall-back from the initial size until you get a response.

5) It is good to take stats of fragmented packets count in 2.03/2.02, so to be able to compare to the future performance of 2.04 .

What are the numbers right now ?

Share this post


Link to post
Share on other sites

I have some more points to add. Regarding ICMP 3-4, Windows Wireshark is known to have some limitations in what it can capture, so I tested large pings on it and a Linux box. The Linux box showed that when trying to ping an AT&T DSL node, an intermediate AT&T router sent an ICMP just as expected, and also included the max supported MTU in the message (1492 (which may be non-optimum)). Windows ping on the other hand simple gave a "request timed out" and Wireshark didn't show any incoming ICMP. No software firewalls. Windows is known to perform PMTUD so this seems to be a failure on the OS's part.

I've heard plans to use these messages, it seems as though the only way to do this is with ICMP raw sockets, which need admin privileges (and hopefully actually show the relevant messages). Is this correct? What about Vista/7 where constant admin access is discouraged?

Also, for reporting your MTU to others, can you access Windows PMTUD data somehow? If something more appropriate doesn't exist, I found some code, which seems to indicate a packet too big error can be returned when using ping functions in an application. If that does work you could ping a host like bittorrent but with a TTL of 3 or 4 so that it only travels a few hops onto the ISP network, testing whether something like an ISP supplied modem/router has a low MTU that might get in the way. If that could then be communicated in uTP a connection could use the lowest of the two values. This is roundabout but I'm not finding a more appropriate method in my searching with regards to Windows.

Share this post


Link to post
Share on other sites

We will only implement the ICMP method. It's too difficult to do anything else. There's no known TCP implementation that does anything but the ICMP method (at least by default).

Share this post


Link to post
Share on other sites

Certainly unclean, read on some random website though that Microsoft did it at one point on their website (not the OS's though), with a large size reduction for the retry. Grain of salt.

Share this post


Link to post
Share on other sites
Firon:

We will only implement the ICMP method

More issues/questions to consider:

- Does Wine fully support this functionality ?

- Does Mac fully support this functionality ?

- Does all Windows X (7/Vista/XP) support it, and at the same privilege level ?

The point is - that maybe it's best to stick to the application level with the implementation/debug/test (like #4 above) when aiming for mutli-platforms/OSes... :)

Share this post


Link to post
Share on other sites

All my searches say that admin privs are needed for the ICMP method, which would mean Vista and Win 7 users are screwed by default. I also looked up Wine details and icmp.dll comes up as 75% complete, rawsocket support (needed for ICMP method support afaict) does not appear to be among the available functions.

Share this post


Link to post
Share on other sites

There are two main RFCs describing how to do PMTUD in TCP. The original one (RFC 1191), relies entirely on ICMP messages. The second one (RFC 4821) describes how to also take packet loss into account to guess PMTU.

My understanding is that no main stream TCP implementation implements the latter.

The reason why you would like to have the loss based PMTUD is because there are firewalls that block ICMP messages.

Our implementation in uTP sends one probe packet every RTT. A probe means that the dont-fragment-bit is set. It uses an ICMP socket (SOCK_RAW, IPPROTO_ICMP) to receive packet too big messages, and adjusts the assumed PMTU.

If we experience packet loss specifically for the probe packets, that is taken as a signal to lower the assumed PMTU as well.

If you can receive ICMP messages, the MTU will converge very quickly. If you can't it will take some loss and a bit longer.

Share this post


Link to post
Share on other sites

Will MTU be dynamically adjusted when path MTU has changed ?

How can you tell if communication breaks temporarily due to MTU change, or some other unrelated permanent reason ?

What would be the overhead (sending redundant probe/packets) ?

More details on the above, and on how/when the DF flag is being changed during the connect period would be nice.

Share this post


Link to post
Share on other sites
Will MTU be dynamically adjusted when path MTU has changed ?

Yes, as the specification above, we send probe packets regularly (once every RTT). Which means that we'll adjust to changing PMTUs very quickly.

How can you tell if communication breaks temporarily due to MTU change, or some other unrelated permanent reason ?

We distinguish the case where _only_ the probe packets were lost, and no other packet. If communication breaks for other reasons or just congestive packet loss, chances are that some other packet is lost, or all packets in the RTT are lost.

What would be the overhead (sending redundant probe/packets) ?

There's no overhead at all in the stable state. Each probe packet that's lost will constitute some overhead. This is relatively small and I would imagine that it in practice ends up being a handful of packets per connection. I have not measured this myself.

More details on the above, and on how/when the DF flag is being changed during the connect period would be nice.

I'm not sure what you mean by "the connect period". We set the DF flag on one packet, once that packet is ACKed we set it on the next packet we send. When a probe is ACKed we look at its size and we adjust our MTU search range accordingly. If we receive duplicate acks, suggesting that a packet was lost, and the packet was a probe, we adjust our MTU search range again by capping it below the size of the probe packet.

Share this post


Link to post
Share on other sites

If you can do it ok in the code, probing around specific sizes first may be a nice optimization. 1460 or something for DSL? Also whatever MTU results from various VPN tunnels.

That aside, proposal looks pretty good.

Share this post


Link to post
Share on other sites

@arvid:

what was the end result , compared to w/o PMTUD ? I mean:

a. what was the average packet size compared to the fixed max - 1444 we have now ?

b. When you ran with and w/o it - how many fragmented packets did you monitor in both cases ? (any noticeable difference in PPS ?)

Share this post


Link to post
Share on other sites

It'll definitely be better with pmtud, rather then using a fixed length it'll detect the optimal size.

Don't know if you accept MTU suggestions from ICMP, AT&T and probably others though will recommend MTU's, 1490 in the case of the AT&T DSL network.

Share this post


Link to post
Share on other sites