Jump to content

1.7.7 kills network stack


ukhobo

Recommended Posts

1.7.6 was fine. I upgraded to 1.7.7 and started having intermittent complete network stack death only solved by a reboot. Reverted back to 1.7.6 and the problem is resolved.

This is a problem with the PC network stack, not my router or other network hardware.

I'm running uTorrent on Windows Vista Home Premium on a Acer Aspire 5630.

Link to comment
Share on other sites

I don't run a firewall on the PC; I rely on my router firewall.

I run 1.7.6. and don't have a problem (I'm going to stick with that for now - I might try later releases after 1.7.7). When I run 1.7.7, having not changed anything other than the uTorrent version, the network stack dies sometime later during the uTorrent session.

This only happens when 1.7.7 is running, has never happened before 1.7.7, doesn't happen with 1.7.6 and never happens when uTorrent is not running. Network connections stop responding and any attempt to manually restart the network services fails. Only a reboot revives the network stack.

I hope this helps track any potential issue but as far as I'm concerned I'll stick with 1.7.6 for now because it works for me without any problems.

Link to comment
Share on other sites

I have the same kill-the-stack behaviour, although I believe I've had it with previous versions (175, 176, and possibly before), but never took the time to look into it with Wireshark.

I run utorrent 1.77 on XP Pro SP2, firewall enabled in WRT54GL+Tomato 1.13, disabled on the PC. The router (Tomato) is configured for nat loopback (so connecting to my public IP address on a forwarded port will forward the connection back to my LAN). TCPIP.SYS is patched to increase connection rate to 50/s. Round trip latency to the router, 192.168.20.1 is just over 0.5ms.

I have about 100 torrents loaded, with a setting of max 2 active, and max 2 upload. global max num of connections/max connected peers per torrent/upload slots/additional slots if <90% = 10 / 8 / 4 / check. I have a recent copy of ipfilter.dat installed and working.

These are very conservative settings, to the point of being impractical, but even so, I experience accumulation of connections in the FIN_WAIT_2 state at a rate of about ~2/minute. My system becomes unreachable by other computers on the LAN when the number of FIN_WAIT_2 connections reach about 8500. All the FIN_WAIT_2 state connections belong to the utorrent.exe's PID.

Normally, my settings are much more liberal, such as 200 global connections, with 16/torrent, and up to 30 active torrents at a time. These settings generate considerable hammering, and the list of FIN_WAIT_2 connections grows much faster.

I have captured some packets with wireshark and I have corellated a packet set to a connection that ends up stuck in FIN_WAIT_2, and cause and effect is apparent.

It looks like after sending FIN, utorrent does not accept any further data from the remote end, and this causes the TCP stack to be stuck.

At the application level, both sides have to keep reading the socket until they receive EOF from the other side, even after they've closed their end of the connection.

Unfortunately, in Windows, even closing the process that opened the stuck sockets does not cause the data in the kernel to be flushed and the connections to be reset. George Elgin points this out, along with a solution, at

[ http://www.developerweb.net/forum/showthread.php?t=2940 ]

Here's the Wireshark packet trace:

No. Time Source Destination Protocol Info

4203 22.749687 192.168.20.1 192.168.20.151 TCP 1229 > 15100 [sYN] Seq=0 Len=0 MSS=1446 WS=2

4204 22.749748 192.168.20.151 192.168.20.1 TCP 15100 > 1229 [sYN, ACK] Seq=0 Ack=1 Win=262140 Len=0 MSS=1446 WS=2

4208 22.750403 192.168.20.1 192.168.20.151 TCP 1229 > 15100 [ACK] Seq=1 Ack=1 Win=186532 Len=0

4209 22.750609 192.168.20.1 192.168.20.151 BitTorrent Handshake

4210 22.750634 192.168.20.151 192.168.20.1 TCP 15100 > 1229 [FIN, ACK] Seq=0 Ack=0 Win=46616 Len=0

4214 22.751208 192.168.20.1 192.168.20.151 TCP 1229 > 15100 [ACK] Seq=0 Ack=1 Win=46633 Len=0

4215 22.751442 192.168.20.1 192.168.20.151 TCP 1229 > 15100 [FIN, ACK] Seq=0 Ack=1 Win=46633 Len=0

4216 22.751452 192.168.20.151 192.168.20.1 TCP 15100 > 1229 [ACK] Seq=1 Ack=1 Win=46616 Len=0

Note that frames 4209 and 4210 are received and respectively sent at the same time, meaning that the data packet 4209 was received at .151 as it was closing the connection. If the application simply close()'s the socket and stops the thead, without waiting for EOF from the remote _or_ shutdown() both sides of the connection, this frame, 4209 will remain in windows' kernel, and the connection will remain in FIN_WAIT_2. Arguably this is the right, but nearsighted way to guarantee no data loss, which is what TCP is intended to provide.

It is a little more difficult to catch this race condition on connections across the net, but the mechanism is the same. It happens more often with clients that try to open multiple connections from the same IP. Apparently utorrent closes those immediately after accept(), ie at the same time as the first bytes of data arrive from the peer.

Side effects:

Through an interesting combination of side effects, my setup guarantees the very quick accumulation of stuck connections:

- ipfilter.dat lists the 192.168.x.x range, so any connections from the router are automatically closed down

- when connecting to my public IP from the NAT side, the router loops back the connection, but rewrites the source IP to look like the connection is coming from itself.

- Since utorrent blocks 192.168.x.x, it blocks its own outgoing connections to my public address (looped back by the router)

- so utorrent doesn't know what my public address is, but it thinks that the peer with my public address is always closing connections, so it keeps trying often.

- It looks like utorrent PSH'es the first packet of data immediately after receiving SYN-ACK from the peer, which guarantees that when the peer hangs up due to ipfilter.dat, it is guaranteed to leave that data in the OS's kernel, as explained above.

- So, every attempt to connect to itself does nothing but to add a a FIN_WAIT_2 connection.

- Theoretically, there are 65535 outgoing ports from any IP address, so once all of them are in FIN_WAIT_2, no more connections can be made.

- Practically Windows craps out way before the theoretical limit. (8500 in my experience).

Long story short, I believe there are 3 bugs:

1. no shutdown() or read() till EOF on connection teardown

2. Repeatedly trying to connect to itself. "Disconnect: Same ID" should only happen ONCE.

3. The ipfilter.dat list blocks the non-routable IP ranges (192.168.x.x and 10.x.x.x)

Please fix it.

Kimmy

--------------

Incidentally, I now use the following settings which which I'm having much better luck. For the benefit of others looking for a quick, temporary workaround:

1. delete the 192.168.x.x and 10.x.x.x ranges from your ipfilter.dat if you use one, and/or disable NAT loopback if you have it enabled on your router.

2. Set bt.allow_same_ip to true in Preferences/Advanced.

3. Increase the max global connections to a sane but large number. I am having good luck with max global connections of 2000 and also 2000 per torrent. (the max active torrents should still be the lowest number that still alows you to max out your connection).

It's a little ironic that the workaround looks to minimize the chances of utorrent refusing an incoming connection. Refused connections are the most likely to cause TCP stack instability.

Yes, this will take more resources, but should make the system more stable. It will still accumulate stuck connections, but not as fast as before the wide-open configuration.

Link to comment
Share on other sites

Archived

This topic is now archived and is closed to further replies.

×
×
  • Create New...