Jump to content

Possible bug or race condition in rename code (File/path not found)


Lord Kestrel

Recommended Posts

I've just started to see this bug after upgrading to 2.2.1 from 2.2.0. I never saw this behaviour on 2.2, however I did not run this version for VERY long.

I've run almost every stable release from 1.x to 2.0.x to 2.2 without this happening.

Let's get the usual statement of OS/version/etc out the way first:

The client's runs on same vmWare 6.5 hosted XP SP3 instance that it's always been since first downloading and running uTorrent in 2006. The virtual machine is isolated and exists solely for uTorrent operation. All of the files it hosts/downloads are stored on SMB/CIFS servers located on my GigE (wired) LAN. (The host OS is CentOS 5 on a 2.5GB RAM AMD Sempron bookcase system.)

More details in a prior bug-report post here: http://forum.utorrent.com/viewtopic.php?pid=253367#p253367

uTorrent has about 6000 torrents "loaded" with about 2000 in queued/active status, and about two dozen active at any one time. Nominal RAM usage is about 250MB. The VM has 640MB of physical RAM allocated to it, and XP is configured to NOT use any swap. I get wire-speed downloads on fresh well-seeded torrents, and use both uTorrent's rate-shaping *AND* a global iproute2-based traffic policing setup to clamp that IP's uplink bandwidth to about 60% of my ISP physical uplink capacity. I use a properly configured stateful firewall that does NOT interfere with proper ICMP messages, and my PPPoE clamps MSS to avoid packet fragmentation. There is NO anti-virus or security software whatsoever on the XP guest OS. I rely on old fashioned compartmentalisation, bi-directional stateful packet filtering, and application gateway proxying for outgoing web-based (tracker) requests.

All of the access controls are implemented at the host-level (or by my firewalls). The backing store objects (three servers) for the guest consist of throwaway Linux-based CIFS storage with no critical or sensitive information on them whatsoever. Since they use RAID5, the odd disk failure hasn't caused data loss. I also proactively replace the entire RAID set members every 15 months or so, which has the added bonus of enlarging the storage set. The backing store filesystems are XFS, and are hosted on hardware that's fully protected by a good quality UPS. Even the largest file allocations happen instantly (I don't use "sparse_files"), something that used to choke Win32-based SMB servers for some reason when downloading very large files. (The CreateFile() would take so long, it'd time out the SMB stack on that volume.)

None of the above has changed ever. The house network server has been running for 540 days non-stop, and I almost never make firewalling or routing changes. Inward DNAT rules were defined for uTorrent *YEARS* ago to be "connectable".

The only common context for this bug appearing for me is a torrent with many files. Whether they're 300MB or 10MB doesn't seem to matter. More files seems to increase the likelihood of the problem appearing. Single file torrents don't seem to trigger this bug, or if it does, it's rare.

I suspect there's some kind of assumption going on with the "rename when finished" logic - perhaps a file open on the new filename is attempted "too quickly", and when it fails, the torrent's marked as bad. I would humbly suggest to the developers that during this phase of the client's operation, that "file not found" errors be retried after a small delay. I'm not sure what the interaction between the CIFS server cache, state, and the XP guest's SMB stack state (or caching, which I know Windows does do for certain operations - to wit, oplocks). I don't think the XP SMB stack is very threaded, so if many transactions queue up, you cannot necessarily assume that any vestigial locks are cleared from an immediately preceding operation.

It's also possible that a rename is attempted "too quickly" after the file's closed for writing, an error occurs on the operation (due to the file being locked). and then a subsequent file open failing when the renamed file still has the old name, etc. I don't have source code, so this is 100% speculation. One other possibility is that the rename is done completely differently, vis-a-vis, before you might have issued a FileRenameInformation() or similar call on an already opened file handle to the file which has just had the last write committed to it, which does not "break" any oplocks for that object.

This has been a thorny area for M$ and SMB/CIFS implementors for quite some time. I humbly suggest you look past MSDN documentation into how the more subtle aspects of this ACTUALLY work, especially under XP.

See: http://social.msdn.microsoft.com/Forums/en-US/os_fileservices/thread/3ca14dc9-da1f-4786-a8f7-a86e9903db0c

In *EVERY* case where the torrent's halted in this way (dozens of times since upgrading to 2.2.1), a simple re-check restores operations. On at least four occasions, the torrent was complete once the re-check finished, suggesting it was the final rename-and-check that "failed". If the network link to the server genuinely broke, then I'd have gotten "no such file" on hundreds if not thousands of active torrents.

On torrents with dozens of files, this seems to trip uncomfortably often. The type of the file is irrelevant. It happens on naked media files as well as .rar and .zip files with impunity. All three checkboxes in the "When Downloading" sub-area of "GENERAL" are checked: Pre-allocate files, append .ut!, and Prevent standby.

I've not changed any "Advanced" configs, except for MAYBE the sparse_file option (which I do NOT use).

I'll probably downgrade to 2.2.0 and see if it happens on the next big collection. I have one torrent I've needed to restart 8 times already, maybe I should downgrade now.

SOMETHING's changed in the file handling code semi-recently, even if it's subtle. My countermeasures suggested above should greatly lower the chances of it happening.

Addendum: I just had this happen to me with a single file torrent (downloaded while Force Checking another torrent, so the SMB stack for that share was pretty busy). This time out of curiosity, I just hit "START" instead of "Force Check" (because I'd need to wait anyway), and uTorrent's status icon for that torrent went to green and started seeding it without any further input from me.

I suspect whatever's being checked directly after the rename operation is failing validation (SHARING_VIOLATION perhaps?), and aborting the torrent. The rename actually succeeds, just not quickly enough for the code's liking. I wish I remembered to check the file from another machine to see what its filename was.

Addendum 2: None of these torrents have long pathnames or strangely named files with Unicode characters. Total pathname lengths for most rarely exceeds 120 characters.

Now trying uTorrent 3.0 (release). We'll see. While the startup sequence seems more spritely when re-enumerating the several thousand torrents during initial startup, as opposed to prior versions which just seemed to take a very long time to startup without any GUI becoming active, I notice a progressive decline in responsiveness compared to 2.2.x, which itself had lower performance than 1.x. Admittedly, my client "host" is modest, but it's been nimble for most of uTorrent's life until recently.

Link to comment
Share on other sites

Archived

This topic is now archived and is closed to further replies.

×
×
  • Create New...