Jump to content

reproducible resume failure


drizzle

Recommended Posts

Posted

I have a system that randomly crashes. It seems to have revealed a *reproducible* sequence that will result in the inability of utorrent to recover after the crash - ie all torrent info is lost and resume.dat is reinit'd.

(in the below 'do stuff' means use utorrent to do downloads, deletes, starts, stops, etc etc)

The sequence is:

-1-start system, start utorrent, do stuff, stop utorrent, and reboot

Now you have a 'clean' setup. Now...

-2-start utorrent, do stuff, crash system; then

-3-reboot, start utorrent, go through recovery (file checks, etc)

Now you have a recovered and running system - you proceed to

-4-do stuff, then crash system again

At the next reboot and restart of utorrent, all torrent info is lost and resume.dat is reinited.

_-VS-_

-3-reboot, start utorrent, go through recovery (file checks, etc)

-4-EXIT/RESTART utorrent, do stuff, then crash system again

At the next reboot and restart of utorrent, utorrent will again go do a successful recovery.

IE, a system crash AFTER a successful utorrent crash recovery seems to result in loss of all torrent info and resume.dat being marked bad at the next utorrent crash recovery attempt, UNLESS you exit / restart utorrent after the recovery. I don't know the particulars of how utorrent determines whether resume.dat is 'ok' to use for recovery, but it seems to fail when the recovery is of a 'recovered' session.

It has taken a long time to piece this together, (particularly considering that the time between crashes can be very long) but it is looking to be a pretty consistent behaviour pattern.

Given this system is dedicated to running utorrent, with auto reboot after a crash and auto start of utorrent, you can see that this behaviour results in loss of all torrent info every _other_ crash - UNLESS for some reason I restart utorrent or reboot the system in between crashes (which I don't do very often - it usually just sits in the corner and does its thing).

FYI - system is a Celeron 666, running W2K+SP4, utorrent 1.7.2

If there is interest, I have copies of the .dat and .old files from my latest crash recovery failure.

As far as I can tell, they look ok, but are failing whatever the 'usability' check is.

BTW, is there a way to over-ride the 'usability' check and tell utorrent to use resume.dat anyway?

Posted

As I have said elsewhere resume.dat corruption is readily reproducible when hard drive space goes to nil and also when the controller is unable to write every 30 second intervals.

Best guess i can give you is to use the bencoded file viewer in my sig to try and diagnose the problem yourself.

First, there are keys which tell uT if a file has been modified. That is "fileguard' or ".fileguard" I'm not sure when the transition took place.

Also if you want more help with this I'd recommend moving to 1.7.5. I can appreciate all the time you took to piece this together, but it's not uncommon for things like this to be "known" to the higher-ups and dedicated support staff, but not shown in a changelog.

Also while you may have found a method that works to reproduce the issue I would ask at least a cursory glance at some "in progress" resume.dat files (the refresh rate of 30 seconds allows ample opportunity) I'm not sure if you are aware, but as you say torrent info... is stored in resume.dat. Are you noticing any other coruption in .dat files.

Also the logging tab messages after a re-started uT may help verify the conditions uT Thinks happen, if you know the case to be otherwise.

I hope this helps, please respond back with any feedback.

Posted

Thanks for the pointer, but as it so happens after posting the above I went searching and found that tool on my own. ;) I've posted some comments in that thread (about the tool).

My use of the tool left me even more perplexed as to what was "wrong" with the resume.dat (and .old files) - but I've just discovered the problem!

I loaded the resume.dat into that tool and simply resaved it, then did a binary compare of the two. The only difference was the 'original' was slightly longer. Using a HEX editor revealed that after the last "valid" data was a string of null (ie 0x00) bytes.

For some reason, both my resume.dat and resume.dat.old have these extra null bytes. When I load the file into the BEditor and save it back, these null bytes are no longer present. When I pass the file with these trailing null bytes removed back to utorrent, it starts ok and proceeds with the recovery!

IE - the presence of these extra null bytes at the end of the file results in loss of all torrent info.

The number of extra bytes is not the same in the two files. It is not a 'pad to block boundary' issue; one file ends at 0x91DD the other at 0x91CF. Utorrent had been running for about a week since the last crash (I didn't stop / restart utorrent after the last crash recovery).

Purely a WAG, but I'd say that for that entire week (since the last crash recovery) utorrent has been generating 'bad' resume.dat files (ie files with extra null bytes at the end) and then chokes on them on the subsequent recovery attempt.

EDIT: Clarification...

In BOTH cases - where I try to simply reuse the "original" resume.dat, and where I try to use the modified resume.dat file with the trailing null bytes removed - I get the SAME message logged by utorrent - "Warning: file integrity check failed (hash doesn't match)". But in the first case I loose ALL torrent info, and in the second case I don't appear to loose ANY.

My further WAG is that utorrent is including the null bytes in its hash computation when it is generating the files (after the 1st crash recovery), but chokes on them when it tries to read the file back in later (after the 2nd crash). But explicitly exiting utorrent after the 1st crash recovery causes it to write out a clean / correct copy of the resume.dat file as part of its exit routine...

Posted

That is interesting. i'll see what I can do about someone investigating this.

As you said this is quite interesting, and I'm glad you didn't lose much data, especially when you've been running without a recent backup ;)

Blah I made a thread before about the procedure I use for my uT backups, but I'm enjoying home-made mac&cheese and I really don't feel like looking. I hope you'll forgive me.

Also I too have requested some sort of "verification check" be done on resume.dat to help reduce the risk of (rare) failures, but I feel for the devs who are currently working on making the new 1.8 line stable...At the very most I'd expect a "fix" build to come out soon after 1.8 goes beta, to include various fixes noted since 1.7.5 became the stable.

Again THANK YOU! for the investigative work. You'd have no idea how many people expect tech support to FIX their problem without any information in return.

Edit, i just have to ask, is WAG acronymic for wild @$$ guess?

Posted

OK, I'll check back later in case there are other questions.

I just checked the resume.dat file created as a result of utorrent doing recovery with the modified 'bad' file (ie with the nulls removed) and it does not have trailing nulls. So the 'bug' is not on THAT code path - seems it is only in the case of recovery with a 'good' resume.dat file that things go weird afterwards.

And re: WAG - 'yes'

Posted

Hmm, yeah in all my bad files I hadn't experienced that. Then again my problems had more to do with dying hard drive/controller issues whereby the data was actually not being written.

This NULL byte appending has to be applied somewhere, but I don't think uT would do it as what use does it have for them?

Being that this involves multiple crashes and rechecks could you try this with a more-or-less blank 1.8 version from here. I don't want you do to it on your production version as you've already sacrificed enough time... Frell, when I crash (unexpectedly, of course) it takes 12 GB of verification an hour. Including 1 empty, 1 completed, and your downloading torrent should suffice.

Note: to run multiple uTorrent instances simultaneously you need to add /RECOVER to the command line.

Please also note, i don't think you need to reboot windows (i.e. crash IT) for this to happen if your case is correct.

There is a little-known feature to hold down control- and right click in the logger tab and selecting "dump memory info" will cause uT to segfault and produce a dump file.

Could you see if this procedure creates malformed .dat files?

Posted

(Most of my crash problems stem from a combination of W2K's firewire drivers and some cheap external firewire drives I use on my torrent box. Got no probs with my IDE system disk.)

The fact that simply removing the null bytes results in a hash check failure says to me that they were included in the hash computation, which implies uT thought they were part of the bencoded string to be written to the file. IE - uT did it

The fact that it chokes reading the file with nulls back in is probably some code tripping over C string semantics (null termination) where it doesn't expect to encounter them.

"Frell" eh? I know what show _you_ watch :)

I'll run your tests, but it may take a day or two...

Posted

However many arns it takes ;)

Hey, that's another idea. But it would of course depend on who you ask.

What kinds of shibboleth should be used in such a recognition of vicarious sci-fi viewers.

The reason I said it's probably something external is that viewing my own .dat files of all varieties and shapes and crash scenarios and dump containers.. they are all valid bencoded files... start with 'd' end with 'e'. Extra NULLs means some buffer was full and dumped including the extra ones. Why? I have no idea. How can we find out.. well through this method or other experimentation. That's why I asked for the new beta 1.8 because then at least a devoted team would be able to look at it if it persists. On older versions perhaps the patch to fix it will be included in the future... but there's no guarantee when that will be due to the nature of software development.

  • 3 weeks later...
Posted

I don't know if I have the same problem, but for some inexplicable reason, my windows went freezing and stuff while trying to burn some CD, utorrent was running, I rebooted, and all my utorrent settings were gone.

I'm quite bothered, I had something like 70 torrent opened (some finished, some unfinished), and my resume.dat and resume.dat.old only have that :

d10:.fileguard40:600CCD1B71569232D01D110BC63E906BEAB04D8Ce

I use utorrent 1.7.5

Posted
The fact that simply removing the null bytes results in a hash check failure says to me that they were included in the hash computation, which implies uT thought they were part of the bencoded string to be written to the file. IE - uT did it

Possible... Do you have a guess as to when uT adds those extra bytes?

Posted

Sorry, no I don't (other than it seems to relate to crash recovery). I kept the files, if looking at them might give any clues. I also modified my setup so that in the future I make a copy of the files before starting uT after a crash.

Have not had any crashes since the last one (knock on wood) but I may get some time in the next few days to poke at this some more, including trying out the 1.8 beta with some forced crashes.

Out of curiosity, how are those two files (.dat and .old) handled with respect to the 30s updates? Are they being overwritten, or is it a file create / rename / delete sequence?

Archived

This topic is now archived and is closed to further replies.

×
×
  • Create New...