Red Hat Bugzilla – Bug 476372
wait and retry doesn't wait
Last modified: 2009-09-09 06:01:27 EDT
Description of problem:
I have had a problem with a couple upgrades that have failed. I have tracked down the problem or at least I think I have it appears that if you get one of those errors that asks if you want to abort or retry if after an hour or two you have not responded it aborts and reboots instead of waiting until a human notices and takes care of the problem.
Like a lot of people I do many of my installs and upgrades in the background of doing other things, as most of the process is a "hurry up and wait" situation. So if I get involved with another project and don't get back immediately, or if I go run an errand and something happens that would be simple to fix if it waited instead it appears to just act on its own, and leaves a mess that takes longer to clean up.
I am wondering if this has something to do with kick start. All of the systems that I experienced this with have been using kick start files because of preupgrade. As I understand this it automatically sets for a reboot at the end. I am just wondering if these could be related and the end of the line reboot is triggering a reboot if an retry style error occurs.
I probably should specify how I know what is going on here. Several times I have started an upgrade or install, and went about other things. Then I would go by the screen and notice the error message. I would decide I didn't really have time at that minute to deal with figuring out the problem and the appropriate response. Then when I come back with the time to deal with it the system will seem to have proceeded on its own.
As I said not good.
This is really weird. I've never seen this behavior before, though the error paths here don't get the most testing. If you are able to reproduce this, could you please attach /tmp/anaconda.log from the trouble machine to this bug report? Grab the log after you've waited a long time, come back, and anaconda has proceeded on without you. Thanks.
I think you are missing the point here. The problem is that generally this happens in a setting where the "continue" is to reboot mid install. I have seen this on several machines.
The main reason I see this is a dumbly configured upstream firewall, or at least one that is not well configured for what I am doing. I am doing network based installs. It detects a whole lot of ftping from a single machine as a problem and does a 10 minute "cool out" on the machine blocking all ftp. This causes an error on the part of the machine, which then allows a retry or abort and reboot. If you don't retry soon enough it seems to automatically select the other option, which is the default option when this error comes up. (That also should probably be fixed to have retry be the default option as this makes frankly more sense) So it then reboots and log goes away.
So the continue and grab the tmp/anaconda.log thing will not work. For debugging purposes it would be really cool if there was an option to duplicate this file somewhere that doesn't disappear when you reboot. I have had several issues where the transient nature of this file has been a problem.
Mater a fact in thinking about this problem defaulting the option of retry would entirely fix the problem. If it retries and fails again no problem. It is the automatically dumping out to a reboot that is the problem. How about fixing this by setting the defaults on all of these retry/exit and reboots to retry. This would be more convenient and logical on a lot of levels.
Well this seems to be fixed. Now instead of the system just rebooting without waiting on a retry, it waits for you to select, but then reboots instead of retrying either way. Don't know if we can call this "fixed", but I have filed a new bug report on the new behavior so I guess this one can be closed.
Definitely some more testing needed here.
Closing based on commen #6.