Red Hat Bugzilla – Bug 1255858
kickstart guided install reboots at|before preparing for install
Last modified: 2016-11-03 19:09:54 EDT
Description of problem:
Version-Release number of selected component (if applicable):
Steps to Reproduce:
Can you try running the install with inst.nokill on the boot command line, and attach the logs from /tmp after the install fails?
Welp, so many buttons, so little time...
That first comment was supposed to read as follows:
A GUI based kickstart driven install fails by orderly shutdown just around the time the display switches from the options (filled in by kickstart) and the "preparing for install" progress bar.
I have tried inst.text, serial redirect, inst.vnc and all have been successful - even with the kickstart.
A manual install, no kickstart, also succeeds.
Finally, I tried 'console=ttyS1 console=tty0 inst.graphical', which fails.
NOTE: the failure is not 100%. I was able to slip past the failure mode 2 out of 35 tries.
There is no dialog box or indication of failure other than a blank screen. When I manage to get to tty1 and poke around, I am able to see an orderly shutdown/reboot. When running with console redirected to both serial and tty0 then I also see just an orderly shutdown. No kernel messages, no error messages.
I have tried shunting our kickstart by removing the storage specification. The machine sits happily in this state with both GUI (tty6) and tmux (tty1) available. I'm digging through the environment now for signs of abnormality.
inst.nokill - coming right up.
Created attachment 1065685 [details]
results of inst.nokill install
Using inst.nokill install failed the second try.
I ran what we call a "customer" install for which we provide a kickstart file on separate media - inst.ks=cdrom:ks.cfg - and do the disk swap to get started. In this instance the GUI popped before getting to the progress bar screen.
I have compared 7.1-GA install logs with 7.2 successful and unsuccessful. Nothing stands out as obvious.
Things I've tried:
- inst.text is always successful
- inst.nokill makes a graphical install slightly more successful
- inst.debug=1 makes no difference in output, that I could tell
I think my next step is to dig into anaconda sources and try to enable more debugging. I haven't seen a way to increase the debug level from the kernel command line.
We have also noticed that specifying an install group '@Stratus ftAUL --optional' when using a graphical install fails to install any of the group. A text install happily installs the group. Probably another bug... What d'ya think? I'll be happy to add another issue.
FYI, I've now seen the graphics install anaconda crash with 7.2-Beta-1. Still no evidence in the log files that indicates what blew up.
I don't see anything in the logs that indicate any kind of problem. It was in the middle of installing packages it looks like, the last thing in rpm-script.log is: ghostscript-9.07-18.el7.x86_64 (840/1330) and syslog doesn't have any hardware problems logged.
When this fails with inst.nokill are the anaconda or anaconda-yum processes still running? If anaconda is, could you do this:
kill -USR2 `cat /var/run/anaconda.pid`
And attach the /tmp/anaconda-tb-* file as a text/plain attachment.
Created attachment 1070153 [details]
Another inst.nokill + graphics pop
Anaconda was still running, no trace files in /tmp.
Issued SIGUSR2 to anaconda, generating the attached trace file.
Can you run this with a simplified kickstart? One without any of your custom packages? I can't see any reason for it to be quitting, but it is possible that a problem in one of your packages could cause anaconda-yum to exit unexpectedly.
Also, was just anaconda running or was anaconda-yum also running (that's the process that handles the actual package installation).
The answer to the second question is yes, anaconda-yum is running at the time of the event.
Do you want ALL the logs from a less inclusive kickstart?
This latest repro only adds prerequisites for our custom packages. Nothing fancy with repos and all Red Hat software.
(In reply to David Bulkow from comment #12)
> Do you want ALL the logs from a less inclusive kickstart?
Yes, if it crashes. Thanks.
Created attachment 1071457 [details]
trace 2 - X.log
Created attachment 1071458 [details]
trace 2 - anaconda.log
Created attachment 1071459 [details]
trace 2 - anaconda-tb-HxM7mp
Created attachment 1071460 [details]
trace 2 - anaconda-tb-all.log
Created attachment 1071461 [details]
trace 2 - anaconda-yum.conf
Created attachment 1071462 [details]
trace 2 - ifcfg.log
Created attachment 1071463 [details]
trace 2 - ks-script-PRFjKQ
Created attachment 1071464 [details]
trace 2 - ks-script-n_VNqK
Created attachment 1071465 [details]
trace 2 - libuser.9wfl83
Created attachment 1071466 [details]
trace 2 - packages
Created attachment 1071467 [details]
trace 2 - packages_required
Created attachment 1071468 [details]
trace 2 - packaging.log
Created attachment 1071469 [details]
trace 2 - partition-map
Created attachment 1071470 [details]
trace 2 - program.log
Created attachment 1071471 [details]
trace 2 - rpm-script.log
Created attachment 1071472 [details]
trace 2 - storage.log
Created attachment 1071473 [details]
trace 2 - storage.state
Created attachment 1071474 [details]
trace 2 - syslog
The following were zero-length:
Sorry, I should have remembered to suggest running with inst.loglevel=debug, could you give that a try, with the limited package set. Thanks.
Do you want every file separately again?
There still isn't much in these logs that sticks out :-(
You only need to include the anaconda-tb-* file, it has everything we need in it.
Created attachment 1071973 [details]
trace 3 - anaconda trace file
This is from a simplified kickstart, @core (I removed 'Server with GUI' for this run) plus packages required for our software.
14:22:19,722 INFO anaconda: Installing boot loader
14:22:19,723 INFO anaconda: Performing post-installation setup tasks
14:22:19,747 INFO anaconda: Performing post-installation setup tasks
14:22:19,747 INFO anaconda: Thread Done: AnaInstallThread (139932915861248)
This was actually a successful run. Can you get this exact same kickstart to fail?
It hadn't quite reached the point where one presses the reboot button. The graphics popped and I was left with the tmux session and anaconda still running.
There hasn't been anything interesting in these logs from the beginning. I wonder if having your hands on one of these machines could help?
In the mean time I'm giving 7.2-Snap-1 a try this morning.
That's correct behavior when using nokill, anaconda stops in a loop. But the installation was finished in this case. In the others it wasn't done since anaconda-yum was still running and it hadn't logged anything about bootloader installation.
7.2-Snap-1 is far less likely to reproduce this issue - 1 in 10?
7.2-Alpha-1 was more like 9 in 10.
I just finished round-tripping other install issues and will try to get another trace. Unfortunately I don't see anything useful coming from these logs. I've been comparing successful versus failed runs and nothing jumps right out. It feels like something graphics related is blowing up, but not leaving any traces in logs.
As for the reboot button - I have been having to first press Reboot, then Ctrl-alt-delete to reboot from an install when things go right.
Is there more I can do in the graphics interface to catch faults - presuming there are ways to fail that don't get logged?
There's inst.loglevel=lock which will log a pile of things about yum locking, but at the point where it fails it is past all of that. When it starts up anaconda-yum it is supposed to be sitting in a loop watching its output and it shouldn't escape with anaconda-yum still installing things.
I may have hit this on Friday while doing a dvd + kickstart install. I only saw it once, but I'll keep looking.
I revisited this today, and still cannot get it to reproduce here. I've used your kickstart (with the slight modification of just doing autopart) in a KVM. I've grabbed it from http and from a swapped in cdrom image, I've inserted failures into anaconda-yum and anaconda's processing of the anaconda-yum output, I've stared at the code and dug into the underlying yum code a bit too.
I cannot find any code paths that would result in it jumping to the exit code without logging something. In anaconda-yum it will always pass a QUIT: back to anaconda. In anaconda it will always log what it gets from anaconda-yum, and if anaconda-yum quits unexpectedly the traceback will be logged in anaconda.log and a dialog displayed.
There must be something very different about how you are using this, or as a last resort some kind of hardware problem that's not showing up in syslog. eg. a out of memory problem would usually show up as a oom-killer entry.
I'm pretty much stuck. About all I can recommend is trying a different method of installation and see if it works better.
As an example of how puzzling this is, in the packaging.log there is this:
14:33:52,738 INFO packaging: populate transaction set
which comes from anaconda-yum. That should be immediately followed by "check transaction set". The only way it can escape would result in either an error being logged, or a progress updates about retrying the populate transaction set.
In the anaconda code, where it processes the output from anaconda-yum, it logs everything, even unrecognized text. It can't hit either of the sys.exit() calls without something getting logged.
Which isn't to say I don't believe this is happening, it clearly is, but all the code paths should result in more detail than what we are seeing which makes me suspect that there is some other serious unknown issue going on.
This isn't statistically significant, but I have not yet seen this fail when I add inst.text to the command line.
The downside to a text install is that graphics.target (sp?) isn't enabled, leaving a problem we have to clean up if this install is being performed in manufacturing. Some folks here have had their install automagically downgraded to a text install - related?
The graphics on this platform is provided by the on-board BMC, using some old spec Matrox design. I have no doubt there could be trouble in this space, but I'm not sure how to dig that out of the data we've been getting. Could it be possible that instead of an error exit the process just got sigKilled?
Also not statistically significant - I have not had this fail when no kickstart is specified/used. This would indicate we've done something untoward in the kickstart, and we did believe this for a while, leading us to reduce the kickstart to it's minimum - still had this event occur.
If there would be value in seeing this first hand, I'd be glad to host a field trip - we're down the road about 20 mins from Westford. Unfortunately we don't have this hardware in a Red Hat lab.
(In reply to David Bulkow from comment #46)
> This isn't statistically significant, but I have not yet seen this fail when
> I add inst.text to the command line.
> The downside to a text install is that graphics.target (sp?) isn't enabled,
> leaving a problem we have to clean up if this install is being performed in
> manufacturing. Some folks here have had their install automagically
> downgraded to a text install - related?
If you add 'xconfig --startxonboot' to the kickstart you should be able to do a text installation of a GUI system.
> The graphics on this platform is provided by the on-board BMC, using some
> old spec Matrox design. I have no doubt there could be trouble in this
> space, but I'm not sure how to dig that out of the data we've been getting.
> Could it be possible that instead of an error exit the process just got
I don't think so, I tried killing things from outside (and inside) the processes and it always resulted in a clear reason being logged.
> Also not statistically significant - I have not had this fail when no
> kickstart is specified/used. This would indicate we've done something
> untoward in the kickstart, and we did believe this for a while, leading us
> to reduce the kickstart to it's minimum - still had this event occur.
Or that there's some timing issue involved.
A funny thing just happened. I saw this in a kvm with the ks pulled from network (no cd swapping). But since I didn't have nokill it just went click and rebooted. So I've probably seen this twice in all my attempts -- but something clearly went wrong.
I also just tried killing metacity (results in an error dialog box from anaconda) and Xorg (results in anaconda exiting but no system reboot).
I may have found it, give this updates.img a try by adding this to the cmdline (or copying it to a local http server):
The problem may be a race at the switch from the summary hub to the progress hub, there are 2 '_autoContinue clicking continue button' which was a problem fixed in the upstream version of Anaconda.
Sorry this took so long. We've had runs where these install issues go away but then come back weeks later. It was important to us that we test this thoroughly, particularly on machines where this seems to happen more frequently.
We have managed 18 successful installs plus 9 more on another machine using this patch.
I think this can be considered fixed.
Thank you for your efforts in digging this one out.
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.
For information on the advisory, and where to find the updated
files, follow the link below.
If the solution does not work for you, open a new bug report.