Red Hat Bugzilla – Bug 857076
reboot after Live installation hangs
Last modified: 2012-10-17 04:06:14 EDT
Created attachment 612464 [details]
Description of problem:
If you finish Live installation and hit Reboot, it hangs at "Reached target shutdown". See screenshot. Multiple people reproduced on multiple bare-metal machines. In VM it seems to work OK.
I have waited long minutes, it doesn't reboot. You can reboot by pressing Ctrl+Alt+Del.
Version-Release number of selected component (if applicable):
F18 Alpha RC3
very often on bare metal machines
Proposing for Alpha blocker discussion.
Note the 'Dependency failed for Reboot.' line, I think that's the actual problem here. Not sure exactly how to debug it, though.
Moving to systemd for some insight.
(In reply to comment #2)
> Note the 'Dependency failed for Reboot.' line, I think that's the actual
> problem here. Not sure exactly how to debug it, though.
Usually the failed dependency should be shown a bit further up.
Other than that:
From what I have seen, this only happens on i386, but not on x86_64.
Scratch that, it happens also on x86_64. Until now, I have seen the hang when installing from Live using optical media or PXE, but not using USB stick (created by any conversion method).
Unfortunately if you just boot LiveCD and reboot, everything seems OK. You have to perform the whole installation and only then it hangs on reboot.
Investigation goes on.
Created attachment 623535 [details]
systemd debug messages during reboot hang
This is the best picture quality I was able to get. Because it doesn't happen with VM, just with LiveCD, I can't easily save the text through debug.sh, as linked by Lennart. Any other ideas for retrieving more (and better readable) info?
(In reply to comment #7)
> Created attachment 623535 [details]
> systemd debug messages during reboot hang
> This is the best picture quality I was able to get. Because it doesn't
> happen with VM, just with LiveCD, I can't easily save the text through
I think you can mount some other partition in debug.sh, just before calling dmesg and create log on some other partition than /. I'm not 100% sure it will work, but now I can't think of a reason why it shouldn't, so give it a try.
Created attachment 624082 [details]
systemd reboot log
Great call. Here's the log. I saved it as /mnt/test/log.txt (it has some selinux warning in the log). It was stuck as usual, I had to hit Ctrl+Alt+Del to reboot the system, as usual.
Discussed at 2012-10-11 NTH review meeting: http://meetbot.fedoraproject.org/fedora-qa/2012-10-11/f18beta-blocker-review-3.1.2012-10-11-16.04.log.txt . Accepted as NTH, it's a very visible bug and could theoretically cause data loss as partitions are not unmounted prior to reboot (though there's no report yet that it has actually done so).
Clarification: Partitions are not unmounted if user resorts to hard reboot, because the system appears to be "stuck". Using Ctrl+Alt+Del unmounts partitions correctly (but we can't expect general users to do this).
The log shows /tmp being unmounted at first:
[ 675.039359] systemd: tmp.mount mount process exited, code=exited status=0
[ 675.039388] systemd: tmp.mount changed unmounting -> dead
[ 675.057084] systemd: Job tmp.mount/stop finished, result=done
[ 675.057162] systemd: Unmounted Temporary Directory.
... but later suddenly tmp.mount becomes active again:
[ 675.092326] systemd: tmp.mount changed dead -> mounted
Then its dependencies are retroactively applied, i.e. its conflicting units are stopped:
[ 675.092337] systemd: Trying to enqueue job umount.target/stop/replace
[ 675.092362] systemd: Installed new job umount.target/stop as 1171
[ 675.092367] systemd: Job systemd-reboot.service/start finished, result=canceled
[ 675.092410] systemd: Job reboot.target/start finished, result=dependency
[ 675.092444] systemd: Dependency failed for Reboot.
[ 675.092450] systemd: Job reboot.target/start failed with result 'dependency'.
[ 675.092456] systemd: Installed new job systemd-reboot.service/stop as 1172
[ 675.092459] systemd: Installed new job reboot.target/stop as 1173
[ 675.092463] systemd: Enqueued job umount.target/stop as 1171
[ 675.092967] systemd: Job reboot.target/stop finished, result=done
[ 675.092997] systemd: Stopped target Reboot.
[ 675.093017] systemd: Job systemd-reboot.service/stop finished, result=done
[ 675.093042] systemd: Stopped Reboot.
I see that the "livesys" service mounts a tmpfs on /tmp. This is in addition to the tmp.mount that systemd puts there, so /tmp is over-mounted. Perhaps this confuses systemd here.
Does it help if you umount /tmp manually before initiating the reboot?
FYI, I'm pushing a change to spin-kickstarts to not mount a tmpfs over a tmpfs, because that's kind of silly.
systemd is currently not dealing nicely with multiple mounts on the same dir. We should probably fix that. (has been on the TODO for a while) Note sure what the best approach would be though. One option might be to repeatedly invoke umount in the "stop" method of .mount units, until the path is not a mount point anymore. But that's quite hard to do nicely and cleanly, since invoking /bin/umount is the official API to unmount things, but that'll complain if we invoke it on a dir that isn't a mount point and we can't really filter that away. The other option is to invoke path_is_mount_point() after each attempt and then redo the umount, but I am a bit concerned about retriggering foreign automounts with that, or ending up accessing a dead fs we better shouldn't have accessed... Which only leaves checking /proc/self/mounts in a loop. WHich is ugly, and string based, but should work.
Anyway, there are two things to fix here:
a) make systemd deal nicer with multiple overmounted mount points
b) teach the livesys stuff not to mount things multiple times on the same dir
And b) should be the beta blocker, not a). And Bill, I assume #13 means you fixed b)?
b) should be fixed in spin-kickstarts git; images would need to be remade, of course.
if this was fixed in spin-kickstarts git on 10-12 it should probably be fixed in TC4. does someone want to check?
I tried with F18 Beta TC4, installed three times in a row, all reboots were fine. Issue fixed.
Bill, which version of spin-kickstarts? Are the stable yet?
Kamil - I just fixed git; it may not be in a particular build.
Changing component to spin-kickstarts, let's close when the fix is in stable updates. (But I guess RelEng use git checkout, so it's fine for the moment).
kamil: yeah, we use git for composes, so for all intents and purposes this is fixed. we don't do spin-kickstarts package builds very often, and we usually only make a special effort to make sure they're in sync for Final release, not Alpha/Beta. so we could probably just CLOSED this.
Adam: Thanks, let's close then.