Bug 857076 - reboot after Live installation hangs
reboot after Live installation hangs
Status: CLOSED UPSTREAM
Product: Fedora
Classification: Fedora
Component: spin-kickstarts (Show other bugs)
18
Unspecified Unspecified
unspecified Severity unspecified
: ---
: ---
Assigned To: Jeroen van Meeuwen
Fedora Extras Quality Assurance
https://fedoraproject.org/wiki/Common...
: CommonBugs
Depends On:
Blocks: F18Beta-accepted/F18BetaFreezeExcept
  Show dependency treegraph
 
Reported: 2012-09-13 10:11 EDT by Kamil Páral
Modified: 2012-10-17 04:06 EDT (History)
16 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2012-10-17 04:06:14 EDT
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:


Attachments (Terms of Use)
reboot hang (751.21 KB, image/jpeg)
2012-09-13 10:11 EDT, Kamil Páral
no flags Details
systemd debug messages during reboot hang (684.40 KB, image/jpeg)
2012-10-08 12:16 EDT, Kamil Páral
no flags Details
systemd reboot log (428.69 KB, text/plain)
2012-10-09 09:04 EDT, Kamil Páral
no flags Details

  None (edit)
Description Kamil Páral 2012-09-13 10:11:17 EDT
Created attachment 612464 [details]
reboot hang

Description of problem:
If you finish Live installation and hit Reboot, it hangs at "Reached target shutdown". See screenshot. Multiple people reproduced on multiple bare-metal machines. In VM it seems to work OK.

I have waited long minutes, it doesn't reboot. You can reboot by pressing Ctrl+Alt+Del.

Version-Release number of selected component (if applicable):
F18 Alpha RC3

How reproducible:
very often on bare metal machines
Comment 1 Kamil Páral 2012-09-13 10:12:06 EDT
Proposing for Alpha blocker discussion.
Comment 2 Adam Williamson 2012-09-17 22:52:57 EDT
Note the 'Dependency failed for Reboot.' line, I think that's the actual problem here. Not sure exactly how to debug it, though.
Comment 3 Matthias Clasen 2012-09-26 19:54:00 EDT
Moving to systemd for some insight.
Comment 4 Lennart Poettering 2012-09-28 06:28:24 EDT
(In reply to comment #2)
> Note the 'Dependency failed for Reboot.' line, I think that's the actual
> problem here. Not sure exactly how to debug it, though.

Usually the failed dependency should be shown a bit further up.

Other than that:

http://www.freedesktop.org/wiki/Software/systemd/Debugging#Diagnosing_Shutdown_Problems
Comment 5 Kamil Páral 2012-10-03 09:39:29 EDT
From what I have seen, this only happens on i386, but not on x86_64.
Comment 6 Kamil Páral 2012-10-08 11:38:54 EDT
Scratch that, it happens also on x86_64. Until now, I have seen the hang when installing from Live using optical media or PXE, but not using USB stick (created by any conversion method).

Unfortunately if you just boot LiveCD and reboot, everything seems OK. You have to perform the whole installation and only then it hangs on reboot.

Investigation goes on.
Comment 7 Kamil Páral 2012-10-08 12:16:52 EDT
Created attachment 623535 [details]
systemd debug messages during reboot hang

This is the best picture quality I was able to get. Because it doesn't happen with VM, just with LiveCD, I can't easily save the text through debug.sh, as linked by Lennart. Any other ideas for retrieving more (and better readable) info?
Comment 8 Michal Sekletar 2012-10-09 03:47:59 EDT
(In reply to comment #7)
> Created attachment 623535 [details]
> systemd debug messages during reboot hang
> 
> This is the best picture quality I was able to get. Because it doesn't
> happen with VM, just with LiveCD, I can't easily save the text through

I think you can mount some other partition in debug.sh, just before calling dmesg and create log on some other partition than /. I'm not 100% sure it will work, but now I can't think of a reason why it shouldn't, so give it a try.
Comment 9 Kamil Páral 2012-10-09 09:04:36 EDT
Created attachment 624082 [details]
systemd reboot log

Great call. Here's the log. I saved it as /mnt/test/log.txt (it has some selinux warning in the log). It was stuck as usual, I had to hit Ctrl+Alt+Del to reboot the system, as usual.
Comment 10 Adam Williamson 2012-10-11 14:41:39 EDT
Discussed at 2012-10-11 NTH review meeting: http://meetbot.fedoraproject.org/fedora-qa/2012-10-11/f18beta-blocker-review-3.1.2012-10-11-16.04.log.txt . Accepted as NTH, it's a very visible bug and could theoretically cause data loss as partitions are not unmounted prior to reboot (though there's no report yet that it has actually done so).
Comment 11 Kamil Páral 2012-10-11 14:46:49 EDT
Clarification: Partitions are not unmounted if user resorts to hard reboot, because the system appears to be "stuck". Using Ctrl+Alt+Del unmounts partitions correctly (but we can't expect general users to do this).
Comment 12 Michal Schmidt 2012-10-12 06:28:23 EDT
The log shows /tmp being unmounted at first:

[  675.039359] systemd[1]: tmp.mount mount process exited, code=exited status=0
[  675.039388] systemd[1]: tmp.mount changed unmounting -> dead
[  675.057084] systemd[1]: Job tmp.mount/stop finished, result=done
[  675.057162] systemd[1]: Unmounted Temporary Directory.

... but later suddenly tmp.mount becomes active again:

[  675.092326] systemd[1]: tmp.mount changed dead -> mounted

Then its dependencies are retroactively applied, i.e. its conflicting units are stopped:

[  675.092337] systemd[1]: Trying to enqueue job umount.target/stop/replace
[  675.092362] systemd[1]: Installed new job umount.target/stop as 1171
[  675.092367] systemd[1]: Job systemd-reboot.service/start finished, result=canceled
[  675.092410] systemd[1]: Job reboot.target/start finished, result=dependency
[  675.092444] systemd[1]: Dependency failed for Reboot.
[  675.092450] systemd[1]: Job reboot.target/start failed with result 'dependency'.
[  675.092456] systemd[1]: Installed new job systemd-reboot.service/stop as 1172
[  675.092459] systemd[1]: Installed new job reboot.target/stop as 1173
[  675.092463] systemd[1]: Enqueued job umount.target/stop as 1171
[  675.092967] systemd[1]: Job reboot.target/stop finished, result=done
[  675.092997] systemd[1]: Stopped target Reboot.
[  675.093017] systemd[1]: Job systemd-reboot.service/stop finished, result=done
[  675.093042] systemd[1]: Stopped Reboot.


I see that the "livesys" service mounts a tmpfs on /tmp. This is in addition to the tmp.mount that systemd puts there, so /tmp is over-mounted. Perhaps this confuses systemd here.

Does it help if you umount /tmp manually before initiating the reboot?
Comment 13 Bill Nottingham 2012-10-12 10:41:43 EDT
FYI, I'm pushing a change to spin-kickstarts to not mount a tmpfs over a tmpfs, because that's kind of silly.
Comment 14 Lennart Poettering 2012-10-12 11:31:34 EDT
systemd is currently not dealing nicely with multiple mounts on the same dir. We should probably fix that. (has been on the TODO for a while) Note sure what the best approach would be though. One option might be to repeatedly invoke umount in the "stop" method of .mount units, until the path is not a mount point anymore. But that's quite hard to do nicely and cleanly, since invoking /bin/umount is the official API to unmount things, but that'll complain if we invoke it on a dir that isn't a mount point and we can't really filter that away. The other option is to invoke path_is_mount_point() after each attempt and then redo the umount, but I am a bit concerned about retriggering foreign automounts with that, or ending up accessing a dead fs we better shouldn't have accessed... Which only leaves checking /proc/self/mounts in a loop. WHich is ugly, and string based, but should work.

Anyway, there are two things to fix here:

a) make systemd deal nicer with multiple overmounted mount points

b) teach the livesys stuff not to mount things multiple times on the same dir

And b) should be the beta blocker, not a). And Bill, I assume #13 means you fixed b)?
Comment 15 Bill Nottingham 2012-10-12 11:55:40 EDT
b) should be fixed in spin-kickstarts git; images would need to be remade, of course.
Comment 16 Adam Williamson 2012-10-15 17:16:30 EDT
if this was fixed in spin-kickstarts git on 10-12 it should probably be fixed in TC4. does someone want to check?
Comment 17 Kamil Páral 2012-10-16 05:51:48 EDT
I tried with F18 Beta TC4, installed three times in a row, all reboots were fine. Issue fixed.

Bill, which version of spin-kickstarts? Are the stable yet?
Comment 18 Bill Nottingham 2012-10-16 08:25:56 EDT
Kamil - I just fixed git; it may not be in a particular build.
Comment 19 Kamil Páral 2012-10-16 09:03:38 EDT
Changing component to spin-kickstarts, let's close when the fix is in stable updates. (But I guess RelEng use git checkout, so it's fine for the moment).
Comment 20 Adam Williamson 2012-10-16 15:44:51 EDT
kamil: yeah, we use git for composes, so for all intents and purposes this is fixed. we don't do spin-kickstarts package builds very often, and we usually only make a special effort to make sure they're in sync for Final release, not Alpha/Beta. so we could probably just CLOSED this.
Comment 21 Kamil Páral 2012-10-17 04:06:14 EDT
Adam: Thanks, let's close then.

Note You need to log in before you can comment on or make changes to this bug.