1209941 – upgraded system does not reboot automatically, ctrl+alt+del is needed: Failed unmounting /sysroot/proc

Bug 1209941 - upgraded system does not reboot automatically, ctrl+alt+del is needed: Failed unmounting /sysroot/proc

Summary: upgraded system does not reboot automatically, ctrl+alt+del is needed: Failed...

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Fedora
Classification:	Fedora
Component:	fedup-dracut
Sub Component:
Version:	22
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	unspecified
Target Milestone:	---
Assignee:	Will Woods
QA Contact:	Fedora Extras Quality Assurance
Docs Contact:
URL:
Whiteboard:	AcceptedBlocker https://fedoraproject...
Depends On:
Blocks:	F22FinalBlocker
TreeView+	depends on / blocked

Reported:	2015-04-08 13:32 UTC by Kamil Páral
Modified:	2015-06-30 02:04 UTC (History)
CC List:	20 users (show)
Fixed In Version:	fedup-dracut-0.9.2-1.fc22
Clone Of:
Environment:
Last Closed:	2015-05-29 13:47:36 UTC
Type:	Bug
Embargoed:
Dependent Products:

Attachments	(Terms of Use)
screenshot of fedup hanging and not rebooting (24.90 KB, image/png) 2015-04-08 13:34 UTC, Kamil Páral	no flags	Details
fedup.log (160.64 KB, text/plain) 2015-04-08 13:34 UTC, Kamil Páral	no flags	Details
upgrade.log (308.04 KB, text/plain) 2015-04-08 13:34 UTC, Kamil Páral	no flags	Details
journal from the upgrade run (80.68 KB, text/plain) 2015-04-08 13:35 UTC, Kamil Páral	no flags	Details
rpm -qa on F21 before upgrade (7.92 KB, text/plain) 2015-04-08 13:35 UTC, Kamil Páral	no flags	Details
rpm -qa on F22 after upgrade (8.80 KB, text/plain) 2015-04-08 13:35 UTC, Kamil Páral	no flags	Details
/var/log/fedup.log (844.32 KB, text/plain) 2015-04-22 17:39 UTC, Alberto Chiusole	no flags	Details
journal systemd debug (636.31 KB, text/plain) 2015-05-06 02:23 UTC, Chris Murphy	no flags	Details
fedup-dracut patch to call systemctl reboot (1.29 KB, patch) 2015-05-18 04:13 UTC, Zbigniew Jędrzejewski-Szmek	no flags	Details \| Diff
fedup.log in response to comment 45 (1.44 MB, text/plain) 2015-06-30 02:02 UTC, Jonathan Baron	no flags	Details
dnf.log in response to comment 45 (1.66 MB, text/plain) 2015-06-30 02:03 UTC, Jonathan Baron	no flags	Details
selection from /var/log/messages in response to comment 45 (139.80 KB, text/plain) 2015-06-30 02:04 UTC, Jonathan Baron	no flags	Details
View All

Description Kamil Páral 2015-04-08 13:32:57 UTC

Description of problem:
I have installed a minimal F21 installation from Server netinst, then upgraded it using fedup:

# fedup --network 22 --instrepo http://dl.fedoraproject.org/pub/alt/stage/22_Beta_RC1/Server/x86_64/os/

At the very end of the upgrade, the system fails to reboot. It just waits. I have waited more then 10 minutes, still hanging. Hitting Ctrl+Alt+Delete reboots the system.

The only error message I see is:
> Failed unmounting /sysroot/proc.
I don't know if it's relevant.

Version-Release number of selected component (if applicable):
fedup-0.9.1-1.fc21.noarch
fedup-0.9.1-1.fc22.noarch
22_Beta_RC1

How reproducible:
100% (3 of 3 attempts)

Steps to Reproduce:
1. install F21 minimal from Server netinst
2. run fedup with Beta RC1 Server x86_64 instrepo
3. see that system does not reboot after upgrade

Additional info:
I suspected this to be a duplicate of bug 1205344, but plymouth seems to be installed on F21 minimal system. So this is probably a different bug.

Comment 1 Kamil Páral 2015-04-08 13:34:18 UTC

Created attachment 1012241 [details]
screenshot of fedup hanging and not rebooting

Comment 2 Kamil Páral 2015-04-08 13:34:46 UTC

Created attachment 1012242 [details]
fedup.log

Comment 3 Kamil Páral 2015-04-08 13:34:52 UTC

Created attachment 1012243 [details]
upgrade.log

Comment 4 Kamil Páral 2015-04-08 13:35:17 UTC

Created attachment 1012244 [details]
journal from the upgrade run

Comment 5 Kamil Páral 2015-04-08 13:35:34 UTC

Created attachment 1012245 [details]
rpm -qa on F21 before upgrade

Comment 6 Kamil Páral 2015-04-08 13:35:44 UTC

Created attachment 1012246 [details]
rpm -qa on F22 after upgrade

Comment 7 Kamil Páral 2015-04-08 13:40:20 UTC

This seems to violate our criteria:
https://fedoraproject.org/wiki/Fedora_22_Beta_Release_Criteria#Upgrade_requirements
"For each one of the release-blocking package sets, it must be possible to successfully complete an upgrade from a fully updated installation of the previous stable Fedora release with that package set installed. The release-blocking package sets are the minimal set, and the sets for each one of the release-blocking desktops. The upgraded system must meet all release criteria. "

If the system doesn't reboot automatically, it makes upgrading servers and other remote machines very problematic. And that is exactly the common use case for minimal installs. OTOH, this might be too strict for Beta.

Comment 8 Kamil Páral 2015-04-08 14:02:55 UTC

I have reproduced the same problem with F21 Server install, so this does not affect just minimal install.

Comment 9 Adam Williamson 2015-04-08 14:09:55 UTC

Personally I think there may be a case for Final blocker here, but it doesn't seem serious enough for Beta. I really wouldn't expect people to be doing mass unattended upgrades to Beta. The point of requiring upgrades to basically work at Beta is so the mechanism can be tested in a range of cases; it does work well enough for that.

Comment 10 Lukas Brabec 2015-04-08 14:15:18 UTC

Tried fedup twice (workstation and minimal) on UEFI machine, 21->22:

Workstation fedup process went as expected.

Minimal fedup process updated the system, but before the final reboot, it hung on "Failed unmounting /sysroot/proc" as kparal described.

I also encountered this while testing bug 1208214 comment 22

Comment 11 Kamil Páral 2015-04-08 14:22:43 UTC

(In reply to awilliam from comment #9)
> Personally I think there may be a case for Final blocker here, but it
> doesn't seem serious enough for Beta. I really wouldn't expect people to be
> doing mass unattended upgrades to Beta. The point of requiring upgrades to
> basically work at Beta is so the mechanism can be tested in a range of
> cases; it does work well enough for that.

I agree. I'm dropping Beta proposal and moving it to Final.

Comment 12 Kamil Páral 2015-04-08 15:18:47 UTC

I have tried this with encrypted Workstation install and fedup rebooted just fine. This seems to affect only Server or minimal installations.

Comment 13 Kamil Páral 2015-04-09 09:23:02 UTC

I have just reproduced this with an unencrypted Workstation upgrade. So, it seems this is probably some kind of a race condition (and maybe it's more frequent for Server/minimal, or we just had more of bad luck with it).

Comment 14 Dan Mossor [danofsatx] 2015-04-10 16:13:54 UTC

I hit this last night with a fedup of F21 with Plasma5 from dvratil-copr to F22 Plasma. If it is a hardware race condition, I have an Intel(R) Core(TM) i7-3610QM CPU @ 2.30GHz, 24GB RAM, and an encrypted 128GB SATAIII SSD and 1TB spinner. Since it was my laptop, the only log I could grab was a picture of the screen with my phone.

Comment 15 Itaru Kitayama 2015-04-15 00:04:58 UTC

Seen couples of times on Mustang in the Linaro server lab.

Comment 16 Itaru Kitayama 2015-04-19 23:38:19 UTC

I just saw this during my update to Fedora 22 Beta RC1.

Comment 17 Dan Mossor [danofsatx] 2015-04-20 17:06:05 UTC

Ran into this again Saturday night. The system in question was a Fedora 20 KDE install, fedup'd two weeks prior to F21 non-product KDE. Once fedup to F22 Plasma was finished, it stalled at "Failed to unmount /sysroot/proc" and "Failed to unmount /dev/hugepages".

It isn't just minimal or server installs.

Comment 18 Petr Schindler 2015-04-20 17:09:47 UTC

Discussed at today's blocker review meeting [1].

This bug was accepted as Final Blocker - it's clear we don't have all the information here, but upgrade apparently fairly commonly failing to reboot is considered a conditional violation of the 'upgrade requirements' criterion, due to the remote system scenario cited in comment #7

[1] http://meetbot.fedoraproject.org/fedora-blocker-review/2015-04-20/

Comment 19 Petr Schindler 2015-04-21 10:29:35 UTC

I just tested this with fedup-0.9.2-1.fc21 when upgrading f21 Server. System wasn't rebooted after upgrade was finished. (system failed to unmount /sysroot/proc).

Comment 20 Langdon White 2015-04-21 16:02:25 UTC

I received a very similar issue. I did not actually get an error, but the system did hang during the unmounting process. ctrl-alt-del let it continue with a couple messages I missed. Screenshot (from my phone) of lead in to hang.

Screenshot: http://imgur.com/6wX1Jwd (not seeing attachment button)

Comment 21 Matthew Bunt 2015-04-21 17:19:22 UTC

I had the very same experience, only from a Workstation 21 install. This was a real world machine with many packages installed. I used fedup-0.9.2-1.fc21 as per the Test Day:2015-04-21 FedUp wiki page. I ran into this bug at the end of the upgrade saying it could not unmount /systroot/proc but after ctrl-alt-del the system rebooted fine and I'm now on Workstation 22 Beta.

Comment 22 Giulio 'juliuxpigface' 2015-04-21 17:38:14 UTC

While following 'Test Day:2015-04-21 FedUp', I encountered the same behavior as Matthew's machine (comment #21). While not being too different from a standard Workstation, my Fedora is a 'Non-Product' one.

Comment 23 Alberto Chiusole 2015-04-22 17:34:38 UTC

Same issue for me as occurred on Matthew's and Giulio's machine (comments #21 and #22), "solved" with a ctrl+alt+del.
Like Giulio, Fedora on my computer is a 'Non-Product' one (lxde+openbox+...).

If there is the need, I can provide the logs (fedup for sure because I saved it, and the upgrade log), but I think would not be much more useful than other attached above.

I encountered some warning of packages without updates before rebooting. I'll attach them.

Comment 24 Alberto Chiusole 2015-04-22 17:39:35 UTC

Created attachment 1017543 [details]
/var/log/fedup.log

log I saved by passing --debuglog to fedup.

Comment 25 David Gay 2015-04-28 23:06:16 UTC

Discussed at the 2015-04-28 blocker review meeting.[1]

Will, has there been any news on this bug?

[1]: http://meetbot.fedoraproject.org/fedora-blocker-review/2015-04-28/

Comment 26 Dan Mossor [danofsatx] 2015-05-04 17:03:52 UTC

Discussed at the 2015-05-04 blocker review meeting.[0]

We are still awaiting an update to this blocker with Final Freeze being 8 days away.

[0] http://meetbot.fedoraproject.org/meetbot/fedora-blocker-review/2015-05-04/f22-blocker-review.2015-05-04-16.00.log.txt

Comment 27 Chris Murphy 2015-05-06 01:51:57 UTC

Reproduced this by starting with a clean Fedora 21 netinstall using standard partitions + XFS, nothing fancy.

Comment 28 Chris Murphy 2015-05-06 02:23:27 UTC

Created attachment 1022395 [details]
journal systemd debug

Reproduced in a (virt-manager) VM, set systemd.log_level=debug and captured the console output to this file.

Comment 29 Chris Murphy 2015-05-07 03:48:59 UTC

This could be a systemd bug since it's mainly responsible for reboot/shutdown, adding Zbigniew. Maybe a suggestion for getting more information than the serial console capture has?

Comment 30 Will Woods 2015-05-14 16:43:43 UTC

Okay, so here's what's going on, according to that log.

We'll start where reboot gets initiated:

Executing: /usr/bin/systemctl --no-block isolate reboot.target
Calling manager for StartUnit on reboot.target, isolate

[...lots of shutdown stuff is scheduled, including these stop jobs...]

Installed new job systemd-journald-audit.socket/stop as 391
Installed new job systemd-journald.service/stop as 394

[...and later that job kills off systemd-journal...]

Child 767 (systemd-journal) died (code=exited, status=0/SUCCESS)
Child 767 belongs to systemd-journald.service
systemd-journald.service: main process exited, code=exited, status=0/SUCCESS
systemd-journald.service changed stop-sigterm -> dead
Job systemd-journald.service/stop finished, result=done
[ OK ] Stopped Journal Service.
Releasing all resources for systemd-journald.service
systemd-journald-dev-log.socket changed running -> listening
systemd-journald-audit.socket changed running -> listening
systemd-journald.socket changed running -> listening

[...huh, two other sockets? a moment later, we see:]

Cannot find unit for notify message of PID 767.
Got auxiliary fds with notification message, closing all.
Closing left-over fd 21
Cannot find unit for notify message of PID 767.
Received SIGCHLD from PID 767 (n/a).

[...I thought PID 767 was already dead? but whatever...]

Accepted new private connection.
Incoming traffic on systemd-journald-audit.socket
Suppressing connection request on systemd-journald-audit.socket since unit stop is scheduled.

[...that socket gets ignored because we're shutting it down, but...]

Incoming traffic on systemd-journald.socket
Trying to enqueue job systemd-journald.service/start/replace
Job shutdown.target/start finished, result=canceled
Installed new job shutdown.target/stop as 398
Installed new job systemd-journald.service/start as 395
Job reboot.target/start finished, result=canceled

And there you have it: reboot cancelled.

So the problem is that journald starts back up, and journald starts because systemd-journald.socket is still active.

So the question is: why isn't systemd-journald.socket being stopped?

Comment 31 Marcin Juszkiewicz 2015-05-14 19:27:00 UTC

APM Mustang (aarch64). Installed Fedora 21 Server and then used fedup to upgrade to F22 TC2. Had to reset machine by hand. After reset landed in properly installed F22.

Comment 32 Zbigniew Jędrzejewski-Szmek 2015-05-18 04:13:29 UTC

Created attachment 1026552 [details]
fedup-dracut patch to call systemctl reboot

Please try this patch. I think it should do the trick.

Comment 33 Will Woods 2015-05-18 15:43:52 UTC

I don't think that's going to work; we're already isolating reboot.target (see the top of the log in comment #30).

Oh, and here's why those two sockets aren't being shut down when we isolate reboot:

  systemd-journald-dev-log.socket:IgnoreOnIsolate=yes
  systemd-journald.socket:IgnoreOnIsolate=yes

Which makes the new question: how does normal system shutdown avoid this?

Comment 34 Zbigniew Jędrzejewski-Szmek 2015-05-18 17:33:06 UTC

Please look at the description of the patch (not shown if you just click on the "Created attachement ..." link, you need the "raw" version https://bugzilla.redhat.com/attachment.cgi?id=1026552). With the proposed change an different job-mode is used and the transition is irreversible.

Comment 35 Will Woods 2015-05-18 18:30:23 UTC

Heh! Yep, I understand now - I just came to the same conclusion after debugging a normal reboot sequence, which does:

  Trying to enqueue job reboot.target/start/replace-irreversibly

instead of:

  Trying to enqueue job reboot.target/start/isolate

which appears in the logs for the upgrade.

So yeah, the root cause here is that fedup-dracut is manually isolating reboot.target when it should just 'systemctl reboot ...' and let systemd worry about the rest.

Which means your patch is totally correct! I've applied it upstream:

  https://github.com/rhinstaller/fedup-dracut/commit/60caf4c

There should be a fixed fedup-dracut build shortly. (Note that the F22 boot images will need to be rebuilt to pick up this fix.)

Comment 36 Will Woods 2015-05-18 21:36:44 UTC

fedup-dracut-0.9.2-1.fc22 now building:

   http://koji.fedoraproject.org/koji/buildinfo?buildID=637739

Comment 37 Fedora Update System 2015-05-19 14:58:17 UTC

fedup-dracut-0.9.2-1.fc22 has been submitted as an update for Fedora 22.
https://admin.fedoraproject.org/updates/fedup-dracut-0.9.2-1.fc22

Comment 38 Fedora Update System 2015-05-20 02:52:33 UTC

Package fedup-dracut-0.9.2-1.fc22:
* should fix your issue,
* was pushed to the Fedora 22 testing repository,
* should be available at your local mirror within two days.
Update it with:
# su -c 'yum update --enablerepo=updates-testing fedup-dracut-0.9.2-1.fc22'
as soon as you are able to.
Please go to the following url:
https://admin.fedoraproject.org/updates/FEDORA-2015-8532/fedup-dracut-0.9.2-1.fc22
then log in and leave karma (feedback).

Comment 39 Kamil Páral 2015-05-20 10:52:09 UTC

This seems to be fixed in RC1, system reboots properly after fedup.

Comment 40 Fedora Update System 2015-05-21 18:44:25 UTC

fedup-dracut-0.9.2-1.fc22 has been pushed to the Fedora 22 stable repository.  If problems still persist, please make note of it in this bug report.

Comment 41 Germano Massullo (Thetra) 2015-05-29 10:19:28 UTC

*** Bug 1224985 has been marked as a duplicate of this bug. ***

Comment 42 Germano Massullo (Thetra) 2015-05-29 10:24:03 UTC

Confirming the bug on fedup-0.9.2-1.fc21 (F21->F22 upgrade)

Comment 43 Kamil Páral 2015-05-29 13:47:36 UTC

Germano, your screenshot does not contain "Failed unmounting /sysroot/proc." message, so it's likely to be a different bug. Please reopen you original bug and attach /var/log/upgrade/upgrade.log there. Thanks.

Comment 44 Jonathan Baron 2015-06-08 20:56:04 UTC

I'm not sure that this is the same bug, but here is the story. I upgraded a server (using nonproduct, because it was already carefully configured) remotely from F20 to F22. (I had done this before on my laptop, but not remotely.) I did the download part on June 7. At 6:30 AM on June 8 I rebooted the server remotely. Based on previous experience, I expected it to go through "System Upgrade" and take about 2 hours and then reboot by itself. At 11 AM it was still inaccessible so I went in to my office where the server is. The monitor (which had been turned off) showed nothing but a couple of lines. The keyboard and mouse did nothing. The ethernet light was flashing, but I could not even ping the server. And the fan was running at full blast. I figured, "Well, I'd better let it do its thing," so I let it run until 12:30. Finally I gave up and did a hard reboot (pressing the power button). It rebooted correctly into F22 and all was well.

I could find nothing that made any sense to me in the logs, but I save them in their most recent stat: fedup.log, dnf.log, and messages. I gave the times above so someone can track them in the messages file.

It may be that the instructions should say "Don't do this remotely." But my other server doesn't even have a monitor, although I could connect one.

Comment 45 Adam Williamson 2015-06-30 00:17:31 UTC

no, there's usually no reason not to do it remotely. It's expected to work. Hard to tell what happened in your case without logs, though.

Comment 46 Jonathan Baron 2015-06-30 02:02:59 UTC

Created attachment 1044520 [details]
fedup.log in response to comment 45

Comment 47 Jonathan Baron 2015-06-30 02:03:55 UTC

Created attachment 1044521 [details]
dnf.log in response to comment 45

Comment 48 Jonathan Baron 2015-06-30 02:04:38 UTC

Created attachment 1044522 [details]
selection from /var/log/messages in response to comment 45

Note You need to log in before you can comment on or make changes to this bug.

awilliam
bebo.sudo
bugzilla
danofsatx
darakus
dgay
germano.massullo
jonathanbaron7
jsedlak
juliux.pigface
langdon
lbrabec
mattdm
mjuszkie
pschindl
robatino
satellitgo
tflink
wwoods
zbyszek