Bug 957783 - fedup from F18 -> F19 hangs at reboot
Summary: fedup from F18 -> F19 hangs at reboot
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Fedora
Classification: Fedora
Component: systemd
Version: 19
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
Assignee: systemd-maint
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2013-04-29 14:14 UTC by Will Woods
Modified: 2013-11-13 02:24 UTC (History)
25 users (show)

Fixed In Version: systemd-201-2.fc18.9
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2013-11-13 02:24:45 UTC
Type: Bug


Attachments (Terms of Use)
complete console output from fedup run (350.59 KB, text/plain)
2013-04-29 14:43 UTC, Will Woods
no flags Details
fedup-0.7.3-3.fc18.noarch screenshot at "freeze" (17.54 KB, image/png)
2013-04-30 07:12 UTC, Robert Lightfoot
no flags Details
`systemctl --full --all` output (16.08 KB, text/plain)
2013-04-30 19:59 UTC, Will Woods
no flags Details
systemctl --full output (5.36 KB, text/plain)
2013-04-30 20:11 UTC, Will Woods
no flags Details
systemd core dump (13.24 MB, application/octet-stream)
2013-04-30 21:46 UTC, Will Woods
no flags Details
'bt full' output (5.28 KB, text/plain)
2013-04-30 22:19 UTC, Will Woods
no flags Details
theute fedup.log (1014.07 KB, text/plain)
2013-07-02 13:52 UTC, Thomas Heute
no flags Details
theute upgrade.log (585.64 KB, text/x-log)
2013-07-02 13:53 UTC, Thomas Heute
no flags Details

Description Will Woods 2013-04-29 14:14:26 UTC
At the end of an upgrade from F18 to F19, the system will hang when it should reboot.

(Note: fedup runs from inside an initramfs, with the user's system mounted at /sysroot.)

fedup-dracut's upgrade-post.service has:

  ExecStopPost=-/usr/bin/systemctl --no-block reboot

Running with systemd.log_level=debug systemd.log_target=console systemd.journald.forward_to_console=true reveals the following:

* The reboot command runs as expected.

* mounts all get unmounted, except:

  sysroot.mount mount process exited, code=exited status=32
  sysroot.mount changed unmounting -> mounted
  Job sysroot.mount/stop finished, result=failed

* Following that, umount.target and final.target are reached:

  umount.target changed dead -> active
  Job umount.target/start finished, result=done
  final.target changed dead -> active
  Job final.target/start finished, result=done

And this is where it seems to be stuck.

There's a bunch of noise about D-Bus requests, some mounts get GC'd, and this is the last thing that systemd emits:

  upgrade-switch-root.service stopping timed out. Killing.
  upgrade-switch-root.service changed stop-sigterm -> stop-sigkill
  upgrade-switch-root.service still around after SIGKILL. Ignoring.
  upgrade-switch-root.service changed stop-sigkill -> final-sigterm
  upgrade-switch-root.service stopping timed out (2). Killing.
  upgrade-switch-root.service changed final-sigterm -> failed
  Job upgrade-switch-root.service/stop finished, result=done
  Unit upgrade-switch-root.service entered failed state

systemd is still running here - Ctrl-Alt-Del yields:

  Received SIGINT.
  Activating special unit ctrl-alt-del.target
  Trying to enqueue job reboot.target/start/replace
  Merged into installed job reboot.target/start as 375
  Merged into installed job systemd-reboot.service/start as 376
  Merged into installed job plymouth-reboot.service/start as 447
  Enqueued job reboot.target/start as 375

So I guess systemd-reboot.service is still waiting for something, so the 'systemd-shutdown' binary never runs, which would unmount /sysroot and reboot.

upgrade-switch-root.service is the service that switches from the user's system to the fedup initramfs:

  [Service]
  Type=oneshot
  EnvironmentFile=/run/initramfs/upgrade.conf
  ExecStart=-/usr/bin/systemctl --no-block --force switch-root ${UPGRADEROOT}
  KillMode=none

I'm not sure how/if it's still running, or what process systemd is trying to kill.

Comment 1 Will Woods 2013-04-29 14:43:24 UTC
Created attachment 741516 [details]
complete console output from fedup run

Comment 2 Robert Lightfoot 2013-04-29 22:50:07 UTC
Tested fedup-0.7.3-3.fc18.noarch from koji build today with Minimal and Gnome Installs of i686 and x86_64 using both qemu/kvm and virtualbox.  The need to force power off still exists in fedup-0.7.3-3.fc18.noarch.  Otherwise a successful F18 --> f19 upgrade was achieved for both arches with fedup.

Comment 3 Robert Lightfoot 2013-04-30 07:12:33 UTC
Created attachment 741785 [details]
fedup-0.7.3-3.fc18.noarch screenshot at "freeze"

Comment 4 Will Woods 2013-04-30 19:58:13 UTC
Adding 'rd.upgrade.debugshell' (enables something like systemd-debug-shell.service) lets me debug this further.

shutdown.target, umount.target, final.target are all 'active'.
reboot.target and systemd-reboot.service are dead but 'start'-ing.

'systemctl dump' causes systemd to segfault and dump core:
[   70.625258] systemd[1]: segfault at 8 ip 00007f4f99e9e205 sp 00007fff4a230b40 error 4 in systemd[7f4f99e66000+f3000]
Caught <SEGV>, dumped core as pid 652.
Freezing execution.

Comment 5 Will Woods 2013-04-30 19:59:55 UTC
Created attachment 741966 [details]
`systemctl --full --all` output

Comment 6 Will Woods 2013-04-30 20:11:38 UTC
Created attachment 741967 [details]
systemctl --full output

This is just 'systemctl --full', which is much easier to read.

Here's the short version:
UNIT                          LOAD   ACTIVE       SUB          JOB  
-.mount                       loaded active       mounted      stop 
mnt.mount                     loaded active       mounted           
sysroot-proc.mount            loaded active       mounted           
sysroot.mount                 loaded active       mounted           
initrd-switch-root.service    loaded deactivating stop-sigterm stop 
plymouth-reboot.service       loaded inactive     dead         start
plymouth-start.service        loaded active       running           
systemd-modules-load.service  loaded failed       failed            
systemd-reboot.service        loaded inactive     dead         start
systemd-udev-trigger.service  loaded active       exited            
systemd-udevd.service         loaded active       running           
upgrade-debug-shell.service   loaded active       running           
upgrade-switch-root.service   error  deactivating stop-sigkill stop
systemd-journald.socket       loaded active       listening    stop 
systemd-udevd-kernel.socket   loaded active       listening         
final.target                  loaded active       active            
reboot.target                 loaded inactive     dead         start
shutdown.target               loaded active       active            
umount.target                 loaded active       active            

Note that 'systemctl --force --no-block reboot' does cause the system to reboot as expected.

Comment 7 Zbigniew Jędrzejewski-Szmek 2013-04-30 20:51:29 UTC
Any chance you could post the bt from the coredump?

Comment 8 Will Woods 2013-04-30 21:46:05 UTC
Created attachment 742001 [details]
systemd core dump

Comment 9 Will Woods 2013-04-30 22:19:55 UTC
Created attachment 742026 [details]
'bt full' output

Core was generated by `/usr/lib/systemd/systemd --switched-root --system --deserialize 25'.
Program terminated with signal 11, Segmentation fault.
#0  0x00007fabe4966e7b in raise () from /lib64/libpthread.so.0
#0  0x00007fabe4966e7b in raise () from /lib64/libpthread.so.0
#1  0x00007fabe64a42f4 in crash (sig=11) at src/core/main.c:140
#2  <signal handler called>
#3  mount_dump (u=0x7fabe7edc920, f=0x7fabe7ee8f40, prefix=0x7fabe7ca3d00 "\t") at src/core/mount.c:828
#4  0x00007fabe650f840 in unit_dump (u=0x7fabe7edc920, f=f@entry=0x7fabe7ee8f40, prefix=0x7fabe654d12d "", prefix@entry=0x0) at src/core/unit.c:763
#5  0x00007fabe64a6c06 in manager_dump_units (s=s@entry=0x7fabe7e47650, f=f@entry=0x7fabe7ee8f40, prefix=prefix@entry=0x0) at src/core/manager.c:1108
#6  0x00007fabe64d8337 in bus_manager_message_handler (connection=0x7fabe7f5de40, message=0x7fabe7f5f620, data=0x7fabe7e47650) at src/core/dbus-manager.c:1231
#7  0x00007fabe4faee66 in _dbus_object_tree_dispatch_and_unlock () from /lib64/libdbus-1.so.3
#8  0x00007fabe4fa1a31 in dbus_connection_dispatch () from /lib64/libdbus-1.so.3
#9  0x00007fabe64d549a in bus_dispatch (m=m@entry=0x7fabe7e47650) at src/core/dbus.c:528
#10 0x00007fabe64a9c30 in manager_loop (m=0x7fabe7e47650) at src/core/manager.c:1743
#11 0x00007fabe64a1c55 in main (argc=5, argv=0x7fff5cbbe798) at src/core/main.c:1764

'bt full' is attached.

Comment 10 Will Woods 2013-05-01 17:48:12 UTC
I'm not 100% sure that the crash and the hang are the same problem, but:

The crash occurs in mount_dump() for /sys/kernel/config.

m->parameters_proc_self_mountinfo is filled in, but m->from_proc_self_mountinfo is false (as is m->from_fragment). So get_mount_parameters() returns p=NULL,
and the subsequent 'p->what' is a NULL dereference.

Not sure how this happens, but the debugging continues...

Comment 11 Brent Baude 2013-05-21 15:43:17 UTC
Seeing same behavior consistently on ppc64 between f18 and f19.  I think this is a beta blocker right?

Comment 13 Adam Williamson 2013-05-23 14:36:23 UTC
if the upgrade basically works and the upgraded system is okay, then I'd say not a blocker.

Comment 14 Kamil Páral 2013-05-23 15:25:06 UTC
I agree this is inconvenient, but doesn't have to block Beta release. If this is rejected for Beta, please re-propose as Final blocker.

Comment 15 Jóhann B. Guðmundsson 2013-05-23 15:43:17 UTC
-1 blocker +1 FE and Final blocker

Comment 16 Adam Williamson 2013-05-23 17:34:57 UTC
Karsten: note that the key determining factor in whether a bug is a blocker is the criteria, not the test cases. A failure of a validation test case isn't automatically and invariably a blocker; you can get a 'fail' result for a test case but the problem can not violate the criteria. You can call these 'non-blocking failures' or something. We try to make the criteria and test cases line up as closely as possible, but it's never perfect.

Discussed at 2013-05-23 Fedora 19 Beta Go/No-Go meeting, acting as a blocker review meeting: http://meetbot.fedoraproject.org/fedora-meeting-2/2013-05-23/f19_beta_gono-go_meeting.2013-05-23-17.01.log.txt . As this bug does not appear to make the actual upgrade process fail - the upgrade succeeds, and the upgraded system works, you just get an ugly crash in the middle - this does not violate the criterion "For each one of the release-blocking package sets, it must be possible to successfully complete an upgrade from a fully updated installation of the previous stable Fedora release with that package set installed." and hence is not a blocker.

Comment 17 Adam Williamson 2013-05-28 00:56:26 UTC
clearing rejectedblocker as this may be proposed as blocker for final.

Comment 18 Adam Williamson 2013-07-01 03:08:15 UTC
Is this still the case with Final? I don't recall seeing it in my Final upgrade tests.

Comment 19 Phil Knirsch 2013-07-01 12:07:26 UTC
I've done 2 fedup tests using RC2 on ppc64 (Gnome & minimal install F18) and both succeeded without a problem and rebooted automatically.

So i'm not sure if this got automagically fixed by some updates between the report and RC2.

Just my $0.02.

Thanks & regards, Phil

Comment 20 Adam Williamson 2013-07-01 23:06:33 UTC
OK. Since, as I said, others tested and no-one seems to see this any more, let's figure that it's fixed unless other reports come in.

Comment 21 Kamil Páral 2013-07-02 08:54:59 UTC
I have done a few upgrades and I haven't seen this either.

Comment 22 Thomas Heute 2013-07-02 11:17:47 UTC
I just experienced it by upgrading to F19 with fedup.

Same as https://bugzilla.redhat.com/show_bug.cgi?id=957783#c3

The upgrade was stuck, pressed escape and I got the message: "Reached target shutdown". I pressed the reset button of my computer and it booted correctly.

People may wait long unless they know about the "escape" key to see the logs.

Comment 23 Kamil Páral 2013-07-02 11:38:14 UTC
Thomas, how long have you waited?

Comment 24 Thomas Heute 2013-07-02 11:40:45 UTC
For several minutes there was 0 activity for a while (no disk spinning) and the message was displayed, time to search the web on a different device, bump into this issue and pressed reset. I would say at least 4-5 minutes in total

Comment 25 Kamil Páral 2013-07-02 12:21:36 UTC
Reopening then. Thomas, can you provide your logs? They should be at /var/log/fedup.log or somewhere around (maybe also upgrade.log?).

Comment 26 Thomas Heute 2013-07-02 13:52:55 UTC
Created attachment 767729 [details]
theute fedup.log

Comment 27 Thomas Heute 2013-07-02 13:53:42 UTC
Created attachment 767730 [details]
theute upgrade.log

Comment 28 Richard Keech 2013-07-02 23:58:35 UTC
I'm seeing this problem too.

I'm upgrading F18->F19 on 32-bit on an HP Mini.

Initial symptoms are a hang on first re-boot into upgrade boot with the progress bar about 20% across.

After disabling plymouth, and turning off 'rhgb' and 'quiet', I can see that the boot process is stalling at the point:

  systemd-journald[56]: Received SIGTERM

  Welcome to Fedora 18 (Spherical Cow)!

  systemd-cgroups-agent[394]: Failed to get D-Bus connection: Failed to connect to socket /org/freedesktop/systemd1/private: connection refused

Comment 29 Adam Williamson 2013-07-03 00:05:35 UTC
That is not this bug. Please file it separately. This bug is "At the end of an upgrade from F18 to F19", i.e., not a failure to boot half way through.

Please, if your symptom does not precisely match that in the first comment, file your bug separately.

Comment 30 Richard Keech 2013-07-03 01:09:13 UTC
OK thanks Adam,  I've opened this as 980663.

Comment 31 Thomas Heute 2013-07-03 12:37:08 UTC
I am in the process to upgrade another machine with similar environment, anything you would like me to do/check if that happens again that would help you ?

Comment 32 Adam Williamson 2013-07-03 19:06:38 UTC
Will, Harald? Running fedup with --debuglog couldn't hurt, I guess.

Comment 33 Will Woods 2013-07-03 21:50:17 UTC
See comment #1 through comment #4 for debugging info. If 'systemctl dump' in the debug shell makes systemd segfault and/or systemd is mysteriously stuck at 'final.target' then this is the same bug.

If systemd isn't stuck / crashing then you have a different bug; please close this and file a new bug.

If it is stuck, then this bug is still in the same place it was before. I lack the time and expertise to debug systemd any deeper than I already have, but the backtrace and core dump should help.

Comment 34 Thomas Heute 2013-07-04 08:39:00 UTC
It didn't fail on the other machine

Comment 35 Harald Hoyer 2013-07-04 09:14:26 UTC
Some debug commands to issue before rebooting.

# systemctl status upgrade-switch-root.service
# systemctl show upgrade-switch-root.service
# systemctl show sys-kernel-config.mount

Comment 36 Harald Hoyer 2013-07-04 10:02:46 UTC
(In reply to Will Woods from comment #9)
> Created attachment 742026 [details]
> 'bt full' output
> 
> Core was generated by `/usr/lib/systemd/systemd --switched-root --system
> --deserialize 25'.
> Program terminated with signal 11, Segmentation fault.
> #0  0x00007fabe4966e7b in raise () from /lib64/libpthread.so.0
> #0  0x00007fabe4966e7b in raise () from /lib64/libpthread.so.0
> #1  0x00007fabe64a42f4 in crash (sig=11) at src/core/main.c:140
> #2  <signal handler called>
> #3  mount_dump (u=0x7fabe7edc920, f=0x7fabe7ee8f40, prefix=0x7fabe7ca3d00
> "\t") at src/core/mount.c:828
> #4  0x00007fabe650f840 in unit_dump (u=0x7fabe7edc920,
> f=f@entry=0x7fabe7ee8f40, prefix=0x7fabe654d12d "", prefix@entry=0x0) at
> src/core/unit.c:763
> #5  0x00007fabe64a6c06 in manager_dump_units (s=s@entry=0x7fabe7e47650,
> f=f@entry=0x7fabe7ee8f40, prefix=prefix@entry=0x0) at src/core/manager.c:1108
> #6  0x00007fabe64d8337 in bus_manager_message_handler
> (connection=0x7fabe7f5de40, message=0x7fabe7f5f620, data=0x7fabe7e47650) at
> src/core/dbus-manager.c:1231
> #7  0x00007fabe4faee66 in _dbus_object_tree_dispatch_and_unlock () from
> /lib64/libdbus-1.so.3
> #8  0x00007fabe4fa1a31 in dbus_connection_dispatch () from
> /lib64/libdbus-1.so.3
> #9  0x00007fabe64d549a in bus_dispatch (m=m@entry=0x7fabe7e47650) at
> src/core/dbus.c:528
> #10 0x00007fabe64a9c30 in manager_loop (m=0x7fabe7e47650) at
> src/core/manager.c:1743
> #11 0x00007fabe64a1c55 in main (argc=5, argv=0x7fff5cbbe798) at
> src/core/main.c:1764
> 
> 'bt full' is attached.

commit 1e4fc9b

Comment 37 Jeff Chillemi 2013-08-19 03:30:59 UTC
I found my way to this page after having this same problem upon upgrading from f17 to f19 using FedUp.  I was still running f16, so upgraded to f17 first with Preupgrade before using FedUp.

The machine froze with the Fedora logo and barely full progress bar.  I hit the escape key (would not have known that without coming to this page) and it dropped me to the text screen, which ended with the "Reached target shutdown" message.

Ctrl-Alt-Del had no effect, neither did Ctrl-Alt-F# to try and switch shells.

After powering down the laptop with the power button and rebooting, everything seemed to be fine.

I'll be happy to provide the upgrade logs or any other info if it would help.

Jeff

Comment 38 The Old Man 2013-09-10 13:48:38 UTC
Yep, same here.

Upgrading from F17 to F19 with FedUp, upgrading ends (monitored in console) at:

[OK] Reached target Shutdown.

Hitting [esc] key gives Fedora logo with progress bar -> changing in Beefy Miracle picture.
Hitting it again gives fast scrolling screen of previous upgrading messages ending in  [OK] Reached target Shutdown.

Reboot after 10 minutes by turning off/on machine:
Starts till GDM
Loggin in: "Something went wrong, please log out"

Emptying one of the users home dir gave a perfect login for that user.
Must be a config thingie of F17, working on it..

Comment 39 The Old Man 2013-09-10 14:24:09 UTC
Log in problem was the same as bug 979946

Comment 40 Philip 2013-10-12 13:03:55 UTC
I've used fedup (F18 -> F19) three times now (on different computers) and every time it froze when trying to reboot during the upgrade.
Much like in this attached screenshot, the last line on the screen was "Reached target Shutdown.":
https://bugzilla.redhat.com/attachment.cgi?id=741785

It seems like a power off / power on can't do any harm (since the filesystems are already unmounted at this point), but it's still a bug.

I do have a snapshot of the old state (F18), so I could theoretically rerun the upgrade if there's something to try.

Comment 41 Fedora Update System 2013-10-29 16:05:22 UTC
systemd-201-2.fc18.9 has been submitted as an update for Fedora 18.
https://admin.fedoraproject.org/updates/systemd-201-2.fc18.9

Comment 42 Fedora Update System 2013-10-30 01:53:06 UTC
Package systemd-201-2.fc18.9:
* should fix your issue,
* was pushed to the Fedora 18 testing repository,
* should be available at your local mirror within two days.
Update it with:
# su -c 'yum update --enablerepo=updates-testing systemd-201-2.fc18.9'
as soon as you are able to.
Please go to the following url:
https://admin.fedoraproject.org/updates/FEDORA-2013-20267/systemd-201-2.fc18.9
then log in and leave karma (feedback).

Comment 43 Fedora Update System 2013-11-13 02:24:45 UTC
systemd-201-2.fc18.9 has been pushed to the Fedora 18 stable repository.  If problems still persist, please make note of it in this bug report.


Note You need to log in before you can comment on or make changes to this bug.