Bug 1674045 - Upgrade from F29 to Rawhide (F30) hangs at end, showing "Upgrade complete! Cleaning up and rebooting..."
Summary: Upgrade from F29 to Rawhide (F30) hangs at end, showing "Upgrade complete! Cl...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Fedora
Classification: Fedora
Component: systemd
Version: rawhide
Hardware: x86_64
OS: Linux
unspecified
high
Target Milestone: ---
Assignee: systemd-maint
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard: openqa AcceptedBlocker https://fedora...
: 1680042 1685455 1692617 (view as bug list)
Depends On:
Blocks: F30FinalBlocker
TreeView+ depends on / blocked
 
Reported: 2019-02-08 20:57 UTC by Adam Williamson
Modified: 2019-04-11 02:14 UTC (History)
19 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2019-04-11 02:14:14 UTC


Attachments (Terms of Use)
screenshot from installer with no text in "ad" (89.52 KB, image/png)
2019-02-10 18:24 UTC, Zbigniew Jędrzejewski-Szmek
no flags Details


Links
System ID Priority Status Summary Last Updated
Red Hat Bugzilla 1674281 None CLOSED Fonts are not rendering for SVG files used as banners at the bottom of the installer 2019-08-05 09:11:56 UTC

Description Adam Williamson 2019-02-08 20:57:43 UTC
openQA F29 -> Rawhide (F30) upgrade tests are often failing at present. The upgrade process hangs at the end, showing the message "Upgrade complete! Cleaning up and rebooting...", e.g.:

https://openqa.fedoraproject.org/tests/350591#step/_graphical_wait_login/24

I have reproduced this manually in a VM with an upgrade from Fedora 29 x86_64 Server network install to current Rawhide. If you force a reboot at this point, the upgraded system appears to boot fine.

If you then look at the logs from the upgrade boot, the end looks like this:

Feb 08 12:13:11 localhost.localdomain dnf[721]: Complete!
Feb 08 12:13:11 localhost.localdomain python3[721]: Upgrade complete! Cleaning up and rebooting...
Feb 08 12:13:11 localhost.localdomain dnf[721]: Cleaning up downloaded data...
Feb 08 12:13:11 localhost.localdomain audit[1]: SERVICE_START pid=1 uid=0 auid=4294967295 ses=4294967295 subj=system_u:system_r:init_t:s0 msg='unit=dnf-system-upgrade comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
Feb 08 12:13:11 localhost.localdomain audit[1]: SERVICE_STOP pid=1 uid=0 auid=4294967295 ses=4294967295 subj=system_u:system_r:init_t:s0 msg='unit=dnf-system-upgrade comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
Feb 08 12:13:11 localhost.localdomain systemd[1]: dnf-system-upgrade.service: Succeeded.
Feb 08 12:13:11 localhost.localdomain systemd[1]: Started System Upgrade using DNF.
Feb 08 12:13:11 localhost.localdomain systemd[1]: Reached target Offline System Update.
Feb 08 12:13:11 localhost.localdomain systemd[1]: Condition check resulted in Remove the Offline System Updates symlink being skipped.
Feb 08 12:13:11 localhost.localdomain systemd[1]: Startup finished in 1.731s (kernel) + 1.676s (initrd) + 5min 57.756s (userspace) = 6min 1.164s.
Feb 08 12:13:36 localhost.localdomain systemd-logind[750]: Failed to get load state of reboot.target: Connection timed out
Feb 08 12:13:36 localhost.localdomain dbus-daemon[748]: [system] Failed to activate service 'org.freedesktop.systemd1': timed out (service_start_timeout=25000ms)

Not sure what the cause is. This appears to affect upgrades of Workstation, KDE and Server, but not upgrade of a minimal install.

Comment 1 David Bauer 2019-02-09 13:15:58 UTC
I have exactly the same issue and behaviour.

Fresh Fedora-Server-netinst-x86_64-29-1.2 install. Then I have done all steps in https://fedoraproject.org/wiki/QA:Testcase_upgrade_dnf_current_server

Feb 09 11:10:46 localhost.localdomain dnf[767]:   systemd-rpm-macros-240-6.gitf02b547.fc30.noarch
Feb 09 11:10:46 localhost.localdomain dnf[767]:   whois-nls-5.4.1-1.fc30.noarch
Feb 09 11:10:46 localhost.localdomain dnf[767]: Fertig.
Feb 09 11:10:46 localhost.localdomain python3[767]: Upgrade complete! Cleaning up and rebooting...
Feb 09 11:10:46 localhost.localdomain dnf[767]: Cleaning up downloaded data...
Feb 09 11:10:47 localhost.localdomain systemd[1]: dnf-system-upgrade.service: Succeeded.
Feb 09 11:10:47 localhost.localdomain systemd[1]: Started System Upgrade using DNF.
Feb 09 11:10:47 localhost.localdomain audit[1]: SERVICE_START pid=1 uid=0 auid=4294967295 ses=4294967295 subj=system_u:system_r:init_t:s0 msg='unit=dnf-system-upgrade comm="systemd" exe="/usr/lib/systemd/system$
Feb 09 11:10:47 localhost.localdomain audit[1]: SERVICE_STOP pid=1 uid=0 auid=4294967295 ses=4294967295 subj=system_u:system_r:init_t:s0 msg='unit=dnf-system-upgrade comm="systemd" exe="/usr/lib/systemd/systemd$
Feb 09 11:10:47 localhost.localdomain systemd[1]: Reached target Offline System Update.
Feb 09 11:10:47 localhost.localdomain systemd[1]: Condition check resulted in Remove the Offline System Updates symlink being skipped.
Feb 09 11:10:47 localhost.localdomain systemd[1]: Startup finished in 2.088s (kernel) + 2.047s (initrd) + 6min 57.670s (userspace) = 7min 1.807s.
Feb 09 11:11:11 localhost.localdomain systemd-logind[773]: Failed to get load state of reboot.target: Connection timed out
Feb 09 11:11:11 localhost.localdomain dbus-daemon[771]: [system] Failed to activate service 'org.freedesktop.systemd1': timed out (service_start_timeout=25000ms)

Comment 2 Adam Williamson 2019-02-09 15:07:38 UTC
I'm gonna propose this as a Final blocker under Beta criterion "For each one of the release-blocking package sets, it must be possible to successfully complete a direct upgrade from a fully updated, clean default installation of each of the last two stable Fedora releases with that package set installed" - https://fedoraproject.org/wiki/Fedora_30_Beta_Release_Criteria#Upgrade_requirements . Reasoning: for Beta it's probably OK to tell people "oh if it gets stuck there just hit the reset button", but for Final that seems a bad story to be trying to sell.

Comment 3 Zbigniew Jędrzejewski-Szmek 2019-02-10 18:24:36 UTC
Created attachment 1528791 [details]
screenshot from installer with no text in "ad"

I'm testing this now.
BTW, is something off with the "ads" during installation? I see no text.

Comment 4 Zbigniew Jędrzejewski-Szmek 2019-02-10 19:45:29 UTC
We have three mechanisms which should trigger reboot:
1. the upgrade service should do it after the installation is finished
2. after the update is done, system-update-cleanup.service will remove the symlink and reboot the machine
3. dnf-system-upgrade.service has an internal dnf-system-upgrade-cleanup.service which does an unconditional reboot

Re 1: it seems dnf-system-upgrade fails to reboot, but still exits with success error code.
So we have two bugs here: one is that the reboot request fails (no idea why yet), and
the other is that the return code is wrong. The error is actually from systemd-logind,
so it's possible that d-s-u makes the request correctly, and systemd-logind just fails
to execute it. In that case, the return code from d-s-u might be OK.

I see "dbus-daemon: ... Failed to activate service 'org.freedesktop.systemd1", which
looks like dbus-daemon is trying to do activation on its own which would be very
wrong. So the issue might be that something is starting dbus-daemon in a wrong mode.
Maybe it's something to do with the handover from dbus-daemon to dbus-broker?

Re 2: system-update-cleanup.service has ConditionPathExists=|/system-update, because
it was mostly designed to prevent a reboot-into-system-update loop when the
upgrade service is broken. We *could* remove this condition, but then 
system-update-cleanup.service could step on the toes of the upgrade service
by scheduling a reboot. The upgrade service could for example schedule a shutdown
not a reboot, or a reboot with some special parameters, so scheduling a reboot
unconditionally from system-update-cleanup.service seems to limit some possible
uses of system-update.target quite a bit. So I don't think we want to do this.

Re 3: d-s-u-cleanup.service is only triggered if d-s-u.service fails, so it's not
relevant here.

Comment 5 Zbigniew Jędrzejewski-Szmek 2019-02-10 22:04:34 UTC
'systemctl reboot' also waits a long time, but then reboots.
'busctl call org.freedesktop.login1 /org/freedesktop/login1 org.freedesktop.login1.Manager Reboot "b" false' times out.
'systemctl restart systemd-logind' also times out, and there's a job active: systemd-logind.service/start.
But 'systemctl status systemd-logind' shows Stutus: "Processing requests".
This means that systemd received READY=1 from logind, but for some reason did not mark it as started.
So right now this looks as a bug systemd itself.

Comment 6 Daniel Mach 2019-02-11 12:49:26 UTC
Reassigning to systemd based on comment#5.
Also the original report and comment#1 indicate that the problem is probably elsewhere than in DNF:
Feb 08 12:13:36 localhost.localdomain systemd-logind[750]: Failed to get load state of reboot.target: Connection timed out
Feb 08 12:13:36 localhost.localdomain dbus-daemon[748]: [system] Failed to activate service 'org.freedesktop.systemd1': timed out (service_start_timeout=25000ms)

Comment 7 František Zatloukal 2019-02-12 14:29:06 UTC
Discussed during the 2019-02-11 blocker review meeting: [1]

The decision to classify this bug as an AcceptedBlocker was made:

"accepted as violation of "For each one of the release-blocking package sets, it must be possible to successfully complete a direct upgrade from a fully updated, clean default installation of each of the last two stable Fedora releases with that package set installed", reasoning per comment #2"

[1] https://meetbot-raw.fedoraproject.org/fedora-blocker-review/2019-02-11/f30-blocker-review.2019-02-11-17.13.log.txt

Comment 8 Adam Williamson 2019-02-22 16:44:05 UTC
*** Bug 1680042 has been marked as a duplicate of this bug. ***

Comment 9 Juha Tuomala 2019-03-23 10:40:47 UTC
Last night upgraded system from f29 -> f30 and got bitten by this. Now system doesn't boot, gives only:

  [FAILED] Failed to start Switch Root.

Comment 10 Adam Williamson 2019-03-23 14:06:09 UTC
That's almost certainly a different bug; no-one else has reported an actual system issue caused by this bug, it's purely a failure to reboot automatically.

Comment 11 Juha Tuomala 2019-03-23 15:15:20 UTC
At every restart it ends up into 

grub> 

and that's it.

Comment 12 Adam Williamson 2019-03-23 18:42:10 UTC
that may possibly be https://bugzilla.redhat.com/show_bug.cgi?id=1661955 , although I think that was supposed to be fixed...

Comment 13 Paul DeStefano 2019-03-23 19:11:18 UTC
Yeah, Adam, I did get beyond that issue and I think it is fixed...unless Juha's found a different code path that leads to the same symptom.

Juha, I thought that bug got fixed.  But, that doesn't help you much.

I'm now stuck at bug 1686059, which is closed, but I haven't had a chance to test the solution myself.  But, that also doesn't help you.

FYI, I now have two test VMs that I set up like my two main computers, just so that I can test upgrades *every week* because I have been burned so many times, recently, by upgrade problems.  I've been using VM testing much longer for system-upgrades, though, and I highly recommend that.  Again, that doesn't help you, but maybe next time?

Comment 14 Juha Tuomala 2019-03-23 20:05:10 UTC
(In reply to Paul DeStefano from comment #13)
> Juha, I thought that bug got fixed.  But, that doesn't help you much.
> 
> I'm now stuck at bug 1686059, which is closed, but I haven't had a chance to
> test the solution myself.  But, that also doesn't help you.

My symptoms are the same as people report there, grub.cfg has no Fedora/Linux entries, that's why it pops the shell open. Something went wrong in dnf system-upgrade final stage.

Comment 15 Adam Williamson 2019-03-25 15:16:22 UTC
Juha, can you please check the versions of the relevant packages on each end of the upgrade, and comment on the bug if your upgrade used the versions which were supposed to be fixed but still failed? Thanks!

Comment 16 Juha Tuomala 2019-03-28 21:56:07 UTC
(In reply to Adam Williamson from comment #15)
> Juha, can you please check the versions of the relevant packages 

I don't know which are the pkgs.

> on each end of the upgrade, 

Each end? This is my workstation that I use daily basis. I needed to upgrade to test bug #1634658, that 'other end' is gone.

> and comment on the bug if your upgrade used the versions
> which were supposed to be fixed but still failed? Thanks!

There were other issues too, kdm failed to start since sddm.kcm was not installed and systemd targets / runlevels were screwed somehow. Not sure how it's supposed to be but I got screen up for now.

Comment 17 Chris Murphy 2019-04-01 16:31:27 UTC
*** Bug 1692617 has been marked as a duplicate of this bug. ***

Comment 18 Yann Droneaud 2019-04-02 15:02:06 UTC
*** Bug 1685455 has been marked as a duplicate of this bug. ***

Comment 19 Zbigniew Jędrzejewski-Szmek 2019-04-09 14:01:03 UTC
It seems something goes wrong in the triangle dbus-daemon — dbus-broker — systemd. I'm upgrading
those packages to F30 individually.

c='sudo dnf upgrade --releasever=30 --setopt=module_platform_id=platform:f30 -y'
First "$c systemd": no problem.
Second "$c dbus-daemon": hangs!

Just "$c dbus-daemon": also hangs.

"$c fedora-repos": OK.
"$c dbus-daemon": fails.

"$c fedora-repos": OK
"$c dbus-broker": fails

My hypothesis at this point is that when dbus-broker changes dbus.{socket,service} to point to its
units, systemd checks if dbus.{socket,service} are running, realizes that they are not,
and closes the dbus connection.

Comment 20 Zbigniew Jędrzejewski-Szmek 2019-04-09 14:07:48 UTC
Yep:
# kill -SIGUSR1 1
# journalctl -f
Apr 09 16:03:32 fedora systemd[1]: Received SIGUSR1 from PID 1138 (bash).
Apr 09 16:03:32 fedora systemd[1]: Starting D-Bus service...
Apr 09 16:03:32 fedora systemd[1]: Starting D-Bus System Message Bus...
Apr 09 16:03:32 fedora dbus-broker-launch[2873]: Looking up NSS user entry for 'gdm'...
Apr 09 16:03:42 fedora dbus-broker-launch[2873]: NSS returned no entry for 'gdm'
Apr 09 16:03:42 fedora systemd[1]: Started D-Bus System Message Bus.
Apr 09 16:03:42 fedora audit[1]: SERVICE_START pid=1 uid=0 auid=4294967295 ses=4294967295 subj=system_u:system_r:init_t:s0 msg='unit=dbus-broker comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
Apr 09 16:03:42 fedora dbus-daemon[789]: [system] Successfully activated service 'org.freedesktop.systemd1'

So systemd things dbus was not running, and then we have both dbus-daemon
and dbus-broker active, and surprisingly, things seem to work again.
I think we need to handle this switcheroo with dbus.{socket,service} better somehow in systemd.

Comment 21 Fedora Update System 2019-04-09 16:19:20 UTC
dbus-broker-19-2.fc30 has been submitted as an update to Fedora 30. https://bodhi.fedoraproject.org/updates/FEDORA-2019-be8e006363

Comment 22 Fedora Update System 2019-04-10 14:37:58 UTC
dbus-broker-19-2.fc30 has been pushed to the Fedora 30 testing repository. If problems still persist, please make note of it in this bug report.
See https://fedoraproject.org/wiki/QA:Updates_Testing for
instructions on how to install test updates.
You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2019-be8e006363

Comment 23 Fedora Update System 2019-04-11 02:14:14 UTC
dbus-broker-19-2.fc30 has been pushed to the Fedora 30 stable repository. If problems still persist, please make note of it in this bug report.


Note You need to log in before you can comment on or make changes to this bug.