Bug 1564462

Summary: offline update performed unclean shutdown
Product: [Fedora] Fedora Reporter: Alan Jenkins <alan.christopher.jenkins>
Component: PackageKitAssignee: Richard Hughes <rhughes>
Status: CLOSED EOL QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 27CC: jonathan, klember, rdieter, rhughes, smparrish, zbyszek
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2018-11-30 20:51:10 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
proof of fsck
none
journal showing offline update and forced shutdown none

Description Alan Jenkins 2018-04-06 11:05:49 UTC
Created attachment 1418057 [details]
proof of fsck

Description of problem:

Something just caused a force shutdown (systemctl poweroff --force), right at the end of the offline update.

This actually *isn't* supposed to cause an unclean unmount, however for whatever reason I noticed it because all my filesystems had to be checked on startup.  See attached.


Version-Release number of selected component (if applicable):
PackageKit-1.1.8-1.fc27.x86_64

How reproducible: don't know

Steps:
1. Offline update, by ticking the update tickbox when I shut down in Gnome.

Actual results:
Force shutdown, plus unclean shutdown

Expected results:
Offline update should normally end with an orderly shutdown (or reboot).  I don't think we can justify using a forced shutdown when we're updating a bunch of critical system files.

(And if there's some strange error condition that made it necessary, that should have been logged).


Additional info:

Notice the lack of "Reached target Unmount All Filesystems.", etc in the log.


Apr 06 09:46:01.908746 alan-laptop PackageKit[797]: update-packages transaction /17001_edadbdbc from uid 0 finished with success after 112114ms
Apr 06 09:46:02.067544 alan-laptop pk-offline-update[770]: shutting down
Apr 06 09:46:02.069017 alan-laptop pk-offline-update[770]: sent mode to plymouth 'shutdown'
Apr 06 09:46:02.070544 alan-laptop pk-offline-update[770]: sent msg to plymouth 'Shutting down after installing updates…'
Apr 06 09:46:02.072926 alan-laptop systemd[1]: Shutting down.
Apr 06 09:46:03.579270 alan-laptop systemd-shutdown[1]: Sending SIGTERM to remaining processes...
Apr 06 09:46:03.580224 alan-laptop bluetoothd[774]: Terminating
Apr 06 09:46:03.580303 alan-laptop alsactl[772]: alsactl daemon stopped
Apr 06 09:46:03.580586 alan-laptop bluetoothd[774]: Disconnected from D-Bus. Exiting.
Apr 06 09:46:03.580996 alan-laptop bluetoothd[774]: Stopping SDP server
Apr 06 09:46:03.581022 alan-laptop bluetoothd[774]: Exit
Apr 06 09:46:03.581470 alan-laptop PackageKit[797]: daemon quit
Apr 06 09:46:03.619684 alan-laptop systemd-journald[563]: Journal stopped

Comment 1 Alan Jenkins 2018-04-06 11:06:28 UTC
Created attachment 1418058 [details]
journal showing offline update and forced shutdown

Comment 2 Alan Jenkins 2018-04-06 11:22:02 UTC
Checking historical logs, it looks like this has always been the behaviour, so is fully reproducible.

I only noticed it, because of the unknown system failure that caused an unclean unmount.

Note that one advantage of an orderly shutdown is that I would have had logs for the unmount of every FS other than /, and /var.  I.e. if there was some mysterious reason that my /boot filesystem (/dev/sda6) was in use and could not be unmounted, I would have a failure logged at that unmount step.

Orderly shutdowns give you better logs, and they're more widely used i.e. better tested.

Comment 3 Alan Jenkins 2018-04-06 11:38:05 UTC
See also https://www.freedesktop.org/software/systemd/man/systemd.offline-updates.html

"After completion (regardless whether the update succeeded or failed) the machine must be rebooted, for example by calling systemctl reboot." i.e. not `systemctl reboot --force`.

I regret to inform you that this the wrong dbus call to achieve this aim :)


	val = g_dbus_connection_call_sync (connection,
					   "org.freedesktop.systemd1",
					   "/org/freedesktop/systemd1",
					   "org.freedesktop.systemd1.Manager",
					   "Reboot",
					   NULL,
					   NULL,
					   G_DBUS_CALL_FLAGS_NONE,
					   -1,
					   NULL,
					   &error);


What you need to do is call the StartUnit method instead, passing "reboot.target" as the `unit`, and "replace-irreversibly" as the `mode`.

I don't think any extra error handling is necessary, you don't want to wait for the `job` that StartUnit returns so you can just ignore that.  And I think the existing error handling looks ok; just return to systemd, and as the offline-updates doc says, the units are set up so that systemd will reboot anyway.

(I think this means a failure to *power off* the machine instead falls back to a reboot.  Not a new thing, and I'm not worrying about it).

Comment 4 Alan Jenkins 2018-04-06 11:51:50 UTC
Note, although `man systemd.special` points to the Reboot method on "org.freedesktop.login1.Manager", I think systemd-logind is not currently started during offline updates.  (system-update.target pulls in sysinit.target, but I think not basic.target, which is what normally pulls in systemd-logind).  And systemd-logind is not bus-activated.

My instinct is to stick with talking to systemd directly.

Comment 5 Zbigniew Jędrzejewski-Szmek 2018-04-06 16:20:51 UTC
Comment #c3 is accurate: 'systemctl reboot' results in a call to org.freedesktop.systemd1.Manager.StartUnit with replace-irreversibly (see start_unit() called from https://github.com/systemd/systemd/blob/master/src/systemctl/systemctl.c#L3656), and 'systemctl reboot --force' results in a call to org.freedesktop.systemd1.Manager.Reboot.

org.freedesktop.systemd1.Manager.Reboot is unnecessarily harsh, and shutting down normally through StartUnit is much better. 

> My instinct is to stick with talking to systemd directly.

Yes, this seems reasonable. Going through logind is mostly about providing access to reboot for non-root users, which does not apply in this case. It should be fine to talk directly to systemd.

Comment 6 Ben Cotton 2018-11-27 16:18:01 UTC
This message is a reminder that Fedora 27 is nearing its end of life.
On 2018-Nov-30  Fedora will stop maintaining and issuing updates for
Fedora 27. It is Fedora's policy to close all bug reports from releases
that are no longer maintained. At that time this bug will be closed as
EOL if it remains open with a Fedora  'version' of '27'.

Package Maintainer: If you wish for this bug to remain open because you
plan to fix it in a currently maintained version, simply change the 'version' 
to a later Fedora version.

Thank you for reporting this issue and we are sorry that we were not 
able to fix it before Fedora 27 is end of life. If you would still like 
to see this bug fixed and are able to reproduce it against a later version 
of Fedora, you are encouraged  change the 'version' to a later Fedora 
version prior this bug is closed as described in the policy above.

Although we aim to fix as many bugs as possible during every release's 
lifetime, sometimes those efforts are overtaken by events. Often a 
more recent Fedora release includes newer upstream software that fixes 
bugs or makes them obsolete.

Comment 7 Ben Cotton 2018-11-30 20:51:10 UTC
Fedora 27 changed to end-of-life (EOL) status on 2018-11-30. Fedora 27 is
no longer maintained, which means that it will not receive any further
security or bug fix updates. As a result we are closing this bug.

If you can reproduce this bug against a currently maintained version of
Fedora please feel free to reopen this bug against that version. If you
are unable to reopen this bug, please file a new report against the
current release. If you experience problems, please add a comment to this
bug.

Thank you for reporting this bug and we are sorry it could not be fixed.

Comment 8 Alan Jenkins 2021-03-29 09:16:45 UTC
For the record, I confirm this bug is now fixed.

Git commit: https://github.com/hughsie/PackageKit/commit/7f7006a91359
Discussion: https://github.com/hughsie/PackageKit/pull/251