Bug 1350120
Summary: | fwupd causes crash after sudo dnf system-upgrade reboot | ||||||||
---|---|---|---|---|---|---|---|---|---|
Product: | [Fedora] Fedora | Reporter: | Ricky <zhangchaowang> | ||||||
Component: | fwupd | Assignee: | Richard Hughes <rhughes> | ||||||
Status: | CLOSED NOTABUG | QA Contact: | Fedora Extras Quality Assurance <extras-qa> | ||||||
Severity: | unspecified | Docs Contact: | |||||||
Priority: | unspecified | ||||||||
Version: | 24 | CC: | rhughes, wwoods, zbyszek | ||||||
Target Milestone: | --- | ||||||||
Target Release: | --- | ||||||||
Hardware: | Unspecified | ||||||||
OS: | Unspecified | ||||||||
Whiteboard: | |||||||||
Fixed In Version: | Doc Type: | If docs needed, set a value | |||||||
Doc Text: | Story Points: | --- | |||||||
Clone Of: | Environment: | ||||||||
Last Closed: | 2016-06-28 07:38:35 UTC | Type: | Bug | ||||||
Regression: | --- | Mount Type: | --- | ||||||
Documentation: | --- | CRM: | |||||||
Verified Versions: | Category: | --- | |||||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||
Embargoed: | |||||||||
Attachments: |
|
Description
Ricky
2016-06-25 18:22:43 UTC
I uses journalctl -b -0. It doesn't show last boot -- the boot after executing 'sudo dnf system-upgrade reboot' I manually browse boot log in journalctl with the help from time stamp. But I don't find anything fishy. I did see it trigger reboot process in the boot -- umounting /var, /, /home and etc. Could you suggest me how to dig deeper? It is nice to reference python script in https://github.com/rpm-software-management/dnf-plugin-system-upgrade. But I have no clues why it reboots. Could you please suggest me any way to debug it? Created attachment 1172540 [details]
boot log
This is boot log after executing 'sudo dnf system-upgrade reboot'
Created attachment 1172541 [details]
/var/lib/dnf/system-upgrade.json
The issue is resolved. It turns out that I have installed different version of libusb in /usr/local/lib for kinect, which caused fwupd service failed. The failure of fwupd caused system upgrade service terminates. Therefore, dnf-system-upgrade plugin didn't finish. I spent time to read dnf-system-upgrade plugin. I thought run_upgrade function has started because symlink folder got wiped out. But it is actually wiped out by systmed in dnf-system-upgrade.service post exec. I'd suggest that add a new internal state in dnf-system-upgrade plugin to indicate that run_upgrade didn't get launch because of interruption by other system update service. Also, suggest users to dig into boot log, eg, 'journalctl -b -1'. Somehow, running 'dnf system-upgrade log -1' didn't show last boot log. This problem occurs because of other services that use /system-updates but don't implement the behavior correctly. Especially this part: As the first step, the update script should check if the /system-update symlink points to the the location used by that update script. In case it does not exists or points to a different location, the script must exit without error. It is possible for multiple update services to be installed, and for multiple update scripts to be launched in parallel, and only the one that corresponds to the tool that created the symlink before reboot should perform any actions. It is unsafe to run multiple updates in parallel. (from https://www.freedesktop.org/software/systemd/man/systemd.offline-updates.html, item 5 under "Implementing Offline System Updates") So either fwupd isn't implementing this correctly, or the libusb problem you mentioned caused it to crash before it could check the /system-update symlink. I'd recommend that the fwupdmgr binary should (somehow) be called upon to write out a flag file somewhere when fwupd prepares an update, and then fwupd-offline-update.service should have ConditionPathExists=<flagfile>. This way, the service will not be started unless a) fwupd itself prepared the update, and b) fwupdmgr is at least *minimally* functional - i.e. it's not just gonna crash on startup because of mismatched libraries or similar problems. @Will Thanks for your quick response. fwupd does item 5 properly: https://github.com/hughsie/fwupd/blob/0e3aae27f86956843da889d88c8bc1048e010771/src/fu-util.c#L398 But the program just crashed early due to library dependencies issues. It caused reboot without telling me who/what/why caused reboot. The logic from item 5 should be implemented by systemd.offline-updates service, instead of individual update script. The systemd.offline-updates service should determine who can run and make sure it runs one update script at a time under its control. I'd prefer the old fedup way to create a new entry in grub and do explicitly upgrade without any interferences. (In reply to Ricky from comment #6) > @Will > Thanks for your quick response. > > fwupd does item 5 properly: > > https://github.com/hughsie/fwupd/blob/ > 0e3aae27f86956843da889d88c8bc1048e010771/src/fu-util.c#L398 Except it doesn't protect against the case you encountered, which a flag file would. > But the program just crashed early due to library dependencies issues. It > caused reboot without telling me who/what/why caused reboot. If there was a way for system upgrade tools to notify the user about failures, I'm sure everyone would be happy to use it. But nobody has designed a good facility for that yet. (Note that would need to handle a lot of interesting cases - remote upgrades over ssh, systems without plymouth, systems without graphical displays, etc.) > The logic from item 5 should be implemented by systemd.offline-updates > service, instead of individual update script. The systemd.offline-updates > service should determine who can run and make sure it runs one update script > at a time under its control. The systemd team seems to disagree, but feel free to discuss better solutions with them. > I'd prefer the old fedup way to create a new entry in grub and do explicitly > upgrade without any interferences. Editing the grub configuration is much less reliable, and the failure mode is an infinite reboot loop or a totally unbootable system. With the current method your system is completely unharmed in the event of an unexpected upgrade failure, which is much safer. The downside is (as you note) that it's a lot harder to tell something went wrong when we don't break your system in the process. I think not including a new incompatible libusb is the real error here. We;ll use something better for offline updates error reporting if it becomes available. I guess we should have a reporting mechanism. I'd could be useful also for other things which need to perform a display a message just before rebooting. I'll try to propose something upstream in systemd. On the other hand, the fact that a broken fwupd binary causes also other upgrade methods to fail is still an issue. It *would* be nice to implement an additional safety like proposed in second half of comment #5. (In reply to Richard Hughes from comment #8) > I think not including a new incompatible libusb is the real error here. In *this* case, yes, but we have seen other instances of fwupd trying to start during a system upgrade, then failing and rebooting the system before the upgrade can start. See Radek's report in bug 1317215: https://bugzilla.redhat.com/show_bug.cgi?id=1317215#c9 I'm not certain why fwupd is failing there - there's no clear error messages, other than this line: Jun 27 12:39:35 dhcp-24-124.brq.redhat.com org.freedesktop.fwupd[836]: (fwupd:846): Fu-WARNING **: FuMain: failed to load AppStream data: AppStream metadata name /var/cache/app-info/xmls/.goutputstream-T2UMHY not valid, expected .xml[.*] or .yml[.*] But nonetheless, we end up here: Jun 27 12:39:59 dhcp-24-124.brq.redhat.com dbus[836]: [system] Failed to activate service 'org.freedesktop.fwupd': timed out Jun 27 12:39:59 dhcp-24-124.brq.redhat.com fwupdmgr[829]: Failed to connect to fwupd: Error calling StartServiceByName for org.freedesktop.fwupd: GDBus.Error:org.freedesktop.DBus.Error.TimedOut: Failed to activate service 'org.freedesktop.fwupd': timed out Jun 27 12:39:59 dhcp-24-124.brq.redhat.com systemd[1]: fwupd-offline-update.service: Main process exited, code=exited, status=1/FAILURE Jun 27 12:39:59 dhcp-24-124.brq.redhat.com systemd[1]: fwupd-offline-update.service: Unit entered failed state. Jun 27 12:39:59 dhcp-24-124.brq.redhat.com systemd[1]: fwupd-offline-update.service: Triggering OnFailure= dependencies. Jun 27 12:39:59 dhcp-24-124.brq.redhat.com systemd[1]: fwupd-offline-update.service: Failed with result 'exit-code'. ...and then the system reboots. (His workaround was "uninstall fwupd", which is probably not what you want users to do..) So, yeah. What to do about this is up to you, of course, but having an extra Condition to keep the service from starting when it's not needed would have prevented the problem in this case as well. |