Description of problem: Running "dnf upgrade" on a fresh FC 24 install results in upgrading also systemd-udev (updated version .x86_64, 229-13.fc24). Upgrading systemd-udev involves restarting the running X session (the user has to log in again) and breaking the dnf transaction: the state of the install is invalid, as many packages are marked as installed in both the old and new versions (as found by running "dnf repoquery --duplicated" , but in fact the new version is not installed (e.g. kernel). Steps to Reproduce: 1. install FC 24 (incl. setting up wifi networking) 2. "dnf upgrade" 3. packages not updated, but marked as duplicated Actual results: Install broken, packages marked as installed (and duplicated), but in fact got not updated. Expected results: "dnf update" updates packages (ossibly without restarting the X session, or with restart, but with all packages updated) Additional info:
Unfortunately DNF can't do much about this. I guess something in packaging is wrong, though I'm not sure what exactly.
It sounds a lot like #1367766, but that bug was only present in F25+. I'm not aware of anything which would cause this in F24. Can you attach the logs from around the upgrade?
Sorry, the logs are gone - I reinstalled the system. However, after reinstall, I updated systemd-udev alone (before updating all the rest), and issues similar to those described in #1367766 are now present in the newly installed system. So if some log related to this scenario would be helpful, let me know which one to attach.
*** Bug 1341327 has been marked as a duplicate of this bug. ***
There's some more info in 1341327 , which we're leaving open to be for the X end of this problem. Basically, restarting systemd-udev-trigger.service causes (we think) systemd-logind to pull the graphics adapter out from under X and immediately give it back again. This service is (currently) restarted in %postun of systemd-udev . From reports we've received so far, it seems that on systems with hybrid graphics, this causes X to crash. On systems with dedicated graphics, it doesn't. This bug is for the systemd end of the problem: the spurious graphics adapter 'replug' probably just shouldn't happen at all. The other bug is for making X not crash if it *does* happen, if X folks want to do that. I'm proposing this bug as a Beta freeze exception. Since the restart is in %postun , if we ship F25 Beta with the current systemd package, then the first update to systemd-udev will trigger this bug - even if it's updating to a systemd-udev which takes the restart out of %postun. To ensure F25 Beta users don't encounter this bug, we have to include a systemd-udev build with the systemd-udev-trigger restart taken out of %postun in the frozen images. http://koji.fedoraproject.org/koji/buildinfo?buildID=807101 is the build that should fix this, I'll submit an update once it's complete.
Nah, "udevadm trigger" is an operation that should always be safe. Software (be it apps or drivers) that cannot deal with such a replug is broken, and needs to be fixed. We have been retriggering udevadm either fully or only specific subsystems since about always. If X11 is broken now with that it needs to be fixed really. I don't see anything to change in systemd here. Sorry.
Lennart: the thing zbyszek thinks may be wrong is not the udevadm trigger operation itself, but the fact that it results in this 'hardware replug' happening. He says he's not sure that's actually intended or wanted.
Created attachment 1207377 [details] journalctl -f logs from udevadm trigger --type=devices --action=add The issue is not caused by udevadm trigger --type=devices --action=add directly, but through systemd-logind. If systemd-logind is SIGSTOPed, nothing happens. If systemd-logind is running normally there is a bunch of remove/add events logged by Xorg (see) attachment. systemd-logind doesn't log anything, even at debug level unfortunately. I think it should at least log when it adds/removes devices. I also don't think it should remove the devices from clients, even temporarily. I would expect this to cause glitches at least.
+1 FE
For the record, I checked release day F23 and F24 lives in a VM, and restarting systemd-udev-trigger.service appears to trigger the hardware 'replug' in both; I see: Oct 04 19:13:40 localhost /usr/libexec/gdm-x-session[1542]: (II) config/udev: removing GPU device /sys/devices/pci0000:00/0000:00:02.0/drm/card0 /dev/dri/card0 Oct 04 19:13:40 localhost /usr/libexec/gdm-x-session[1542]: xf86: remove device 0 /sys/devices/pci0000:00/0000:00:02.0/drm/card0 Oct 04 19:13:40 localhost /usr/libexec/gdm-x-session[1542]: failed to find screen to remove even on F23. However, it seems like there was a change between F23 and F24: the introduction of the systemd-udev subpackage, which did not exist in F23. This commit created it: http://pkgs.fedoraproject.org/cgit/rpms/systemd.git/commit/?id=c16b573717a4fc657d8bac8e12f734f574b8ec42 and added the postun scriptlet: +%postun udev +%systemd_postun_with_restart systemd-udev-{settle,trigger}.service systemd-udevd-{control,kernel}.socket systemd-udevd.service at least just looking at that commit diff, this wasn't simply moved from somewhere else - we actually weren't doing that before, though the systemd-udev-trigger service did exist. So I think that's why this showed up in F24.
That's +3 FE (counting myself too), setting accepted.
I think that this bug strikes also F25 because yesterday i have updated my Thinkpad T430 (Intel + Nvidia) with F25 during a Gnome Session and X has crashed..
systemd-229-16.fc24 has been pushed to the Fedora 24 testing repository. If problems still persist, please make note of it in this bug report. See https://fedoraproject.org/wiki/QA:Updates_Testing for instructions on how to install test updates. You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2016-faf2598d0c
systemd-231-8.fc25 has been pushed to the Fedora 25 testing repository. If problems still persist, please make note of it in this bug report. See https://fedoraproject.org/wiki/QA:Updates_Testing for instructions on how to install test updates. You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2016-d458ee281a
systemd-229-16.fc24 has been pushed to the Fedora 24 stable repository. If problems still persist, please make note of it in this bug report.
I did have a thought about how we could further mitigate this to prevent people updating a fresh Fedora 24 install from encountering it. Basically, have systemd-udev do something like this (psuedocode): %pre %if (current systemd package is older than systemd-229-16.fc24) systemctl mask systemd-udev-trigger.service %endif %posttrans systemctl unmask systemd-udev-trigger.service the %pre will run before the old systemd-udev package's %postun and effectively negate its restart of the service, I think, then the %posttrans would restore it to normal. We might need a few more hedges - perhaps only do this on update(?), and definitely check if systemd-udev-trigger.service was *already* masked and don't unmask it in %posttrans in that case (systemctl is-enabled can tell us if it's already masked) - but what do people think of the general idea? Too hacky?
That's way to hacky and error prone. Instead, we could a drop in with '[Unit] RefuseManualStop=true' to the service and do 'systemctl daemon-reload' in %post. This is enough to prevent the subsequent 'systemctl try-restart systemd-udev-trigger.sevice' from doing anything.
systemd-231-8.fc25 has been pushed to the Fedora 25 stable repository. If problems still persist, please make note of it in this bug report.
*** Bug 1382749 has been marked as a duplicate of this bug. ***
*** Bug 1383410 has been marked as a duplicate of this bug. ***
systemd-231-10.fc25 now adds RefuseManualStop=true to the unit. This should fix the issues during upgrade.
Are you also going to push the RefuseManualStop workaround to F24?
It's not necessary. The scriptlets which caused the issue were added in F25, and the F24→F25 upgrade should now be fixed. F24 itself is not affected.
Huh? No, that's wrong. The scriptlets are already in F24. You can reproduce this bug simply by installing a clean F24 and updating systemd-udev from the update repositories.
Hm, OK. I'll check F24 then too.
systemd-231-10.fc25 has been pushed to the Fedora 25 testing repository. If problems still persist, please make note of it in this bug report. See https://fedoraproject.org/wiki/QA:Updates_Testing for instructions on how to install test updates. You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2016-005ad5dcb1
systemd-231-10.fc25 has been pushed to the Fedora 25 stable repository. If problems still persist, please make note of it in this bug report.
So the Fedora 24 solution to this is to upgrade to 25. :=[[[
No, it's already fixed in F24, just not the extra-fix that prevents it happening on the first update yet, but Zbigniew is still planning to do that, I think. Bugzilla / Bodhi integration has limitations in dealing with bugs that affect multiple releases.
OK, thanks, it did happen last time I did an update when I can get a systemd update without failure, I will know. I now separate systemd out from "all" and will continue till it does not fail. :=]
Ray: it happened when you did the update because of the details of how the bug is triggered. The bug is triggered by a command in system-udev's `%postun` script. When you do a package update from, say, foo-1.0 to foo-2.0, the `%postun` script from foo-1.0 is run as part of that transaction. So here's how it went down: the *existing* systemd-udev package on your system had a `%postun` script that would trigger the update. Up until systemd-229-16.fc24 , all F24 systemd packages had that script. We released systemd-229-16.fc24 as an update which *removed* that script. However, because it's the %postun of the *old* package that is run on update - not the %postun from the *new* package - when you install systemd-udev-229-16.fc24 , the bug will happen one last time, because the old systemd-udev package still has the bad %postun. What the update ensures is that any time you update the package *after* the update to 229-16, you won't hit the bug. We've since come up with a trick which allows the new package to suppress the old package's %postun , so that the bug will no longer happen when you first update to the 'fixed' package. But that trick hasn't been built for F24 yet, I hope Zbigniew will build it, though. Still, now you've got 229-16 installed, you should be safe from this bug in future in any case.
Many thanks for your thorough explanation. I knew it was some command in the "clean" phase. I will expect the next one to succeed and remove the need for the separated"dnf update systemd" --- :=]]]
Just for the record: I had the same problem upgrading from F23->F24 - X was terminated. Now it happened again with upgrade from F24->F25, and the systemd-udev package was updated long ago before upgrade from F24->F25 on Oct 04 2016: /var/log/dnf.rpm.log:Nov 03 14:26:34 INFO Upgraded: systemd-udev-231-10.fc25.x86_64 /var/log/dnf.rpm.log-20161009:Oct 04 18:20:24 INFO Upgraded: systemd-udev-229-15.fc24.x86_64 /var/log/dnf.rpm.log-20161009:Oct 04 18:20:30 INFO Cleanup: systemd-udev-229-13.fc24.x86_64 /var/log/dnf.rpm.log-20161009:Oct 07 10:51:11 INFO Upgraded: systemd-udev-229-16.fc24.x86_64 /var/log/dnf.rpm.log-20161009:Oct 07 10:51:20 INFO Cleanup: systemd-udev-229-15.fc24.x86_64 It's not a big deal, but with dnf is a bit difficult to clean the mess after the crash. I use this: dnf remove $(dnf repoquery --duplicated --latest-limit -1 -q) which complains for removal of systemd* and dnf packages and I need to do that manually with rpm. This time the new problem was the missing /usr/lib/locale/locale-archive, which leads to: -bash: warning: setlocale: LC_CTYPE: cannot change locale (en_US.UTF-8): No such file or directory -bash: warning: setlocale: LC_COLLATE: cannot change locale (en_US.UTF-8): No such file or directory -bash: warning: setlocale: LC_MESSAGES: cannot change locale (en_US.UTF-8): No such file or directory -bash: warning: setlocale: LC_NUMERIC: cannot change locale (en_US.UTF-8): No such file or directory -bash: warning: setlocale: LC_TIME: cannot change locale (en_US.UTF-8): No such file or directory Executing build-locale-archive fixed it. I have another one system to test the upgrade, if there is a fix for F24's systemd-udev package. I did not have a problem upgrading systemd* in F23 or F24, only when upgrading to the next FNN.
I think this is fixed now but FTR I do updates in two steps till I am sure dnf update systemd dnf update all If and when the first fails, I do this before the next one dnf clean all
that's not a great defence. the best defences are the ones we documented: use offline updates, update from a VT, or update from a tmux/screen session. this specific bug should now be basically fixed, yes.