Bug 1647172

Summary: dbus-daemon not enabled on live images due to package install ordering problems caused by huge dependency loops
Product: [Fedora] Fedora Reporter: Adam Williamson <awilliam>
Component: fedora-releaseAssignee: Mohan Boddu <mboddu>
Status: CLOSED RAWHIDE QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: urgent Docs Contact:
Priority: unspecified    
Version: rawhideCC: amigadave, daherrma, dennis, dh.herrmann, dmach, igor.raits, jdisnard, jkeating, kellin, kevin, mblaha, mboddu, mhatina, mls, packaging-team-maint, pbrobinson, pmatilai, pmoravco, rpm-software-management, satellitgo, sgallagh, vmukhame, zbyszek
Target Milestone: ---Keywords: Reopened
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2018-11-11 21:28:30 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
Dependency diff between 20181102 and 20181105 package set
none
the output we get with RPMTRANS_FLAG_DEPLOOPS when reproducing an affected image compose
none
the output we get with RPMTRANS_FLAG_DEPLOOPS for a 20181102.n.0 compose
none
the output we get with RPMTRANS_FLAG_DEPLOOPS for a 20181105.n.1 KDE build none

Description Adam Williamson 2018-11-06 19:28:37 UTC
In the last Rawhide compose (20181105.n.1), live images are broken because dbus-daemon.service does not start on boot. systemctl status shows it as disabled, even though its preset is enabled.

Looking at the compose logs, I think the problem is install ordering. e.g. from the Workstation live compose:

https://koji.fedoraproject.org/koji/taskinfo?taskID=30699164
https://kojipkgs.fedoraproject.org//work/tasks/9164/30699164/livemedia-out.log

2018-11-06 02:35:46,150 INFO pylorax: Installing dbus-daemon.x86_64 (240/1608)
2018-11-06 02:35:46,150 INFO pylorax: Installing cryptsetup-libs.x86_64 (241/1608)
2018-11-06 02:35:46,150 INFO pylorax: Installing trousers-lib.x86_64 (242/1608)
2018-11-06 02:35:46,150 INFO pylorax: Installing openssl-pkcs11.x86_64 (243/1608)
2018-11-06 02:35:46,150 INFO pylorax: Installing openssl-libs.x86_64 (244/1608)
2018-11-06 02:35:46,150 INFO pylorax: Installing systemd.x86_64 (245/1608)

Note that dbus-daemon got installed *before* systemd, even though it Requires(post) systemd, as you can see in the logs of the current dbus package build:

https://koji.fedoraproject.org/koji/buildinfo?buildID=1160465
https://kojipkgs.fedoraproject.org//packages/dbus/1.12.10/6.fc30/data/logs/x86_64/build.log

I think because systemd wasn't installed when dbus-daemon was installed, its %post script that tries to activate the preset would have failed:

%post daemon
%systemd_post dbus-daemon.service

...and that's why the service isn't enabled.

What I can't figure out is *why dnf chose this ordering*. In earlier composes, it didn't. For instance, here's the 20181031.n.0 Workstation live compose:

https://koji.fedoraproject.org/koji/taskinfo?taskID=30570088
https://kojipkgs.fedoraproject.org//work/tasks/88/30570088/livemedia-out.log

2018-10-31 10:20:04,486 INFO pylorax: Installing dbus.x86_64 (607/1607)
2018-10-31 10:20:04,487 INFO pylorax: Installing dbus-tools.x86_64 (608/1607)
2018-10-31 10:20:04,487 INFO pylorax: Installing dbus-libs.x86_64 (609/1607)
...
2018-10-31 10:20:34,076 INFO pylorax: Installing systemd.x86_64 (632/1607)
2018-10-31 10:20:34,076 INFO pylorax: Installing dbus-common.noarch (633/1607)
2018-10-31 10:20:34,076 INFO pylorax: Installing dbus-daemon.x86_64 (634/1607)
2018-10-31 10:20:34,076 INFO pylorax: Installing systemd-bootchart.x86_64 (635/1607)
2018-10-31 10:20:34,076 INFO pylorax: Installing systemd-udev.x86_64 (636/1607)

Note it installed systemd before dbus-daemon, there.

The obvious change is the one I made recently to have systemd Requires(post): openssl-libs:

https://www.happyassassin.net/2018/11/05/adamws-debugging-adventures-has-anyone-seen-my-kernel/
https://src.fedoraproject.org/rpms/systemd/c/71e781a09623e04dc1006dcba12838578595d70f?branch=master

however, it doesn't seem like it's that simple, because even before that change, dnf was actually installing openssl-libs (and openssl-pkcs11, which it Recommends) before both systemd and dbus-daemon. e.g. from the 20181031.n.0 log again:

2018-10-31 10:20:04,486 INFO pylorax: Installing openssl-pkcs11.x86_64 (604/1607)
...
2018-10-31 10:20:34,075 INFO pylorax: Installing openssl-libs.x86_64 (626/1607)
...
2018-10-31 10:20:34,075 INFO pylorax: Installing openssl-libs.x86_64 (626/1607)
...
2018-10-31 10:20:34,076 INFO pylorax: Installing systemd.x86_64 (632/1607)
2018-10-31 10:20:34,076 INFO pylorax: Installing dbus-common.noarch (633/1607)
2018-10-31 10:20:34,076 INFO pylorax: Installing dbus-daemon.x86_64 (634/1607)

So I don't think *just* that could be the cause, though maybe it's involved, somehow.

It does seem like *something* changed that moved all the relevant packages way earlier in the order (from the 600s to the 200s), but I can't figure out what that was yet. There was a change in dbus that *adds* explicit dependencies on systemd / systemctl for the correct subpackages, but I don't see how that could've made things worse.

Still trying to figure this out...

Comment 1 Adam Williamson 2018-11-06 20:37:02 UTC
Note that in F29 and earlier this worked differently: I believe the way it worked there is we had dbus.socket and dbus.service , and dbus.service was automatically started on access to the dbus socket.

dbus.socket still exists, but it seems that even though dbus-daemon.service has a dbus.service alias, systemd does not start dbus-daemon.service via dbus.socket - instead 'systemctl status dbus.socket' shows this:

[root@localhost ~]# systemctl status dbus.socket
● dbus.socket - D-Bus System Message Bus Socket
   Loaded: loaded (/usr/lib/systemd/system/dbus.socket; disabled; vendor preset: enabled)
   Active: inactive (dead)
   Listen: /run/dbus/system_bus_socket (Stream)

Nov 06 20:09:50 localhost systemd[1]: dbus.socket: Socket service dbus.service not loaded, refusing.
Nov 06 20:09:50 localhost systemd[1]: Failed to listen on D-Bus System Message Bus Socket.

The Rawhide KDE live worked back on 2018-10-09, but both that image and the logs from its creation have been garbage-collected everywhere, so I cannot test it to see how dbus was actually getting started on that date, or check the logs to see the package install order...

Comment 3 Adam Williamson 2018-11-06 20:43:29 UTC
From the systemd notes:

"In addition, unit files may specify aliases through the Alias= directive in the [Install] section; those aliases are only effective when the unit is enabled."

so the dbus-daemon.service -> dbus.service alias is not much use for socket activation, I don't think.

Comment 4 Adam Williamson 2018-11-06 23:26:26 UTC
I think transaction ordering is actually in libsolv...

Comment 5 Panu Matilainen 2018-11-07 07:57:04 UTC
libsolv does support transaction ordering but it's not used by dnf, rpm is what does the ordering. 

dbus-daemon getting installed before systemd despite requiring systemd simply means there is an unmanageable dependency loop in the package set, which will need to be analyzed and resolved one way or the other.

Comment 6 Panu Matilainen 2018-11-07 10:22:19 UTC
I haven't been able to reproduce this locally. My rawhide mirror is lagging a few days behind, but updating to latest dbus and systemd from koji didn't change the situation. So it's probably something else then.

If only the composes were performed with RPMTRANS_FLAG_DEPLOOPS enabled, diffing that output between successful and failed compose would make it quite easy to pinpoint the problematic change :-/ 

It's also a bit crazy that the package versions included in a compose don't seem to be visible in the compose logs. Like, who would ever need to know?

AFAICT this is the last compose where the order was right (from Nov 2nd):
https://kojipkgs.fedoraproject.org//work/tasks/6109/30606109/livemedia-out.log
and this is the first completed one with wrong order (from Nov 4th):
https://kojipkgs.fedoraproject.org//work/tasks/3309/30663309/livemedia-out.log

There are exactly 1608 packages in both, so if we could find the version differences between the two, that'd narrow the haystack considerably.

Comment 7 Panu Matilainen 2018-11-07 10:46:57 UTC
Lacking anything better to go with, I diffed the rpm -qa list from the actual live images. There's no iso from Nov 4th so I had to use the one from 5th - so this is between 20181102 and 20181105. Anyway, the problematic change is from something on this list:

-coreutils-8.30-5.fc30.x86_64
-coreutils-common-8.30-5.fc30.x86_64
+coreutils-8.30-7.fc30.x86_64
+coreutils-common-8.30-7.fc30.x86_64
-dbus-1.12.10-5.fc30.x86_64
-dbus-common-1.12.10-5.fc30.noarch
-dbus-daemon-1.12.10-5.fc30.x86_64
+dbus-1.12.10-6.fc30.x86_64
+dbus-common-1.12.10-6.fc30.noarch
+dbus-daemon-1.12.10-6.fc30.x86_64
-dbus-libs-1.12.10-5.fc30.x86_64
+dbus-libs-1.12.10-6.fc30.x86_64
-dbus-tools-1.12.10-5.fc30.x86_64
-dbus-x11-1.12.10-5.fc30.x86_64
+dbus-tools-1.12.10-6.fc30.x86_64
+dbus-x11-1.12.10-6.fc30.x86_64
-elfutils-0.174-2.fc30.x86_64
-elfutils-default-yama-scope-0.174-2.fc30.noarch
-elfutils-libelf-0.174-2.fc30.x86_64
-elfutils-libs-0.174-2.fc30.x86_64
+elfutils-0.174-3.fc30.x86_64
+elfutils-default-yama-scope-0.174-3.fc30.noarch
+elfutils-libelf-0.174-3.fc30.x86_64
+elfutils-libs-0.174-3.fc30.x86_64
-fedora-release-30-0.11.noarch
-fedora-release-workstation-30-0.11.noarch
+fedora-release-common-30-0.12.noarch
+fedora-release-workstation-30-0.12.noarch
-findutils-4.6.0-20.fc29.x86_64
+findutils-4.6.0-21.fc30.x86_64
-firefox-63.0.1-1.fc30.x86_64
+firefox-63.0.1-3.fc30.x86_64
-gnome-calendar-3.30.0-1.fc30.x86_64
+gnome-calendar-3.30.1-1.fc30.x86_64
-harfbuzz-2.0.2-1.fc30.x86_64
-harfbuzz-icu-2.0.2-1.fc30.x86_64
+harfbuzz-2.1.0-1.fc30.x86_64
+harfbuzz-icu-2.1.0-1.fc30.x86_64
-ibus-libzhuyin-1.8.92-3.fc29.x86_64
+ibus-libzhuyin-1.8.93-1.fc30.x86_64
-java-openjdk-headless-10.0.2.13-8.fc30.x86_64
+java-openjdk-headless-11.0.1.13-4.fc30.x86_64
-kernel-headers-4.20.0-0.rc0.git8.1.fc30.x86_64
+kernel-headers-4.20.0-0.rc1.git0.1.fc30.x86_64
-kexec-tools-2.0.17-12.fc30.x86_64
+kexec-tools-2.0.18-1.fc30.x86_64
-krb5-libs-1.17-0.beta1.1.fc30.x86_64
+krb5-libs-1.16.1-25.fc30.x86_64
-libnfsidmap-2.3.3-1.rc1.fc30.x86_64
+libnfsidmap-2.3.3-2.rc1.fc30.x86_64
-libwacom-0.31-1.fc29.x86_64
-libwacom-data-0.31-1.fc29.noarch
+libwacom-0.32-1.fc30.x86_64
+libwacom-data-0.32-1.fc30.noarch
-lmdb-libs-0.9.22-3.fc29.x86_64
+lmdb-libs-0.9.22-4.fc30.x86_64
-nfs-utils-2.3.3-1.rc1.fc30.x86_64
-nftables-0.9.0-2.fc29.x86_64
+nfs-utils-2.3.3-2.rc1.fc30.x86_64
+nftables-0.9.0-3.fc30.x86_64
-opencc-1.0.5-3.fc30.x86_64
-perl-Errno-1.29-424.fc30.x86_64
+perl-Errno-1.29-425.fc30.x86_64
-perl-interpreter-5.28.0-424.fc30.x86_64
-perl-IO-1.39-424.fc30.x86_64
+perl-interpreter-5.28.0-425.fc30.x86_64
+perl-IO-1.39-425.fc30.x86_64
-perl-libs-5.28.0-424.fc30.x86_64
-perl-macros-5.28.0-424.fc30.x86_64
+perl-libs-5.28.0-425.fc30.x86_64
+perl-macros-5.28.0-425.fc30.x86_64
-ppp-2.4.7-26.fc29.x86_64
+ppp-2.4.7-27.fc30.x86_64
-python3-3.7.1-1.fc30.x86_64
+python3-3.7.1-2.fc30.x86_64
-python3-libs-3.7.1-1.fc30.x86_64
+python3-libs-3.7.1-2.fc30.x86_64
-rpm-ostree-libs-2018.9-2.fc30.x86_64
+rpm-ostree-libs-2018.9-3.fc30.x86_64
-SDL2-2.0.8-6.fc29.x86_64
+SDL2-2.0.9-1.fc30.x86_64
-selinux-policy-3.14.3-10.fc30.noarch
-selinux-policy-targeted-3.14.3-10.fc30.noarch
+selinux-policy-3.14.3-12.fc30.noarch
+selinux-policy-targeted-3.14.3-12.fc30.noarch
-systemd-239-6.git9f3aed1.fc30.x86_64
+systemd-239-8.git9f3aed1.fc30.x86_64
-systemd-container-239-6.git9f3aed1.fc30.x86_64
-systemd-libs-239-6.git9f3aed1.fc30.x86_64
-systemd-pam-239-6.git9f3aed1.fc30.x86_64
-systemd-udev-239-6.git9f3aed1.fc30.x86_64
+systemd-container-239-8.git9f3aed1.fc30.x86_64
+systemd-libs-239-8.git9f3aed1.fc30.x86_64
+systemd-pam-239-8.git9f3aed1.fc30.x86_64
+systemd-rpm-macros-239-8.git9f3aed1.fc30.noarch
+systemd-udev-239-8.git9f3aed1.fc30.x86_64
-vim-minimal-8.1.497-1.fc30.x86_64
+vim-minimal-8.1.511-1.fc30.x86_64

Comment 8 Panu Matilainen 2018-11-07 11:36:00 UTC
Created attachment 1502933 [details]
Dependency diff between 20181102 and 20181105 package set

Here's a diff of all dependencies between 20181102 and 20181105 package set extracted from the live images.

Most of that is just version difference noise which unfortunately makes it harder to stop actual changes, but the most obvious suspects are indeed systemd and dbus with a lot of churn, but the new openssl-libs and fedora-release-related changes perhaps too. There's a loop of over 60 packages surrounding systemd, which is such a house of cards its a miracle anything works at all.

Comment 9 Adam Williamson 2018-11-07 17:19:42 UTC
Sorry, I should've added a few more notes on this, clearly :)

1) Thanks for the hint that we don't use libsolv ordering, moving to rpm

2) Indeed this isn't trivial to reproduce: it doesn't seem to happen if you e.g. just install systemd and dbus-daemon into a mock chroot, or something. I've tried various package sets in a mock root and not got it to reproduce yet. I didn't yet try a Workstation netinstall from the compose repo, maybe I'll try that.

3) When trying to figure this out with diffs, bear in mind that until dbus-1.12.10-6 , the dbus-daemon subpackage did *NOT* have the systemd dependencies, so ordering it before systemd in the transaction would have been legitimate (even though in fact it seems like it got ordered after systemd). We can still try to figure out what change actually caused the live compose ordering to change, though.

Indeed I figured the cause here is most likely a dependency loop, but I had no idea what the magic debugging option would be that would let us figure out that loop! Thanks for the hint on RPMTRANS_FLAG_DEPLOOPS : I should be able to run an image compose with that set, and see what happens.

Comment 10 Adam Williamson 2018-11-07 18:28:49 UTC
So, uh, yeah, RPMTRANS_FLAG_DEPLOOPS sure tells us some stuff. There are 47 "SCC"s (which I think basically means "dep loops", the last of which has 147 members! Will attach that output, probably easier to read as an attachment than in-line.

Note that mclasen suggested changing the ordering might not help (as even if we got systemd installed first, systemctl may not work without dbus running), but I don't think that's actually true. I have the 20181025.n.0 Workstation live image here, with dbus-1.12.10-5.fc30 - and in that one, dbus-daemon.service *is* enabled and *does* start on boot. From the compose logs, systemd was installed before dbus-daemon when composing that image. So I'm guessing 'systemctl preset' does actually work without dbus running.

Comment 11 Adam Williamson 2018-11-07 18:31:47 UTC
Created attachment 1503086 [details]
the output we get with RPMTRANS_FLAG_DEPLOOPS when reproducing an affected image compose

note that the bug was indeed reproduced in this run: dbus-daemon got installed before systemd. I got this by doing a dnf scratch build that sets RPMTRANS_FLAG_DEPLOOPS in the Base._ts property, installing the python3-dnf from that scratch build into a local mock chroot, and running a copy of the 20181105.n.1 Workstation live compose in that mock chroot with that python3-dnf.

Comment 12 Adam Williamson 2018-11-07 20:18:51 UTC
So I did a test build of systemd without the openssl-libs dep and re-ran the live compose with a side repo containing that; it did change the representation of the loops a bit in the debugging output, but the bug still happened, dbus-daemon was installed before systemd. So I don't think the openssl-libs dep alone is the problem.

Next I'm trying a rebuild of systemd -6 with a higher release, so we can see if any of the changes between systemd -6 and -8 are the problem...

Comment 13 Adam Williamson 2018-11-08 07:09:27 UTC
Created attachment 1503215 [details]
the output we get with RPMTRANS_FLAG_DEPLOOPS for a 20181102.n.0 compose

I also reproduced the 20181102.n.0 compose and got the loop debug output for that. Just as with the 'official' compose, with this one, the relevant packages got installed *much* later, and dbus-daemon was installed right after systemd (so the service enablement worked, and the built image boots successfully).

The loop output is clearly different from the 1105 case; there are more loops, but they're not as complex (there are significantly fewer total lines in the output).

Hope this helps...

BTW, it seems that this loop calculation is just considering all dependencies. Shouldn't a Requires(pre) or a Requires(post) sort of "trump" a regular Requires? If so, wouldn't that mean we'd only have a problem here if there's a loop *consisting of pre or post requires only*? If so, can we debug only those dependencies?

Comment 14 David Rheinsberg 2018-11-08 07:23:19 UTC
(In reply to Adam Williamson from comment #10)
> Note that mclasen suggested changing the ordering might not help (as even if
> we got systemd installed first, systemctl may not work without dbus
> running), but I don't think that's actually true. I have the 20181025.n.0
> Workstation live image here, with dbus-1.12.10-5.fc30 - and in that one,
> dbus-daemon.service *is* enabled and *does* start on boot. From the compose
> logs, systemd was installed before dbus-daemon when composing that image. So
> I'm guessing 'systemctl preset' does actually work without dbus running.

The `systemctl enable|disable|preset` calls do not require a running daemon. This is intentional, otherwise bootstrapping through `--installroot=` (and alike) would not work either. In general, tools we use in rpm scriptlets should not query the host services.

> I also reproduced the 20181102.n.0 compose and got the loop debug output for
> that. Just as with the 'official' compose, with this one, the relevant
> packages got installed *much* later, and dbus-daemon was installed right after
> systemd (so the service enablement worked, and the built image boots
> successfully).
>
> The loop output is clearly different from the 1105 case; there are more loops,
> but they're not as complex (there are significantly fewer total lines in the
> output).

There have been several fixes in dbus.spec in the last few days. In particular, the systemd-dependencies have been fixed on November 5th. As of now we are not aware of any outstanding issues, and your build seems to be from November 7th. Maybe the recent fixes were related to this issue?

Since I am not on the dbus-maintainers list I wasn't CC'ed on many of the bug-reports, but I think I now caught up. If I interpret your last message correctly, the compose images work again?

Comment 15 Adam Williamson 2018-11-08 07:36:26 UTC
No, you're not interpreting correctly. This is still broken. The added dependencies in 1.12.10-6 did not help anything; for some reason, dnf still orders dbus-daemon before systemd, even though dbus-daemon Requires(post): systemd . That is the problem we are trying to figure out at present.

Comment 16 Adam Williamson 2018-11-08 07:40:53 UTC
To briefly recap: just considering the Workstation images, on 20181102.n.0 and earlier, *even though dbus-daemon did not Requires(post): systemd*, dnf/rpm ordered systemd install before dbus-daemon install when creating the Workstation live. I believe in 20181103.n.0, but certainly by 20181105.n.1, dnf/rpm now orders systemd install *after* dbus-daemon install, even when using dbus 1.12.10-6, with the dependency which should make sure systemd is installed *before* dbus-daemon.

The working assumption is that some kind of loop is the problem here, and something changed probably in the 20181103.n.0 compose that made dnf/rpm want to resolve the loop by placing dbus-daemon before systemd. We're not yet sure what that is.

The reason I bring up 20181103.n.0 is that I tried a live build from that tree, with the dep cycle debugging flag: the compose doesn't actually complete for some reason, but it does get as far as the loop debug output, and that output looks much more like 20181105.n.1 than it does like 20181102.n.0. So I think the critical change occurred between 20181102.n.0 and 20181103.n.0, but I can't 100% prove it unless I can get a 20181103.n.0 build to complete.

Comment 17 Panu Matilainen 2018-11-08 08:07:11 UTC
Right, it's the gigantic 147 member SCC (strongly connected component == loop, yes) where things blow up. Scriptlet dependencies (Requires(pre) etc) are tracked with => in the output whereas regular dependencies are tracked with ->, and yes they are considered when considering the loop break points, but loops this big just wont work, been there... The practical limit of rpm's loop breaking ability seems to be somewhere around 60-80 member SCC's, anything over 100 is almost certainly going to go wrong somewhere. 

The dbus changes are harmless but also largely redundant (dbus-daemon already required dbus-common which had all the necessary dependencies), the smoking gun turns out to be fedora-release: with fedora-release*30-0.9 the loop is not there, with fedora-release*30-0.12 it is. FWIW, if you have rpm's locally available, you can see the loops with something like this, much much faster than running full composes etc:

# rpm -Uvh --deploops --test --noverify --root /srv/test/ *.rpm

Anyway, as per above, reassigning to fedora-release. The problem here is that system-release is required by "setup" package which is one of those "early bootstrap" packages that simply cannot have dependencies, but 30-0.12 moves the system-release provide to fedora-release-workstation which also now requires glib2 which is probably what drags in half the world into that bootstrap loop and breaks it.

Comment 18 Adam Williamson 2018-11-08 08:09:54 UTC
Aha! I believe I have identified the package which changed things for Workstation, and it's...fedora-release.

fedora-release went from 30-0.11 (20181102.n.0) to 30-0.12 (20181103.n.0). Now, if I do a build that's identical to 30.0-11 but with a higher NEVR and add that into a 20181105.n.1 compose, I get the old ordering - where all the relevant packages are installed in the 600 range, and systemd before dbus-daemon:

2018-11-07 23:59:47,152: Installing systemd.x86_64 (638/1608)
2018-11-07 23:59:47,153: Installing dbus-common.noarch (639/1608)
2018-11-07 23:59:47,153: Installing dbus-daemon.x86_64 (640/1608)

so basically what happened here is, in 20181102.n.0 and earlier, dnf/rpm were resolving these hairy balls of dependencies such that systemd and dbus-daemon get installed somewhere up in the 600 range, with systemd first. In 20181103.n.0, the changes in fedora-release-30-0.12 somehow caused dnf/rpm to completely change how it decided to order things; it now installed them much earlier (in the 200 range), and with dbus-daemon before systemd. In 20181105.n.1, dbus-1.12.10-6 appeared, with dbus-daemon having a specific "Requires(post): systemd", which *SHOULD* cause dnf/rpm to order systemd before dbus-daemon...but somehow it does not, they stick with the same ordering as in 20181103.n.0, with dbus-daemon before systemd. "Reverting" the fedora-release changes causes dnf/rpm to go back to the old ordering, with the packages installed later and in the opposite order.

So...whatever it is about fedora-release -12 vs. -11 that makes dnf/rpm want to order dbus-daemon before systemd, somehow that trumps even the explicit "Requires(post): systemd" in dbus-daemon. I don't know what that change *is*, though.

Note also that the story is slightly different for other images...in the KDE live, for instance, here's 20181102.n.0:

Installing dbus-common.noarch (317/1648)
Installing dbus-daemon.x86_64 (318/1648)
Installing elfutils-default-yama-scope.noarch (319/1648)
Installing systemd-bootchart.x86_64 (320/1648)
Installing cryptsetup-libs.x86_64 (321/1648)
Installing systemd.x86_64 (322/1648)

and here's 20181105.n.1:

Installing dbus-daemon.x86_64 (322/1652)
Installing systemd-bootchart.x86_64 (323/1652)
Installing cryptsetup-libs.x86_64 (324/1652)
Installing openssl-pkcs11.x86_64 (325/1652)
Installing openssl-libs.x86_64 (326/1652)
Installing qt5-qtbase.x86_64 (327/1652)
Installing systemd.x86_64 (328/1652)

So basically for the KDE live, dnf/rpm ordered dbus-daemon before systemd *even before the fedora-release change*, and the added explicit dep in 20181105.n.1 just didn't help, again, whatever is making dnf/rpm decide to order dbus-daemon before systemd is somehow trumping even the explicit dependency.

Anyway, hope all this info helps to figure it out somehow :/

Comment 19 Adam Williamson 2018-11-08 08:10:44 UTC
Panu: note the KDE case, though, it orders wrongly regardless of fedora-release :/ I'll try and get the loop debug output for a KDE compose tomorrow, it's too late for tonight.

Comment 20 Adam Williamson 2018-11-08 17:03:23 UTC
Created attachment 1503380 [details]
the output we get with RPMTRANS_FLAG_DEPLOOPS for a 20181105.n.1 KDE build

Here's the loop debug output for the 20181105.n.1 KDE compose. It's pretty big! 73 loops, one of them has 243 members.

Comment 21 Adam Williamson 2018-11-08 20:18:01 UTC
BTW, here's an interesting thing from the past I remembered:

https://bugzilla.redhat.com/show_bug.cgi?id=1363858

back in 2016, systemd made a change to just call 'systemctl preset-all' in its %post on initial installation, rather than having a list of specific presets that systemd itself ships. Only that change broke the world, because it seems preset-all wasn't working correctly at the time. So I reverted it.

It looks like, shortly after that, preset-all got fixed...but the systemd spec was never changed to use 'preset-all' again. It still has the static list of presets in it.

If that change was in systemd, it would actually avoid this problem, because when systemd got installed, the call to 'preset-all' would enable dbus-daemon even though dbus-daemon's own attempt to do it earlier would've failed.

It also occurs to me to wonder why systemd doesn't just have a *%posttrans* which does 'systemctl preset-all' on initial system install...

Comment 22 Adam Williamson 2018-11-08 20:21:59 UTC
sorry, to finish that thought: because there are actually other potential package install ordering issues here. The distribution presets are shipped in, it looks like, fedora-release-common and fedora-release-(edition) (so there are presets in fedora-release-workstation and fedora-release-server), but the packages the presets affect don't seem to have 'Requires(pre): fedora-release-common' or anything in them, and the fedora-release-* packages themselves don't do anything to *apply* the presets they install.

So we are apparently also just relying on those fedora-release-* packages *happening* to get ordered early enough that presets are correctly applied during initial system installation...

Comment 23 Zbigniew Jędrzejewski-Szmek 2018-11-08 21:27:30 UTC
systemd preset-all should work fine nowadays. AFAIK, we fixed all the outstanding bugs. So I wouldn't mind revisiting this, and running preset-all when systemd is initially installed.

Comment 24 Adam Williamson 2018-11-08 22:19:56 UTC
How about running it in %posttrans as well as or instead of in %post ? (Again, only on initial installs). Can you think of any issues with that? I can't *immediately* think of any, I think any service customizations anaconda does would be after the RPM transaction was complete, so it shouldn't break those...

I'll do some test runs with changed systemd packages.

Oh, I also checked, and this issue affects Rawhide Workstation network installs currently as well - if you do a network install of Rawhide Workstation, dbus-daemon gets installed before systemd, and is not enabled on the installed system. That's actually why the openQA non-English tests are failing at present, as they use the Workstation package set.

Comment 25 Adam Williamson 2018-11-08 22:26:35 UTC
oh, bah, you *can't* do something in %posttrans only on initial install, that's just not an option:

https://fedoraproject.org/wiki/Packaging:Scriptlets#Syntax

We could, I guess, have anaconda do it, or something. That feels slightly icky though.

Comment 26 Stephen Gallagher 2018-11-08 22:28:30 UTC
Well, we could have the posttrans script drop a stamp file somewhere on disk so it only runs once. That would simulate “install-only”.

Comment 27 Adam Williamson 2018-11-08 22:48:57 UTC
eh, also feels a bit icky...maybe we could just have the fedora-release packages do it too, that should cover most possibilities hopefully...

Comment 28 Adam Williamson 2018-11-08 23:36:10 UTC
So, I tested a systemd that does 'systemctl preset-all' in %post . And FOR SCIENCE, I made it log the output! Here it is:

Unit syslog.service is an alias to a unit that is not present, ignoring.
Unit messagebus.service is an alias to a unit that is not present, ignoring.
Removed /etc/systemd/system/dbus-org.freedesktop.resolve1.service.
Removed /etc/systemd/system/dbus-org.freedesktop.network1.service.
Created symlink /etc/systemd/system/sockets.target.wants/dbus.socket → /usr/lib/systemd/system/dbus.socket.
Created symlink /etc/systemd/system/dbus.service → /usr/lib/systemd/system/dbus-daemon.service.
Created symlink /etc/systemd/system/multi-user.target.wants/dbus-daemon.service → /usr/lib/systemd/system/dbus-daemon.service.
Created symlink /etc/systemd/system/ctrl-alt-del.target → /usr/lib/systemd/system/reboot.target.
Created symlink /etc/systemd/system/getty.target.wants/getty → /usr/lib/systemd/system/getty@.service.
Created symlink /etc/systemd/system/multi-user.target.wants/remote-fs.target → /usr/lib/systemd/system/remote-fs.target.

so, we can see it fixed our problem. (And indeed it did: the image boots to the desktop, and dbus-daemon is running). But it also does some other stuff, and I'm not sure if it's all desired / intended. Particularly the removals of the two dbus-org.freedesktop services...I'm not sure what's going on there.

Zbyszek, any thoughts, before we do this officially?

Comment 29 Adam Williamson 2018-11-09 00:44:16 UTC
Welp, since Zbyszek wasn't around any more, I just went ahead and made the change:

https://koji.fedoraproject.org/koji/buildinfo?buildID=1162053

it's easy enough to revert again if it turns out to be problematic, and it should at least get Rawhide *kinda* working again. I hope.

Comment 30 Adam Williamson 2018-11-09 07:21:15 UTC
I grabbed the Workstation and KDE lives from the still-running 20181109.n.0 compose; both boot OK. Let's say this bug is fixed, if we still want to do something about the dependency loops, we can tackle those separately...

Comment 31 Panu Matilainen 2018-11-09 08:40:03 UTC
> oh, bah, you *can't* do something in %posttrans only on initial install, that's > just not an option:
> 
> https://fedoraproject.org/wiki/Packaging:Scriptlets#Syntax

FWIW, that piece of information is outdated, rpm >= 4.12 supplies the $1 argument to %pretrans, %posttrans and %verifyscript too.

Comment 32 Adam Williamson 2018-11-10 01:15:07 UTC
hmm, that's handy. what are the rules for the values?

Comment 33 Adam Williamson 2018-11-10 01:15:45 UTC
oh, i see - it supplies the same value, so the 'is it initial install or not?' is per package scriptlet, not per the whole transaction...

Comment 34 Adam Williamson 2018-11-11 17:29:27 UTC
BTW, I think the fedora-release loop here is probably causing numerous other problems, including the scriptlet fails that are breaking multiple install tests (currently it's crypto-policies that's failing, but before that something else was; none of the packages whose scriptlets are failing have changed, so I think it's the ordering that's changed). It's probably still a good idea to reconsider this fedora-release change.

Comment 35 Adam Williamson 2018-11-11 17:35:41 UTC
sgallagh, any thoughts, since you came up with the fedora-release change? It seems the change to the way system-release is a problem here, as Panu explained in comment #17. Now you've made the *variant* packages the ones that provide system-release , that means all the deps of the relevant variant package become early install deps - so on a Server install, RPM is trying to order cockpit very early, and on a Workstation install, it's trying to order glib very early...

Comment 36 Adam Williamson 2018-11-11 17:41:43 UTC
Hrm, here's one odd thing: fedora-release-server Requires(post): systemd , but does not do anything in %post at all that I can see. fedora-release-workstation Requires(post): /usr/bin/glib-compile-schemas , but it actually does that in *posttrans*, not post.

Maybe if we dropped the Requires(post): systemd and changed Requires(post): /usr/bin/glib-compile-schemas to just a plain Requires:, that might help?

Comment 37 Stephen Gallagher 2018-11-11 17:50:00 UTC
(In reply to Adam Williamson from comment #35)
> sgallagh, any thoughts, since you came up with the fedora-release change? It
> seems the change to the way system-release is a problem here, as Panu
> explained in comment #17. Now you've made the *variant* packages the ones
> that provide system-release , that means all the deps of the relevant
> variant package become early install deps - so on a Server install, RPM is
> trying to order cockpit very early, and on a Workstation install, it's
> trying to order glib very early...

Are you sure about the Server install ordering cockpit? It shouldn't be doing that, because it's just a Requires:, not a Requires(pre|post). The idea should be that it ensures those packages end up present at the end of the transaction but it *shouldn't* impact ordering. I do notice however that there's a Requires(post): systemd on fedora-release-server that is definitely no longer needed.

As for Workstation, there *is* a bug there. We have a Requires(post[un]): /usr/bin/glib-compile-schemas in there, which we should really replace with a filetrigger so that it happens at the end of the transaction once. And looking at the glib2 spec, it seems that trigger already exists, so I'm going to send a PR that drops these order-specific Requires and see if that sorts everything out.

PR available at https://src.fedoraproject.org/rpms/fedora-release/pull-request/44

Scratch-build available at https://koji.fedoraproject.org/koji/taskinfo?taskID=30806555

Comment 38 Adam Williamson 2018-11-11 21:28:30 UTC
I'm gonna close this again as we've now (hopefully) resolved all the most pressing issues by tweaking deps and scripts, but I'm going to open a new bug where we can discuss RPM's behaviour. Both Stephen and I have questions about the explanations Panu provided above.

Comment 39 Adam Williamson 2018-11-11 21:40:18 UTC
Opened https://bugzilla.redhat.com/show_bug.cgi?id=1648721 for further discussion of RPM's behaviour here...

Comment 40 satellitgo 2018-11-11 22:08:12 UTC
Fedora-Workstation-Live-x86_64-Rawhide-20181111.n.0.iso

installs and seems to be fixed

Comment 41 satellitgo 2018-11-11 23:16:28 UTC
Fedora-KDE-Live-x86_64-Rawhide-20181110.n.0

"you need to load the kernel"

VMM install

Comment 42 Adam Williamson 2018-11-12 00:37:40 UTC
You need 1111. 1110 still had a known bug. (it's not actually this bug, though, it's something somewhat different that is also to do with package install ordering in a way, but anyway, it's fixed in 1111). 1111 worked fine in openQA testing.

Comment 43 Panu Matilainen 2018-11-12 07:52:00 UTC
(In reply to Stephen Gallagher from comment #37)
> Are you sure about the Server install ordering cockpit? It shouldn't be
> doing that, because it's just a Requires:, not a Requires(pre|post). The
> idea should be that it ensures those packages end up present at the end of
> the transaction but it *shouldn't* impact ordering. I do notice however that
> there's a Requires(post): systemd on fedora-release-server that is
> definitely no longer needed.

There's a major misunderstanding here. Dependencies need to be always installed first, otherwise just about nothing would work. Requires(pre/post) etc only differ from "plain" Requires when there are dependency loops, in which case they're used as hints for loop breaking at the least harmful spots.

Comment 44 Adam Williamson 2018-11-12 17:22:59 UTC
Can we discuss it in the other bug? Seems more appropriate there. Thanks!