Richard Jones reported some errors in a Rawhide package build: https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org/message/J2R7RFV2BDLRGKJMGRAMH2VX6Z457DKJ/ one of the problems we saw is that systemctl and udevadm commands run early during buildroot population give "error while loading shared libraries: libibverbs.so.1: cannot open shared object file: No such file or directory". I looked into possible causes of this and figured out that it was caused by this change to libpcap: https://src.fedoraproject.org/rpms/libpcap/c/99163ae31cb27dedbbc425ebe051449713ac8ca1?branch=master which causes libpcap.so.1 to be linked against libibverbs.so.1. This is a problem because we now have a dependency loop: * systemd requires libpcap (this is not expressly specified in systemd package deps, but systemctl *is* linked against libpcap.so.1 and if you try to remove libpcap, dnf refuses because systemd would have to be removed too, so the dependency is encoded indirectly somehow) * libpcap requires libibverbs * libibverbs requires rdma-core * rdma-core requires systemd that means dnf cannot order installation of the packages such that all those dependencies are respected. What it will do in such a situation is just come up with some way to cut the knot - it'll just pick an order. Most likely it decides to install systemd and various other things first, then install libibverbs later...but anything that gets installed between systemd and libibverbs which uses `systemctl` or `udevadm` or any other systemd command linked against libpcap fails due to libibverbs.so.1 not being there. We untagged libpcap-1.9.1-4.fc33 from Rawhide for now to resolve this problem, and I confirmed it did solve it (note that the segfaults turned out to be a different bug, https://bugzilla.redhat.com/show_bug.cgi?id=1837809 ). Please don't do another build of libpcap with the rdma support enabled without coming up with some solution for this problem first. Thanks!
oh, libibverbs also requires systemd directly, actually. so the loop is slightly tighter.
Hi Adam, thanks for catching this. I am glad I only did this change in rawhide. Not sure about the solution here. I will make sure not to build this until I find one. Michal
systemd requires libip4tc.so.2()(64bit) provided by iptables-libs, and iptables-libs requires libpcap.so.1()(64bit) provided by libpcap. It is pulled in to provide firewalling functionality in nspawn and networkd (?). It seems possible that the dep could be moved from libsystemd-shared.so to only some binaries. This wouldn't remove the dependency from the rpm, because both of those binaries are in the main systemd rpm, but it would avoid the crash since systemctl and friends wouldn't be linked.
Reproduced in mock, this output is not visible in the root.log in koji, I guess only stderr is printed. I will put the output here, it shows nicely the order of actions that leads to this: Installing : systemd-udev-245.5-2.fc33.x86_64 287/392 Running scriptlet: systemd-udev-245.5-2.fc33.x86_64 287/392 /usr/bin/systemctl: error while loading shared libraries: libibverbs.so.1: cannot open shared object file: No such file or directory Installing : dracut-050-26.git20200316.fc33.x86_64 288/392 Installing : rdma-core-29.0-1.fc33.x86_64 289/392 Running scriptlet: rdma-core-29.0-1.fc33.x86_64 289/392 /sbin/udevadm: error while loading shared libraries: libibverbs.so.1: cannot open shared object file: No such file or directory /sbin/udevadm: error while loading shared libraries: libibverbs.so.1: cannot open shared object file: No such file or directory /sbin/udevadm: error while loading shared libraries: libibverbs.so.1: cannot open shared object file: No such file or directory Installing : libibverbs-29.0-1.fc33.x86_64 290/392
I have a confirmation from the dnf team that if there is a cycle, dnf usually tries to break it in a sort of a random place. I tried to break the chain in libpcap directly by installing libibverbs as Requires(pre): in libpcap but the dependencies rearranged in a sort of a weird manner: Installing : rdma-core-29.0-1.fc33.x86_64 287/392 Running scriptlet: rdma-core-29.0-1.fc33.x86_64 287/392 /var/tmp/rpm-tmp.s5iQbn: line 1: /sbin/udevadm: No such file or directory /var/tmp/rpm-tmp.s5iQbn: line 2: /sbin/udevadm: No such file or directory /var/tmp/rpm-tmp.s5iQbn: line 3: /sbin/udevadm: No such file or directory Installing : libibverbs-29.0-1.fc33.x86_64 288/392 Running scriptlet: systemd-udev-245.5-2.fc33.x86_64 289/392 Installing : systemd-udev-245.5-2.fc33.x86_64 289/392 Running scriptlet: systemd-udev-245.5-2.fc33.x86_64 289/392 /usr/bin/systemctl: error while loading shared libraries: libpcap.so.1: cannot open shared object file: No such file or directory Installing : libpcap-14:1.9.1-7.fc33.x86_64 290/392 Maybe if systemd-udev would pre required libibverbs, this would work?
Oh man, we have a whole bug somewhere where we got very deep into the weeds about these kinds of dependency loops: https://bugzilla.redhat.com/show_bug.cgi?id=1647172 (original bug) https://bugzilla.redhat.com/show_bug.cgi?id=1648721 (loop behaviour spin-off) it seems this got merged upstream: https://github.com/rpm-software-management/rpm/pull/1028 so we might be able to use that...
This might also break composes (it actually breaks Fedora ELN compose). If this happens, the `systemd-machine-id-setup` command which is executed in the %post phase of `systemd` package fails with the mentioned `libibverbs.so.1` error message and *no* /etc/machine-id is generated. The `/etc/machine-id` is needed for `kernel-install` to install the kernel into the /boot directory. Without the kernel in /boot directory, the `lorax` fails to build the bootable iso and the compose build fails.
(In reply to Adam Williamson from comment #6) > Oh man, we have a whole bug somewhere where we got very deep into the weeds > about these kinds of dependency loops: > > https://bugzilla.redhat.com/show_bug.cgi?id=1647172 (original bug) > https://bugzilla.redhat.com/show_bug.cgi?id=1648721 (loop behaviour spin-off) > > it seems this got merged upstream: > https://github.com/rpm-software-management/rpm/pull/1028 > > so we might be able to use that... Please be aware that https://github.com/rpm-software-management/rpmlint/issues/429 is currently blocking the use of `Requires(meta)` in Fedora.
Jan: yeah, I recall another bug in the past where the lack of a machine-id caused trouble, it definitely can be a problem if that's happening.
Adam: And your blog post about that bug helped me a lot to find out what is wrong with Fedora ELN compose, thanks for it :).
As best I can tell, `rdma-core` doesn't actually require the `systemd` package, just the `systemd-libs` package (because `/usr/sbin/rdma-ndd` is linked against it). So one way might be able to resolve this bug is to drop the explicit `Requires: systemd` and replace it with `%{?systemd_requires}` (which moves the requirement on systemctl to the RPM scriptlets). I don't have a good way to test this, so I'm not sure if it will work, but it seems plausible. I sent it out as a PR: https://src.fedoraproject.org/rpms/rdma-core/pull-request/3
https://src.fedoraproject.org/rpms/rdma-core/pull-request/4 I merged this PR and built rdma-core-30.0-2.fc33 .
It seems that the reproducer I was trying in comment #4 is no longer showing these errors. But I am not really sure if that is enough proof that this will work.
Well, I can confirm that rdma-core indeed no longer requires systemd. We can try re-tagging libpcap-1.9.1-4.fc33 and seeing if the errors start showing up in builds again, or not.
Let's do it!
Yup, I asked Mohan to do it already, it should be tagged now and in the buildroot now or soon. I'll check builds later today and see if they look OK. Thanks everyone.
Hmm, well, here's a recent build that's probably in the zone: https://koji.fedoraproject.org/koji/buildinfo?buildID=1535255 https://kojipkgs.fedoraproject.org//packages/dropbear/2020.80/1.fc33/data/logs/x86_64/root.log we don't have any "cannot open shared object file" errors, but we do have some "No such file or directory" errors: === DEBUG util.py:623: Running transaction DEBUG util.py:623: /var/tmp/rpm-tmp.Jk1xFq: line 1: /sbin/udevadm: No such file or directory DEBUG util.py:623: /var/tmp/rpm-tmp.Jk1xFq: line 2: /sbin/udevadm: No such file or directory DEBUG util.py:623: /var/tmp/rpm-tmp.Jk1xFq: line 3: /sbin/udevadm: No such file or directory DEBUG util.py:623: Created symlink /etc/systemd/system/sockets.target.wants/dbus.socket → /usr/lib/systemd/system/dbus.socket. DEBUG util.py:623: Created symlink /etc/systemd/user/sockets.target.wants/dbus.socket → /usr/lib/systemd/user/dbus.socket. DEBUG util.py:623: Created symlink /etc/systemd/system/dbus.service → /usr/lib/systemd/system/dbus-broker.service. DEBUG util.py:623: Created symlink /etc/systemd/user/dbus.service → /usr/lib/systemd/user/dbus-broker.service. DEBUG util.py:623: Installed: === so we do still have scriptlets trying to run udevadm before it's installed. Comparing to a build from a couple of weeks ago (which would have been done with the older libpcap, without rdma-core support): https://koji.fedoraproject.org/koji/buildinfo?buildID=1524066 https://kojipkgs.fedoraproject.org//packages/dropbear/2020.79/1.fc33/data/logs/x86_64/root.log and we don't have any of those errors: === DEBUG util.py:623: Running transaction DEBUG util.py:623: Created symlink /etc/systemd/system/sockets.target.wants/dbus.socket → /usr/lib/systemd/system/dbus.socket. DEBUG util.py:623: Created symlink /etc/systemd/user/sockets.target.wants/dbus.socket → /usr/lib/systemd/user/dbus.socket. DEBUG util.py:623: Created symlink /etc/systemd/system/dbus.service → /usr/lib/systemd/system/dbus-broker.service. DEBUG util.py:623: Created symlink /etc/systemd/user/dbus.service → /usr/lib/systemd/user/dbus-broker.service. DEBUG util.py:623: Installed: === I'm not sure if this will cause any practical problems, but we do still clearly have an issue...
From the commit message by zbyszek: "If systemd-udev is installed later in the transaction, the call to udevadm trigger is not necessary. If systemd-udev is installed earlier in the transaction, the call to udevadm trigger will succeed. And if if systemd-udev is not installed at all, it means that there's no hardware support and udevadm doesn't need to be executed at all." So, I think what we actually need to do here is just wrap the %post script in a ``` if [ -x /sbin/udevadm ]; then udevadm trigger <...> fi ``` That way it will not throw errors if it's ordered before systemd-udev. I will send a PR.
PR submitted: https://src.fedoraproject.org/rpms/rdma-core/pull-request/5
ah, right. that's a nice refinement but also means we don't need to worry about the errors for now. thanks.
(In reply to Stephen Gallagher from comment #19) > PR submitted: https://src.fedoraproject.org/rpms/rdma-core/pull-request/5 Merged and built rdma-core-30.0-4.fc33 . Thanks
Is this what is supposed to happen? Last metadata expiration check: 0:09:54 ago on Sun 05 Jul 2020 10:43:27 AM CEST. Dependencies resolved. ================================================================================ Package Architecture Version Repository Size ================================================================================ Upgrading: libpcap x86_64 14:1.9.1-4.fc33 rawhide 174 k Installing dependencies: libibverbs x86_64 30.0-4.fc33 rawhide 319 k pciutils x86_64 3.6.4-1.fc33 rawhide 91 k pciutils-libs x86_64 3.6.4-1.fc33 rawhide 41 k rdma-core x86_64 30.0-4.fc33 rawhide 60 k Transaction Summary ================================================================================ Install 4 Packages Upgrade 1 Package Total download size: 685 k Is this ok [y/N]:
This bug appears to have been reported against 'rawhide' during the Fedora 33 development cycle. Changing version to 33.
Issue had been fixed in Fedora-Rawhide (fc34). It will not be backport to F33. I will close it as WONTFIX.
Thanks, I think we can close this as RAWHIDE.