Bug 1837812 - Building libpcap with rdma support causes problematic circular dependency systemd -> libpcap -> libibverbs -> rdma-core -> systemd
Summary: Building libpcap with rdma support causes problematic circular dependency sys...
Keywords:
Status: CLOSED RAWHIDE
Alias: None
Product: Fedora
Classification: Fedora
Component: libpcap
Version: 33
Hardware: All
OS: Linux
unspecified
high
Target Milestone: ---
Assignee: Michal Ruprich
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks: TRACKER-bugs-affecting-libguestfs
TreeView+ depends on / blocked
 
Reported: 2020-05-20 03:41 UTC by Adam Williamson
Modified: 2020-12-22 13:04 UTC (History)
17 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2020-12-22 13:04:58 UTC
Type: Bug
Embargoed:


Attachments (Terms of Use)

Description Adam Williamson 2020-05-20 03:41:47 UTC
Richard Jones reported some errors in a Rawhide package build:

https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org/message/J2R7RFV2BDLRGKJMGRAMH2VX6Z457DKJ/

one of the problems we saw is that systemctl and udevadm commands run early during buildroot population give "error while loading shared libraries: libibverbs.so.1: cannot open shared object file: No such file or directory". I looked into possible causes of this and figured out that it was caused by this change to libpcap:

https://src.fedoraproject.org/rpms/libpcap/c/99163ae31cb27dedbbc425ebe051449713ac8ca1?branch=master

which causes libpcap.so.1 to be linked against libibverbs.so.1. This is a problem because we now have a dependency loop:

* systemd requires libpcap (this is not expressly specified in systemd package deps, but systemctl *is* linked against libpcap.so.1 and if you try to remove libpcap, dnf refuses because systemd would have to be removed too, so the dependency is encoded indirectly somehow)
* libpcap requires libibverbs
* libibverbs requires rdma-core
* rdma-core requires systemd

that means dnf cannot order installation of the packages such that all those dependencies are respected. What it will do in such a situation is just come up with some way to cut the knot - it'll just pick an order. Most likely it decides to install systemd and various other things first, then install libibverbs later...but anything that gets installed between systemd and libibverbs which uses `systemctl` or `udevadm` or any other systemd command linked against libpcap fails due to libibverbs.so.1 not being there.

We untagged libpcap-1.9.1-4.fc33 from Rawhide for now to resolve this problem, and I confirmed it did solve it (note that the segfaults turned out to be a different bug, https://bugzilla.redhat.com/show_bug.cgi?id=1837809 ). Please don't do another build of libpcap with the rdma support enabled without coming up with some solution for this problem first. Thanks!

Comment 1 Adam Williamson 2020-05-20 03:44:39 UTC
oh, libibverbs also requires systemd directly, actually. so the loop is slightly tighter.

Comment 2 Michal Ruprich 2020-05-20 06:32:01 UTC
Hi Adam,

thanks for catching this. I am glad I only did this change in rawhide. Not sure about the solution here. I will make sure not to build this until I find one.

Michal

Comment 3 Zbigniew Jędrzejewski-Szmek 2020-05-20 08:06:00 UTC
systemd requires libip4tc.so.2()(64bit) provided by iptables-libs,                                             
and iptables-libs requires libpcap.so.1()(64bit) provided by libpcap.                                          

It is pulled in to provide firewalling functionality in nspawn and networkd (?).
It seems possible that the dep could be moved from libsystemd-shared.so to only
some binaries. This wouldn't remove the dependency from the rpm, because both of
those binaries are in the main systemd rpm, but it would avoid the crash since
systemctl and friends wouldn't be linked.

Comment 4 Michal Ruprich 2020-05-20 12:01:20 UTC
Reproduced in mock, this output is not visible in the root.log in koji, I guess only stderr is printed. I will put the output here, it shows nicely the order of actions that leads to this:

  Installing       : systemd-udev-245.5-2.fc33.x86_64                                                          287/392 
  Running scriptlet: systemd-udev-245.5-2.fc33.x86_64                                                          287/392 
/usr/bin/systemctl: error while loading shared libraries: libibverbs.so.1: cannot open shared object file: No such file or directory

  Installing       : dracut-050-26.git20200316.fc33.x86_64                                                     288/392 
  Installing       : rdma-core-29.0-1.fc33.x86_64                                                              289/392 
  Running scriptlet: rdma-core-29.0-1.fc33.x86_64                                                              289/392 
/sbin/udevadm: error while loading shared libraries: libibverbs.so.1: cannot open shared object file: No such file or directory
/sbin/udevadm: error while loading shared libraries: libibverbs.so.1: cannot open shared object file: No such file or directory
/sbin/udevadm: error while loading shared libraries: libibverbs.so.1: cannot open shared object file: No such file or directory

  Installing       : libibverbs-29.0-1.fc33.x86_64                                                             290/392

Comment 5 Michal Ruprich 2020-05-20 13:38:50 UTC
I have a confirmation from the dnf team that if there is a cycle, dnf usually tries to break it in a sort of a random place. I tried to break the chain in libpcap directly by installing libibverbs as Requires(pre): in libpcap but the dependencies rearranged in a sort of a weird manner:

  Installing       : rdma-core-29.0-1.fc33.x86_64                                                              287/392 
  Running scriptlet: rdma-core-29.0-1.fc33.x86_64                                                              287/392 
/var/tmp/rpm-tmp.s5iQbn: line 1: /sbin/udevadm: No such file or directory
/var/tmp/rpm-tmp.s5iQbn: line 2: /sbin/udevadm: No such file or directory
/var/tmp/rpm-tmp.s5iQbn: line 3: /sbin/udevadm: No such file or directory

  Installing       : libibverbs-29.0-1.fc33.x86_64                                                             288/392 
  Running scriptlet: systemd-udev-245.5-2.fc33.x86_64                                                          289/392 
  Installing       : systemd-udev-245.5-2.fc33.x86_64                                                          289/392 
  Running scriptlet: systemd-udev-245.5-2.fc33.x86_64                                                          289/392 
/usr/bin/systemctl: error while loading shared libraries: libpcap.so.1: cannot open shared object file: No such file or directory

  Installing       : libpcap-14:1.9.1-7.fc33.x86_64                                                            290/392 

Maybe if systemd-udev would pre required libibverbs, this would work?

Comment 6 Adam Williamson 2020-05-20 16:59:33 UTC
Oh man, we have a whole bug somewhere where we got very deep into the weeds about these kinds of dependency loops:

https://bugzilla.redhat.com/show_bug.cgi?id=1647172 (original bug)
https://bugzilla.redhat.com/show_bug.cgi?id=1648721 (loop behaviour spin-off)

it seems this got merged upstream:
https://github.com/rpm-software-management/rpm/pull/1028

so we might be able to use that...

Comment 7 Jan Kaluža 2020-06-18 10:48:27 UTC
This might also break composes (it actually breaks Fedora ELN compose). If this happens, the `systemd-machine-id-setup` command which is executed in the %post phase of `systemd` package fails with the mentioned `libibverbs.so.1` error message and *no* /etc/machine-id is generated.

The `/etc/machine-id` is needed for `kernel-install` to install the kernel into the /boot directory. Without the kernel in /boot directory, the `lorax` fails to build the bootable iso and the compose build fails.

Comment 8 Stephen Gallagher 2020-06-18 12:54:04 UTC
(In reply to Adam Williamson from comment #6)
> Oh man, we have a whole bug somewhere where we got very deep into the weeds
> about these kinds of dependency loops:
> 
> https://bugzilla.redhat.com/show_bug.cgi?id=1647172 (original bug)
> https://bugzilla.redhat.com/show_bug.cgi?id=1648721 (loop behaviour spin-off)
> 
> it seems this got merged upstream:
> https://github.com/rpm-software-management/rpm/pull/1028
> 
> so we might be able to use that...

Please be aware that https://github.com/rpm-software-management/rpmlint/issues/429 is currently blocking the use of `Requires(meta)` in Fedora.

Comment 9 Adam Williamson 2020-06-18 14:41:27 UTC
Jan: yeah, I recall another bug in the past where the lack of a machine-id caused trouble, it definitely can be a problem if that's happening.

Comment 10 Jan Kaluža 2020-06-19 04:55:28 UTC
Adam: And your blog post about that bug helped me a lot to find out what is wrong with Fedora ELN compose, thanks for it :).

Comment 11 Stephen Gallagher 2020-06-24 17:47:49 UTC
As best I can tell, `rdma-core` doesn't actually require the `systemd` package, just the `systemd-libs` package (because `/usr/sbin/rdma-ndd` is linked against it). So one way might be able to resolve this bug is to drop the explicit `Requires: systemd` and replace it with `%{?systemd_requires}` (which moves the requirement on systemctl to the RPM scriptlets).

I don't have a good way to test this, so I'm not sure if it will work, but it seems plausible.

I sent it out as a PR: https://src.fedoraproject.org/rpms/rdma-core/pull-request/3

Comment 12 Honggang LI 2020-06-30 02:20:05 UTC
https://src.fedoraproject.org/rpms/rdma-core/pull-request/4

I merged this PR and built rdma-core-30.0-2.fc33 .

Comment 13 Michal Ruprich 2020-06-30 08:19:40 UTC
It seems that the reproducer I was trying in comment #4 is no longer showing these errors. But I am not really sure if that is enough proof that this will work.

Comment 14 Adam Williamson 2020-06-30 15:07:23 UTC
Well, I can confirm that rdma-core indeed no longer requires systemd. We can try re-tagging libpcap-1.9.1-4.fc33 and seeing if the errors start showing up in builds again, or not.

Comment 15 Zbigniew Jędrzejewski-Szmek 2020-06-30 15:44:32 UTC
Let's do it!

Comment 16 Adam Williamson 2020-06-30 16:12:58 UTC
Yup, I asked Mohan to do it already, it should be tagged now and in the buildroot now or soon. I'll check builds later today and see if they look OK. Thanks everyone.

Comment 17 Adam Williamson 2020-06-30 18:58:13 UTC
Hmm, well, here's a recent build that's probably in the zone:

https://koji.fedoraproject.org/koji/buildinfo?buildID=1535255
https://kojipkgs.fedoraproject.org//packages/dropbear/2020.80/1.fc33/data/logs/x86_64/root.log

we don't have any "cannot open shared object file" errors, but we do have some "No such file or directory" errors:

===

DEBUG util.py:623:  Running transaction
DEBUG util.py:623:  /var/tmp/rpm-tmp.Jk1xFq: line 1: /sbin/udevadm: No such file or directory
DEBUG util.py:623:  /var/tmp/rpm-tmp.Jk1xFq: line 2: /sbin/udevadm: No such file or directory
DEBUG util.py:623:  /var/tmp/rpm-tmp.Jk1xFq: line 3: /sbin/udevadm: No such file or directory
DEBUG util.py:623:  Created symlink /etc/systemd/system/sockets.target.wants/dbus.socket → /usr/lib/systemd/system/dbus.socket.
DEBUG util.py:623:  Created symlink /etc/systemd/user/sockets.target.wants/dbus.socket → /usr/lib/systemd/user/dbus.socket.
DEBUG util.py:623:  Created symlink /etc/systemd/system/dbus.service → /usr/lib/systemd/system/dbus-broker.service.
DEBUG util.py:623:  Created symlink /etc/systemd/user/dbus.service → /usr/lib/systemd/user/dbus-broker.service.
DEBUG util.py:623:  Installed:

===

so we do still have scriptlets trying to run udevadm before it's installed. Comparing to a build from a couple of weeks ago (which would have been done with the older libpcap, without rdma-core support):

https://koji.fedoraproject.org/koji/buildinfo?buildID=1524066
https://kojipkgs.fedoraproject.org//packages/dropbear/2020.79/1.fc33/data/logs/x86_64/root.log

and we don't have any of those errors:

===

DEBUG util.py:623:  Running transaction
DEBUG util.py:623:  Created symlink /etc/systemd/system/sockets.target.wants/dbus.socket → /usr/lib/systemd/system/dbus.socket.
DEBUG util.py:623:  Created symlink /etc/systemd/user/sockets.target.wants/dbus.socket → /usr/lib/systemd/user/dbus.socket.
DEBUG util.py:623:  Created symlink /etc/systemd/system/dbus.service → /usr/lib/systemd/system/dbus-broker.service.
DEBUG util.py:623:  Created symlink /etc/systemd/user/dbus.service → /usr/lib/systemd/user/dbus-broker.service.
DEBUG util.py:623:  Installed:

===

I'm not sure if this will cause any practical problems, but we do still clearly have an issue...

Comment 18 Stephen Gallagher 2020-06-30 20:18:45 UTC
From the commit message by zbyszek:

"If systemd-udev is installed later in the transaction,
the call to udevadm trigger is not necessary. If systemd-udev is
installed earlier in the transaction, the call to udevadm trigger will
succeed. And if if systemd-udev is not installed at all, it means that
there's no hardware support and udevadm doesn't need to be executed at
all."


So, I think what we actually need to do here is just wrap the %post script in a 
```
if [ -x /sbin/udevadm ]; then
udevadm trigger <...>
fi
```

That way it will not throw errors if it's ordered before systemd-udev. I will send a PR.

Comment 19 Stephen Gallagher 2020-06-30 20:25:46 UTC
PR submitted: https://src.fedoraproject.org/rpms/rdma-core/pull-request/5

Comment 20 Adam Williamson 2020-06-30 20:35:13 UTC
ah, right. that's a nice refinement but also means we don't need to worry about the errors for now. thanks.

Comment 21 Honggang LI 2020-07-02 13:46:58 UTC
(In reply to Stephen Gallagher from comment #19)
> PR submitted: https://src.fedoraproject.org/rpms/rdma-core/pull-request/5

Merged and built rdma-core-30.0-4.fc33 .

Thanks

Comment 22 dac.override 2020-07-05 08:54:03 UTC
Is this what is supposed to happen?

Last metadata expiration check: 0:09:54 ago on Sun 05 Jul 2020 10:43:27 AM CEST.
Dependencies resolved.
================================================================================
 Package              Architecture  Version                Repository      Size
================================================================================
Upgrading:
 libpcap              x86_64        14:1.9.1-4.fc33        rawhide        174 k
Installing dependencies:
 libibverbs           x86_64        30.0-4.fc33            rawhide        319 k
 pciutils             x86_64        3.6.4-1.fc33           rawhide         91 k
 pciutils-libs        x86_64        3.6.4-1.fc33           rawhide         41 k
 rdma-core            x86_64        30.0-4.fc33            rawhide         60 k

Transaction Summary
================================================================================
Install  4 Packages
Upgrade  1 Package

Total download size: 685 k
Is this ok [y/N]:

Comment 23 Ben Cotton 2020-08-11 15:21:19 UTC
This bug appears to have been reported against 'rawhide' during the Fedora 33 development cycle.
Changing version to 33.

Comment 26 Honggang LI 2020-12-21 08:41:57 UTC
Issue had been fixed in Fedora-Rawhide (fc34). It will not be backport to F33. I will close it as WONTFIX.

Comment 27 Michal Ruprich 2020-12-22 13:04:58 UTC
Thanks, I think we can close this as RAWHIDE.


Note You need to log in before you can comment on or make changes to this bug.