Bug 2038037 - libceph-common.so.2 has undefined symbol: _ZN3fmt2v86detail13error_handler8on_errorEPKc breaking libvirt virtstoraged
Summary: libceph-common.so.2 has undefined symbol: _ZN3fmt2v86detail13error_handler8on...
Keywords:
Status: CLOSED RAWHIDE
Alias: None
Product: Fedora
Classification: Fedora
Component: ceph
Version: rawhide
Hardware: Unspecified
OS: Unspecified
urgent
urgent
Target Milestone: ---
Assignee: Kaleb KEITHLEY
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On: 2038020
Blocks:
TreeView+ depends on / blocked
 
Reported: 2022-01-07 07:06 UTC by Martin Pitt
Modified: 2022-01-13 12:50 UTC (History)
20 users (show)

Fixed In Version: ceph-16.2.7-3
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2022-01-13 12:50:01 UTC
Type: Bug
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Ceph Project Bug Tracker 53816 0 None None None 2022-01-10 13:19:52 UTC

Description Martin Pitt 2022-01-07 07:06:57 UTC
Description of problem: virtstoraged.service broke in rawhide a few days ago, due to a linker error. Apparently there was some C++ library ABI change.


Version-Release number of selected component (if applicable):

libvirt-daemon-driver-storage-core-7.10.0-1.fc36.x86_64


How reproducible: Always


Steps to Reproduce:
1. systemctl start virtstoraged.socket
2. virsh pool-define-as dir-pool --type dir --target /tmp/pool

Actual results:

error: Failed to define pool dir-pool
error: Failed to connect socket to '/var/run/libvirt/virtstoraged-sock': Connection refused

The socket unit fails:


× virtstoraged.socket - Libvirt storage local socket
     Loaded: loaded (/usr/lib/systemd/system/virtstoraged.socket; enabled; vendor preset: enabled)
     Active: failed (Result: service-start-limit-hit) since Fri 2022-01-07 07:02:41 UTC; 3min 35s ago
   Triggers: ● virtstoraged.service
     Listen: /run/libvirt/virtstoraged-sock (Stream)

Jan 07 07:02:35 testcloud systemd[1]: Listening on virtstoraged.socket - Libvirt storage local socket.
Jan 07 07:02:41 testcloud systemd[1]: virtstoraged.socket: Failed with result 'service-start-limit-hit'.

Because the service fails:

systemd[1]: Starting virtstoraged.service - Virtualization storage daemon...
virtstoraged[14077]: libvirt version: 7.10.0, package: 1.fc36 (Fedora Project, 2021-12-01-11:24:27, )
virtstoraged[14077]: hostname: testcloud
virtstoraged[14077]: internal error: Failed to load module '/usr/lib64/libvirt/storage-backend/libvirt_storage_backend_rbd.so': /usr/lib64/ceph/libceph-common.so.2: undefined symbol: _ZN3fmt2v86detail13error_handler8on_errorEPKc
systemd[1]: virtstoraged.service: Main process exited, code=exited, status=3/NOTIMPLEMENTED

Comment 1 Kaleb KEITHLEY 2022-01-10 19:23:46 UTC
Both the upstream ceph.spec(.in) and fedora distgit ceph.spec specify a BR for fmt-devel, and the build is linking with the system libfmt.so/libfmt.so.8, instead of the bundled fmt bits, which would be libfmt.so.6 if it were being built and used. (Also note that ceph doesn't have a WITH_SYSTEM_FMT option for its cmake config.)

I would also presume, given the "v8" in the undefined symbol name, that ceph is building with the fmt headers in /usr/include/fmt, and not the bundled headers, which would have "v6" in the name.

But... Since ceph in both f35 and f36/rawhide were built with fmt-8.0.1-x.fc35 — and apparently were working up until 7 Jan 2022*, when fmt was updated — I'm going to make a leap and guess that the recent update to fmt-8.1.1-x in f36/rawhide and f35 is the reason for apparent sudden appearance of this error at runtime.

I will rebuild ceph on both f36/rawhide and f35 against the new fmt-8.1.1. Let's see if that fixes the runtime error you're seeing. (If it doesn't fix it then we can try building with the bundled fmt bits instead.)

*which is also the date this bz was filed.

Comment 2 Kaleb KEITHLEY 2022-01-10 20:46:10 UTC
also see #fedora-devel today (10 Jan):

<defolos> xvitaly: you or tchaikov updated fmt to 8.1 in rawhide, right?
<xvitaly> defolos: Me.
<defolos> did you not rebuild libspdlog?
<defolos> I'm getting linking errors in bear now
<xvitaly> defolos: There are no ABI changes in 8.1.1.
<xvitaly> defolos: Can you show log?
<defolos> xvitaly: `/usr/bin/ld: /usr/lib64/libspdlog.so.1.9.2: undefined reference to `fmt::v8::detail::error_handler::on_error(char const*)' `
<defolos> that's what I'm getting
<defolos> and koschei too: https://kojipkgs.fedoraproject.org/work/tasks/2472/80842472/build.log
<xvitaly> defolos: It should be fixed in 8.1.1.
<xvitaly> defolos: > DEBUG util.py:446:   fmt-devel                 aarch64   8.1.0-1.fc36                 build   120 k
<xvitaly> 8.1.0 is broken.
<xvitaly> ABI regression was fixed in 8.1.1.
<-- juhp (~juhp.188.82) has quit (Ping timeout: 256 seconds)
<-- crobinso (~crobinso@2601:18c:8180:dc95:eff8:9128:2eb6:9e63) has quit (Remote host closed the connection)
--> juhp (~juhp.188.82) has joined #fedora-devel
<defolos> xvitaly: ah, thanks!
<defolos> do you have an ETA?
<alebastr> defolos: ETA is next successful rawhide compose
<defolos> alebastr: even better, thanks!
<alebastr> 8.1.1 is already available in rawhide buildroot

Comment 3 Kaleb KEITHLEY 2022-01-11 12:09:23 UTC
fixed (?) in ceph-16.2.7-3.fc36 — rebuilt with fmt-8.1.1-1

please give it a try and let me know whether it works now.

Comment 4 Kaleb KEITHLEY 2022-01-11 13:00:31 UTC
What I'm seeing is that while fmt-8.1.1-1 is in the buildroot[1] there hasn't been a compose and a dnf update will still get fmt-8.1.0, which is known to have broken the ABI[2].

[1] https://kojipkgs.fedoraproject.org//packages/ceph/16.2.7/3.fc36/data/logs/x86_64/root.log
[2] https://bugzilla.redhat.com/show_bug.cgi?id=2038020


Note You need to log in before you can comment on or make changes to this bug.