Bug 2170310 - 5.x client with 4.x cluster : RBD IO failed saying rbd: symbol lookup error: rbd: undefined symbol
Summary: 5.x client with 4.x cluster : RBD IO failed saying rbd: symbol lookup error: ...
Keywords:
Status: CLOSED INSUFFICIENT_DATA
Alias: None
Product: Red Hat Ceph Storage
Classification: Red Hat Storage
Component: RADOS
Version: 5.3
Hardware: Unspecified
OS: Unspecified
low
low
Target Milestone: ---
: 6.1z1
Assignee: Brad Hubbard
QA Contact: Pawan
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2023-02-16 05:42 UTC by Vasishta
Modified: 2023-06-29 00:44 UTC (History)
5 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2023-06-29 00:44:29 UTC
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Issue Tracker RHCEPH-6152 0 None None None 2023-02-16 11:41:05 UTC

Description Vasishta 2023-02-16 05:42:59 UTC
Description of problem:
While using 5.x client with 4.x cluster, rbd command failed saying 
rbd: symbol lookup error: rbd: undefined symbol: _ZN8librados7v14_2_05IoCtx7notify2ERKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEERN4ceph6buffer7v14_2_04listEmPSD_, version LIBRADOS_14.2.0

Version-Release number of selected component (if applicable):
16.2.10-133

How reproducible:
Not been able to reproduce due to some dependency errors, will update further

Steps to Reproduce:
1. Configure 4.x cluster
2. Upgrade client package to 5.x ceph
3. Try rbd commands, (we tried rbd resize)

Actual results:
rbd: symbol lookup error: rbd: undefined symbol: _ZN8librados7v14_2_05IoCtx7notify2ERKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEERN4ceph6buffer7v14_2_04listEmPSD_, version LIBRADOS_14.2.0

Expected results:
No errors

Additional info:
Found similar issues in hammer to jewel upgrade upstream suite:
http://pastebin.test.redhat.com/1091748

Comment 1 Brad Hubbard 2023-02-16 06:17:06 UTC
$ echo _ZN8librados7v14_2_05IoCtx7notify2ERKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEERN4ceph6buffer7v14_2_04listEmPSD_|c++filt 
librados::v14_2_0::IoCtx::notify2(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, ceph::buffer::v14_2_0::list&, unsigned long, ceph::buffer::v14_2_0::list*)

Can you post the output of the following?

# ldd /usr/bin/rbd
# rpm -qf /usr/bin/rbd
# nm -gD /usr/bin/rbd
# nm -gD /usr/lib/librados.so.2
# rpm -qf /usr/lib/librados.so.2

This assumes the librados in the ldd output of the first command is
/usr/lib/librados.so.2 (it should be). If that is not the case then use whatever
path ldd returns for the last two commands please.

Comment 2 Vasishta 2023-02-16 11:39:47 UTC
(In reply to Vasishta from comment #0)

> Additional info:
> Found similar issues in hammer to jewel upgrade upstream suite:
> http://pastebin.test.redhat.com/1091748

Sorry for incorrect link, it is https://tracker.ceph.com/issues/17809

Comment 3 Vasishta 2023-02-20 17:46:13 UTC
(In reply to Brad Hubbard from comment #1)

Hi Brad,


Tried couple of times to reproduce but did not hit above issue.
Recent attempt involved same version of 4.x (14.2.22-128.el8cp) with recent version of 5.x build

# rpm -qf /usr/lib64/librados.so.2
librados2-16.2.10-137.el8cp.x86_64

I tried using same automation suite (downstream (cephci)), in this suite we upgrad packages and clients in parallel with IOs.
So all iterations might happen in slightly random fashion.
Test which failed - https://github.com/red-hat-storage/cephci/blob/master/suites/pacific/upgrades/tier-1_upgrade_test-4x-to-5x-rpm.yaml#L101-L158
Implementation of parallelism 
https://github.com/red-hat-storage/cephci/blob/master/tests/parallel/test_parallel.py#L38-L41 ------> https://github.com/red-hat-storage/cephci/blob/master/ceph/parallel.py

Does this seem to be corner case issue ?

Comment 4 Brad Hubbard 2023-02-20 22:13:22 UTC
We can either wait until you can gather all the information in comment #1 or we
can close this as having insufficient data. At the moment I don't believe the
combination of binaries that caused this would actually be supported in practice
but I need confirmation of the exact binaries involved and the symbol data to
make that call. Let me know whether you want to leave it open or just reopen it
later when you have the requested data.


Note You need to log in before you can comment on or make changes to this bug.