Description of problem: While using 5.x client with 4.x cluster, rbd command failed saying rbd: symbol lookup error: rbd: undefined symbol: _ZN8librados7v14_2_05IoCtx7notify2ERKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEERN4ceph6buffer7v14_2_04listEmPSD_, version LIBRADOS_14.2.0 Version-Release number of selected component (if applicable): 16.2.10-133 How reproducible: Not been able to reproduce due to some dependency errors, will update further Steps to Reproduce: 1. Configure 4.x cluster 2. Upgrade client package to 5.x ceph 3. Try rbd commands, (we tried rbd resize) Actual results: rbd: symbol lookup error: rbd: undefined symbol: _ZN8librados7v14_2_05IoCtx7notify2ERKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEERN4ceph6buffer7v14_2_04listEmPSD_, version LIBRADOS_14.2.0 Expected results: No errors Additional info: Found similar issues in hammer to jewel upgrade upstream suite: http://pastebin.test.redhat.com/1091748
$ echo _ZN8librados7v14_2_05IoCtx7notify2ERKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEERN4ceph6buffer7v14_2_04listEmPSD_|c++filt librados::v14_2_0::IoCtx::notify2(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, ceph::buffer::v14_2_0::list&, unsigned long, ceph::buffer::v14_2_0::list*) Can you post the output of the following? # ldd /usr/bin/rbd # rpm -qf /usr/bin/rbd # nm -gD /usr/bin/rbd # nm -gD /usr/lib/librados.so.2 # rpm -qf /usr/lib/librados.so.2 This assumes the librados in the ldd output of the first command is /usr/lib/librados.so.2 (it should be). If that is not the case then use whatever path ldd returns for the last two commands please.
(In reply to Vasishta from comment #0) > Additional info: > Found similar issues in hammer to jewel upgrade upstream suite: > http://pastebin.test.redhat.com/1091748 Sorry for incorrect link, it is https://tracker.ceph.com/issues/17809
(In reply to Brad Hubbard from comment #1) Hi Brad, Tried couple of times to reproduce but did not hit above issue. Recent attempt involved same version of 4.x (14.2.22-128.el8cp) with recent version of 5.x build # rpm -qf /usr/lib64/librados.so.2 librados2-16.2.10-137.el8cp.x86_64 I tried using same automation suite (downstream (cephci)), in this suite we upgrad packages and clients in parallel with IOs. So all iterations might happen in slightly random fashion. Test which failed - https://github.com/red-hat-storage/cephci/blob/master/suites/pacific/upgrades/tier-1_upgrade_test-4x-to-5x-rpm.yaml#L101-L158 Implementation of parallelism https://github.com/red-hat-storage/cephci/blob/master/tests/parallel/test_parallel.py#L38-L41 ------> https://github.com/red-hat-storage/cephci/blob/master/ceph/parallel.py Does this seem to be corner case issue ?
We can either wait until you can gather all the information in comment #1 or we can close this as having insufficient data. At the moment I don't believe the combination of binaries that caused this would actually be supported in practice but I need confirmation of the exact binaries involved and the symbol data to make that call. Let me know whether you want to leave it open or just reopen it later when you have the requested data.