Bug 1936210

Summary: rbd example from the docs crashes with ceph_assert(weight == 10)
Product: [Red Hat Storage] Red Hat Ceph Storage Reporter: its_rhn
Component: RADOSAssignee: Greg Farnum <gfarnum>
Status: CLOSED ERRATA QA Contact: Manohar Murthy <mmurthy>
Severity: high Docs Contact:
Priority: unspecified    
Version: 4.2CC: akupczyk, alfrgarc, astupnik, bhubbard, ceph-eng-bugs, ceph-qe-bugs, dhill, dzafman, gfarnum, jdurgin, johfulto, kchai, kdreyer, kurathod, lhh, manuel.frei, mhackett, mhicks, nojha, patrik.fuerer, pdhiran, rzarzyns, schhabdi, sseshasa, tserlin, vereddy, vumrao
Target Milestone: ---   
Target Release: 4.2z1   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: ceph-14.2.11-137.el7cp ceph-14.2.11-137.el8cp Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2021-04-28 20:13:21 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1941272, 1942142    

Description its_rhn 2021-03-07 16:51:49 UTC
Description of problem:
After fresh RHOSP 16.1 installation I noticed that "openstack image create" is failing with 503 service unavailable. I have external ceph configured for all openstack storage services so I went into some debugging, following these steps:
https://access.redhat.com/documentation/en-us/red_hat_openstack_platform/16.1/html-single/integrating_an_overcloud_with_an_existing_red_hat_ceph_cluster/index#proc_verifying-the-ceph-storage-cluster_

Version-Release number of selected component (if applicable):
RHOSP 16.1 and whatever it fetches for registry.redhat.io/rhosp-rhel8/openstack-nova-compute:16.1

How reproducible:
Very

Steps to Reproduce:
1. deploy rhosp with external ceph
2. try creating an image
3. go dig into nova-compute containers as instructed by the docs

Actual results:
[root@rhosp-compute-0 ~]# podman exec nova_compute /usr/bin/rbd --conf /etc/ceph/ceph.conf --keyring /etc/ceph/ceph.client.openstack.keyring --cluster ceph --id openstack ls images

/builddir/build/BUILD/ceph-14.2.11/src/mon/MonMap.cc: In function 'void mon_info_t::decode(ceph::buffer::v14_2_0::list::const_iterator&)' thread 7f6c70b8e700 time 2021-03-07 17:38:07.654388
/builddir/build/BUILD/ceph-14.2.11/src/mon/MonMap.cc: 80: FAILED ceph_assert(weight == 10)
 ceph version 14.2.11-95.el8cp (1d6087ae858e7c8e72fe7390c3522c7e0d951240) nautilus (stable)
 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x156) [0x7f6c7930b358]
 2: (()+0x276572) [0x7f6c7930b572]
 3: (mon_info_t::decode(ceph::buffer::v14_2_0::list::iterator_impl<true>&)+0x922) [0x7f6c796f6f92]
 4: (MonMap::decode(ceph::buffer::v14_2_0::list::iterator_impl<true>&)+0x98a) [0x7f6c796f79aa]
 5: (MonClient::handle_monmap(MMonMap*)+0x13d) [0x7f6c796e49fd]
 6: (MonClient::ms_dispatch(Message*)+0x35b) [0x7f6c796ed6db]
 7: (DispatchQueue::entry()+0x12cc) [0x7f6c79560fec]
 8: (DispatchQueue::DispatchThread::entry()+0x11) [0x7f6c796163f1]
 9: (()+0x82de) [0x7f6c779622de]
 10: (clone()+0x43) [0x7f6c75e5ee83]
*** Caught signal (Aborted) **
 in thread 7f6c70b8e700 thread_name:ms_dispatch
2021-03-07 17:38:07.655 7f6c70b8e700 -1 /builddir/build/BUILD/ceph-14.2.11/src/mon/MonMap.cc: In function 'void mon_info_t::decode(ceph::buffer::v14_2_0::list::const_iterator&)' thread 7f6c70b8e700 time 2021-03-07 17:38:07.654388
/builddir/build/BUILD/ceph-14.2.11/src/mon/MonMap.cc: 80: FAILED ceph_assert(weight == 10)

 ceph version 14.2.11-95.el8cp (1d6087ae858e7c8e72fe7390c3522c7e0d951240) nautilus (stable)
 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x156) [0x7f6c7930b358]
 2: (()+0x276572) [0x7f6c7930b572]
 3: (mon_info_t::decode(ceph::buffer::v14_2_0::list::iterator_impl<true>&)+0x922) [0x7f6c796f6f92]
 4: (MonMap::decode(ceph::buffer::v14_2_0::list::iterator_impl<true>&)+0x98a) [0x7f6c796f79aa]
 5: (MonClient::handle_monmap(MMonMap*)+0x13d) [0x7f6c796e49fd]
 6: (MonClient::ms_dispatch(Message*)+0x35b) [0x7f6c796ed6db]
 7: (DispatchQueue::entry()+0x12cc) [0x7f6c79560fec]
 8: (DispatchQueue::DispatchThread::entry()+0x11) [0x7f6c796163f1]
 9: (()+0x82de) [0x7f6c779622de]
 10: (clone()+0x43) [0x7f6c75e5ee83]

 ceph version 14.2.11-95.el8cp (1d6087ae858e7c8e72fe7390c3522c7e0d951240) nautilus (stable)
 1: (()+0x12dd0) [0x7f6c7796cdd0]
 2: (gsignal()+0x10f) [0x7f6c75d9a70f]
 3: (abort()+0x127) [0x7f6c75d84b25]
 4: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x1a7) [0x7f6c7930b3a9]
 5: (()+0x276572) [0x7f6c7930b572]
 6: (mon_info_t::decode(ceph::buffer::v14_2_0::list::iterator_impl<true>&)+0x922) [0x7f6c796f6f92]
 7: (MonMap::decode(ceph::buffer::v14_2_0::list::iterator_impl<true>&)+0x98a) [0x7f6c796f79aa]
 8: (MonClient::handle_monmap(MMonMap*)+0x13d) [0x7f6c796e49fd]
 9: (MonClient::ms_dispatch(Message*)+0x35b) [0x7f6c796ed6db]
 10: (DispatchQueue::entry()+0x12cc) [0x7f6c79560fec]
 11: (DispatchQueue::DispatchThread::entry()+0x11) [0x7f6c796163f1]
 12: (()+0x82de) [0x7f6c779622de]
 13: (clone()+0x43) [0x7f6c75e5ee83]
2021-03-07 17:38:07.656 7f6c70b8e700 -1 *** Caught signal (Aborted) **
 in thread 7f6c70b8e700 thread_name:ms_dispatch

 ceph version 14.2.11-95.el8cp (1d6087ae858e7c8e72fe7390c3522c7e0d951240) nautilus (stable)
 1: (()+0x12dd0) [0x7f6c7796cdd0]
 2: (gsignal()+0x10f) [0x7f6c75d9a70f]
 3: (abort()+0x127) [0x7f6c75d84b25]
 4: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x1a7) [0x7f6c7930b3a9]
 5: (()+0x276572) [0x7f6c7930b572]
 6: (mon_info_t::decode(ceph::buffer::v14_2_0::list::iterator_impl<true>&)+0x922) [0x7f6c796f6f92]
 7: (MonMap::decode(ceph::buffer::v14_2_0::list::iterator_impl<true>&)+0x98a) [0x7f6c796f79aa]
 8: (MonClient::handle_monmap(MMonMap*)+0x13d) [0x7f6c796e49fd]
 9: (MonClient::ms_dispatch(Message*)+0x35b) [0x7f6c796ed6db]
 10: (DispatchQueue::entry()+0x12cc) [0x7f6c79560fec]
 11: (DispatchQueue::DispatchThread::entry()+0x11) [0x7f6c796163f1]
 12: (()+0x82de) [0x7f6c779622de]
 13: (clone()+0x43) [0x7f6c75e5ee83]
 NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.

--- begin dump of recent events ---
   -78> 2021-03-07 17:38:07.620 7f6c836e9500  5 asok(0x563a74247f40) register_command assert hook 0x563a74286710
   -77> 2021-03-07 17:38:07.620 7f6c836e9500  5 asok(0x563a74247f40) register_command abort hook 0x563a74286710
   -76> 2021-03-07 17:38:07.620 7f6c836e9500  5 asok(0x563a74247f40) register_command perfcounters_dump hook 0x563a74286710
   -75> 2021-03-07 17:38:07.620 7f6c836e9500  5 asok(0x563a74247f40) register_command 1 hook 0x563a74286710
   -74> 2021-03-07 17:38:07.620 7f6c836e9500  5 asok(0x563a74247f40) register_command perf dump hook 0x563a74286710
   -73> 2021-03-07 17:38:07.620 7f6c836e9500  5 asok(0x563a74247f40) register_command perfcounters_schema hook 0x563a74286710
   -72> 2021-03-07 17:38:07.620 7f6c836e9500  5 asok(0x563a74247f40) register_command perf histogram dump hook 0x563a74286710
   -71> 2021-03-07 17:38:07.620 7f6c836e9500  5 asok(0x563a74247f40) register_command 2 hook 0x563a74286710
   -70> 2021-03-07 17:38:07.620 7f6c836e9500  5 asok(0x563a74247f40) register_command perf schema hook 0x563a74286710
   -69> 2021-03-07 17:38:07.620 7f6c836e9500  5 asok(0x563a74247f40) register_command perf histogram schema hook 0x563a74286710
   -68> 2021-03-07 17:38:07.620 7f6c836e9500  5 asok(0x563a74247f40) register_command perf reset hook 0x563a74286710
   -67> 2021-03-07 17:38:07.620 7f6c836e9500  5 asok(0x563a74247f40) register_command config show hook 0x563a74286710
   -66> 2021-03-07 17:38:07.620 7f6c836e9500  5 asok(0x563a74247f40) register_command config help hook 0x563a74286710
   -65> 2021-03-07 17:38:07.620 7f6c836e9500  5 asok(0x563a74247f40) register_command config set hook 0x563a74286710
   -64> 2021-03-07 17:38:07.620 7f6c836e9500  5 asok(0x563a74247f40) register_command config unset hook 0x563a74286710
   -63> 2021-03-07 17:38:07.620 7f6c836e9500  5 asok(0x563a74247f40) register_command config get hook 0x563a74286710
   -62> 2021-03-07 17:38:07.620 7f6c836e9500  5 asok(0x563a74247f40) register_command config diff hook 0x563a74286710
   -61> 2021-03-07 17:38:07.620 7f6c836e9500  5 asok(0x563a74247f40) register_command config diff get hook 0x563a74286710
   -60> 2021-03-07 17:38:07.620 7f6c836e9500  5 asok(0x563a74247f40) register_command log flush hook 0x563a74286710
   -59> 2021-03-07 17:38:07.620 7f6c836e9500  5 asok(0x563a74247f40) register_command log dump hook 0x563a74286710
   -58> 2021-03-07 17:38:07.620 7f6c836e9500  5 asok(0x563a74247f40) register_command log reopen hook 0x563a74286710
   -57> 2021-03-07 17:38:07.620 7f6c836e9500  5 asok(0x563a74247f40) register_command dump_mempools hook 0x563a7428ff38
   -56> 2021-03-07 17:38:07.651 7f6c836e9500 10 monclient: get_monmap_and_config
   -55> 2021-03-07 17:38:07.651 7f6c836e9500 10 monclient: build_initial_monmap
   -54> 2021-03-07 17:38:07.651 7f6c836e9500 10 monclient: monmap:
epoch 0
fsid e1559855-e037-46d7-bede-7d690d30b346
last_changed 2021-03-07 17:38:07.651709
created 2021-03-07 17:38:07.651709
min_mon_release 0 (unknown)
election_strategy: 1
0: v2:10.11.12.88:3300/0 mon.noname-a

   -53> 2021-03-07 17:38:07.651 7f6c836e9500  5 AuthRegistry(0x563a743285c0) adding auth protocol: cephx
   -52> 2021-03-07 17:38:07.651 7f6c836e9500  5 AuthRegistry(0x563a743285c0) adding auth protocol: cephx
   -51> 2021-03-07 17:38:07.651 7f6c836e9500  5 AuthRegistry(0x563a743285c0) adding auth protocol: cephx
   -50> 2021-03-07 17:38:07.651 7f6c836e9500  5 AuthRegistry(0x563a743285c0) adding auth protocol: none
   -49> 2021-03-07 17:38:07.651 7f6c836e9500  5 AuthRegistry(0x563a743285c0) adding con mode: secure
   -48> 2021-03-07 17:38:07.651 7f6c836e9500  5 AuthRegistry(0x563a743285c0) adding con mode: crc
   -47> 2021-03-07 17:38:07.651 7f6c836e9500  5 AuthRegistry(0x563a743285c0) adding con mode: secure
   -46> 2021-03-07 17:38:07.651 7f6c836e9500  5 AuthRegistry(0x563a743285c0) adding con mode: crc
   -45> 2021-03-07 17:38:07.651 7f6c836e9500  5 AuthRegistry(0x563a743285c0) adding con mode: secure
   -44> 2021-03-07 17:38:07.651 7f6c836e9500  5 AuthRegistry(0x563a743285c0) adding con mode: crc
   -43> 2021-03-07 17:38:07.651 7f6c836e9500  5 AuthRegistry(0x563a743285c0) adding con mode: crc
   -42> 2021-03-07 17:38:07.651 7f6c836e9500  5 AuthRegistry(0x563a743285c0) adding con mode: secure
   -41> 2021-03-07 17:38:07.651 7f6c836e9500  5 AuthRegistry(0x563a743285c0) adding con mode: crc
   -40> 2021-03-07 17:38:07.651 7f6c836e9500  5 AuthRegistry(0x563a743285c0) adding con mode: secure
   -39> 2021-03-07 17:38:07.651 7f6c836e9500  5 AuthRegistry(0x563a743285c0) adding con mode: crc
   -38> 2021-03-07 17:38:07.651 7f6c836e9500  5 AuthRegistry(0x563a743285c0) adding con mode: secure
   -37> 2021-03-07 17:38:07.651 7f6c836e9500  2 auth: KeyRing::load: loaded key file /etc/ceph/ceph.client.openstack.keyring
   -36> 2021-03-07 17:38:07.652 7f6c836e9500 10 monclient: init
   -35> 2021-03-07 17:38:07.652 7f6c836e9500  5 AuthRegistry(0x7ffc1a0d0f18) adding auth protocol: cephx
   -34> 2021-03-07 17:38:07.652 7f6c836e9500  5 AuthRegistry(0x7ffc1a0d0f18) adding auth protocol: cephx
   -33> 2021-03-07 17:38:07.652 7f6c836e9500  5 AuthRegistry(0x7ffc1a0d0f18) adding auth protocol: cephx
   -32> 2021-03-07 17:38:07.652 7f6c836e9500  5 AuthRegistry(0x7ffc1a0d0f18) adding auth protocol: none
   -31> 2021-03-07 17:38:07.652 7f6c836e9500  5 AuthRegistry(0x7ffc1a0d0f18) adding con mode: secure
   -30> 2021-03-07 17:38:07.652 7f6c836e9500  5 AuthRegistry(0x7ffc1a0d0f18) adding con mode: crc
   -29> 2021-03-07 17:38:07.652 7f6c836e9500  5 AuthRegistry(0x7ffc1a0d0f18) adding con mode: secure
   -28> 2021-03-07 17:38:07.652 7f6c836e9500  5 AuthRegistry(0x7ffc1a0d0f18) adding con mode: crc
   -27> 2021-03-07 17:38:07.652 7f6c836e9500  5 AuthRegistry(0x7ffc1a0d0f18) adding con mode: secure
   -26> 2021-03-07 17:38:07.652 7f6c836e9500  5 AuthRegistry(0x7ffc1a0d0f18) adding con mode: crc
   -25> 2021-03-07 17:38:07.652 7f6c836e9500  5 AuthRegistry(0x7ffc1a0d0f18) adding con mode: crc
   -24> 2021-03-07 17:38:07.652 7f6c836e9500  5 AuthRegistry(0x7ffc1a0d0f18) adding con mode: secure
   -23> 2021-03-07 17:38:07.652 7f6c836e9500  5 AuthRegistry(0x7ffc1a0d0f18) adding con mode: crc
   -22> 2021-03-07 17:38:07.652 7f6c836e9500  5 AuthRegistry(0x7ffc1a0d0f18) adding con mode: secure
   -21> 2021-03-07 17:38:07.652 7f6c836e9500  5 AuthRegistry(0x7ffc1a0d0f18) adding con mode: crc
   -20> 2021-03-07 17:38:07.652 7f6c836e9500  5 AuthRegistry(0x7ffc1a0d0f18) adding con mode: secure
   -19> 2021-03-07 17:38:07.652 7f6c836e9500  2 auth: KeyRing::load: loaded key file /etc/ceph/ceph.client.openstack.keyring
   -18> 2021-03-07 17:38:07.652 7f6c836e9500  2 auth: KeyRing::load: loaded key file /etc/ceph/ceph.client.openstack.keyring
   -17> 2021-03-07 17:38:07.652 7f6c836e9500 10 monclient: _reopen_session rank -1
   -16> 2021-03-07 17:38:07.652 7f6c836e9500 10 monclient(hunting): picked mon.noname-a con 0x563a743bbff0 addr v2:10.11.12.88:3300/0
   -15> 2021-03-07 17:38:07.652 7f6c836e9500 10 monclient(hunting): start opening mon connection
   -14> 2021-03-07 17:38:07.652 7f6c836e9500 10 monclient(hunting): _renew_subs
   -13> 2021-03-07 17:38:07.652 7f6c836e9500 10 monclient(hunting): authenticate will time out at 2021-03-07 17:43:07.653033
   -12> 2021-03-07 17:38:07.653 7f6c7138f700 10 monclient(hunting): get_auth_request con 0x563a743bbff0 auth_method 0
   -11> 2021-03-07 17:38:07.653 7f6c7138f700 10 monclient(hunting): get_auth_request method 2 preferred_modes [2,1]
   -10> 2021-03-07 17:38:07.653 7f6c7138f700 10 monclient(hunting): _init_auth method 2
    -9> 2021-03-07 17:38:07.653 7f6c7138f700 10 monclient(hunting): handle_auth_reply_more payload 9
    -8> 2021-03-07 17:38:07.653 7f6c7138f700 10 monclient(hunting): handle_auth_reply_more payload_len 9
    -7> 2021-03-07 17:38:07.653 7f6c7138f700 10 monclient(hunting): handle_auth_reply_more responding with 36 bytes
    -6> 2021-03-07 17:38:07.653 7f6c7138f700 10 monclient(hunting): handle_auth_done global_id 22731321 payload 482
    -5> 2021-03-07 17:38:07.653 7f6c7138f700 10 monclient: _finish_hunting 0
    -4> 2021-03-07 17:38:07.653 7f6c7138f700  1 monclient: found mon.noname-a
    -3> 2021-03-07 17:38:07.653 7f6c7138f700 10 monclient: _send_mon_message to mon.noname-a at v2:10.11.12.88:3300/0
    -2> 2021-03-07 17:38:07.654 7f6c70b8e700 10 monclient: handle_monmap mon_map magic: 0 v1
    -1> 2021-03-07 17:38:07.655 7f6c70b8e700 -1 /builddir/build/BUILD/ceph-14.2.11/src/mon/MonMap.cc: In function 'void mon_info_t::decode(ceph::buffer::v14_2_0::list::const_iterator&)' thread 7f6c70b8e700 time 2021-03-07 17:38:07.654388
/builddir/build/BUILD/ceph-14.2.11/src/mon/MonMap.cc: 80: FAILED ceph_assert(weight == 10)

 ceph version 14.2.11-95.el8cp (1d6087ae858e7c8e72fe7390c3522c7e0d951240) nautilus (stable)
 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x156) [0x7f6c7930b358]
 2: (()+0x276572) [0x7f6c7930b572]
 3: (mon_info_t::decode(ceph::buffer::v14_2_0::list::iterator_impl<true>&)+0x922) [0x7f6c796f6f92]
 4: (MonMap::decode(ceph::buffer::v14_2_0::list::iterator_impl<true>&)+0x98a) [0x7f6c796f79aa]
 5: (MonClient::handle_monmap(MMonMap*)+0x13d) [0x7f6c796e49fd]
 6: (MonClient::ms_dispatch(Message*)+0x35b) [0x7f6c796ed6db]
 7: (DispatchQueue::entry()+0x12cc) [0x7f6c79560fec]
 8: (DispatchQueue::DispatchThread::entry()+0x11) [0x7f6c796163f1]
 9: (()+0x82de) [0x7f6c779622de]
 10: (clone()+0x43) [0x7f6c75e5ee83]

 ceph version 14.2.11-95.el8cp (1d6087ae858e7c8e72fe7390c3522c7e0d951240) nautilus (stable)
 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x156) [0x7f6c7930b358]
 2: (()+0x276572) [0x7f6c7930b572]
 3: (mon_info_t::decode(ceph::buffer::v14_2_0::list::iterator_impl<true>&)+0x922) [0x7f6c796f6f92]
 4: (MonMap::decode(ceph::buffer::v14_2_0::list::iterator_impl<true>&)+0x98a) [0x7f6c796f79aa]
 5: (MonClient::handle_monmap(MMonMap*)+0x13d) [0x7f6c796e49fd]
 6: (MonClient::ms_dispatch(Message*)+0x35b) [0x7f6c796ed6db]
 7: (DispatchQueue::entry()+0x12cc) [0x7f6c79560fec]
 8: (DispatchQueue::DispatchThread::entry()+0x11) [0x7f6c796163f1]
 9: (()+0x82de) [0x7f6c779622de]
 10: (clone()+0x43) [0x7f6c75e5ee83]

     0> 2021-03-07 17:38:07.656 7f6c70b8e700 -1 *** Caught signal (Aborted) **
 in thread 7f6c70b8e700 thread_name:ms_dispatch

 ceph version 14.2.11-95.el8cp (1d6087ae858e7c8e72fe7390c3522c7e0d951240) nautilus (stable)
 1: (()+0x12dd0) [0x7f6c7796cdd0]
 2: (gsignal()+0x10f) [0x7f6c75d9a70f]
 3: (abort()+0x127) [0x7f6c75d84b25]
 4: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x1a7) [0x7f6c7930b3a9]
 5: (()+0x276572) [0x7f6c7930b572]
 6: (mon_info_t::decode(ceph::buffer::v14_2_0::list::iterator_impl<true>&)+0x922) [0x7f6c796f6f92]
 7: (MonMap::decode(ceph::buffer::v14_2_0::list::iterator_impl<true>&)+0x98a) [0x7f6c796f79aa]
 8: (MonClient::handle_monmap(MMonMap*)+0x13d) [0x7f6c796e49fd]
 9: (MonClient::ms_dispatch(Message*)+0x35b) [0x7f6c796ed6db]
 10: (DispatchQueue::entry()+0x12cc) [0x7f6c79560fec]
 11: (DispatchQueue::DispatchThread::entry()+0x11) [0x7f6c796163f1]
 12: (()+0x82de) [0x7f6c779622de]
 13: (clone()+0x43) [0x7f6c75e5ee83]
 NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.

--- logging levels ---
   0/ 5 none
   0/ 1 lockdep
   0/ 1 context
   1/ 1 crush
   1/ 5 mds
   1/ 5 mds_balancer
   1/ 5 mds_locker
   1/ 5 mds_log
   1/ 5 mds_log_expire
   1/ 5 mds_migrator
   0/ 1 buffer
   0/ 1 timer
   0/ 1 filer
   0/ 1 striper
   0/ 1 objecter
   0/ 5 rados
   0/ 5 rbd
   0/ 5 rbd_mirror
   0/ 5 rbd_replay
   0/ 5 journaler
   0/ 5 objectcacher
   0/ 5 client
   1/ 5 osd
   0/ 5 optracker
   0/ 5 objclass
   1/ 3 filestore
   1/ 3 journal
   0/ 0 ms
   1/ 5 mon
   0/10 monc
   1/ 5 paxos
   0/ 5 tp
   1/ 5 auth
   1/ 5 crypto
   1/ 1 finisher
   1/ 1 reserver
   1/ 5 heartbeatmap
   1/ 5 perfcounter
   1/ 5 rgw
   1/ 5 rgw_sync
   1/10 civetweb
   1/ 5 javaclient
   1/ 5 asok
   1/ 1 throttle
   0/ 0 refs
   1/ 5 xio
   1/ 5 compressor
   1/ 5 bluestore
   1/ 5 bluefs
   1/ 3 bdev
   1/ 5 kstore
   4/ 5 rocksdb
   4/ 5 leveldb
   4/ 5 memdb
   1/ 5 kinetic
   1/ 5 fuse
   1/ 5 mgr
   1/ 5 mgrc
   1/ 5 dpdk
   1/ 5 eventtrace
   1/ 5 prioritycache
   0/ 5 test
  -2/-2 (syslog threshold)
  99/99 (stderr threshold)
  max_recent       500
  max_new         1000
  log_file 
--- end dump of recent events ---
--- begin dump of recent events ---
   -78> 2021-03-07 17:38:07.620 7f6c836e9500  5 asok(0x563a74247f40) register_command assert hook 0x563a74286710
   -77> 2021-03-07 17:38:07.620 7f6c836e9500  5 asok(0x563a74247f40) register_command abort hook 0x563a74286710
   -76> 2021-03-07 17:38:07.620 7f6c836e9500  5 asok(0x563a74247f40) register_command perfcounters_dump hook 0x563a74286710
   -75> 2021-03-07 17:38:07.620 7f6c836e9500  5 asok(0x563a74247f40) register_command 1 hook 0x563a74286710
   -74> 2021-03-07 17:38:07.620 7f6c836e9500  5 asok(0x563a74247f40) register_command perf dump hook 0x563a74286710
   -73> 2021-03-07 17:38:07.620 7f6c836e9500  5 asok(0x563a74247f40) register_command perfcounters_schema hook 0x563a74286710
   -72> 2021-03-07 17:38:07.620 7f6c836e9500  5 asok(0x563a74247f40) register_command perf histogram dump hook 0x563a74286710
   -71> 2021-03-07 17:38:07.620 7f6c836e9500  5 asok(0x563a74247f40) register_command 2 hook 0x563a74286710
   -70> 2021-03-07 17:38:07.620 7f6c836e9500  5 asok(0x563a74247f40) register_command perf schema hook 0x563a74286710
   -69> 2021-03-07 17:38:07.620 7f6c836e9500  5 asok(0x563a74247f40) register_command perf histogram schema hook 0x563a74286710
   -68> 2021-03-07 17:38:07.620 7f6c836e9500  5 asok(0x563a74247f40) register_command perf reset hook 0x563a74286710
   -67> 2021-03-07 17:38:07.620 7f6c836e9500  5 asok(0x563a74247f40) register_command config show hook 0x563a74286710
   -66> 2021-03-07 17:38:07.620 7f6c836e9500  5 asok(0x563a74247f40) register_command config help hook 0x563a74286710
   -65> 2021-03-07 17:38:07.620 7f6c836e9500  5 asok(0x563a74247f40) register_command config set hook 0x563a74286710
   -64> 2021-03-07 17:38:07.620 7f6c836e9500  5 asok(0x563a74247f40) register_command config unset hook 0x563a74286710
   -63> 2021-03-07 17:38:07.620 7f6c836e9500  5 asok(0x563a74247f40) register_command config get hook 0x563a74286710
   -62> 2021-03-07 17:38:07.620 7f6c836e9500  5 asok(0x563a74247f40) register_command config diff hook 0x563a74286710
   -61> 2021-03-07 17:38:07.620 7f6c836e9500  5 asok(0x563a74247f40) register_command config diff get hook 0x563a74286710
   -60> 2021-03-07 17:38:07.620 7f6c836e9500  5 asok(0x563a74247f40) register_command log flush hook 0x563a74286710
   -59> 2021-03-07 17:38:07.620 7f6c836e9500  5 asok(0x563a74247f40) register_command log dump hook 0x563a74286710
   -58> 2021-03-07 17:38:07.620 7f6c836e9500  5 asok(0x563a74247f40) register_command log reopen hook 0x563a74286710
   -57> 2021-03-07 17:38:07.620 7f6c836e9500  5 asok(0x563a74247f40) register_command dump_mempools hook 0x563a7428ff38
   -56> 2021-03-07 17:38:07.651 7f6c836e9500 10 monclient: get_monmap_and_config
   -55> 2021-03-07 17:38:07.651 7f6c836e9500 10 monclient: build_initial_monmap
   -54> 2021-03-07 17:38:07.651 7f6c836e9500 10 monclient: monmap:
epoch 0
fsid e1559855-e037-46d7-bede-7d690d30b346
last_changed 2021-03-07 17:38:07.651709
created 2021-03-07 17:38:07.651709
min_mon_release 0 (unknown)
election_strategy: 1
0: v2:10.11.12.88:3300/0 mon.noname-a

   -53> 2021-03-07 17:38:07.651 7f6c836e9500  5 AuthRegistry(0x563a743285c0) adding auth protocol: cephx
   -52> 2021-03-07 17:38:07.651 7f6c836e9500  5 AuthRegistry(0x563a743285c0) adding auth protocol: cephx
   -51> 2021-03-07 17:38:07.651 7f6c836e9500  5 AuthRegistry(0x563a743285c0) adding auth protocol: cephx
   -50> 2021-03-07 17:38:07.651 7f6c836e9500  5 AuthRegistry(0x563a743285c0) adding auth protocol: none
   -49> 2021-03-07 17:38:07.651 7f6c836e9500  5 AuthRegistry(0x563a743285c0) adding con mode: secure
   -48> 2021-03-07 17:38:07.651 7f6c836e9500  5 AuthRegistry(0x563a743285c0) adding con mode: crc
   -47> 2021-03-07 17:38:07.651 7f6c836e9500  5 AuthRegistry(0x563a743285c0) adding con mode: secure
   -46> 2021-03-07 17:38:07.651 7f6c836e9500  5 AuthRegistry(0x563a743285c0) adding con mode: crc
   -45> 2021-03-07 17:38:07.651 7f6c836e9500  5 AuthRegistry(0x563a743285c0) adding con mode: secure
   -44> 2021-03-07 17:38:07.651 7f6c836e9500  5 AuthRegistry(0x563a743285c0) adding con mode: crc
   -43> 2021-03-07 17:38:07.651 7f6c836e9500  5 AuthRegistry(0x563a743285c0) adding con mode: crc
   -42> 2021-03-07 17:38:07.651 7f6c836e9500  5 AuthRegistry(0x563a743285c0) adding con mode: secure
   -41> 2021-03-07 17:38:07.651 7f6c836e9500  5 AuthRegistry(0x563a743285c0) adding con mode: crc
   -40> 2021-03-07 17:38:07.651 7f6c836e9500  5 AuthRegistry(0x563a743285c0) adding con mode: secure
   -39> 2021-03-07 17:38:07.651 7f6c836e9500  5 AuthRegistry(0x563a743285c0) adding con mode: crc
   -38> 2021-03-07 17:38:07.651 7f6c836e9500  5 AuthRegistry(0x563a743285c0) adding con mode: secure
   -37> 2021-03-07 17:38:07.651 7f6c836e9500  2 auth: KeyRing::load: loaded key file /etc/ceph/ceph.client.openstack.keyring
   -36> 2021-03-07 17:38:07.652 7f6c836e9500 10 monclient: init
   -35> 2021-03-07 17:38:07.652 7f6c836e9500  5 AuthRegistry(0x7ffc1a0d0f18) adding auth protocol: cephx
   -34> 2021-03-07 17:38:07.652 7f6c836e9500  5 AuthRegistry(0x7ffc1a0d0f18) adding auth protocol: cephx
   -33> 2021-03-07 17:38:07.652 7f6c836e9500  5 AuthRegistry(0x7ffc1a0d0f18) adding auth protocol: cephx
   -32> 2021-03-07 17:38:07.652 7f6c836e9500  5 AuthRegistry(0x7ffc1a0d0f18) adding auth protocol: none
   -31> 2021-03-07 17:38:07.652 7f6c836e9500  5 AuthRegistry(0x7ffc1a0d0f18) adding con mode: secure
   -30> 2021-03-07 17:38:07.652 7f6c836e9500  5 AuthRegistry(0x7ffc1a0d0f18) adding con mode: crc
   -29> 2021-03-07 17:38:07.652 7f6c836e9500  5 AuthRegistry(0x7ffc1a0d0f18) adding con mode: secure
   -28> 2021-03-07 17:38:07.652 7f6c836e9500  5 AuthRegistry(0x7ffc1a0d0f18) adding con mode: crc
   -27> 2021-03-07 17:38:07.652 7f6c836e9500  5 AuthRegistry(0x7ffc1a0d0f18) adding con mode: secure
   -26> 2021-03-07 17:38:07.652 7f6c836e9500  5 AuthRegistry(0x7ffc1a0d0f18) adding con mode: crc
   -25> 2021-03-07 17:38:07.652 7f6c836e9500  5 AuthRegistry(0x7ffc1a0d0f18) adding con mode: crc
   -24> 2021-03-07 17:38:07.652 7f6c836e9500  5 AuthRegistry(0x7ffc1a0d0f18) adding con mode: secure
   -23> 2021-03-07 17:38:07.652 7f6c836e9500  5 AuthRegistry(0x7ffc1a0d0f18) adding con mode: crc
   -22> 2021-03-07 17:38:07.652 7f6c836e9500  5 AuthRegistry(0x7ffc1a0d0f18) adding con mode: secure
   -21> 2021-03-07 17:38:07.652 7f6c836e9500  5 AuthRegistry(0x7ffc1a0d0f18) adding con mode: crc
   -20> 2021-03-07 17:38:07.652 7f6c836e9500  5 AuthRegistry(0x7ffc1a0d0f18) adding con mode: secure
   -19> 2021-03-07 17:38:07.652 7f6c836e9500  2 auth: KeyRing::load: loaded key file /etc/ceph/ceph.client.openstack.keyring
   -18> 2021-03-07 17:38:07.652 7f6c836e9500  2 auth: KeyRing::load: loaded key file /etc/ceph/ceph.client.openstack.keyring
   -17> 2021-03-07 17:38:07.652 7f6c836e9500 10 monclient: _reopen_session rank -1
   -16> 2021-03-07 17:38:07.652 7f6c836e9500 10 monclient(hunting): picked mon.noname-a con 0x563a743bbff0 addr v2:10.11.12.88:3300/0
   -15> 2021-03-07 17:38:07.652 7f6c836e9500 10 monclient(hunting): start opening mon connection
   -14> 2021-03-07 17:38:07.652 7f6c836e9500 10 monclient(hunting): _renew_subs
   -13> 2021-03-07 17:38:07.652 7f6c836e9500 10 monclient(hunting): authenticate will time out at 2021-03-07 17:43:07.653033
   -12> 2021-03-07 17:38:07.653 7f6c7138f700 10 monclient(hunting): get_auth_request con 0x563a743bbff0 auth_method 0
   -11> 2021-03-07 17:38:07.653 7f6c7138f700 10 monclient(hunting): get_auth_request method 2 preferred_modes [2,1]
   -10> 2021-03-07 17:38:07.653 7f6c7138f700 10 monclient(hunting): _init_auth method 2
    -9> 2021-03-07 17:38:07.653 7f6c7138f700 10 monclient(hunting): handle_auth_reply_more payload 9
    -8> 2021-03-07 17:38:07.653 7f6c7138f700 10 monclient(hunting): handle_auth_reply_more payload_len 9
    -7> 2021-03-07 17:38:07.653 7f6c7138f700 10 monclient(hunting): handle_auth_reply_more responding with 36 bytes
    -6> 2021-03-07 17:38:07.653 7f6c7138f700 10 monclient(hunting): handle_auth_done global_id 22731321 payload 482
    -5> 2021-03-07 17:38:07.653 7f6c7138f700 10 monclient: _finish_hunting 0
    -4> 2021-03-07 17:38:07.653 7f6c7138f700  1 monclient: found mon.noname-a
    -3> 2021-03-07 17:38:07.653 7f6c7138f700 10 monclient: _send_mon_message to mon.noname-a at v2:10.11.12.88:3300/0
    -2> 2021-03-07 17:38:07.654 7f6c70b8e700 10 monclient: handle_monmap mon_map magic: 0 v1
    -1> 2021-03-07 17:38:07.655 7f6c70b8e700 -1 /builddir/build/BUILD/ceph-14.2.11/src/mon/MonMap.cc: In function 'void mon_info_t::decode(ceph::buffer::v14_2_0::list::const_iterator&)' thread 7f6c70b8e700 time 2021-03-07 17:38:07.654388
/builddir/build/BUILD/ceph-14.2.11/src/mon/MonMap.cc: 80: FAILED ceph_assert(weight == 10)

 ceph version 14.2.11-95.el8cp (1d6087ae858e7c8e72fe7390c3522c7e0d951240) nautilus (stable)
 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x156) [0x7f6c7930b358]
 2: (()+0x276572) [0x7f6c7930b572]
 3: (mon_info_t::decode(ceph::buffer::v14_2_0::list::iterator_impl<true>&)+0x922) [0x7f6c796f6f92]
 4: (MonMap::decode(ceph::buffer::v14_2_0::list::iterator_impl<true>&)+0x98a) [0x7f6c796f79aa]
 5: (MonClient::handle_monmap(MMonMap*)+0x13d) [0x7f6c796e49fd]
 6: (MonClient::ms_dispatch(Message*)+0x35b) [0x7f6c796ed6db]
 7: (DispatchQueue::entry()+0x12cc) [0x7f6c79560fec]
 8: (DispatchQueue::DispatchThread::entry()+0x11) [0x7f6c796163f1]
 9: (()+0x82de) [0x7f6c779622de]
 10: (clone()+0x43) [0x7f6c75e5ee83]

     0> 2021-03-07 17:38:07.656 7f6c70b8e700 -1 *** Caught signal (Aborted) **
 in thread 7f6c70b8e700 thread_name:ms_dispatch
ceph version 14.2.11-95.el8cp (1d6087ae858e7c8e72fe7390c3522c7e0d951240) nautilus (stable)
 1: (()+0x12dd0) [0x7f6c7796cdd0]
 2: (gsignal()+0x10f) [0x7f6c75d9a70f]
 3: (abort()+0x127) [0x7f6c75d84b25]
 4: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x1a7) [0x7f6c7930b3a9]
 5: (()+0x276572) [0x7f6c7930b572]
 6: (mon_info_t::decode(ceph::buffer::v14_2_0::list::iterator_impl<true>&)+0x922) [0x7f6c796f6f92]
 7: (MonMap::decode(ceph::buffer::v14_2_0::list::iterator_impl<true>&)+0x98a) [0x7f6c796f79aa]
 8: (MonClient::handle_monmap(MMonMap*)+0x13d) [0x7f6c796e49fd]
 9: (MonClient::ms_dispatch(Message*)+0x35b) [0x7f6c796ed6db]
 10: (DispatchQueue::entry()+0x12cc) [0x7f6c79560fec]
 11: (DispatchQueue::DispatchThread::entry()+0x11) [0x7f6c796163f1]
 12: (()+0x82de) [0x7f6c779622de]
 13: (clone()+0x43) [0x7f6c75e5ee83]
 NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.

--- logging levels ---
   0/ 5 none
   0/ 1 lockdep
   0/ 1 context
   1/ 1 crush
   1/ 5 mds
   1/ 5 mds_balancer
   1/ 5 mds_locker
   1/ 5 mds_log
   1/ 5 mds_log_expire
   1/ 5 mds_migrator
   0/ 1 buffer
   0/ 1 timer
   0/ 1 filer
   0/ 1 striper
   0/ 1 objecter
   0/ 5 rados
   0/ 5 rbd
   0/ 5 rbd_mirror
   0/ 5 rbd_replay
   0/ 5 journaler
   0/ 5 objectcacher
   0/ 5 client
   1/ 5 osd
   0/ 5 optracker
   0/ 5 objclass
   1/ 3 filestore
   1/ 3 journal
   0/ 0 ms
   1/ 5 mon
   0/10 monc
   1/ 5 paxos
   0/ 5 tp
   1/ 5 auth
   1/ 5 crypto
   1/ 1 finisher
   1/ 1 reserver
   1/ 5 heartbeatmap
   1/ 5 perfcounter
   1/ 5 rgw
   1/ 5 rgw_sync
   1/10 civetweb
   1/ 5 javaclient
   1/ 5 asok
   1/ 1 throttle
   0/ 0 refs
   1/ 5 xio
   1/ 5 compressor
   1/ 5 bluestore
   1/ 5 bluefs
   1/ 3 bdev
   1/ 5 kstore
   4/ 5 rocksdb
   4/ 5 leveldb
   4/ 5 memdb
   1/ 5 kinetic
   1/ 5 fuse
   1/ 5 mgr
   1/ 5 mgrc
   1/ 5 dpdk
   1/ 5 eventtrace
   1/ 5 prioritycache
   0/ 5 test
  -2/-2 (syslog threshold)
  99/99 (stderr threshold)
  max_recent       500
  max_new         1000
  log_file /var/lib/ceph/crash/2021-03-07_16:38:07.657168Z_f3cb6559-7852-417b-9f7d-7f2c5c8f612e/log
--- end dump of recent events ---
Error: non zero exit code: 134: OCI runtime error

Expected results:
List of images (currently empty)
Additionally, if an error happens, the above gibberish gives admin 0 ideas of what is wrong and how to attempt to fix it.


Additional info:
External ceph in our case is 15.2.1 octopus release. I didn't find any ceph version interoperability matrix or some such, but I believe rbd protocol didn't change much since its first release and should just work. Let me know if I'm mistaken.

Comment 2 John Fulton 2021-03-08 16:17:20 UTC
Red Hat OpenStack QE tests v16 with RHCSv4 (based on luminous) and we'll test it with external RHCSv5 (based on pacific) before it GAs, but we don't test downstream v16 with upstream Octopus so this isn't a configuration we support.

However, there might be useful information here worth checking to make sure it is not an issue in the RHCS4 client so the product component of this bug has been switched from OpenStack to Ceph.

Comment 3 Brad Hubbard 2021-03-08 23:09:49 UTC
This appears to be related to the work done in https://bugzilla.redhat.com/show_bug.cgi?id=1800382 @gfarnum mind taking a look?

Comment 4 Greg Farnum 2021-03-09 08:49:30 UTC
Yep; looks like one of my safety checks busted the client against newer Ceph server releases. I can pull that out.

Just to be sure, though, does this deployment do something to set monitor weights? The assert is checking that the monitor weight is set to the default, and I’m surprised anything is bothering to change those.

Comment 5 its_rhn 2021-03-09 09:00:33 UTC
Not that I'm aware of. Also grepping through yaml files used by tripleo and ceph-ansible for "weight" gives no hits. So I'd say it's very much default.

Comment 6 Greg Farnum 2021-03-16 17:13:56 UTC
This is now fixed in our ceph-4.2-rhel-patches branch so should go out in 4.2z1. Thanks for the report!

Comment 29 errata-xmlrpc 2021-04-28 20:13:21 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Important: Red Hat Ceph Storage security, bug fix, and enhancement Update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2021:1452