Description of problem (please be detailed as possible and provide log snippets): The test link given below uses a single cephblockpool with replica 2 and compression enabled which is shared by two storageclasses. The test consists of creating PVCs using these SCs, mounting them on app pods and running IO on them. On running the following test, the OSD pods start going into CrashLoopBackOff state. https://github.com/red-hat-storage/ocs-ci/blob/master/tests/manage/storageclass/test_create_2_sc_with_1_pool_comp_rep2.py $ oc get pods |grep rook-ceph-osd- rook-ceph-osd-0-6fb4bf9944-7snwv 2/2 Running 6 (4m ago) 53m rook-ceph-osd-1-84df69484f-29m5f 1/2 CrashLoopBackOff 5 (115s ago) 52m rook-ceph-osd-2-6c5d8987d7-hmk6z 1/2 CrashLoopBackOff 5 (48s ago) 52m This leaves the cluster in an unhealthy state. The OSD states switch between Running -> Error -> CrashLoopBackOff. This behavior has been seen only in clusters on the azure platform so far, for both upgraded and new deployments. The same test was run on an AWS cluster with the same builds but this issue was not observed. Version of all relevant components (if applicable): OCP: 4.10.0-0.nightly-2021-12-23-153012 ODF: 4.9.1-258.ci, 4.9.1-257.ci, 4.9.0-251.ci Does this issue impact your ability to continue to work with the product (please explain in detail what is the user impact)? Yes, the cluster is left in an unhealthy state. Is there any workaround available to the best of your knowledge? Not that I am aware of Rate from 1 - 5 the complexity of the scenario you performed that caused this bug (1 - very simple, 5 - very complex)? 3 Can this issue reproducible? Yes Can this issue reproduce from the UI? If this is a regression, please provide more details to justify this: Yes, there has been successful runs of this test in previous releases. The most recent successful run has been with the following versions: ODF: 4.9.0-251.ci OCP: 4.9.0-0.nightly-2021-12-08-092803 Steps to Reproduce: ------------------ OCS-CI test: https://github.com/red-hat-storage/ocs-ci/blob/master/tests/manage/storageclass/test_create_2_sc_with_1_pool_comp_rep2.py The test above consists of the following steps: 1. Create 2 Storage Class with creating one rbd pool for both 2. Create PVCs using the new Storage Classes 3. Mount PVC to app pod Actual results: --------------- After the PVCs were created and were being mounted to the app pod, it was observed that the OSDs started going into CrashLoopBackOff state, which left the cluster unhealthy Expected results: ----------------- The test should pass and the cluster should be in a healthy state throughout the test
The OSD.2 log [1] shows the following bluestore assert. Neha, who could take a look? 2021-12-28T18:56:21.713686028Z debug 2021-12-28T18:56:21.712+0000 7f4e5efec700 1 osd.2 pg_epoch: 475 pg[2.8( v 443'248574 (290'246134,443'248574] local-lis/les=442/443 n=527 ec=12/12 lis/c=462/239 les/c/f=463/240/0 sis=475) [1,2,0] r=1 lpr=475 pi=[239,475)/4 crt=443'248574 lcod 0'0 mlcod 0'0 unknown NOTIFY mbc={}] state<Start>: transitioning to Stray 2021-12-28T18:56:22.766209808Z /builddir/build/BUILD/ceph-16.2.0/src/os/bluestore/HybridAllocator.cc: In function 'virtual int64_t HybridAllocator::allocate(uint64_t, uint64_t, uint64_t, int64_t, PExtentVector*)' thread 7f4e5ffee700 time 2021-12-28T18:56:22.766157+0000 2021-12-28T18:56:22.766209808Z /builddir/build/BUILD/ceph-16.2.0/src/os/bluestore/HybridAllocator.cc: 31: FAILED ceph_assert(want % unit == 0) 2021-12-28T18:56:22.766415811Z /builddir/build/BUILD/ceph-16.2.0/src/os/bluestore/HybridAllocator.cc: In function 'virtual int64_t HybridAllocator::allocate(uint64_t, uint64_t, uint64_t, int64_t, PExtentVector*)' thread 7f4e5b7e5700 time 2021-12-28T18:56:22.766379+0000 2021-12-28T18:56:22.766415811Z /builddir/build/BUILD/ceph-16.2.0/src/os/bluestore/HybridAllocator.cc: 31: FAILED ceph_assert(want % unit == 0) 2021-12-28T18:56:22.767038720Z /builddir/build/BUILD/ceph-16.2.0/src/os/bluestore/HybridAllocator.cc: In function 'virtual int64_t HybridAllocator::allocate(uint64_t, uint64_t, uint64_t, int64_t, PExtentVector*)' thread 7f4e5dfea700 time 2021-12-28T18:56:22.766990+0000 2021-12-28T18:56:22.767038720Z /builddir/build/BUILD/ceph-16.2.0/src/os/bluestore/HybridAllocator.cc: 31: FAILED ceph_assert(want % unit == 0) 2021-12-28T18:56:22.767359025Z /builddir/build/BUILD/ceph-16.2.0/src/os/bluestore/HybridAllocator.cc: In function 'virtual int64_t HybridAllocator::allocate(uint64_t, uint64_t, uint64_t, int64_t, PExtentVector*)' thread 7f4e607ef700 time 2021-12-28T18:56:22.767327+0000 2021-12-28T18:56:22.767359025Z /builddir/build/BUILD/ceph-16.2.0/src/os/bluestore/HybridAllocator.cc: 31: FAILED ceph_assert(want % unit == 0) 2021-12-28T18:56:22.772643302Z ceph version 16.2.0-146.el8cp (56f5e9cfe88a08b6899327eca5166ca1c4a392aa) pacific (stable) 2021-12-28T18:56:22.772643302Z 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x158) [0x5639c40de27a] 2021-12-28T18:56:22.772643302Z 2: ceph-osd(+0x567494) [0x5639c40de494] 2021-12-28T18:56:22.772643302Z 3: (HybridAllocator::allocate(unsigned long, unsigned long, unsigned long, long, std::vector<bluestore_pextent_t, mempool::pool_allocator<(mempool::pool_index_t)5, bluestore_pextent_t> >*)+0x97f) [0x5639c481663f] 2021-12-28T18:56:22.772643302Z 4: (BlueStore::_do_alloc_write(BlueStore::TransContext*, boost::intrusive_ptr<BlueStore::Collection>, boost::intrusive_ptr<BlueStore::Onode>, BlueStore::WriteContext*)+0x70e) [0x5639c46e8a8e] 2021-12-28T18:56:22.772643302Z 5: (BlueStore::_do_write(BlueStore::TransContext*, boost::intrusive_ptr<BlueStore::Collection>&, boost::intrusive_ptr<BlueStore::Onode>, unsigned long, unsigned long, ceph::buffer::v15_2_0::list&, unsigned int)+0x694) [0x5639c4721b04] 2021-12-28T18:56:22.772643302Z 6: (BlueStore::_write(BlueStore::TransContext*, boost::intrusive_ptr<BlueStore::Collection>&, boost::intrusive_ptr<BlueStore::Onode>&, unsigned long, unsigned long, ceph::buffer::v15_2_0::list&, unsigned int)+0xd1) [0x5639c4722c71] 2021-12-28T18:56:22.772643302Z 7: (BlueStore::_txc_add_transaction(BlueStore::TransContext*, ceph::os::Transaction*)+0x2077) [0x5639c4726cb7] 2021-12-28T18:56:22.772643302Z 8: (BlueStore::queue_transactions(boost::intrusive_ptr<ObjectStore::CollectionImpl>&, std::vector<ceph::os::Transaction, std::allocator<ceph::os::Transaction> >&, boost::intrusive_ptr<TrackedOp>, ThreadPool::TPHandle*)+0x316) [0x5639c4727ff6] 2021-12-28T18:56:22.772643302Z 9: (non-virtual thunk to PrimaryLogPG::queue_transactions(std::vector<ceph::os::Transaction, std::allocator<ceph::os::Transaction> >&, boost::intrusive_ptr<OpRequest>)+0x58) [0x5639c4384058] 2021-12-28T18:56:22.772643302Z 10: (ReplicatedBackend::submit_transaction(hobject_t const&, object_stat_sum_t const&, eversion_t const&, std::unique_ptr<PGTransaction, std::default_delete<PGTransaction> >&&, eversion_t const&, eversion_t const&, std::vector<pg_log_entry_t, std::allocator<pg_log_entry_t> >&&, std::optional<pg_hit_set_history_t>&, Context*, unsigned long, osd_reqid_t, boost::intrusive_ptr<OpRequest>)+0xc9d) [0x5639c456dbdd] 2021-12-28T18:56:22.772643302Z 11: (PrimaryLogPG::issue_repop(PrimaryLogPG::RepGather*, PrimaryLogPG::OpContext*)+0xce0) [0x5639c42ef570] 2021-12-28T18:56:22.772643302Z 12: (PrimaryLogPG::execute_ctx(PrimaryLogPG::OpContext*)+0x115d) [0x5639c434b45d] 2021-12-28T18:56:22.772643302Z 13: (PrimaryLogPG::do_op(boost::intrusive_ptr<OpRequest>&)+0x2de2) [0x5639c4353fd2] 2021-12-28T18:56:22.772643302Z 14: (PrimaryLogPG::do_request(boost::intrusive_ptr<OpRequest>&, ThreadPool::TPHandle&)+0xd1c) [0x5639c435b31c] 2021-12-28T18:56:22.772643302Z 15: (OSD::dequeue_op(boost::intrusive_ptr<PG>, boost::intrusive_ptr<OpRequest>, ThreadPool::TPHandle&)+0x309) [0x5639c41e4789] 2021-12-28T18:56:22.772643302Z 16: (ceph::osd::scheduler::PGOpItem::run(OSD*, OSDShard*, boost::intrusive_ptr<PG>&, ThreadPool::TPHandle&)+0x68) [0x5639c4441968] 2021-12-28T18:56:22.772643302Z 17: (OSD::ShardedOpWQ::_process(unsigned int, ceph::heartbeat_handle_d*)+0xa58) [0x5639c4204738] 2021-12-28T18:56:22.772643302Z 18: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x5c4) [0x5639c486bc74] 2021-12-28T18:56:22.772643302Z 19: (ShardedThreadPool::WorkThreadSharded::entry()+0x14) [0x5639c486e914] 2021-12-28T18:56:22.772643302Z 20: /lib64/libpthread.so.0(+0x817a) [0x7f4e838e317a] 2021-12-28T18:56:22.772643302Z 21: clone() 2021-12-28T18:56:22.772676103Z debug 2021-12-28T18:56:22.771+0000 7f4e5ffee700 -1 /builddir/build/BUILD/ceph-16.2.0/src/os/bluestore/HybridAllocator.cc: In function 'virtual int64_t HybridAllocator::allocate(uint64_t, uint64_t, uint64_t, int64_t, PExtentVector*)' thread 7f4e607ef700 time 2021-12-28T18:56:22.767327+0000 2021-12-28T18:56:22.772676103Z /builddir/build/BUILD/ceph-16.2.0/src/os/bluestore/HybridAllocator.cc: 31: FAILED ceph_assert(want % unit == 0) 2021-12-28T18:56:22.772676103Z 2021-12-28T18:56:22.772676103Z ceph version 16.2.0-146.el8cp (56f5e9cfe88a08b6899327eca5166ca1c4a392aa) pacific (stable) 2021-12-28T18:56:22.772676103Z 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x158) [0x5639c40de27a] 2021-12-28T18:56:22.772676103Z 2: ceph-osd(+0x567494) [0x5639c40de494] 2021-12-28T18:56:22.772676103Z 3: (HybridAllocator::allocate(unsigned long, unsigned long, unsigned long, long, std::vector<bluestore_pextent_t, mempool::pool_allocator<(mempool::pool_index_t)5, bluestore_pextent_t> >*)+0x97f) [0x5639c481663f] 2021-12-28T18:56:22.772676103Z 4: (BlueStore::_do_alloc_write(BlueStore::TransContext*, boost::intrusive_ptr<BlueStore::Collection>, boost::intrusive_ptr<BlueStore::Onode>, BlueStore::WriteContext*)+0x70e) [0x5639c46e8a8e] 2021-12-28T18:56:22.772676103Z 5: (BlueStore::_do_write(BlueStore::TransContext*, boost::intrusive_ptr<BlueStore::Collection>&, boost::intrusive_ptr<BlueStore::Onode>, unsigned long, unsigned long, ceph::buffer::v15_2_0::list&, unsigned int)+0x694) [0x5639c4721b04] 2021-12-28T18:56:22.772676103Z 6: (BlueStore::_write(BlueStore::TransContext*, boost::intrusive_ptr<BlueStore::Collection>&, boost::intrusive_ptr<BlueStore::Onode>&, unsigned long, unsigned long, ceph::buffer::v15_2_0::list&, unsigned int)+0xd1) [0x5639c4722c71] 2021-12-28T18:56:22.772676103Z 7: (BlueStore::_txc_add_transaction(BlueStore::TransContext*, ceph::os::Transaction*)+0x2077) [0x5639c4726cb7] 2021-12-28T18:56:22.772676103Z 8: (BlueStore::queue_transactions(boost::intrusive_ptr<ObjectStore::CollectionImpl>&, std::vector<ceph::os::Transaction, std::allocator<ceph::os::Transaction> >&, boost::intrusive_ptr<TrackedOp>, ThreadPool::TPHandle*)+0x316) [0x5639c4727ff6] 2021-12-28T18:56:22.772676103Z 9: (non-virtual thunk to PrimaryLogPG::queue_transactions(std::vector<ceph::os::Transaction, std::allocator<ceph::os::Transaction> >&, boost::intrusive_ptr<OpRequest>)+0x58) [0x5639c4384058] 2021-12-28T18:56:22.772676103Z 10: (ReplicatedBackend::submit_transaction(hobject_t const&, object_stat_sum_t const&, eversion_t const&, std::unique_ptr<PGTransaction, std::default_delete<PGTransaction> >&&, eversion_t const&, eversion_t const&, std::vector<pg_log_entry_t, std::allocator<pg_log_entry_t> >&&, std::optional<pg_hit_set_history_t>&, Context*, unsigned long, osd_reqid_t, boost::intrusive_ptr<OpRequest>)+0xc9d) [0x5639c456dbdd] 2021-12-28T18:56:22.772676103Z 11: (PrimaryLogPG::issue_repop(PrimaryLogPG::RepGather*, PrimaryLogPG::OpContext*)+0xce0) [0x5639c42ef570] 2021-12-28T18:56:22.772676103Z 12: (PrimaryLogPG::execute_ctx(PrimaryLogPG::OpContext*)+0x115d) [0x5639c434b45d] 2021-12-28T18:56:22.772676103Z 13: (PrimaryLogPG::do_op(boost::intrusive_ptr<OpRequest>&)+0x2de2) [0x5639c4353fd2] 2021-12-28T18:56:22.772676103Z 14: (PrimaryLogPG::do_request(boost::intrusive_ptr<OpRequest>&, ThreadPool::TPHandle&)+0xd1c) [0x5639c435b31c] 2021-12-28T18:56:22.772676103Z 15: (OSD::dequeue_op(boost::intrusive_ptr<PG>, boost::intrusive_ptr<OpRequest>, ThreadPool::TPHandle&)+0x309) [0x5639c41e4789] 2021-12-28T18:56:22.772676103Z 16: (ceph::osd::scheduler::PGOpItem::run(OSD*, OSDShard*, boost::intrusive_ptr<PG>&, ThreadPool::TPHandle&)+0x68) [0x5639c4441968] 2021-12-28T18:56:22.772676103Z 17: (OSD::ShardedOpWQ::_process(unsigned int, ceph::heartbeat_handle_d*)+0xa58) [0x5639c4204738] 2021-12-28T18:56:22.772676103Z 18: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x5c4) [0x5639c486bc74] 2021-12-28T18:56:22.772676103Z 19: (ShardedThreadPool::WorkThreadSharded::entry()+0x14) [0x5639c486e914] 2021-12-28T18:56:22.772676103Z 20: /lib64/libpthread.so.0(+0x817a) [0x7f4e838e317a] 2021-12-28T18:56:22.772676103Z 21: clone() 2021-12-28T18:56:22.772676103Z 2021-12-28T18:56:22.772692403Z *** Caught signal (Aborted) ** 2021-12-28T18:56:22.772692403Z in thread 7f4e5ffee700 thread_name:tp_osd_tp 2021-12-28T18:56:22.779175698Z ceph version 16.2.0-146.el8cp (56f5e9cfe88a08b6899327eca5166ca1c4a392aa) pacific (stable) 2021-12-28T18:56:22.779175698Z 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x158) [0x5639c40de27a] 2021-12-28T18:56:22.779175698Z 2: ceph-osd(+0x567494) [0x5639c40de494] 2021-12-28T18:56:22.779175698Z 3: (HybridAllocator::allocate(unsigned long, unsigned long, unsigned long, long, std::vector<bluestore_pextent_t, mempool::pool_allocator<(mempool::pool_index_t)5, bluestore_pextent_t> >*)+0x97f) [0x5639c481663f] 2021-12-28T18:56:22.779175698Z 4: (BlueStore::_do_alloc_write(BlueStore::TransContext*, boost::intrusive_ptr<BlueStore::Collection>, boost::intrusive_ptr<BlueStore::Onode>, BlueStore::WriteContext*)+0x70e) [0x5639c46e8a8e] 2021-12-28T18:56:22.779175698Z 5: (BlueStore::_do_write(BlueStore::TransContext*, boost::intrusive_ptr<BlueStore::Collection>&, boost::intrusive_ptr<BlueStore::Onode>, unsigned long, unsigned long, ceph::buffer::v15_2_0::list&, unsigned int)+0x694) [0x5639c4721b04] 2021-12-28T18:56:22.779175698Z 6: (BlueStore::_write(BlueStore::TransContext*, boost::intrusive_ptr<BlueStore::Collection>&, boost::intrusive_ptr<BlueStore::Onode>&, unsigned long, unsigned long, ceph::buffer::v15_2_0::list&, unsigned int)+0xd1) [0x5639c4722c71] 2021-12-28T18:56:22.779175698Z 7: (BlueStore::_txc_add_transaction(BlueStore::TransContext*, ceph::os::Transaction*)+0x2077) [0x5639c4726cb7] 2021-12-28T18:56:22.779175698Z 8: (BlueStore::queue_transactions(boost::intrusive_ptr<ObjectStore::CollectionImpl>&, std::vector<ceph::os::Transaction, std::allocator<ceph::os::Transaction> >&, boost::intrusive_ptr<TrackedOp>, ThreadPool::TPHandle*)+0x316) [0x5639c4727ff6] 2021-12-28T18:56:22.779175698Z 9: (non-virtual thunk to PrimaryLogPG::queue_transactions(std::vector<ceph::os::Transaction, std::allocator<ceph::os::Transaction> >&, boost::intrusive_ptr<OpRequest>)+0x58) [0x5639c4384058] 2021-12-28T18:56:22.779175698Z 10: (ReplicatedBackend::submit_transaction(hobject_t const&, object_stat_sum_t const&, eversion_t const&, std::unique_ptr<PGTransaction, std::default_delete<PGTransaction> >&&, eversion_t const&, eversion_t const&, std::vector<pg_log_entry_t, std::allocator<pg_log_entry_t> >&&, std::optional<pg_hit_set_history_t>&, Context*, unsigned long, osd_reqid_t, boost::intrusive_ptr<OpRequest>)+0xc9d) [0x5639c456dbdd] 2021-12-28T18:56:22.779175698Z 11: (PrimaryLogPG::issue_repop(PrimaryLogPG::RepGather*, PrimaryLogPG::OpContext*)+0xce0) [0x5639c42ef570] 2021-12-28T18:56:22.779175698Z 12: (PrimaryLogPG::execute_ctx(PrimaryLogPG::OpContext*)+0x115d) [0x5639c434b45d] 2021-12-28T18:56:22.779175698Z 13: (PrimaryLogPG::do_op(boost::intrusive_ptr<OpRequest>&)+0x2de2) [0x5639c4353fd2] 2021-12-28T18:56:22.779175698Z 14: (PrimaryLogPG::do_request(boost::intrusive_ptr<OpRequest>&, ThreadPool::TPHandle&)+0xd1c) [0x5639c435b31c] 2021-12-28T18:56:22.779175698Z 15: (OSD::dequeue_op(boost::intrusive_ptr<PG>, boost::intrusive_ptr<OpRequest>, ThreadPool::TPHandle&)+0x309) [0x5639c41e4789] 2021-12-28T18:56:22.779175698Z 16: (ceph::osd::scheduler::PGOpItem::run(OSD*, OSDShard*, boost::intrusive_ptr<PG>&, ThreadPool::TPHandle&)+0x68) [0x5639c4441968] 2021-12-28T18:56:22.779175698Z 17: (OSD::ShardedOpWQ::_process(unsigned int, ceph::heartbeat_handle_d*)+0xa58) [0x5639c4204738] 2021-12-28T18:56:22.779175698Z 18: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x5c4) [0x5639c486bc74] 2021-12-28T18:56:22.779175698Z 19: (ShardedThreadPool::WorkThreadSharded::entry()+0x14) [0x5639c486e914] 2021-12-28T18:56:22.779175698Z 20: /lib64/libpthread.so.0(+0x817a) [0x7f4e838e317a] 2021-12-28T18:56:22.779175698Z 21: clone() 2021-12-28T18:56:22.779214398Z debug [1] http://magna002.ceph.redhat.com/ocsci-jenkins/openshift-clusters/j-067zi3c33-uo/j-067zi3c33-uo_20211228T093525/logs/failed_testcase_ocs_logs_1640694899/test_multiple_sc_one_pool_rep2_comp_ocs_logs/ocs_must_gather/quay-io-rhceph-dev-ocs-must-gather-sha256-71a98052bdd86ccc10967af82739b04d6b2305eb6721cb0744470de2c284fd15/namespaces/openshift-storage/pods/rook-ceph-osd-2-c95644c47-sz89g/osd/osd/logs/current.log
(In reply to Travis Nielsen from comment #4) > The OSD.2 log [1] shows the following bluestore assert. Neha, who could take > a look? This assert does not ring a bell for me, but @akupczyk is the best person to take a look at it. How reproducible is this? Can we capture logs with higher debug levels, like debug_bluestore=20? > > 2021-12-28T18:56:21.713686028Z debug 2021-12-28T18:56:21.712+0000 > 7f4e5efec700 1 osd.2 pg_epoch: 475 pg[2.8( v 443'248574 > (290'246134,443'248574] local-lis/les=442/443 n=527 ec=12/12 lis/c=462/239 > les/c/f=463/240/0 sis=475) [1,2,0] r=1 lpr=475 pi=[239,475)/4 crt=443'248574 > lcod 0'0 mlcod 0'0 unknown NOTIFY mbc={}] state<Start>: transitioning to > Stray > 2021-12-28T18:56:22.766209808Z > /builddir/build/BUILD/ceph-16.2.0/src/os/bluestore/HybridAllocator.cc: In > function 'virtual int64_t HybridAllocator::allocate(uint64_t, uint64_t, > uint64_t, int64_t, PExtentVector*)' thread 7f4e5ffee700 time > 2021-12-28T18:56:22.766157+0000 > 2021-12-28T18:56:22.766209808Z > /builddir/build/BUILD/ceph-16.2.0/src/os/bluestore/HybridAllocator.cc: 31: > FAILED ceph_assert(want % unit == 0) > 2021-12-28T18:56:22.766415811Z > /builddir/build/BUILD/ceph-16.2.0/src/os/bluestore/HybridAllocator.cc: In > function 'virtual int64_t HybridAllocator::allocate(uint64_t, uint64_t, > uint64_t, int64_t, PExtentVector*)' thread 7f4e5b7e5700 time > 2021-12-28T18:56:22.766379+0000 > 2021-12-28T18:56:22.766415811Z > /builddir/build/BUILD/ceph-16.2.0/src/os/bluestore/HybridAllocator.cc: 31: > FAILED ceph_assert(want % unit == 0) > 2021-12-28T18:56:22.767038720Z > /builddir/build/BUILD/ceph-16.2.0/src/os/bluestore/HybridAllocator.cc: In > function 'virtual int64_t HybridAllocator::allocate(uint64_t, uint64_t, > uint64_t, int64_t, PExtentVector*)' thread 7f4e5dfea700 time > 2021-12-28T18:56:22.766990+0000 > 2021-12-28T18:56:22.767038720Z > /builddir/build/BUILD/ceph-16.2.0/src/os/bluestore/HybridAllocator.cc: 31: > FAILED ceph_assert(want % unit == 0) > 2021-12-28T18:56:22.767359025Z > /builddir/build/BUILD/ceph-16.2.0/src/os/bluestore/HybridAllocator.cc: In > function 'virtual int64_t HybridAllocator::allocate(uint64_t, uint64_t, > uint64_t, int64_t, PExtentVector*)' thread 7f4e607ef700 time > 2021-12-28T18:56:22.767327+0000 > 2021-12-28T18:56:22.767359025Z > /builddir/build/BUILD/ceph-16.2.0/src/os/bluestore/HybridAllocator.cc: 31: > FAILED ceph_assert(want % unit == 0) > 2021-12-28T18:56:22.772643302Z ceph version 16.2.0-146.el8cp > (56f5e9cfe88a08b6899327eca5166ca1c4a392aa) pacific (stable) > 2021-12-28T18:56:22.772643302Z 1: (ceph::__ceph_assert_fail(char const*, > char const*, int, char const*)+0x158) [0x5639c40de27a] > 2021-12-28T18:56:22.772643302Z 2: ceph-osd(+0x567494) [0x5639c40de494] > 2021-12-28T18:56:22.772643302Z 3: (HybridAllocator::allocate(unsigned long, > unsigned long, unsigned long, long, std::vector<bluestore_pextent_t, > mempool::pool_allocator<(mempool::pool_index_t)5, bluestore_pextent_t> > >*)+0x97f) [0x5639c481663f] > 2021-12-28T18:56:22.772643302Z 4: > (BlueStore::_do_alloc_write(BlueStore::TransContext*, > boost::intrusive_ptr<BlueStore::Collection>, > boost::intrusive_ptr<BlueStore::Onode>, BlueStore::WriteContext*)+0x70e) > [0x5639c46e8a8e] > 2021-12-28T18:56:22.772643302Z 5: > (BlueStore::_do_write(BlueStore::TransContext*, > boost::intrusive_ptr<BlueStore::Collection>&, > boost::intrusive_ptr<BlueStore::Onode>, unsigned long, unsigned long, > ceph::buffer::v15_2_0::list&, unsigned int)+0x694) [0x5639c4721b04] > 2021-12-28T18:56:22.772643302Z 6: > (BlueStore::_write(BlueStore::TransContext*, > boost::intrusive_ptr<BlueStore::Collection>&, > boost::intrusive_ptr<BlueStore::Onode>&, unsigned long, unsigned long, > ceph::buffer::v15_2_0::list&, unsigned int)+0xd1) [0x5639c4722c71] > 2021-12-28T18:56:22.772643302Z 7: > (BlueStore::_txc_add_transaction(BlueStore::TransContext*, > ceph::os::Transaction*)+0x2077) [0x5639c4726cb7] > 2021-12-28T18:56:22.772643302Z 8: > (BlueStore::queue_transactions(boost::intrusive_ptr<ObjectStore:: > CollectionImpl>&, std::vector<ceph::os::Transaction, > std::allocator<ceph::os::Transaction> >&, boost::intrusive_ptr<TrackedOp>, > ThreadPool::TPHandle*)+0x316) [0x5639c4727ff6] > 2021-12-28T18:56:22.772643302Z 9: (non-virtual thunk to > PrimaryLogPG::queue_transactions(std::vector<ceph::os::Transaction, > std::allocator<ceph::os::Transaction> >&, > boost::intrusive_ptr<OpRequest>)+0x58) [0x5639c4384058] > 2021-12-28T18:56:22.772643302Z 10: > (ReplicatedBackend::submit_transaction(hobject_t const&, object_stat_sum_t > const&, eversion_t const&, std::unique_ptr<PGTransaction, > std::default_delete<PGTransaction> >&&, eversion_t const&, eversion_t > const&, std::vector<pg_log_entry_t, std::allocator<pg_log_entry_t> >&&, > std::optional<pg_hit_set_history_t>&, Context*, unsigned long, osd_reqid_t, > boost::intrusive_ptr<OpRequest>)+0xc9d) [0x5639c456dbdd] > 2021-12-28T18:56:22.772643302Z 11: > (PrimaryLogPG::issue_repop(PrimaryLogPG::RepGather*, > PrimaryLogPG::OpContext*)+0xce0) [0x5639c42ef570] > 2021-12-28T18:56:22.772643302Z 12: > (PrimaryLogPG::execute_ctx(PrimaryLogPG::OpContext*)+0x115d) [0x5639c434b45d] > 2021-12-28T18:56:22.772643302Z 13: > (PrimaryLogPG::do_op(boost::intrusive_ptr<OpRequest>&)+0x2de2) > [0x5639c4353fd2] > 2021-12-28T18:56:22.772643302Z 14: > (PrimaryLogPG::do_request(boost::intrusive_ptr<OpRequest>&, > ThreadPool::TPHandle&)+0xd1c) [0x5639c435b31c] > 2021-12-28T18:56:22.772643302Z 15: > (OSD::dequeue_op(boost::intrusive_ptr<PG>, boost::intrusive_ptr<OpRequest>, > ThreadPool::TPHandle&)+0x309) [0x5639c41e4789] > 2021-12-28T18:56:22.772643302Z 16: > (ceph::osd::scheduler::PGOpItem::run(OSD*, OSDShard*, > boost::intrusive_ptr<PG>&, ThreadPool::TPHandle&)+0x68) [0x5639c4441968] > 2021-12-28T18:56:22.772643302Z 17: (OSD::ShardedOpWQ::_process(unsigned > int, ceph::heartbeat_handle_d*)+0xa58) [0x5639c4204738] > 2021-12-28T18:56:22.772643302Z 18: > (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x5c4) > [0x5639c486bc74] > 2021-12-28T18:56:22.772643302Z 19: > (ShardedThreadPool::WorkThreadSharded::entry()+0x14) [0x5639c486e914] > 2021-12-28T18:56:22.772643302Z 20: /lib64/libpthread.so.0(+0x817a) > [0x7f4e838e317a] > 2021-12-28T18:56:22.772643302Z 21: clone() > 2021-12-28T18:56:22.772676103Z debug 2021-12-28T18:56:22.771+0000 > 7f4e5ffee700 -1 > /builddir/build/BUILD/ceph-16.2.0/src/os/bluestore/HybridAllocator.cc: In > function 'virtual int64_t HybridAllocator::allocate(uint64_t, uint64_t, > uint64_t, int64_t, PExtentVector*)' thread 7f4e607ef700 time > 2021-12-28T18:56:22.767327+0000 > 2021-12-28T18:56:22.772676103Z > /builddir/build/BUILD/ceph-16.2.0/src/os/bluestore/HybridAllocator.cc: 31: > FAILED ceph_assert(want % unit == 0) > 2021-12-28T18:56:22.772676103Z > 2021-12-28T18:56:22.772676103Z ceph version 16.2.0-146.el8cp > (56f5e9cfe88a08b6899327eca5166ca1c4a392aa) pacific (stable) > 2021-12-28T18:56:22.772676103Z 1: (ceph::__ceph_assert_fail(char const*, > char const*, int, char const*)+0x158) [0x5639c40de27a] > 2021-12-28T18:56:22.772676103Z 2: ceph-osd(+0x567494) [0x5639c40de494] > 2021-12-28T18:56:22.772676103Z 3: (HybridAllocator::allocate(unsigned long, > unsigned long, unsigned long, long, std::vector<bluestore_pextent_t, > mempool::pool_allocator<(mempool::pool_index_t)5, bluestore_pextent_t> > >*)+0x97f) [0x5639c481663f] > 2021-12-28T18:56:22.772676103Z 4: > (BlueStore::_do_alloc_write(BlueStore::TransContext*, > boost::intrusive_ptr<BlueStore::Collection>, > boost::intrusive_ptr<BlueStore::Onode>, BlueStore::WriteContext*)+0x70e) > [0x5639c46e8a8e] > 2021-12-28T18:56:22.772676103Z 5: > (BlueStore::_do_write(BlueStore::TransContext*, > boost::intrusive_ptr<BlueStore::Collection>&, > boost::intrusive_ptr<BlueStore::Onode>, unsigned long, unsigned long, > ceph::buffer::v15_2_0::list&, unsigned int)+0x694) [0x5639c4721b04] > 2021-12-28T18:56:22.772676103Z 6: > (BlueStore::_write(BlueStore::TransContext*, > boost::intrusive_ptr<BlueStore::Collection>&, > boost::intrusive_ptr<BlueStore::Onode>&, unsigned long, unsigned long, > ceph::buffer::v15_2_0::list&, unsigned int)+0xd1) [0x5639c4722c71] > 2021-12-28T18:56:22.772676103Z 7: > (BlueStore::_txc_add_transaction(BlueStore::TransContext*, > ceph::os::Transaction*)+0x2077) [0x5639c4726cb7] > 2021-12-28T18:56:22.772676103Z 8: > (BlueStore::queue_transactions(boost::intrusive_ptr<ObjectStore:: > CollectionImpl>&, std::vector<ceph::os::Transaction, > std::allocator<ceph::os::Transaction> >&, boost::intrusive_ptr<TrackedOp>, > ThreadPool::TPHandle*)+0x316) [0x5639c4727ff6] > 2021-12-28T18:56:22.772676103Z 9: (non-virtual thunk to > PrimaryLogPG::queue_transactions(std::vector<ceph::os::Transaction, > std::allocator<ceph::os::Transaction> >&, > boost::intrusive_ptr<OpRequest>)+0x58) [0x5639c4384058] > 2021-12-28T18:56:22.772676103Z 10: > (ReplicatedBackend::submit_transaction(hobject_t const&, object_stat_sum_t > const&, eversion_t const&, std::unique_ptr<PGTransaction, > std::default_delete<PGTransaction> >&&, eversion_t const&, eversion_t > const&, std::vector<pg_log_entry_t, std::allocator<pg_log_entry_t> >&&, > std::optional<pg_hit_set_history_t>&, Context*, unsigned long, osd_reqid_t, > boost::intrusive_ptr<OpRequest>)+0xc9d) [0x5639c456dbdd] > 2021-12-28T18:56:22.772676103Z 11: > (PrimaryLogPG::issue_repop(PrimaryLogPG::RepGather*, > PrimaryLogPG::OpContext*)+0xce0) [0x5639c42ef570] > 2021-12-28T18:56:22.772676103Z 12: > (PrimaryLogPG::execute_ctx(PrimaryLogPG::OpContext*)+0x115d) [0x5639c434b45d] > 2021-12-28T18:56:22.772676103Z 13: > (PrimaryLogPG::do_op(boost::intrusive_ptr<OpRequest>&)+0x2de2) > [0x5639c4353fd2] > 2021-12-28T18:56:22.772676103Z 14: > (PrimaryLogPG::do_request(boost::intrusive_ptr<OpRequest>&, > ThreadPool::TPHandle&)+0xd1c) [0x5639c435b31c] > 2021-12-28T18:56:22.772676103Z 15: > (OSD::dequeue_op(boost::intrusive_ptr<PG>, boost::intrusive_ptr<OpRequest>, > ThreadPool::TPHandle&)+0x309) [0x5639c41e4789] > 2021-12-28T18:56:22.772676103Z 16: > (ceph::osd::scheduler::PGOpItem::run(OSD*, OSDShard*, > boost::intrusive_ptr<PG>&, ThreadPool::TPHandle&)+0x68) [0x5639c4441968] > 2021-12-28T18:56:22.772676103Z 17: (OSD::ShardedOpWQ::_process(unsigned > int, ceph::heartbeat_handle_d*)+0xa58) [0x5639c4204738] > 2021-12-28T18:56:22.772676103Z 18: > (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x5c4) > [0x5639c486bc74] > 2021-12-28T18:56:22.772676103Z 19: > (ShardedThreadPool::WorkThreadSharded::entry()+0x14) [0x5639c486e914] > 2021-12-28T18:56:22.772676103Z 20: /lib64/libpthread.so.0(+0x817a) > [0x7f4e838e317a] > 2021-12-28T18:56:22.772676103Z 21: clone() > 2021-12-28T18:56:22.772676103Z > 2021-12-28T18:56:22.772692403Z *** Caught signal (Aborted) ** > 2021-12-28T18:56:22.772692403Z in thread 7f4e5ffee700 thread_name:tp_osd_tp > 2021-12-28T18:56:22.779175698Z ceph version 16.2.0-146.el8cp > (56f5e9cfe88a08b6899327eca5166ca1c4a392aa) pacific (stable) > 2021-12-28T18:56:22.779175698Z 1: (ceph::__ceph_assert_fail(char const*, > char const*, int, char const*)+0x158) [0x5639c40de27a] > 2021-12-28T18:56:22.779175698Z 2: ceph-osd(+0x567494) [0x5639c40de494] > 2021-12-28T18:56:22.779175698Z 3: (HybridAllocator::allocate(unsigned long, > unsigned long, unsigned long, long, std::vector<bluestore_pextent_t, > mempool::pool_allocator<(mempool::pool_index_t)5, bluestore_pextent_t> > >*)+0x97f) [0x5639c481663f] > 2021-12-28T18:56:22.779175698Z 4: > (BlueStore::_do_alloc_write(BlueStore::TransContext*, > boost::intrusive_ptr<BlueStore::Collection>, > boost::intrusive_ptr<BlueStore::Onode>, BlueStore::WriteContext*)+0x70e) > [0x5639c46e8a8e] > 2021-12-28T18:56:22.779175698Z 5: > (BlueStore::_do_write(BlueStore::TransContext*, > boost::intrusive_ptr<BlueStore::Collection>&, > boost::intrusive_ptr<BlueStore::Onode>, unsigned long, unsigned long, > ceph::buffer::v15_2_0::list&, unsigned int)+0x694) [0x5639c4721b04] > 2021-12-28T18:56:22.779175698Z 6: > (BlueStore::_write(BlueStore::TransContext*, > boost::intrusive_ptr<BlueStore::Collection>&, > boost::intrusive_ptr<BlueStore::Onode>&, unsigned long, unsigned long, > ceph::buffer::v15_2_0::list&, unsigned int)+0xd1) [0x5639c4722c71] > 2021-12-28T18:56:22.779175698Z 7: > (BlueStore::_txc_add_transaction(BlueStore::TransContext*, > ceph::os::Transaction*)+0x2077) [0x5639c4726cb7] > 2021-12-28T18:56:22.779175698Z 8: > (BlueStore::queue_transactions(boost::intrusive_ptr<ObjectStore:: > CollectionImpl>&, std::vector<ceph::os::Transaction, > std::allocator<ceph::os::Transaction> >&, boost::intrusive_ptr<TrackedOp>, > ThreadPool::TPHandle*)+0x316) [0x5639c4727ff6] > 2021-12-28T18:56:22.779175698Z 9: (non-virtual thunk to > PrimaryLogPG::queue_transactions(std::vector<ceph::os::Transaction, > std::allocator<ceph::os::Transaction> >&, > boost::intrusive_ptr<OpRequest>)+0x58) [0x5639c4384058] > 2021-12-28T18:56:22.779175698Z 10: > (ReplicatedBackend::submit_transaction(hobject_t const&, object_stat_sum_t > const&, eversion_t const&, std::unique_ptr<PGTransaction, > std::default_delete<PGTransaction> >&&, eversion_t const&, eversion_t > const&, std::vector<pg_log_entry_t, std::allocator<pg_log_entry_t> >&&, > std::optional<pg_hit_set_history_t>&, Context*, unsigned long, osd_reqid_t, > boost::intrusive_ptr<OpRequest>)+0xc9d) [0x5639c456dbdd] > 2021-12-28T18:56:22.779175698Z 11: > (PrimaryLogPG::issue_repop(PrimaryLogPG::RepGather*, > PrimaryLogPG::OpContext*)+0xce0) [0x5639c42ef570] > 2021-12-28T18:56:22.779175698Z 12: > (PrimaryLogPG::execute_ctx(PrimaryLogPG::OpContext*)+0x115d) [0x5639c434b45d] > 2021-12-28T18:56:22.779175698Z 13: > (PrimaryLogPG::do_op(boost::intrusive_ptr<OpRequest>&)+0x2de2) > [0x5639c4353fd2] > 2021-12-28T18:56:22.779175698Z 14: > (PrimaryLogPG::do_request(boost::intrusive_ptr<OpRequest>&, > ThreadPool::TPHandle&)+0xd1c) [0x5639c435b31c] > 2021-12-28T18:56:22.779175698Z 15: > (OSD::dequeue_op(boost::intrusive_ptr<PG>, boost::intrusive_ptr<OpRequest>, > ThreadPool::TPHandle&)+0x309) [0x5639c41e4789] > 2021-12-28T18:56:22.779175698Z 16: > (ceph::osd::scheduler::PGOpItem::run(OSD*, OSDShard*, > boost::intrusive_ptr<PG>&, ThreadPool::TPHandle&)+0x68) [0x5639c4441968] > 2021-12-28T18:56:22.779175698Z 17: (OSD::ShardedOpWQ::_process(unsigned > int, ceph::heartbeat_handle_d*)+0xa58) [0x5639c4204738] > 2021-12-28T18:56:22.779175698Z 18: > (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x5c4) > [0x5639c486bc74] > 2021-12-28T18:56:22.779175698Z 19: > (ShardedThreadPool::WorkThreadSharded::entry()+0x14) [0x5639c486e914] > 2021-12-28T18:56:22.779175698Z 20: /lib64/libpthread.so.0(+0x817a) > [0x7f4e838e317a] > 2021-12-28T18:56:22.779175698Z 21: clone() > 2021-12-28T18:56:22.779214398Z debug > > > [1] > http://magna002.ceph.redhat.com/ocsci-jenkins/openshift-clusters/j-067zi3c33- > uo/j-067zi3c33-uo_20211228T093525/logs/failed_testcase_ocs_logs_1640694899/ > test_multiple_sc_one_pool_rep2_comp_ocs_logs/ocs_must_gather/quay-io-rhceph- > dev-ocs-must-gather-sha256- > 71a98052bdd86ccc10967af82739b04d6b2305eb6721cb0744470de2c284fd15/namespaces/ > openshift-storage/pods/rook-ceph-osd-2-c95644c47-sz89g/osd/osd/logs/current. > log
Rachael, please go into the toolbox and run ceph config set osd debug_bluestore 20 Thanks
Rachel, please try again and reopen if you still see the issue.
Yeah, looks like we need a fix in rook. Travis, please take a look. Also, I can see that the same is present in 4.6/4.7/4.8 as well so this is not a regression. Probably something which we never tested? https://github.com/red-hat-storage/rook/blob/d1788e1616734109b104bfae687833a7ee3ca747/pkg/operator/ceph/cluster/osd/spec.go#L158
(In reply to Mudit Agarwal from comment #15) > Yeah, looks like we need a fix in rook. > Travis, please take a look. > > Also, I can see that the same is present in 4.6/4.7/4.8 as well so this is > not a regression. Probably something which we never tested? > https://github.com/red-hat-storage/rook/blob/ > d1788e1616734109b104bfae687833a7ee3ca747/pkg/operator/ceph/cluster/osd/spec. > go#L158 I think it was me who did the typo while doing the changes in rook a long time back. I'll modify it to 8192. Assigning it to myself.
Already in the latest sync of 4.10.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Important: Red Hat OpenShift Data Foundation 4.10.0 enhancement, security & bug fix update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2022:1372