* Description of problem (please be detailed as possible and provide log snippets): + 2 OSDs are in CLBO state possibly due to rocksdb corruption + This is the assert msg: ---------------------- "assert_msg": "/builddir/build/BUILD/ceph-16.2.7/src/os/bluestore/BlueStore.cc: In function 'void BlueStore::_do_write_small(BlueStore::TransContext*, BlueStore::CollectionRef&, BlueStore::OnodeRef, uint64_t, uint64_t, ceph::buffer::v15_2_0::list::iterator&, BlueStore::WriteContext*)' thread 7f07ca3c4700 time 2023-04-02T19:06:58.679984+0000\n/builddir/build/BUILD/ceph-16.2.7/src/os/bluestore/BlueStore.cc: 13534: FAILED ceph_assert(r >= 0 && r <= (int)tail_read)\n", ---------------------- + There is an open tracker upstream which looks similar: https://tracker.ceph.com/issues/51900 * Version of all relevant components (if applicable): ODF 4.10 ceph version 16.2.7-126.el8cp * Does this issue impact your ability to continue to work with the product (please explain in detail what is the user impact)? 2 out of 3 OSDs are down, PGs inactive, production impacted * Is there any workaround available to the best of your knowledge? No
The osd.0 and osd.2 are failing because of unable to read superblock : debug 2023-04-03T09:19:45.201+0000 7fdca2ce5080 -1 bluestore(/var/lib/ceph/osd/ceph-2) _verify_csum bad crc32c/0x1000 checksum at blob offset 0x0, got 0x659b92dc, expected 0x48b54be2, device location [0x2000~1000], logical extent 0x0~1000, object #-1:7b3f43c4:::osd_superblock:0# debug 2023-04-03T09:19:45.202+0000 7fdca2ce5080 -1 bluestore(/var/lib/ceph/osd/ceph-2) _verify_csum bad crc32c/0x1000 checksum at blob offset 0x0, got 0x659b92dc, expected 0x48b54be2, device location [0x2000~1000], logical extent 0x0~1000, object #-1:7b3f43c4:::osd_superblock:0# debug 2023-04-03T09:19:45.202+0000 7fdca2ce5080 -1 bluestore(/var/lib/ceph/osd/ceph-2) _verify_csum bad crc32c/0x1000 checksum at blob offset 0x0, got 0x659b92dc, expected 0x48b54be2, device location [0x2000~1000], logical extent 0x0~1000, object #-1:7b3f43c4:::osd_superblock:0# debug 2023-04-03T09:19:45.203+0000 7fdca2ce5080 -1 bluestore(/var/lib/ceph/osd/ceph-2) _verify_csum bad crc32c/0x1000 checksum at blob offset 0x0, got 0x659b92dc, expected 0x48b54be2, device location [0x2000~1000], logical extent 0x0~1000, object #-1:7b3f43c4:::osd_superblock:0# debug 2023-04-03T09:19:45.203+0000 7fdca2ce5080 -1 osd.2 0 OSD::init() : unable to read osd superblock It looks like OSD super block got corrupted which likely seems to be because of multiple OSD daemons tried to access same device.