Bug 2183996
| Summary: | [GSS][RADOS] OSDs in CLBO state with error "FAILED ceph_assert(r >= 0 && r <= (int)tail_read)" | ||
|---|---|---|---|
| Product: | [Red Hat Storage] Red Hat OpenShift Data Foundation | Reporter: | Karun Josy <kjosy> |
| Component: | ceph | Assignee: | Neha Ojha <nojha> |
| ceph sub component: | RADOS | QA Contact: | Elad <ebenahar> |
| Status: | CLOSED NOTABUG | Docs Contact: | |
| Severity: | urgent | ||
| Priority: | unspecified | CC: | akupczyk, bhubbard, bkunal, bniver, hnallurv, kelwhite, muagarwa, nojha, ocs-bugs, odf-bz-bot, pdhange, sostapov |
| Version: | 4.10 | ||
| Target Milestone: | --- | ||
| Target Release: | --- | ||
| Hardware: | x86_64 | ||
| OS: | Linux | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | If docs needed, set a value | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2023-04-11 05:50:28 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
The osd.0 and osd.2 are failing because of unable to read superblock : debug 2023-04-03T09:19:45.201+0000 7fdca2ce5080 -1 bluestore(/var/lib/ceph/osd/ceph-2) _verify_csum bad crc32c/0x1000 checksum at blob offset 0x0, got 0x659b92dc, expected 0x48b54be2, device location [0x2000~1000], logical extent 0x0~1000, object #-1:7b3f43c4:::osd_superblock:0# debug 2023-04-03T09:19:45.202+0000 7fdca2ce5080 -1 bluestore(/var/lib/ceph/osd/ceph-2) _verify_csum bad crc32c/0x1000 checksum at blob offset 0x0, got 0x659b92dc, expected 0x48b54be2, device location [0x2000~1000], logical extent 0x0~1000, object #-1:7b3f43c4:::osd_superblock:0# debug 2023-04-03T09:19:45.202+0000 7fdca2ce5080 -1 bluestore(/var/lib/ceph/osd/ceph-2) _verify_csum bad crc32c/0x1000 checksum at blob offset 0x0, got 0x659b92dc, expected 0x48b54be2, device location [0x2000~1000], logical extent 0x0~1000, object #-1:7b3f43c4:::osd_superblock:0# debug 2023-04-03T09:19:45.203+0000 7fdca2ce5080 -1 bluestore(/var/lib/ceph/osd/ceph-2) _verify_csum bad crc32c/0x1000 checksum at blob offset 0x0, got 0x659b92dc, expected 0x48b54be2, device location [0x2000~1000], logical extent 0x0~1000, object #-1:7b3f43c4:::osd_superblock:0# debug 2023-04-03T09:19:45.203+0000 7fdca2ce5080 -1 osd.2 0 OSD::init() : unable to read osd superblock It looks like OSD super block got corrupted which likely seems to be because of multiple OSD daemons tried to access same device. |
* Description of problem (please be detailed as possible and provide log snippets): + 2 OSDs are in CLBO state possibly due to rocksdb corruption + This is the assert msg: ---------------------- "assert_msg": "/builddir/build/BUILD/ceph-16.2.7/src/os/bluestore/BlueStore.cc: In function 'void BlueStore::_do_write_small(BlueStore::TransContext*, BlueStore::CollectionRef&, BlueStore::OnodeRef, uint64_t, uint64_t, ceph::buffer::v15_2_0::list::iterator&, BlueStore::WriteContext*)' thread 7f07ca3c4700 time 2023-04-02T19:06:58.679984+0000\n/builddir/build/BUILD/ceph-16.2.7/src/os/bluestore/BlueStore.cc: 13534: FAILED ceph_assert(r >= 0 && r <= (int)tail_read)\n", ---------------------- + There is an open tracker upstream which looks similar: https://tracker.ceph.com/issues/51900 * Version of all relevant components (if applicable): ODF 4.10 ceph version 16.2.7-126.el8cp * Does this issue impact your ability to continue to work with the product (please explain in detail what is the user impact)? 2 out of 3 OSDs are down, PGs inactive, production impacted * Is there any workaround available to the best of your knowledge? No