Bug 2342817

Summary: After performing the osd resize test the osd pods failed to recover and the cluster ceph health was not OK
Product: [Red Hat Storage] Red Hat Ceph Storage Reporter: Parth Arora <paarora>
Component: RADOSAssignee: Adam Kupczyk <akupczyk>
Status: CLOSED DUPLICATE QA Contact: Pawan <pdhiran>
Severity: urgent Docs Contact:
Priority: unspecified    
Version: 8.2CC: akupczyk, bhubbard, ceph-eng-bugs, cephqe-warriors, hakumar, lithomas, muagarwa, nojha, pdhiran, vumrao
Target Milestone: ---Keywords: Regression
Target Release: 8.0z3   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: 2342752 Environment:
Last Closed: 2025-02-06 09:33:29 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 2342752    
Bug Blocks:    

Description Parth Arora 2025-01-29 15:13:03 UTC
+++ This bug was initially created as a clone of Bug #2342752 +++

Description of problem:

The osd resize is failing in 4.18 branches intermittently,
`ceph-bluestore-tool bluefs-bdev-expand --path /var/lib/ceph/osd/ceph-x`

Errros from different clusters:
```
2025-01-10T14:52:34.944+0000 7fb4b70d2940 -1 bluestore(/var/lib/ceph/osd/ceph-0/block) _read_bdev_label unable to decode label /var/lib/ceph/osd/ceph-0/block at offset 102: void bluestore_bdev_label_t::decode(ceph::buffer::v15_2_0::list::const_iterator&) decode past end of struct encoding: Malformed input [buffer:3]
```

```
2025-01-27T15:45:05.164888150Z 2025-01-27T15:45:05.164+0000 7f60b29c5940 -1 bluestore(/var/lib/ceph/osd/ceph-0/block) _read_bdev_label unable to decode label /var/lib/ceph/osd/ceph-0/block at offset 62: Decoder at 'void bluestore_bdev_label_t::decode(ceph::buffer::v15_2_0::list::const_iterator&)' v=2 cannot decode v=41 minimal_decoder=66: Malformed input [buffer:3]
```

```

Errors:
IBM cloud
1)

2025-01-10T14:52:34.944+0000 7fb4b70d2940 -1 bluestore(/var/lib/ceph/osd/ceph-0/block) _read_bdev_label unable to decode label /var/lib/ceph/osd/ceph-0/block at offset 102: void bluestore_bdev_label_t::decode(ceph::buffer::v15_2_0::list::const_iterator&) decode past end of struct encoding: Malformed input [buffer:3]
2025-01-27T15:45:05.164888150Z 2025-01-27T15:45:05.164+0000 7f60b29c5940 -1 bluestore(/var/lib/ceph/osd/ceph-0/block) _read_bdev_label unable to decode label /var/lib/ceph/osd/ceph-0/block at offset 62: Decoder at 'void bluestore_bdev_label_t::decode(ceph::buffer::v15_2_0::list::const_iterator&)' v=2 cannot decode v=41 minimal_decoder=66: Malformed input [buffer:3]
Aws

2025-01-13T15:13:38.683+0000 7fa5eb2d4940 -1 bluestore(/var/lib/ceph/osd/ceph-0/block) _read_bdev_label unable to decode label /var/lib/ceph/osd/ceph-0/block at offset 62: Decoder at 'void bluestore_bdev_label_t::decode(ceph::buffer::v15_2_0::list::const_iterator&)' v=2 cannot decode v=1 minimal_decoder=104: Malformed input [buffer:3]
```
Probably coming from here:https://github.com/ceph/ceph/blame/squid-release/src/os/bluestore/BlueStore.cc#L6612

For more info: https://github.com/rook/rook/pull/15251#issuecomment-2618453748

Interesting thing, in failed cases if we update the osd image to upstream `quay.io/ceph/ceph:v18.2.4`, the resizing worked,


Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info: