Bug 2342752

Summary: After performing the osd resize test the osd pods failed to recover and the cluster ceph health was not OK
Product: [Red Hat Storage] Red Hat Ceph Storage Reporter: Parth Arora <paarora>
Component: RADOSAssignee: Adam Kupczyk <akupczyk>
Status: CLOSED ERRATA QA Contact: Harsh Kumar <hakumar>
Severity: urgent Docs Contact:
Priority: unspecified    
Version: 9.0CC: akupczyk, bhubbard, ceph-eng-bugs, cephqe-warriors, ngangadh, nojha, rzarzyns, tserlin, vumrao
Target Milestone: ---   
Target Release: 8.1   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: ceph-19.2.1-3.el9cp Doc Type: No Doc Update
Doc Text:
Story Points: ---
Clone Of:
: 2342817 (view as bug list) Environment:
Last Closed: 2025-06-26 12:24:27 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 2342817    

Description Parth Arora 2025-01-29 08:19:31 UTC
Description of problem:

The osd resize is failing in 4.18 branches intermittently,
`ceph-bluestore-tool bluefs-bdev-expand --path /var/lib/ceph/osd/ceph-x`

Errros from different clusters:
```
2025-01-10T14:52:34.944+0000 7fb4b70d2940 -1 bluestore(/var/lib/ceph/osd/ceph-0/block) _read_bdev_label unable to decode label /var/lib/ceph/osd/ceph-0/block at offset 102: void bluestore_bdev_label_t::decode(ceph::buffer::v15_2_0::list::const_iterator&) decode past end of struct encoding: Malformed input [buffer:3]
```

```
2025-01-27T15:45:05.164888150Z 2025-01-27T15:45:05.164+0000 7f60b29c5940 -1 bluestore(/var/lib/ceph/osd/ceph-0/block) _read_bdev_label unable to decode label /var/lib/ceph/osd/ceph-0/block at offset 62: Decoder at 'void bluestore_bdev_label_t::decode(ceph::buffer::v15_2_0::list::const_iterator&)' v=2 cannot decode v=41 minimal_decoder=66: Malformed input [buffer:3]
```

```

Errors:
IBM cloud
1)

2025-01-10T14:52:34.944+0000 7fb4b70d2940 -1 bluestore(/var/lib/ceph/osd/ceph-0/block) _read_bdev_label unable to decode label /var/lib/ceph/osd/ceph-0/block at offset 102: void bluestore_bdev_label_t::decode(ceph::buffer::v15_2_0::list::const_iterator&) decode past end of struct encoding: Malformed input [buffer:3]
2025-01-27T15:45:05.164888150Z 2025-01-27T15:45:05.164+0000 7f60b29c5940 -1 bluestore(/var/lib/ceph/osd/ceph-0/block) _read_bdev_label unable to decode label /var/lib/ceph/osd/ceph-0/block at offset 62: Decoder at 'void bluestore_bdev_label_t::decode(ceph::buffer::v15_2_0::list::const_iterator&)' v=2 cannot decode v=41 minimal_decoder=66: Malformed input [buffer:3]
Aws

2025-01-13T15:13:38.683+0000 7fa5eb2d4940 -1 bluestore(/var/lib/ceph/osd/ceph-0/block) _read_bdev_label unable to decode label /var/lib/ceph/osd/ceph-0/block at offset 62: Decoder at 'void bluestore_bdev_label_t::decode(ceph::buffer::v15_2_0::list::const_iterator&)' v=2 cannot decode v=1 minimal_decoder=104: Malformed input [buffer:3]
```
Probably coming from here:https://github.com/ceph/ceph/blame/squid-release/src/os/bluestore/BlueStore.cc#L6612

For more info: https://github.com/rook/rook/pull/15251#issuecomment-2618453748

Interesting thing, in failed cases if we update the osd image to upstream `quay.io/ceph/ceph:v18.2.4`, the resizing worked,


Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:

Comment 8 errata-xmlrpc 2025-06-26 12:24:27 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Important: Red Hat Ceph Storage 8.1 security, bug fix, and enhancement updates), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2025:9775