Bug 2342817 - After performing the osd resize test the osd pods failed to recover and the cluster ceph health was not OK
Summary: After performing the osd resize test the osd pods failed to recover and the c...
Keywords:
Status: CLOSED DUPLICATE of bug 2316351
Alias: None
Product: Red Hat Ceph Storage
Classification: Red Hat Storage
Component: RADOS
Version: 8.2
Hardware: Unspecified
OS: Unspecified
unspecified
urgent
Target Milestone: ---
: 8.0z3
Assignee: Adam Kupczyk
QA Contact: Pawan
URL:
Whiteboard:
Depends On: 2342752
Blocks:
TreeView+ depends on / blocked
 
Reported: 2025-01-29 15:13 UTC by Parth Arora
Modified: 2025-02-06 14:19 UTC (History)
10 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of: 2342752
Environment:
Last Closed: 2025-02-06 09:33:29 UTC
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Issue Tracker RHCEPH-10533 0 None None None 2025-01-29 15:13:56 UTC

Description Parth Arora 2025-01-29 15:13:03 UTC
+++ This bug was initially created as a clone of Bug #2342752 +++

Description of problem:

The osd resize is failing in 4.18 branches intermittently,
`ceph-bluestore-tool bluefs-bdev-expand --path /var/lib/ceph/osd/ceph-x`

Errros from different clusters:
```
2025-01-10T14:52:34.944+0000 7fb4b70d2940 -1 bluestore(/var/lib/ceph/osd/ceph-0/block) _read_bdev_label unable to decode label /var/lib/ceph/osd/ceph-0/block at offset 102: void bluestore_bdev_label_t::decode(ceph::buffer::v15_2_0::list::const_iterator&) decode past end of struct encoding: Malformed input [buffer:3]
```

```
2025-01-27T15:45:05.164888150Z 2025-01-27T15:45:05.164+0000 7f60b29c5940 -1 bluestore(/var/lib/ceph/osd/ceph-0/block) _read_bdev_label unable to decode label /var/lib/ceph/osd/ceph-0/block at offset 62: Decoder at 'void bluestore_bdev_label_t::decode(ceph::buffer::v15_2_0::list::const_iterator&)' v=2 cannot decode v=41 minimal_decoder=66: Malformed input [buffer:3]
```

```

Errors:
IBM cloud
1)

2025-01-10T14:52:34.944+0000 7fb4b70d2940 -1 bluestore(/var/lib/ceph/osd/ceph-0/block) _read_bdev_label unable to decode label /var/lib/ceph/osd/ceph-0/block at offset 102: void bluestore_bdev_label_t::decode(ceph::buffer::v15_2_0::list::const_iterator&) decode past end of struct encoding: Malformed input [buffer:3]
2025-01-27T15:45:05.164888150Z 2025-01-27T15:45:05.164+0000 7f60b29c5940 -1 bluestore(/var/lib/ceph/osd/ceph-0/block) _read_bdev_label unable to decode label /var/lib/ceph/osd/ceph-0/block at offset 62: Decoder at 'void bluestore_bdev_label_t::decode(ceph::buffer::v15_2_0::list::const_iterator&)' v=2 cannot decode v=41 minimal_decoder=66: Malformed input [buffer:3]
Aws

2025-01-13T15:13:38.683+0000 7fa5eb2d4940 -1 bluestore(/var/lib/ceph/osd/ceph-0/block) _read_bdev_label unable to decode label /var/lib/ceph/osd/ceph-0/block at offset 62: Decoder at 'void bluestore_bdev_label_t::decode(ceph::buffer::v15_2_0::list::const_iterator&)' v=2 cannot decode v=1 minimal_decoder=104: Malformed input [buffer:3]
```
Probably coming from here:https://github.com/ceph/ceph/blame/squid-release/src/os/bluestore/BlueStore.cc#L6612

For more info: https://github.com/rook/rook/pull/15251#issuecomment-2618453748

Interesting thing, in failed cases if we update the osd image to upstream `quay.io/ceph/ceph:v18.2.4`, the resizing worked,


Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:


Note You need to log in before you can comment on or make changes to this bug.