Bug 2273934 - ceph-volume raw list and activate fail
Summary: ceph-volume raw list and activate fail
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Ceph Storage
Classification: Red Hat Storage
Component: Ceph-Volume
Version: 7.0
Hardware: Unspecified
OS: Unspecified
urgent
urgent
Target Milestone: ---
: 7.0z2
Assignee: Guillaume Abrioux
QA Contact: Aditya Ramteke
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2024-04-08 08:50 UTC by Guillaume Abrioux
Modified: 2024-05-07 12:11 UTC (History)
8 users (show)

Fixed In Version: ceph-18.2.0-191.el9cp
Doc Type: Bug Fix
Doc Text:
Cause: Since the introduction of bluestore-rdr, `ceph-volume raw list` tries to get the objectstore type from the BlueStore labels. This objectstore type label wasn't set prior to bluestore-rdr introduction. Consequence: `ceph-volume raw activate` fails for any OSD created prior to the version of Ceph which introduced bluestore-rdr as it calls `ceph-volume raw list` Fix: ceph-volume tries to get the label and defaults to "bluestore" when it can't retrieve it. Result: `ceph-volume raw list` `ceph-volume raw activate` don't fail.
Clone Of:
Environment:
Last Closed: 2024-05-07 12:11:18 UTC
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Issue Tracker RHCEPH-8771 0 None None None 2024-04-08 08:52:18 UTC
Red Hat Knowledge Base (Solution) 7063703 0 None None None 2024-04-23 16:31:39 UTC
Red Hat Product Errata RHBA-2024:2743 0 None None None 2024-05-07 12:11:20 UTC

Description Guillaume Abrioux 2024-04-08 08:50:28 UTC
This bug was initially created as a copy of Bug #2273724

I am copying this bug because: 



Description of problem:
After an upgrade from ODF 4.12 to ODF 4.14 ceph osd containers are not running because ceph-volume seems to be unable to activate the osd devices that are based on lvm from old ocs deployments (4.3/2 up to 4.4). More information is found in the BZ https://bugzilla.redhat.com/show_bug.cgi?id=2273398.

Sadly, we don't have any usable ceph-volume logs, but this seems like a very strong contender. Workaround to bring the osd back up:

~~~
- Creating a backup of the osd deployment, we're going to remove the liveness probe
- scaled down the rook-ceph and ocs-operators
- oc edit the osd deployment and searched for the expand-bluefs section and removed the container
- oc get pods to see if osd came up (still 1/2) and rshed info the container
   - ceph-volume lvm list 
   - ceph-volume lvm active --no-systemd -- <osd.id> <osd fsid> // osd fsid from ceph-volume lvm list
   - The osd was activated and when we viewed the osd data dir, the block device was listed:
      - ls -l '/var/lib/ceph/osd/ceph-{id}
~~~

Ask:
- What changed in ceph-volume from 4.13 to 4.14 that would cause any issues with LVM based OSDs from ealier versions of OCS?

Comment 4 Manisha Saini 2024-04-12 05:01:01 UTC
Hi Guillaume Abrioux,

Could you please provide the steps to verify this BZ?

Thanks

Comment 12 errata-xmlrpc 2024-05-07 12:11:18 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Red Hat Ceph Storage 7.0 Bug Fix update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2024:2743


Note You need to log in before you can comment on or make changes to this bug.