Bug 2273724 - ceph-volume raw list and activate fail
Summary: ceph-volume raw list and activate fail
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Ceph Storage
Classification: Red Hat Storage
Component: Ceph-Volume
Version: 6.1
Hardware: All
OS: All
urgent
urgent
Target Milestone: ---
: 6.1z6
Assignee: Guillaume Abrioux
QA Contact: Aditya Ramteke
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2024-04-05 21:16 UTC by kelwhite
Modified: 2024-05-15 04:48 UTC (History)
11 users (show)

Fixed In Version: ceph-17.2.6-216.el9cp
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2024-05-01 01:10:45 UTC
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Bugzilla 2273398 0 urgent CLOSED [GSS][ODF 4.16 backport] Legacy LVM-based OSDs are in crashloop state 2024-11-15 04:25:28 UTC
Red Hat Issue Tracker RHCEPH-8763 0 None None None 2024-04-05 21:28:02 UTC
Red Hat Knowledge Base (Solution) 7063703 0 None None None 2024-04-23 16:31:24 UTC
Red Hat Product Errata RHSA-2024:2631 0 None None None 2024-05-01 01:10:48 UTC

Internal Links: 2273398

Description kelwhite 2024-04-05 21:16:29 UTC
Description of problem:
After an upgrade from ODF 4.12 to ODF 4.14 ceph osd containers are not running because ceph-volume seems to be unable to activate the osd devices that are based on lvm from old ocs deployments (4.3/2 up to 4.4). More information is found in the BZ https://bugzilla.redhat.com/show_bug.cgi?id=2273398.

Sadly, we don't have any usable ceph-volume logs, but this seems like a very strong contender. Workaround to bring the osd back up:

~~~
- Creating a backup of the osd deployment, we're going to remove the liveness probe
- scaled down the rook-ceph and ocs-operators
- oc edit the osd deployment and searched for the expand-bluefs section and removed the container
- oc get pods to see if osd came up (still 1/2) and rshed info the container
   - ceph-volume lvm list 
   - ceph-volume lvm active --no-systemd -- <osd.id> <osd fsid> // osd fsid from ceph-volume lvm list
   - The osd was activated and when we viewed the osd data dir, the block device was listed:
      - ls -l '/var/lib/ceph/osd/ceph-{id}
~~~

Ask:
- What changed in ceph-volume from 4.13 to 4.14 that would cause any issues with LVM based OSDs from ealier versions of OCS?

Comment 16 errata-xmlrpc 2024-05-01 01:10:45 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Critical: Red Hat Ceph Storage 6.1 security and bug fix update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2024:2631


Note You need to log in before you can comment on or make changes to this bug.