2089397 – [GSS]OSD pods CLBO after upgrade to 4.10 from 4.9.

Bug 2089397 - [GSS]OSD pods CLBO after upgrade to 4.10 from 4.9.

Summary: [GSS]OSD pods CLBO after upgrade to 4.10 from 4.9.

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat OpenShift Data Foundation
Classification:	Red Hat Storage
Component:	rook
Sub Component:
Version:	4.10
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	high
Target Milestone:	---
Target Release:	ODF 4.11.0
Assignee:	Sébastien Han
QA Contact:	Elad
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2022-05-23 14:59 UTC by khover
Modified:	2023-08-09 17:03 UTC (History)
CC List:	11 users (show)
Fixed In Version:
Doc Type:	No Doc Update
Doc Text:
Clone Of:
Environment:
Last Closed:	2022-08-24 13:53:39 UTC
Embargoed:

Attachments	(Terms of Use)
osd describe (20.86 KB, text/plain) 2022-05-23 14:59 UTC, khover	no flags	Details
View All

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Github	rook rook pull 10298	0	None	open	osd: remove broken argument	2022-05-23 15:00:11 UTC
Red Hat Product Errata	RHSA-2022:6156	0	None	None	None	2022-08-24 13:53:49 UTC

Description khover 2022-05-23 14:59:42 UTC

Created attachment 1882431 [details]
osd describe

Description of problem:

OSD pods CLBO after upgrade to 4.10 from 4.9.

rook-ceph-osd-0-7c5b8797dc-jpk4w                                  1/2     CrashLoopBackOff    29 (3m18s ago)   95m
rook-ceph-osd-1-676cbfb684-fcccr                                  1/2     CrashLoopBackOff    28 (5s ago)      84m
rook-ceph-osd-2-89bb9dbd9-p56b2                                   1/2     CrashLoopBackOff    11 (4m25s ago)   36m

edited each one oth this 3 deployments (that are exactly the 3 deployments that i see in crashloop state) and i removed the "/rook/rook" from the args 
rook-ceph-osd-0                                      1/1     1            1           46h
rook-ceph-osd-1                                      1/1     1            1           46h
rook-ceph-osd-2                                      1/1     1            1           7h7m



      containers:
      - args:
        - /rook/rook <-- I Removed this line
        - ceph
        - osd
        - start
        - --
        - --foreground
        - --id
        - "1"
        - --fsid
        - 42e1ae07-9402-4cc9-b1a4-a1fe127e6ebc
        - --cluster
        - ceph
        - --setuser
        - ceph
        - --setgroup
        - ceph
        - --crush-location=root=default host=xxxocpocsxxxs02 rack=rack2
        - --log-to-stderr=true
        - --err-to-stderr=true
        - --mon-cluster-log-to-stderr=true
        - '--log-stderr-prefix=debug '
        - --default-log-to-file=false
        - --default-mon-cluster-log-to-file=false
        - --ms-learn-addr-from-peer=false
        command:
        - /rook/rook


After that, the osd runs fine and the ceph is available.


Pd: The broken state is easy to reproduce. If i delete one of the commented deployment (oc delete deployment rook-ceph-osd-1 for example) the operator starts the reconciliation  process and breaks my cluster again


Version-Release number of selected component (if applicable):
NAME                              DISPLAY                            VERSION    REPLACES                           PHASE

odf-operator.v4.10.2              OpenShift Data Foundation          4.10.2     odf-operator.v4.9.6                Succeeded

How reproducible:

customer deletes the deployment and issue is reproduced

Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:

Comment 2 Travis Nielsen 2022-05-23 15:18:35 UTC

*** Bug 2089398 has been marked as a duplicate of this bug. ***

Comment 3 Mudit Agarwal 2022-05-30 11:00:06 UTC

Fix is present in the latest build.

Comment 4 Elad 2022-05-30 12:39:45 UTC

For verification, please check https://bugzilla.redhat.com/show_bug.cgi?id=2089398#c17

Comment 10 Elad 2022-07-19 14:41:26 UTC

Moving to VERIFIED based on regression testing of ODF upgrade using 4.11.0-113

ocs-ci results for OCS4-11-Downstream-OCP4-11-AWS-UPI-Proxy-3AZ-RHCOS-3M-3W-upgrade-ocs-auto (BUILD ID: 4.11.0-113 RUN ID: 1658223369)

Comment 12 errata-xmlrpc 2022-08-24 13:53:39 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Important: Red Hat OpenShift Data Foundation 4.11.0 security, enhancement, & bugfix update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2022:6156

Note You need to log in before you can comment on or make changes to this bug.