Bug 2089397

Summary: [GSS]OSD pods CLBO after upgrade to 4.10 from 4.9.
Product: [Red Hat Storage] Red Hat OpenShift Data Foundation Reporter: khover
Component: rookAssignee: Sébastien Han <shan>
Status: CLOSED ERRATA QA Contact: Elad <ebenahar>
Severity: high Docs Contact:
Priority: unspecified    
Version: 4.10CC: ebenahar, hnallurv, madam, muagarwa, nberry, ocs-bugs, odf-bz-bot, pbalogh, petr.bena, shan, tdesala
Target Milestone: ---   
Target Release: ODF 4.11.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: No Doc Update
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2022-08-24 13:53:39 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
osd describe none

Description khover 2022-05-23 14:59:42 UTC
Created attachment 1882431 [details]
osd describe

Description of problem:

OSD pods CLBO after upgrade to 4.10 from 4.9.

rook-ceph-osd-0-7c5b8797dc-jpk4w                                  1/2     CrashLoopBackOff    29 (3m18s ago)   95m
rook-ceph-osd-1-676cbfb684-fcccr                                  1/2     CrashLoopBackOff    28 (5s ago)      84m
rook-ceph-osd-2-89bb9dbd9-p56b2                                   1/2     CrashLoopBackOff    11 (4m25s ago)   36m

edited each one oth this 3 deployments (that are exactly the 3 deployments that i see in crashloop state) and i removed the "/rook/rook" from the args 
rook-ceph-osd-0                                      1/1     1            1           46h
rook-ceph-osd-1                                      1/1     1            1           46h
rook-ceph-osd-2                                      1/1     1            1           7h7m



      containers:
      - args:
        - /rook/rook <-- I Removed this line
        - ceph
        - osd
        - start
        - --
        - --foreground
        - --id
        - "1"
        - --fsid
        - 42e1ae07-9402-4cc9-b1a4-a1fe127e6ebc
        - --cluster
        - ceph
        - --setuser
        - ceph
        - --setgroup
        - ceph
        - --crush-location=root=default host=xxxocpocsxxxs02 rack=rack2
        - --log-to-stderr=true
        - --err-to-stderr=true
        - --mon-cluster-log-to-stderr=true
        - '--log-stderr-prefix=debug '
        - --default-log-to-file=false
        - --default-mon-cluster-log-to-file=false
        - --ms-learn-addr-from-peer=false
        command:
        - /rook/rook


After that, the osd runs fine and the ceph is available.


Pd: The broken state is easy to reproduce. If i delete one of the commented deployment (oc delete deployment rook-ceph-osd-1 for example) the operator starts the reconciliation  process and breaks my cluster again


Version-Release number of selected component (if applicable):
NAME                              DISPLAY                            VERSION    REPLACES                           PHASE

odf-operator.v4.10.2              OpenShift Data Foundation          4.10.2     odf-operator.v4.9.6                Succeeded

How reproducible:

customer deletes the deployment and issue is reproduced

Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:

Comment 2 Travis Nielsen 2022-05-23 15:18:35 UTC
*** Bug 2089398 has been marked as a duplicate of this bug. ***

Comment 3 Mudit Agarwal 2022-05-30 11:00:06 UTC
Fix is present in the latest build.

Comment 4 Elad 2022-05-30 12:39:45 UTC
For verification, please check https://bugzilla.redhat.com/show_bug.cgi?id=2089398#c17

Comment 10 Elad 2022-07-19 14:41:26 UTC
Moving to VERIFIED based on regression testing of ODF upgrade using 4.11.0-113

ocs-ci results for OCS4-11-Downstream-OCP4-11-AWS-UPI-Proxy-3AZ-RHCOS-3M-3W-upgrade-ocs-auto (BUILD ID: 4.11.0-113 RUN ID: 1658223369)

Comment 12 errata-xmlrpc 2022-08-24 13:53:39 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Important: Red Hat OpenShift Data Foundation 4.11.0 security, enhancement, & bugfix update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2022:6156