Bug 2089397 - [GSS]OSD pods CLBO after upgrade to 4.10 from 4.9.
Summary: [GSS]OSD pods CLBO after upgrade to 4.10 from 4.9.
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat OpenShift Data Foundation
Classification: Red Hat Storage
Component: rook
Version: 4.10
Hardware: Unspecified
OS: Unspecified
unspecified
high
Target Milestone: ---
: ODF 4.11.0
Assignee: Sébastien Han
QA Contact: Elad
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2022-05-23 14:59 UTC by khover
Modified: 2023-08-09 17:03 UTC (History)
11 users (show)

Fixed In Version:
Doc Type: No Doc Update
Doc Text:
Clone Of:
Environment:
Last Closed: 2022-08-24 13:53:39 UTC
Embargoed:


Attachments (Terms of Use)
osd describe (20.86 KB, text/plain)
2022-05-23 14:59 UTC, khover
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Github rook rook pull 10298 0 None open osd: remove broken argument 2022-05-23 15:00:11 UTC
Red Hat Product Errata RHSA-2022:6156 0 None None None 2022-08-24 13:53:49 UTC

Description khover 2022-05-23 14:59:42 UTC
Created attachment 1882431 [details]
osd describe

Description of problem:

OSD pods CLBO after upgrade to 4.10 from 4.9.

rook-ceph-osd-0-7c5b8797dc-jpk4w                                  1/2     CrashLoopBackOff    29 (3m18s ago)   95m
rook-ceph-osd-1-676cbfb684-fcccr                                  1/2     CrashLoopBackOff    28 (5s ago)      84m
rook-ceph-osd-2-89bb9dbd9-p56b2                                   1/2     CrashLoopBackOff    11 (4m25s ago)   36m

edited each one oth this 3 deployments (that are exactly the 3 deployments that i see in crashloop state) and i removed the "/rook/rook" from the args 
rook-ceph-osd-0                                      1/1     1            1           46h
rook-ceph-osd-1                                      1/1     1            1           46h
rook-ceph-osd-2                                      1/1     1            1           7h7m



      containers:
      - args:
        - /rook/rook <-- I Removed this line
        - ceph
        - osd
        - start
        - --
        - --foreground
        - --id
        - "1"
        - --fsid
        - 42e1ae07-9402-4cc9-b1a4-a1fe127e6ebc
        - --cluster
        - ceph
        - --setuser
        - ceph
        - --setgroup
        - ceph
        - --crush-location=root=default host=xxxocpocsxxxs02 rack=rack2
        - --log-to-stderr=true
        - --err-to-stderr=true
        - --mon-cluster-log-to-stderr=true
        - '--log-stderr-prefix=debug '
        - --default-log-to-file=false
        - --default-mon-cluster-log-to-file=false
        - --ms-learn-addr-from-peer=false
        command:
        - /rook/rook


After that, the osd runs fine and the ceph is available.


Pd: The broken state is easy to reproduce. If i delete one of the commented deployment (oc delete deployment rook-ceph-osd-1 for example) the operator starts the reconciliation  process and breaks my cluster again


Version-Release number of selected component (if applicable):
NAME                              DISPLAY                            VERSION    REPLACES                           PHASE

odf-operator.v4.10.2              OpenShift Data Foundation          4.10.2     odf-operator.v4.9.6                Succeeded

How reproducible:

customer deletes the deployment and issue is reproduced

Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:

Comment 2 Travis Nielsen 2022-05-23 15:18:35 UTC
*** Bug 2089398 has been marked as a duplicate of this bug. ***

Comment 3 Mudit Agarwal 2022-05-30 11:00:06 UTC
Fix is present in the latest build.

Comment 4 Elad 2022-05-30 12:39:45 UTC
For verification, please check https://bugzilla.redhat.com/show_bug.cgi?id=2089398#c17

Comment 10 Elad 2022-07-19 14:41:26 UTC
Moving to VERIFIED based on regression testing of ODF upgrade using 4.11.0-113

ocs-ci results for OCS4-11-Downstream-OCP4-11-AWS-UPI-Proxy-3AZ-RHCOS-3M-3W-upgrade-ocs-auto (BUILD ID: 4.11.0-113 RUN ID: 1658223369)

Comment 12 errata-xmlrpc 2022-08-24 13:53:39 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Important: Red Hat OpenShift Data Foundation 4.11.0 security, enhancement, & bugfix update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2022:6156


Note You need to log in before you can comment on or make changes to this bug.