Bug 2089397

Summary:

[GSS]OSD pods CLBO after upgrade to 4.10 from 4.9.

Product:

[Red Hat Storage] Red Hat OpenShift Data Foundation

Reporter:

khover

Component:

rook

Assignee:

Sébastien Han <shan>

Status:

CLOSED ERRATA

QA Contact:

Elad <ebenahar>

Severity:

high

Docs Contact:

Priority:

unspecified

Version:

4.10

CC:

ebenahar, hnallurv, madam, muagarwa, nberry, ocs-bugs, odf-bz-bot, pbalogh, petr.bena, shan, tdesala

Target Milestone:

---

Target Release:

ODF 4.11.0

Hardware:

Unspecified

OS:

Unspecified

Whiteboard:

Fixed In Version:

Doc Type:

No Doc Update

Doc Text:

Story Points:

---

Clone Of:

Environment:

Last Closed:

2022-08-24 13:53:39 UTC

Type:

Bug

Regression:

---

Mount Type:

---

Documentation:

---

CRM:

Verified Versions:

Category:

---

oVirt Team:

---

RHEL 7.3 requirements from Atomic Host:

Cloudforms Team:

---

Target Upstream Version:

Embargoed:

Attachments:

Description	Flags
osd describe	none

Description khover 2022-05-23 14:59:42 UTC

Created attachment 1882431 [details]
osd describe

Description of problem:

OSD pods CLBO after upgrade to 4.10 from 4.9.

rook-ceph-osd-0-7c5b8797dc-jpk4w                                  1/2     CrashLoopBackOff    29 (3m18s ago)   95m
rook-ceph-osd-1-676cbfb684-fcccr                                  1/2     CrashLoopBackOff    28 (5s ago)      84m
rook-ceph-osd-2-89bb9dbd9-p56b2                                   1/2     CrashLoopBackOff    11 (4m25s ago)   36m

edited each one oth this 3 deployments (that are exactly the 3 deployments that i see in crashloop state) and i removed the "/rook/rook" from the args 
rook-ceph-osd-0                                      1/1     1            1           46h
rook-ceph-osd-1                                      1/1     1            1           46h
rook-ceph-osd-2                                      1/1     1            1           7h7m



      containers:
      - args:
        - /rook/rook <-- I Removed this line
        - ceph
        - osd
        - start
        - --
        - --foreground
        - --id
        - "1"
        - --fsid
        - 42e1ae07-9402-4cc9-b1a4-a1fe127e6ebc
        - --cluster
        - ceph
        - --setuser
        - ceph
        - --setgroup
        - ceph
        - --crush-location=root=default host=xxxocpocsxxxs02 rack=rack2
        - --log-to-stderr=true
        - --err-to-stderr=true
        - --mon-cluster-log-to-stderr=true
        - '--log-stderr-prefix=debug '
        - --default-log-to-file=false
        - --default-mon-cluster-log-to-file=false
        - --ms-learn-addr-from-peer=false
        command:
        - /rook/rook


After that, the osd runs fine and the ceph is available.


Pd: The broken state is easy to reproduce. If i delete one of the commented deployment (oc delete deployment rook-ceph-osd-1 for example) the operator starts the reconciliation  process and breaks my cluster again


Version-Release number of selected component (if applicable):
NAME                              DISPLAY                            VERSION    REPLACES                           PHASE

odf-operator.v4.10.2              OpenShift Data Foundation          4.10.2     odf-operator.v4.9.6                Succeeded

How reproducible:

customer deletes the deployment and issue is reproduced

Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:

Comment 2 Travis Nielsen 2022-05-23 15:18:35 UTC

*** Bug 2089398 has been marked as a duplicate of this bug. ***

Comment 3 Mudit Agarwal 2022-05-30 11:00:06 UTC

Fix is present in the latest build.

Comment 4 Elad 2022-05-30 12:39:45 UTC

For verification, please check https://bugzilla.redhat.com/show_bug.cgi?id=2089398#c17

Comment 10 Elad 2022-07-19 14:41:26 UTC

Moving to VERIFIED based on regression testing of ODF upgrade using 4.11.0-113

ocs-ci results for OCS4-11-Downstream-OCP4-11-AWS-UPI-Proxy-3AZ-RHCOS-3M-3W-upgrade-ocs-auto (BUILD ID: 4.11.0-113 RUN ID: 1658223369)

Comment 12 errata-xmlrpc 2022-08-24 13:53:39 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Important: Red Hat OpenShift Data Foundation 4.11.0 security, enhancement, & bugfix update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2022:6156