Created attachment 1882431 [details] osd describe Description of problem: OSD pods CLBO after upgrade to 4.10 from 4.9. rook-ceph-osd-0-7c5b8797dc-jpk4w 1/2 CrashLoopBackOff 29 (3m18s ago) 95m rook-ceph-osd-1-676cbfb684-fcccr 1/2 CrashLoopBackOff 28 (5s ago) 84m rook-ceph-osd-2-89bb9dbd9-p56b2 1/2 CrashLoopBackOff 11 (4m25s ago) 36m edited each one oth this 3 deployments (that are exactly the 3 deployments that i see in crashloop state) and i removed the "/rook/rook" from the args rook-ceph-osd-0 1/1 1 1 46h rook-ceph-osd-1 1/1 1 1 46h rook-ceph-osd-2 1/1 1 1 7h7m containers: - args: - /rook/rook <-- I Removed this line - ceph - osd - start - -- - --foreground - --id - "1" - --fsid - 42e1ae07-9402-4cc9-b1a4-a1fe127e6ebc - --cluster - ceph - --setuser - ceph - --setgroup - ceph - --crush-location=root=default host=xxxocpocsxxxs02 rack=rack2 - --log-to-stderr=true - --err-to-stderr=true - --mon-cluster-log-to-stderr=true - '--log-stderr-prefix=debug ' - --default-log-to-file=false - --default-mon-cluster-log-to-file=false - --ms-learn-addr-from-peer=false command: - /rook/rook After that, the osd runs fine and the ceph is available. Pd: The broken state is easy to reproduce. If i delete one of the commented deployment (oc delete deployment rook-ceph-osd-1 for example) the operator starts the reconciliation process and breaks my cluster again Version-Release number of selected component (if applicable): NAME DISPLAY VERSION REPLACES PHASE odf-operator.v4.10.2 OpenShift Data Foundation 4.10.2 odf-operator.v4.9.6 Succeeded How reproducible: customer deletes the deployment and issue is reproduced Steps to Reproduce: 1. 2. 3. Actual results: Expected results: Additional info:
*** Bug 2089398 has been marked as a duplicate of this bug. ***
Fix is present in the latest build.
For verification, please check https://bugzilla.redhat.com/show_bug.cgi?id=2089398#c17
Moving to VERIFIED based on regression testing of ODF upgrade using 4.11.0-113 ocs-ci results for OCS4-11-Downstream-OCP4-11-AWS-UPI-Proxy-3AZ-RHCOS-3M-3W-upgrade-ocs-auto (BUILD ID: 4.11.0-113 RUN ID: 1658223369)
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Important: Red Hat OpenShift Data Foundation 4.11.0 security, enhancement, & bugfix update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2022:6156