* Description of problem (please be detailed as possible and provide log snippets): ODF 4.14 installation does not complete because the 'rook-ceph-rgw-ocs-storagecluster-cephobjectstore-a pod' stuck in CrashLoopBackOff state * Version of all relevant components (if applicable): ODF 4.14
@karun A few questions: - Does restarting the rook operator fixes the issue completely and cluster is usable after that? (Trying to understand if there a race condition like you mentioned above) - In the logs that you have shared, the ceph status seems to be `Healh_ok` and no OSDs are down. Is must gather taken after the Operator was restarted and StorageCluster was successful?
(In reply to Santosh Pillai from comment #8) > @karun A few questions: > > - Does restarting the rook operator fixes the issue completely and cluster > is usable after that? (Trying to understand if there a race condition like > you mentioned above) > - In the logs that you have shared, the ceph status seems to be `Healh_ok` > and no OSDs are down. Is must gather taken after the Operator was restarted > and StorageCluster was successful? - is it possible to get the output of `radosgw-admin zonegroup list` on a cluster with this error vs on a cluster that is working fine?
Thanks Karun for the details. This should be fixed by Parth's PR https://github.com/rook/rook/pull/12817 which correctly handles the timeout errors and has increased the retry count.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Important: Red Hat OpenShift Data Foundation 4.15.0 security, enhancement, & bug fix update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2024:1383