Bug 2187952

Summary: When cluster controller is cancelled frequently, multiple simultaneous controllers cause issues since need to wait for shutdown before continuing new controller
Product: [Red Hat Storage] Red Hat OpenShift Data Foundation Reporter: Mahesh Shetty <mashetty>
Component: rookAssignee: Subham Rai <srai>
Status: CLOSED UPSTREAM QA Contact: Mahesh Shetty <mashetty>
Severity: urgent Docs Contact:
Priority: unspecified    
Version: 4.13CC: athakkar, dahorak, hnallurv, muagarwa, ocs-bugs, odf-bz-bot, owasserm, pbalogh, srai, tnielsen
Target Milestone: ---Keywords: Regression, TestBlocker
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: 4.13.0-181 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2023-06-06 15:26:37 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 2193220    
Bug Blocks:    

Comment 12 Subham Rai 2023-04-20 10:23:16 UTC
seems like we are not looking at this right error...

```
2023-04-20 10:17:06.055843 E | ceph-file-controller: failed to reconcile failed to detect running and desired ceph version: failed to detect ceph image version: failed to complete ceph version job: failed to run CmdReporter ceph-file-controller-detect-version successfully. failed to delete existing results ConfigMap ceph-file-controller-detect-version. failed to delete ConfigMap ceph-file-controller-detect-version. client rate limiter Wait returned an error: context canceled
2023-04-20 10:17:06.055874 E | ceph-file-controller: failed to reconcile CephFilesystem "openshift-storage/ocs-storagecluster-cephfilesystem". failed to detect running and desired ceph version: failed to detect ceph image version: failed to complete ceph version job: failed to run CmdReporter ceph-file-controller-detect-version successfully. failed to delete existing results ConfigMap ceph-file-controller-detect-version. failed to delete ConfigMap ceph-file-controller-detect-version. client rate limiter Wait returned an error: context canceled
2023-04-20 10:17:06.055903 E | ceph-csi: failed to reconcile failed to get csi ceph.conf configmap: failed to get csi ceph.conf configmap "csi-ceph-conf-override" (in "openshift-storage"): client rate limiter Wait returned an error: context canceled
2023-04-20 10:17:06.055921 E | ceph-nodedaemon-controller: node reconcile failed: failed to create ceph-exporter metrics service: failed to update service rook-ceph-exporter. client rate limiter Wait returned an error: context canceled
```

@athakkar is looking at this.

Comment 13 Mudit Agarwal 2023-04-20 13:33:25 UTC
*** Bug 2187951 has been marked as a duplicate of this bug. ***

Comment 24 Subham Rai 2023-05-02 14:57:33 UTC
*** Bug 2190413 has been marked as a duplicate of this bug. ***

Comment 40 Subham Rai 2023-06-06 10:41:39 UTC
upstream tracker https://github.com/rook/rook/issues/12331

Comment 41 Travis Nielsen 2023-06-06 15:26:37 UTC
Let's track this with the upstream issue as there is no longer a downstream repro for this issue since OCS operator fixed the frequent cephcluster CR update issue.