Bug 2187952 - When cluster controller is cancelled frequently, multiple simultaneous controllers cause issues since need to wait for shutdown before continuing new controller
Summary: When cluster controller is cancelled frequently, multiple simultaneous contro...
Keywords:
Status: CLOSED UPSTREAM
Alias: None
Product: Red Hat OpenShift Data Foundation
Classification: Red Hat Storage
Component: rook
Version: 4.13
Hardware: Unspecified
OS: Unspecified
unspecified
urgent
Target Milestone: ---
: ---
Assignee: Subham Rai
QA Contact: Mahesh Shetty
URL:
Whiteboard:
: 2187951 2190413 (view as bug list)
Depends On: 2193220
Blocks:
TreeView+ depends on / blocked
 
Reported: 2023-04-19 08:17 UTC by Mahesh Shetty
Modified: 2023-08-09 17:03 UTC (History)
10 users (show)

Fixed In Version: 4.13.0-181
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2023-06-06 15:26:37 UTC
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github red-hat-storage rook pull 474 0 None Draft Bug 2187952: Disable exporter service for downstream 4.13 2023-04-24 12:12:43 UTC

Comment 12 Subham Rai 2023-04-20 10:23:16 UTC
seems like we are not looking at this right error...

```
2023-04-20 10:17:06.055843 E | ceph-file-controller: failed to reconcile failed to detect running and desired ceph version: failed to detect ceph image version: failed to complete ceph version job: failed to run CmdReporter ceph-file-controller-detect-version successfully. failed to delete existing results ConfigMap ceph-file-controller-detect-version. failed to delete ConfigMap ceph-file-controller-detect-version. client rate limiter Wait returned an error: context canceled
2023-04-20 10:17:06.055874 E | ceph-file-controller: failed to reconcile CephFilesystem "openshift-storage/ocs-storagecluster-cephfilesystem". failed to detect running and desired ceph version: failed to detect ceph image version: failed to complete ceph version job: failed to run CmdReporter ceph-file-controller-detect-version successfully. failed to delete existing results ConfigMap ceph-file-controller-detect-version. failed to delete ConfigMap ceph-file-controller-detect-version. client rate limiter Wait returned an error: context canceled
2023-04-20 10:17:06.055903 E | ceph-csi: failed to reconcile failed to get csi ceph.conf configmap: failed to get csi ceph.conf configmap "csi-ceph-conf-override" (in "openshift-storage"): client rate limiter Wait returned an error: context canceled
2023-04-20 10:17:06.055921 E | ceph-nodedaemon-controller: node reconcile failed: failed to create ceph-exporter metrics service: failed to update service rook-ceph-exporter. client rate limiter Wait returned an error: context canceled
```

@athakkar is looking at this.

Comment 13 Mudit Agarwal 2023-04-20 13:33:25 UTC
*** Bug 2187951 has been marked as a duplicate of this bug. ***

Comment 24 Subham Rai 2023-05-02 14:57:33 UTC
*** Bug 2190413 has been marked as a duplicate of this bug. ***

Comment 40 Subham Rai 2023-06-06 10:41:39 UTC
upstream tracker https://github.com/rook/rook/issues/12331

Comment 41 Travis Nielsen 2023-06-06 15:26:37 UTC
Let's track this with the upstream issue as there is no longer a downstream repro for this issue since OCS operator fixed the frequent cephcluster CR update issue.


Note You need to log in before you can comment on or make changes to this bug.