2187952 – When cluster controller is cancelled frequently, multiple simultaneous controllers cause issues since need to wait for shutdown before continuing new controller

Bug 2187952 - When cluster controller is cancelled frequently, multiple simultaneous controllers cause issues since need to wait for shutdown before continuing new controller

Summary: When cluster controller is cancelled frequently, multiple simultaneous contro...

Keywords:
Status:	CLOSED UPSTREAM
Alias:	None
Product:	Red Hat OpenShift Data Foundation
Classification:	Red Hat Storage
Component:	rook
Sub Component:
Version:	4.13
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	urgent
Target Milestone:	---
Target Release:	---
Assignee:	Subham Rai
QA Contact:	Mahesh Shetty
Docs Contact:
URL:
Whiteboard:
Duplicates (2):	2187951 2190413 (view as bug list)
Depends On:	2193220
Blocks:
TreeView+	depends on / blocked

Reported:	2023-04-19 08:17 UTC by Mahesh Shetty
Modified:	2023-08-09 17:03 UTC (History)
CC List:	10 users (show)
Fixed In Version:	4.13.0-181
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2023-06-06 15:26:37 UTC
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Github	red-hat-storage rook pull 474	0	None	Draft	Bug 2187952: Disable exporter service for downstream 4.13	2023-04-24 12:12:43 UTC

Comment 12 Subham Rai 2023-04-20 10:23:16 UTC

seems like we are not looking at this right error...

```
2023-04-20 10:17:06.055843 E | ceph-file-controller: failed to reconcile failed to detect running and desired ceph version: failed to detect ceph image version: failed to complete ceph version job: failed to run CmdReporter ceph-file-controller-detect-version successfully. failed to delete existing results ConfigMap ceph-file-controller-detect-version. failed to delete ConfigMap ceph-file-controller-detect-version. client rate limiter Wait returned an error: context canceled
2023-04-20 10:17:06.055874 E | ceph-file-controller: failed to reconcile CephFilesystem "openshift-storage/ocs-storagecluster-cephfilesystem". failed to detect running and desired ceph version: failed to detect ceph image version: failed to complete ceph version job: failed to run CmdReporter ceph-file-controller-detect-version successfully. failed to delete existing results ConfigMap ceph-file-controller-detect-version. failed to delete ConfigMap ceph-file-controller-detect-version. client rate limiter Wait returned an error: context canceled
2023-04-20 10:17:06.055903 E | ceph-csi: failed to reconcile failed to get csi ceph.conf configmap: failed to get csi ceph.conf configmap "csi-ceph-conf-override" (in "openshift-storage"): client rate limiter Wait returned an error: context canceled
2023-04-20 10:17:06.055921 E | ceph-nodedaemon-controller: node reconcile failed: failed to create ceph-exporter metrics service: failed to update service rook-ceph-exporter. client rate limiter Wait returned an error: context canceled
```

@athakkar is looking at this.

Comment 13 Mudit Agarwal 2023-04-20 13:33:25 UTC

*** Bug 2187951 has been marked as a duplicate of this bug. ***

Comment 24 Subham Rai 2023-05-02 14:57:33 UTC

*** Bug 2190413 has been marked as a duplicate of this bug. ***

Comment 40 Subham Rai 2023-06-06 10:41:39 UTC

upstream tracker https://github.com/rook/rook/issues/12331

Comment 41 Travis Nielsen 2023-06-06 15:26:37 UTC

Let's track this with the upstream issue as there is no longer a downstream repro for this issue since OCS operator fixed the frequent cephcluster CR update issue.

Note You need to log in before you can comment on or make changes to this bug.