Bug 2132604 - [RDR] ramen-dr-cluster-operator pod goes into CrashLoopBackOff state
Summary: [RDR] ramen-dr-cluster-operator pod goes into CrashLoopBackOff state
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: Red Hat OpenShift Data Foundation
Classification: Red Hat Storage
Component: odf-dr
Version: 4.12
Hardware: Unspecified
OS: Unspecified
unspecified
urgent
Target Milestone: ---
: ODF 4.12.0
Assignee: Vineet
QA Contact: Sidhant Agrawal
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2022-10-06 08:36 UTC by Sidhant Agrawal
Modified: 2023-08-09 17:00 UTC (History)
12 users (show)

Fixed In Version: 4.12.0-74
Doc Type: No Doc Update
Doc Text:
Clone Of:
Environment:
Last Closed: 2023-02-08 14:06:28 UTC
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github red-hat-storage odf-multicluster-orchestrator pull 136 0 None Merged Bug 2132604: Update Ramen deps to the latest commit 2022-10-13 12:06:00 UTC

Description Sidhant Agrawal 2022-10-06 08:36:24 UTC
Description of problem (please be detailed as possible and provide log
snippests):
In RDR setup, ramen-dr-cluster-operator pod on managed clusters goes into CrashLoopBackOff state and restarts continuously. Also observed that the logs are filled with this error:
```
ERROR	controller-runtime.source	source/source.go:139	if kind is a CRD, it should be installed before calling Start	{"kind": "Backup.velero.io", "error": "no matches for kind \"Backup\" in version \"velero.io/v1\""}
```

Output from one of the managed cluster pod logs:
...
2022-10-06T08:26:11.829Z	ERROR	controller-runtime.source	source/source.go:139	if kind is a CRD, it should be installed before calling Start	{"kind": "Restore.velero.io", "error": "no matches for kind \"Restore\" in version \"velero.io/v1\""}
sigs.k8s.io/controller-runtime/pkg/source.(*Kind).Start.func1.1
	/remote-source/deps/gomod/pkg/mod/sigs.k8s.io/controller-runtime.0/pkg/source/source.go:139
k8s.io/apimachinery/pkg/util/wait.runConditionWithCrashProtectionWithContext
	/remote-source/deps/gomod/pkg/mod/k8s.io/apimachinery.0/pkg/util/wait/wait.go:235
k8s.io/apimachinery/pkg/util/wait.WaitForWithContext
	/remote-source/deps/gomod/pkg/mod/k8s.io/apimachinery.0/pkg/util/wait/wait.go:662
k8s.io/apimachinery/pkg/util/wait.poll
	/remote-source/deps/gomod/pkg/mod/k8s.io/apimachinery.0/pkg/util/wait/wait.go:596
k8s.io/apimachinery/pkg/util/wait.PollImmediateUntilWithContext
	/remote-source/deps/gomod/pkg/mod/k8s.io/apimachinery.0/pkg/util/wait/wait.go:547
sigs.k8s.io/controller-runtime/pkg/source.(*Kind).Start.func1
	/remote-source/deps/gomod/pkg/mod/sigs.k8s.io/controller-runtime.0/pkg/source/source.go:132
2022-10-06T08:26:16.879Z	ERROR	controller-runtime.source	source/source.go:139	if kind is a CRD, it should be installed before calling Start	{"kind": "Backup.velero.io", "error": "no matches for kind \"Backup\" in version \"velero.io/v1\""}
sigs.k8s.io/controller-runtime/pkg/source.(*Kind).Start.func1.1
	/remote-source/deps/gomod/pkg/mod/sigs.k8s.io/controller-runtime.0/pkg/source/source.go:139
k8s.io/apimachinery/pkg/util/wait.runConditionWithCrashProtectionWithContext
	/remote-source/deps/gomod/pkg/mod/k8s.io/apimachinery.0/pkg/util/wait/wait.go:235
k8s.io/apimachinery/pkg/util/wait.WaitForWithContext
	/remote-source/deps/gomod/pkg/mod/k8s.io/apimachinery.0/pkg/util/wait/wait.go:662
k8s.io/apimachinery/pkg/util/wait.poll
	/remote-source/deps/gomod/pkg/mod/k8s.io/apimachinery.0/pkg/util/wait/wait.go:596
k8s.io/apimachinery/pkg/util/wait.PollImmediateUntilWithContext
	/remote-source/deps/gomod/pkg/mod/k8s.io/apimachinery.0/pkg/util/wait/wait.go:547
sigs.k8s.io/controller-runtime/pkg/source.(*Kind).Start.func1
	/remote-source/deps/gomod/pkg/mod/sigs.k8s.io/controller-runtime.0/pkg/source/source.go:132
I1006 08:26:18.524896       1 request.go:601] Waited for 1.250961075s due to client-side throttling, not priority and fairness, request: GET:https://172.30.0.1:443/apis/ramendr.openshift.io/v1alpha1?timeout=32s
2022-10-06T08:26:22.021Z	ERROR	controller-runtime.source	source/source.go:139	if kind is a CRD, it should be installed before calling Start	{"kind": "Restore.velero.io", "error": "no matches for kind \"Restore\" in version \"velero.io/v1\""}
sigs.k8s.io/controller-runtime/pkg/source.(*Kind).Start.func1.1
	/remote-source/deps/gomod/pkg/mod/sigs.k8s.io/controller-runtime.0/pkg/source/source.go:139
k8s.io/apimachinery/pkg/util/wait.runConditionWithCrashProtectionWithContext
	/remote-source/deps/gomod/pkg/mod/k8s.io/apimachinery.0/pkg/util/wait/wait.go:235
k8s.io/apimachinery/pkg/util/wait.WaitForWithContext
	/remote-source/deps/gomod/pkg/mod/k8s.io/apimachinery.0/pkg/util/wait/wait.go:662
k8s.io/apimachinery/pkg/util/wait.poll
	/remote-source/deps/gomod/pkg/mod/k8s.io/apimachinery.0/pkg/util/wait/wait.go:596
k8s.io/apimachinery/pkg/util/wait.PollImmediateUntilWithContext
	/remote-source/deps/gomod/pkg/mod/k8s.io/apimachinery.0/pkg/util/wait/wait.go:547
sigs.k8s.io/controller-runtime/pkg/source.(*Kind).Start.func1
	/remote-source/deps/gomod/pkg/mod/sigs.k8s.io/controller-runtime.0/pkg/source/source.go:132
2022-10-06T08:26:22.426Z	ERROR	controller/controller.go:210	Could not wait for Cache to sync	{"controller": "volumereplicationgroup", "controllerGroup": "ramendr.openshift.io", "controllerKind": "VolumeReplicationGroup", "error": "failed to wait for volumereplicationgroup caches to sync: timed out waiting for cache to be synced"}
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.1
	/remote-source/deps/gomod/pkg/mod/sigs.k8s.io/controller-runtime.0/pkg/internal/controller/controller.go:210
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2
	/remote-source/deps/gomod/pkg/mod/sigs.k8s.io/controller-runtime.0/pkg/internal/controller/controller.go:215
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start
	/remote-source/deps/gomod/pkg/mod/sigs.k8s.io/controller-runtime.0/pkg/internal/controller/controller.go:241
sigs.k8s.io/controller-runtime/pkg/manager.(*runnableGroup).reconcile.func1
	/remote-source/deps/gomod/pkg/mod/sigs.k8s.io/controller-runtime.0/pkg/manager/runnable_group.go:219
2022-10-06T08:26:22.426Z	INFO	manager/internal.go:567	Stopping and waiting for non leader election runnables
2022-10-06T08:26:22.426Z	INFO	manager/internal.go:571	Stopping and waiting for leader election runnables
2022-10-06T08:26:22.426Z	INFO	controller/controller.go:247	Shutdown signal received, waiting for all workers to finish	{"controller": "protectedvolumereplicationgrouplist", "controllerGroup": "ramendr.openshift.io", "controllerKind": "ProtectedVolumeReplicationGroupList"}
2022-10-06T08:26:22.426Z	INFO	controller/controller.go:249	All workers finished	{"controller": "protectedvolumereplicationgrouplist", "controllerGroup": "ramendr.openshift.io", "controllerKind": "ProtectedVolumeReplicationGroupList"}
2022-10-06T08:26:22.426Z	INFO	manager/internal.go:577	Stopping and waiting for caches
2022-10-06T08:26:22.426Z	INFO	manager/internal.go:581	Stopping and waiting for webhooks
2022-10-06T08:26:22.427Z	INFO	manager/internal.go:585	Wait completed, proceeding to shutdown the manager
2022-10-06T08:26:22.427Z	ERROR	setup	app/main.go:210	problem running manager	{"error": "failed to wait for volumereplicationgroup caches to sync: timed out waiting for cache to be synced"}
main.main
	/remote-source/app/main.go:210
runtime.main
	/usr/lib/golang/src/runtime/proc.go:250
```
	
Version of all relevant components (if applicable):
OCP: 4.12.0-0.nightly-2022-09-28-204419
ODF: 4.12.0-70
ACM: 2.6.1
Submariner: 0.13.0

Does this issue impact your ability to continue to work with the product
(please explain in detail what is the user impact)?

Is there any workaround available to the best of your knowledge?

Rate from 1 - 5 the complexity of the scenario you performed that caused this
bug (1 - very simple, 5 - very complex)?
2

Can this issue reproducible?

Can this issue reproduce from the UI?

If this is a regression, please provide more details to justify this:

Steps to Reproduce:
1. Configure RDR setup with 1 ACM hub and 2 managed clusters
2. Check status of ramen-dr-cluster-operator pod in managed clusters


Actual results:
ramen-dr-cluster-operator pod goes into CrashLoopBackOff state

Expected results:
ramen-dr-cluster-operator pod should not go into CrashLoopBackOff state

Additional info:


Note You need to log in before you can comment on or make changes to this bug.