DescriptionSidhant Agrawal
2022-10-06 08:36:24 UTC
Description of problem (please be detailed as possible and provide log
snippests):
In RDR setup, ramen-dr-cluster-operator pod on managed clusters goes into CrashLoopBackOff state and restarts continuously. Also observed that the logs are filled with this error:
```
ERROR controller-runtime.source source/source.go:139 if kind is a CRD, it should be installed before calling Start {"kind": "Backup.velero.io", "error": "no matches for kind \"Backup\" in version \"velero.io/v1\""}
```
Output from one of the managed cluster pod logs:
...
2022-10-06T08:26:11.829Z ERROR controller-runtime.source source/source.go:139 if kind is a CRD, it should be installed before calling Start {"kind": "Restore.velero.io", "error": "no matches for kind \"Restore\" in version \"velero.io/v1\""}
sigs.k8s.io/controller-runtime/pkg/source.(*Kind).Start.func1.1
/remote-source/deps/gomod/pkg/mod/sigs.k8s.io/controller-runtime.0/pkg/source/source.go:139
k8s.io/apimachinery/pkg/util/wait.runConditionWithCrashProtectionWithContext
/remote-source/deps/gomod/pkg/mod/k8s.io/apimachinery.0/pkg/util/wait/wait.go:235
k8s.io/apimachinery/pkg/util/wait.WaitForWithContext
/remote-source/deps/gomod/pkg/mod/k8s.io/apimachinery.0/pkg/util/wait/wait.go:662
k8s.io/apimachinery/pkg/util/wait.poll
/remote-source/deps/gomod/pkg/mod/k8s.io/apimachinery.0/pkg/util/wait/wait.go:596
k8s.io/apimachinery/pkg/util/wait.PollImmediateUntilWithContext
/remote-source/deps/gomod/pkg/mod/k8s.io/apimachinery.0/pkg/util/wait/wait.go:547
sigs.k8s.io/controller-runtime/pkg/source.(*Kind).Start.func1
/remote-source/deps/gomod/pkg/mod/sigs.k8s.io/controller-runtime.0/pkg/source/source.go:132
2022-10-06T08:26:16.879Z ERROR controller-runtime.source source/source.go:139 if kind is a CRD, it should be installed before calling Start {"kind": "Backup.velero.io", "error": "no matches for kind \"Backup\" in version \"velero.io/v1\""}
sigs.k8s.io/controller-runtime/pkg/source.(*Kind).Start.func1.1
/remote-source/deps/gomod/pkg/mod/sigs.k8s.io/controller-runtime.0/pkg/source/source.go:139
k8s.io/apimachinery/pkg/util/wait.runConditionWithCrashProtectionWithContext
/remote-source/deps/gomod/pkg/mod/k8s.io/apimachinery.0/pkg/util/wait/wait.go:235
k8s.io/apimachinery/pkg/util/wait.WaitForWithContext
/remote-source/deps/gomod/pkg/mod/k8s.io/apimachinery.0/pkg/util/wait/wait.go:662
k8s.io/apimachinery/pkg/util/wait.poll
/remote-source/deps/gomod/pkg/mod/k8s.io/apimachinery.0/pkg/util/wait/wait.go:596
k8s.io/apimachinery/pkg/util/wait.PollImmediateUntilWithContext
/remote-source/deps/gomod/pkg/mod/k8s.io/apimachinery.0/pkg/util/wait/wait.go:547
sigs.k8s.io/controller-runtime/pkg/source.(*Kind).Start.func1
/remote-source/deps/gomod/pkg/mod/sigs.k8s.io/controller-runtime.0/pkg/source/source.go:132
I1006 08:26:18.524896 1 request.go:601] Waited for 1.250961075s due to client-side throttling, not priority and fairness, request: GET:https://172.30.0.1:443/apis/ramendr.openshift.io/v1alpha1?timeout=32s
2022-10-06T08:26:22.021Z ERROR controller-runtime.source source/source.go:139 if kind is a CRD, it should be installed before calling Start {"kind": "Restore.velero.io", "error": "no matches for kind \"Restore\" in version \"velero.io/v1\""}
sigs.k8s.io/controller-runtime/pkg/source.(*Kind).Start.func1.1
/remote-source/deps/gomod/pkg/mod/sigs.k8s.io/controller-runtime.0/pkg/source/source.go:139
k8s.io/apimachinery/pkg/util/wait.runConditionWithCrashProtectionWithContext
/remote-source/deps/gomod/pkg/mod/k8s.io/apimachinery.0/pkg/util/wait/wait.go:235
k8s.io/apimachinery/pkg/util/wait.WaitForWithContext
/remote-source/deps/gomod/pkg/mod/k8s.io/apimachinery.0/pkg/util/wait/wait.go:662
k8s.io/apimachinery/pkg/util/wait.poll
/remote-source/deps/gomod/pkg/mod/k8s.io/apimachinery.0/pkg/util/wait/wait.go:596
k8s.io/apimachinery/pkg/util/wait.PollImmediateUntilWithContext
/remote-source/deps/gomod/pkg/mod/k8s.io/apimachinery.0/pkg/util/wait/wait.go:547
sigs.k8s.io/controller-runtime/pkg/source.(*Kind).Start.func1
/remote-source/deps/gomod/pkg/mod/sigs.k8s.io/controller-runtime.0/pkg/source/source.go:132
2022-10-06T08:26:22.426Z ERROR controller/controller.go:210 Could not wait for Cache to sync {"controller": "volumereplicationgroup", "controllerGroup": "ramendr.openshift.io", "controllerKind": "VolumeReplicationGroup", "error": "failed to wait for volumereplicationgroup caches to sync: timed out waiting for cache to be synced"}
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.1
/remote-source/deps/gomod/pkg/mod/sigs.k8s.io/controller-runtime.0/pkg/internal/controller/controller.go:210
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2
/remote-source/deps/gomod/pkg/mod/sigs.k8s.io/controller-runtime.0/pkg/internal/controller/controller.go:215
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start
/remote-source/deps/gomod/pkg/mod/sigs.k8s.io/controller-runtime.0/pkg/internal/controller/controller.go:241
sigs.k8s.io/controller-runtime/pkg/manager.(*runnableGroup).reconcile.func1
/remote-source/deps/gomod/pkg/mod/sigs.k8s.io/controller-runtime.0/pkg/manager/runnable_group.go:219
2022-10-06T08:26:22.426Z INFO manager/internal.go:567 Stopping and waiting for non leader election runnables
2022-10-06T08:26:22.426Z INFO manager/internal.go:571 Stopping and waiting for leader election runnables
2022-10-06T08:26:22.426Z INFO controller/controller.go:247 Shutdown signal received, waiting for all workers to finish {"controller": "protectedvolumereplicationgrouplist", "controllerGroup": "ramendr.openshift.io", "controllerKind": "ProtectedVolumeReplicationGroupList"}
2022-10-06T08:26:22.426Z INFO controller/controller.go:249 All workers finished {"controller": "protectedvolumereplicationgrouplist", "controllerGroup": "ramendr.openshift.io", "controllerKind": "ProtectedVolumeReplicationGroupList"}
2022-10-06T08:26:22.426Z INFO manager/internal.go:577 Stopping and waiting for caches
2022-10-06T08:26:22.426Z INFO manager/internal.go:581 Stopping and waiting for webhooks
2022-10-06T08:26:22.427Z INFO manager/internal.go:585 Wait completed, proceeding to shutdown the manager
2022-10-06T08:26:22.427Z ERROR setup app/main.go:210 problem running manager {"error": "failed to wait for volumereplicationgroup caches to sync: timed out waiting for cache to be synced"}
main.main
/remote-source/app/main.go:210
runtime.main
/usr/lib/golang/src/runtime/proc.go:250
```
Version of all relevant components (if applicable):
OCP: 4.12.0-0.nightly-2022-09-28-204419
ODF: 4.12.0-70
ACM: 2.6.1
Submariner: 0.13.0
Does this issue impact your ability to continue to work with the product
(please explain in detail what is the user impact)?
Is there any workaround available to the best of your knowledge?
Rate from 1 - 5 the complexity of the scenario you performed that caused this
bug (1 - very simple, 5 - very complex)?
2
Can this issue reproducible?
Can this issue reproduce from the UI?
If this is a regression, please provide more details to justify this:
Steps to Reproduce:
1. Configure RDR setup with 1 ACM hub and 2 managed clusters
2. Check status of ramen-dr-cluster-operator pod in managed clusters
Actual results:
ramen-dr-cluster-operator pod goes into CrashLoopBackOff state
Expected results:
ramen-dr-cluster-operator pod should not go into CrashLoopBackOff state
Additional info:
Description of problem (please be detailed as possible and provide log snippests): In RDR setup, ramen-dr-cluster-operator pod on managed clusters goes into CrashLoopBackOff state and restarts continuously. Also observed that the logs are filled with this error: ``` ERROR controller-runtime.source source/source.go:139 if kind is a CRD, it should be installed before calling Start {"kind": "Backup.velero.io", "error": "no matches for kind \"Backup\" in version \"velero.io/v1\""} ``` Output from one of the managed cluster pod logs: ... 2022-10-06T08:26:11.829Z ERROR controller-runtime.source source/source.go:139 if kind is a CRD, it should be installed before calling Start {"kind": "Restore.velero.io", "error": "no matches for kind \"Restore\" in version \"velero.io/v1\""} sigs.k8s.io/controller-runtime/pkg/source.(*Kind).Start.func1.1 /remote-source/deps/gomod/pkg/mod/sigs.k8s.io/controller-runtime.0/pkg/source/source.go:139 k8s.io/apimachinery/pkg/util/wait.runConditionWithCrashProtectionWithContext /remote-source/deps/gomod/pkg/mod/k8s.io/apimachinery.0/pkg/util/wait/wait.go:235 k8s.io/apimachinery/pkg/util/wait.WaitForWithContext /remote-source/deps/gomod/pkg/mod/k8s.io/apimachinery.0/pkg/util/wait/wait.go:662 k8s.io/apimachinery/pkg/util/wait.poll /remote-source/deps/gomod/pkg/mod/k8s.io/apimachinery.0/pkg/util/wait/wait.go:596 k8s.io/apimachinery/pkg/util/wait.PollImmediateUntilWithContext /remote-source/deps/gomod/pkg/mod/k8s.io/apimachinery.0/pkg/util/wait/wait.go:547 sigs.k8s.io/controller-runtime/pkg/source.(*Kind).Start.func1 /remote-source/deps/gomod/pkg/mod/sigs.k8s.io/controller-runtime.0/pkg/source/source.go:132 2022-10-06T08:26:16.879Z ERROR controller-runtime.source source/source.go:139 if kind is a CRD, it should be installed before calling Start {"kind": "Backup.velero.io", "error": "no matches for kind \"Backup\" in version \"velero.io/v1\""} sigs.k8s.io/controller-runtime/pkg/source.(*Kind).Start.func1.1 /remote-source/deps/gomod/pkg/mod/sigs.k8s.io/controller-runtime.0/pkg/source/source.go:139 k8s.io/apimachinery/pkg/util/wait.runConditionWithCrashProtectionWithContext /remote-source/deps/gomod/pkg/mod/k8s.io/apimachinery.0/pkg/util/wait/wait.go:235 k8s.io/apimachinery/pkg/util/wait.WaitForWithContext /remote-source/deps/gomod/pkg/mod/k8s.io/apimachinery.0/pkg/util/wait/wait.go:662 k8s.io/apimachinery/pkg/util/wait.poll /remote-source/deps/gomod/pkg/mod/k8s.io/apimachinery.0/pkg/util/wait/wait.go:596 k8s.io/apimachinery/pkg/util/wait.PollImmediateUntilWithContext /remote-source/deps/gomod/pkg/mod/k8s.io/apimachinery.0/pkg/util/wait/wait.go:547 sigs.k8s.io/controller-runtime/pkg/source.(*Kind).Start.func1 /remote-source/deps/gomod/pkg/mod/sigs.k8s.io/controller-runtime.0/pkg/source/source.go:132 I1006 08:26:18.524896 1 request.go:601] Waited for 1.250961075s due to client-side throttling, not priority and fairness, request: GET:https://172.30.0.1:443/apis/ramendr.openshift.io/v1alpha1?timeout=32s 2022-10-06T08:26:22.021Z ERROR controller-runtime.source source/source.go:139 if kind is a CRD, it should be installed before calling Start {"kind": "Restore.velero.io", "error": "no matches for kind \"Restore\" in version \"velero.io/v1\""} sigs.k8s.io/controller-runtime/pkg/source.(*Kind).Start.func1.1 /remote-source/deps/gomod/pkg/mod/sigs.k8s.io/controller-runtime.0/pkg/source/source.go:139 k8s.io/apimachinery/pkg/util/wait.runConditionWithCrashProtectionWithContext /remote-source/deps/gomod/pkg/mod/k8s.io/apimachinery.0/pkg/util/wait/wait.go:235 k8s.io/apimachinery/pkg/util/wait.WaitForWithContext /remote-source/deps/gomod/pkg/mod/k8s.io/apimachinery.0/pkg/util/wait/wait.go:662 k8s.io/apimachinery/pkg/util/wait.poll /remote-source/deps/gomod/pkg/mod/k8s.io/apimachinery.0/pkg/util/wait/wait.go:596 k8s.io/apimachinery/pkg/util/wait.PollImmediateUntilWithContext /remote-source/deps/gomod/pkg/mod/k8s.io/apimachinery.0/pkg/util/wait/wait.go:547 sigs.k8s.io/controller-runtime/pkg/source.(*Kind).Start.func1 /remote-source/deps/gomod/pkg/mod/sigs.k8s.io/controller-runtime.0/pkg/source/source.go:132 2022-10-06T08:26:22.426Z ERROR controller/controller.go:210 Could not wait for Cache to sync {"controller": "volumereplicationgroup", "controllerGroup": "ramendr.openshift.io", "controllerKind": "VolumeReplicationGroup", "error": "failed to wait for volumereplicationgroup caches to sync: timed out waiting for cache to be synced"} sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.1 /remote-source/deps/gomod/pkg/mod/sigs.k8s.io/controller-runtime.0/pkg/internal/controller/controller.go:210 sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2 /remote-source/deps/gomod/pkg/mod/sigs.k8s.io/controller-runtime.0/pkg/internal/controller/controller.go:215 sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start /remote-source/deps/gomod/pkg/mod/sigs.k8s.io/controller-runtime.0/pkg/internal/controller/controller.go:241 sigs.k8s.io/controller-runtime/pkg/manager.(*runnableGroup).reconcile.func1 /remote-source/deps/gomod/pkg/mod/sigs.k8s.io/controller-runtime.0/pkg/manager/runnable_group.go:219 2022-10-06T08:26:22.426Z INFO manager/internal.go:567 Stopping and waiting for non leader election runnables 2022-10-06T08:26:22.426Z INFO manager/internal.go:571 Stopping and waiting for leader election runnables 2022-10-06T08:26:22.426Z INFO controller/controller.go:247 Shutdown signal received, waiting for all workers to finish {"controller": "protectedvolumereplicationgrouplist", "controllerGroup": "ramendr.openshift.io", "controllerKind": "ProtectedVolumeReplicationGroupList"} 2022-10-06T08:26:22.426Z INFO controller/controller.go:249 All workers finished {"controller": "protectedvolumereplicationgrouplist", "controllerGroup": "ramendr.openshift.io", "controllerKind": "ProtectedVolumeReplicationGroupList"} 2022-10-06T08:26:22.426Z INFO manager/internal.go:577 Stopping and waiting for caches 2022-10-06T08:26:22.426Z INFO manager/internal.go:581 Stopping and waiting for webhooks 2022-10-06T08:26:22.427Z INFO manager/internal.go:585 Wait completed, proceeding to shutdown the manager 2022-10-06T08:26:22.427Z ERROR setup app/main.go:210 problem running manager {"error": "failed to wait for volumereplicationgroup caches to sync: timed out waiting for cache to be synced"} main.main /remote-source/app/main.go:210 runtime.main /usr/lib/golang/src/runtime/proc.go:250 ``` Version of all relevant components (if applicable): OCP: 4.12.0-0.nightly-2022-09-28-204419 ODF: 4.12.0-70 ACM: 2.6.1 Submariner: 0.13.0 Does this issue impact your ability to continue to work with the product (please explain in detail what is the user impact)? Is there any workaround available to the best of your knowledge? Rate from 1 - 5 the complexity of the scenario you performed that caused this bug (1 - very simple, 5 - very complex)? 2 Can this issue reproducible? Can this issue reproduce from the UI? If this is a regression, please provide more details to justify this: Steps to Reproduce: 1. Configure RDR setup with 1 ACM hub and 2 managed clusters 2. Check status of ramen-dr-cluster-operator pod in managed clusters Actual results: ramen-dr-cluster-operator pod goes into CrashLoopBackOff state Expected results: ramen-dr-cluster-operator pod should not go into CrashLoopBackOff state Additional info: