Description of problem (please be detailed as possible and provide log snippests): In RDR setup, after creating DRPolicy on hub cluster, ramen-dr-cluster-operator pod on managed clusters goes into CrashLoopBackOff state due to missing Recipe and Velero CRDs $ oc get pod -n openshift-dr-system NAME READY STATUS RESTARTS AGE ramen-dr-cluster-operator-59d8dd9fd4-qnv84 1/2 CrashLoopBackOff 9 (12s ago) 44m Error messages from pod logs: ``` 2024-04-18T16:25:10.824Z ERROR controller-runtime.source.EventHandler source/kind.go:68 failed to get informer from cache {"error": "failed to get API group resources: unable to retrieve the complete list of server APIs: velero.io/v1: the server could not find the requested resource"} sigs.k8s.io/controller-runtime/pkg/internal/source.(*Kind).Start.func1.1 /remote-source/deps/gomod/pkg/mod/sigs.k8s.io/controller-runtime.3/pkg/internal/source/kind.go:68 k8s.io/apimachinery/pkg/util/wait.loopConditionUntilContext.func1 /remote-source/deps/gomod/pkg/mod/k8s.io/apimachinery.0/pkg/util/wait/loop.go:53 k8s.io/apimachinery/pkg/util/wait.loopConditionUntilContext /remote-source/deps/gomod/pkg/mod/k8s.io/apimachinery.0/pkg/util/wait/loop.go:54 k8s.io/apimachinery/pkg/util/wait.PollUntilContextCancel /remote-source/deps/gomod/pkg/mod/k8s.io/apimachinery.0/pkg/util/wait/poll.go:33 sigs.k8s.io/controller-runtime/pkg/internal/source.(*Kind).Start.func1 /remote-source/deps/gomod/pkg/mod/sigs.k8s.io/controller-runtime.3/pkg/internal/source/kind.go:56 2024-04-18T16:25:10.826Z ERROR controller-runtime.source.EventHandler source/kind.go:63 if kind is a CRD, it should be installed before calling Start {"kind": "Recipe.ramendr.openshift.io", "error": "no matches for kind \"Recipe\" in version \"ramendr.openshift.io/v1alpha1\""} sigs.k8s.io/controller-runtime/pkg/internal/source.(*Kind).Start.func1.1 /remote-source/deps/gomod/pkg/mod/sigs.k8s.io/controller-runtime.3/pkg/internal/source/kind.go:63 k8s.io/apimachinery/pkg/util/wait.loopConditionUntilContext.func1 /remote-source/deps/gomod/pkg/mod/k8s.io/apimachinery.0/pkg/util/wait/loop.go:53 k8s.io/apimachinery/pkg/util/wait.loopConditionUntilContext /remote-source/deps/gomod/pkg/mod/k8s.io/apimachinery.0/pkg/util/wait/loop.go:54 k8s.io/apimachinery/pkg/util/wait.PollUntilContextCancel /remote-source/deps/gomod/pkg/mod/k8s.io/apimachinery.0/pkg/util/wait/poll.go:33 sigs.k8s.io/controller-runtime/pkg/internal/source.(*Kind).Start.func1 /remote-source/deps/gomod/pkg/mod/sigs.k8s.io/controller-runtime.3/pkg/internal/source/kind.go:56 2024-04-18T16:25:10.827Z ERROR controller-runtime.source.EventHandler source/kind.go:68 failed to get informer from cache {"error": "failed to get API group resources: unable to retrieve the complete list of server APIs: velero.io/v1: the server could not find the requested resource"} sigs.k8s.io/controller-runtime/pkg/internal/source.(*Kind).Start.func1.1 /remote-source/deps/gomod/pkg/mod/sigs.k8s.io/controller-runtime.3/pkg/internal/source/kind.go:68 k8s.io/apimachinery/pkg/util/wait.loopConditionUntilContext.func1 /remote-source/deps/gomod/pkg/mod/k8s.io/apimachinery.0/pkg/util/wait/loop.go:53 k8s.io/apimachinery/pkg/util/wait.loopConditionUntilContext /remote-source/deps/gomod/pkg/mod/k8s.io/apimachinery.0/pkg/util/wait/loop.go:54 k8s.io/apimachinery/pkg/util/wait.PollUntilContextCancel /remote-source/deps/gomod/pkg/mod/k8s.io/apimachinery.0/pkg/util/wait/poll.go:33 sigs.k8s.io/controller-runtime/pkg/internal/source.(*Kind).Start.func1 /remote-source/deps/gomod/pkg/mod/sigs.k8s.io/controller-runtime.3/pkg/internal/source/kind.go:56 2024-04-18T16:25:10.936Z INFO pvcmap.VolumeReplicationGroup controllers/volumereplicationgroup_controller.go:178 Create event for PersistentVolumeClaim 2024-04-18T16:25:13.021Z INFO configmap.VolumeReplicationGroup controllers/volumereplicationgroup_controller.go:144 Update in ramen-dr-cluster-operator-config configuration map 2024-04-18T16:25:13.037Z INFO controller/controller.go:220 Starting workers {"controller": "protectedvolumereplicationgrouplist", "controllerGroup": "ramendr.openshift.io", "controllerKind": "ProtectedVolumeReplicationGroupList", "worker count": 1} 2024-04-18T16:25:13.038Z INFO pvcmap.VolumeReplicationGroup controllers/volumereplicationgroup_controller.go:178 Create event for PersistentVolumeClaim 2024-04-18T16:25:13.038Z INFO pvcmap.VolumeReplicationGroup controllers/volumereplicationgroup_controller.go:178 Create event for PersistentVolumeClaim 2024-04-18T16:25:13.038Z INFO pvcmap.VolumeReplicationGroup controllers/volumereplicationgroup_controller.go:178 Create event for PersistentVolumeClaim 2024-04-18T16:25:13.038Z INFO pvcmap.VolumeReplicationGroup controllers/volumereplicationgroup_controller.go:178 Create event for PersistentVolumeClaim 2024-04-18T16:25:13.039Z INFO pvcmap.VolumeReplicationGroup controllers/volumereplicationgroup_controller.go:178 Create event for PersistentVolumeClaim 2024-04-18T16:25:13.039Z INFO pvcmap.VolumeReplicationGroup controllers/volumereplicationgroup_controller.go:178 Create event for PersistentVolumeClaim 2024-04-18T16:25:20.821Z ERROR controller-runtime.source.EventHandler source/kind.go:63 if kind is a CRD, it should be installed before calling Start {"kind": "Recipe.ramendr.openshift.io", "error": "no matches for kind \"Recipe\" in version \"ramendr.openshift.io/v1alpha1\""} sigs.k8s.io/controller-runtime/pkg/internal/source.(*Kind).Start.func1.1 /remote-source/deps/gomod/pkg/mod/sigs.k8s.io/controller-runtime.3/pkg/internal/source/kind.go:63 k8s.io/apimachinery/pkg/util/wait.loopConditionUntilContext.func2 /remote-source/deps/gomod/pkg/mod/k8s.io/apimachinery.0/pkg/util/wait/loop.go:87 k8s.io/apimachinery/pkg/util/wait.loopConditionUntilContext /remote-source/deps/gomod/pkg/mod/k8s.io/apimachinery.0/pkg/util/wait/loop.go:88 k8s.io/apimachinery/pkg/util/wait.PollUntilContextCancel /remote-source/deps/gomod/pkg/mod/k8s.io/apimachinery.0/pkg/util/wait/poll.go:33 sigs.k8s.io/controller-runtime/pkg/internal/source.(*Kind).Start.func1 /remote-source/deps/gomod/pkg/mod/sigs.k8s.io/controller-runtime.3/pkg/internal/source/kind.go:56 ``` Version of all relevant components (if applicable): OCP: 4.16.0-0.nightly-2024-04-16-195622 ODF: 4.16.0-79.stable Does this issue impact your ability to continue to work with the product (please explain in detail what is the user impact)? Is there any workaround available to the best of your knowledge? Rate from 1 - 5 the complexity of the scenario you performed that caused this bug (1 - very simple, 5 - very complex)? 2 Can this issue reproducible? Yes Can this issue reproduce from the UI? Yes If this is a regression, please provide more details to justify this: Steps to Reproduce: 1. Configure RDR setup with 1 ACM hub and 2 managed clusters 2. Install MCO on hub cluster and then create DRPolicy 3. Observe the ramen-dr-cluster-operator pod status on managed clusters Actual results: ramen-dr-cluster-operator pod goes into CrashLoopBackOff state Expected results: ramen-dr-cluster-operator pod should not go into CrashLoopBackOff state Additional info:
Facing similar issue with MDR as well. OCP- 4.16 ODF- 4.16.0-85 ➜ clust1 oc get pod -n openshift-dr-system NAME READY STATUS RESTARTS AGE ramen-dr-cluster-operator-759cc88f66-4l6n2 1/2 CrashLoopBackOff 117 (65s ago) 14h Error messages from pod logs: ``` /remote-source/deps/gomod/pkg/mod/sigs.k8s.io/controller-runtime.3/pkg/internal/source/kind.go:56 2024-04-25T08:08:48.407Z ERROR controller-runtime.source.EventHandler source/kind.go:68 failed to get informer from cache {"error": "failed to get API group resources: unable to retrieve the complete list of server APIs: velero.io/v1: the server could not find the requested resource"} sigs.k8s.io/controller-runtime/pkg/internal/source.(*Kind).Start.func1.1 /remote-source/deps/gomod/pkg/mod/sigs.k8s.io/controller-runtime.3/pkg/internal/source/kind.go:68 k8s.io/apimachinery/pkg/util/wait.loopConditionUntilContext.func2 /remote-source/deps/gomod/pkg/mod/k8s.io/apimachinery.0/pkg/util/wait/loop.go:87 k8s.io/apimachinery/pkg/util/wait.loopConditionUntilContext /remote-source/deps/gomod/pkg/mod/k8s.io/apimachinery.0/pkg/util/wait/loop.go:88 k8s.io/apimachinery/pkg/util/wait.PollUntilContextCancel /remote-source/deps/gomod/pkg/mod/k8s.io/apimachinery.0/pkg/util/wait/poll.go:33 sigs.k8s.io/controller-runtime/pkg/internal/source.(*Kind).Start.func1 /remote-source/deps/gomod/pkg/mod/sigs.k8s.io/controller-runtime.3/pkg/internal/source/kind.go:56 2024-04-25T08:08:52.581Z ERROR controller/controller.go:203 Could not wait for Cache to sync {"controller": "volumereplicationgroup", "controllerGroup": "ramendr.openshift.io", "controllerKind": "VolumeReplicationGroup", "error": "failed to wait for volumereplicationgroup caches to sync: timed out waiting for cache to be synced for Kind *v1alpha1.Recipe"} sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.1 /remote-source/deps/gomod/pkg/mod/sigs.k8s.io/controller-runtime.3/pkg/internal/controller/controller.go:203 sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2 /remote-source/deps/gomod/pkg/mod/sigs.k8s.io/controller-runtime.3/pkg/internal/controller/controller.go:208 sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start /remote-source/deps/gomod/pkg/mod/sigs.k8s.io/controller-runtime.3/pkg/internal/controller/controller.go:234 sigs.k8s.io/controller-runtime/pkg/manager.(*runnableGroup).reconcile.func1 /remote-source/deps/gomod/pkg/mod/sigs.k8s.io/controller-runtime.3/pkg/manager/runnable_group.go:223 2024-04-25T08:08:52.581Z INFO manager/internal.go:516 Stopping and waiting for non leader election runnables 2024-04-25T08:08:52.581Z INFO manager/internal.go:520 Stopping and waiting for leader election runnables 2024-04-25T08:08:52.581Z INFO controller/controller.go:240 Shutdown signal received, waiting for all workers to finish {"controller": "protectedvolumereplicationgrouplist", "controllerGroup": "ramendr.openshift.io", "controllerKind": "ProtectedVolumeReplicationGroupList"} 2024-04-25T08:08:52.581Z INFO controller/controller.go:242 All workers finished {"controller": "protectedvolumereplicationgrouplist", "controllerGroup": "ramendr.openshift.io", "controllerKind": "ProtectedVolumeReplicationGroupList"} 2024-04-25T08:08:52.581Z INFO manager/internal.go:526 Stopping and waiting for caches 2024-04-25T08:08:52.581Z INFO manager/internal.go:530 Stopping and waiting for webhooks 2024-04-25T08:08:52.581Z INFO manager/internal.go:533 Stopping and waiting for HTTP servers 2024-04-25T08:08:52.581Z INFO controller-runtime.metrics server/server.go:231 Shutting down metrics server with timeout of 1 minute 2024-04-25T08:08:52.581Z INFO manager/server.go:43 shutting down server {"kind": "health probe", "addr": "[::]:8081"} 2024-04-25T08:08:52.582Z INFO manager/internal.go:537 Wait completed, proceeding to shutdown the manager 2024-04-25T08:08:52.582Z ERROR setup app/main.go:247 problem running manager {"error": "failed to wait for volumereplicationgroup caches to sync: timed out waiting for cache to be synced for Kind *v1alpha1.Recipe"} main.main /remote-source/app/main.go:247 runtime.main /usr/lib/golang/src/runtime/proc.go:267 ```
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Important: Red Hat OpenShift Data Foundation 4.16.0 security, enhancement & bug fix update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2024:4591