Description of problem (please be detailed as possible and provide log snippests): Using OCP 4.14 for configuring RDR setup, after creating DRPolicy on hub cluster, ramen-dr-cluster-operator pod on managed clusters goes into CrashLoopBackOff state. C1: NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES ramen-dr-cluster-operator-88b987576-6zxz9 1/2 CrashLoopBackOff 7 (18s ago) 11m 10.135.0.150 compute-0 <none> <none> C2: NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES ramen-dr-cluster-operator-88b987576-wxw8b 1/2 CrashLoopBackOff 7 (28s ago) 11m 10.131.0.128 compute-2 <none> <none> CSV in failed phase: $ oc get csv -n openshift-dr-system NAME DISPLAY VERSION REPLACES PHASE odr-cluster-operator.v4.14.0-93.stable Openshift DR Cluster Operator 4.14.0-93.stable Failed volsync-product.v0.7.3 VolSync 0.7.3 volsync-product.v0.7.2 Failed Pod logs show following error: ``` 2023-08-02T03:38:35.414Z INFO setup controllers/ramenconfig.go:62 loading Ramen configuration from {"file": "/config/ramen_manager_config.yaml"} 2023-08-02T03:38:35.508Z INFO setup controllers/ramenconfig.go:70 s3 profile {"key": 0, "value": {"s3ProfileName":"s3profile-sagrawal-nc1-ocs-storagecluster","s3Bucket":"odrbucket-c76ef3ef878d","s3CompatibleEndpoint":"https://s3-openshift-storage.apps.sagrawal-nc1.qe.rh-ocs.com","s3Region":"noobaa","s3SecretRef":{"name":"afad760beb02470bea0590703f25f2dd341064e"}}} 2023-08-02T03:38:35.509Z INFO setup controllers/ramenconfig.go:70 s3 profile {"key": 1, "value": {"s3ProfileName":"s3profile-sagrawal-nc2-ocs-storagecluster","s3Bucket":"odrbucket-c76ef3ef878d","s3CompatibleEndpoint":"https://s3-openshift-storage.apps.sagrawal-nc2.qe.rh-ocs.com","s3Region":"noobaa","s3SecretRef":{"name":"1a3b49b412612a151fb4a657f706623d2aa805f"}}} panic: runtime error: invalid memory address or nil pointer dereference [signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0x225ce50] goroutine 1 [running]: k8s.io/client-go/discovery.convertAPIResource(...) /remote-source/deps/gomod/pkg/mod/k8s.io/client-go.2/discovery/aggregated_discovery.go:88 k8s.io/client-go/discovery.convertAPIGroup({{{0x0, 0x0}, {0x0, 0x0}}, {{0xc00089d650, 0x15}, {0x0, 0x0}, {0x0, 0x0}, ...}, ...}) /remote-source/deps/gomod/pkg/mod/k8s.io/client-go.2/discovery/aggregated_discovery.go:69 +0x570 k8s.io/client-go/discovery.SplitGroupsAndResources({{{0xc0005b6330, 0x15}, {0xc00027af40, 0x1b}}, {{0x0, 0x0}, {0x0, 0x0}, {0x0, 0x0}, ...}, ...}) /remote-source/deps/gomod/pkg/mod/k8s.io/client-go.2/discovery/aggregated_discovery.go:35 +0x118 k8s.io/client-go/discovery.(*DiscoveryClient).downloadAPIs(0x1fb1c59?) /remote-source/deps/gomod/pkg/mod/k8s.io/client-go.2/discovery/discovery_client.go:310 +0x47c k8s.io/client-go/discovery.(*DiscoveryClient).GroupsAndMaybeResources(0x22610bf?) /remote-source/deps/gomod/pkg/mod/k8s.io/client-go.2/discovery/discovery_client.go:198 +0x5c k8s.io/client-go/discovery.ServerGroupsAndResources({0x3da1e38, 0xc0005ea3f0}) /remote-source/deps/gomod/pkg/mod/k8s.io/client-go.2/discovery/discovery_client.go:392 +0x59 k8s.io/client-go/discovery.(*DiscoveryClient).ServerGroupsAndResources.func1() /remote-source/deps/gomod/pkg/mod/k8s.io/client-go.2/discovery/discovery_client.go:356 +0x25 k8s.io/client-go/discovery.withRetries(0x2, 0xc000b3d008) /remote-source/deps/gomod/pkg/mod/k8s.io/client-go.2/discovery/discovery_client.go:621 +0x71 k8s.io/client-go/discovery.(*DiscoveryClient).ServerGroupsAndResources(0x0?) /remote-source/deps/gomod/pkg/mod/k8s.io/client-go.2/discovery/discovery_client.go:355 +0x3a k8s.io/client-go/restmapper.GetAPIGroupResources({0x3da1e38?, 0xc0005ea3f0?}) /remote-source/deps/gomod/pkg/mod/k8s.io/client-go.2/restmapper/discovery.go:148 +0x42 sigs.k8s.io/controller-runtime/pkg/client/apiutil.NewDynamicRESTMapper.func1() /remote-source/deps/gomod/pkg/mod/sigs.k8s.io/controller-runtime.6/pkg/client/apiutil/dynamicrestmapper.go:94 +0x25 sigs.k8s.io/controller-runtime/pkg/client/apiutil.(*dynamicRESTMapper).setStaticMapper(...) /remote-source/deps/gomod/pkg/mod/sigs.k8s.io/controller-runtime.6/pkg/client/apiutil/dynamicrestmapper.go:130 sigs.k8s.io/controller-runtime/pkg/client/apiutil.NewDynamicRESTMapper(0xc000b3d4e8?, {0x0, 0x0, 0x24a208e?}) /remote-source/deps/gomod/pkg/mod/sigs.k8s.io/controller-runtime.6/pkg/client/apiutil/dynamicrestmapper.go:110 +0x182 sigs.k8s.io/controller-runtime/pkg/cluster.setOptionsDefaults.func1(0x3747aa0?) /remote-source/deps/gomod/pkg/mod/sigs.k8s.io/controller-runtime.6/pkg/cluster/cluster.go:217 +0x25 sigs.k8s.io/controller-runtime/pkg/cluster.New(0xc0005ad440, {0xc000b3d9a8, 0x1, 0x0?}) /remote-source/deps/gomod/pkg/mod/sigs.k8s.io/controller-runtime.6/pkg/cluster/cluster.go:159 +0x18d sigs.k8s.io/controller-runtime/pkg/manager.New(_, {0xc000376150, 0x0, 0x0, {{0x3d9b190, 0xc0003247c0}, 0x0}, 0x1, {0x0, 0x0}, ...}) /remote-source/deps/gomod/pkg/mod/sigs.k8s.io/controller-runtime.6/pkg/manager/manager.go:351 +0xf9 main.newManager() /remote-source/app/main.go:105 +0x5d6 main.main() /remote-source/app/main.go:210 +0x1d ``` Version of all relevant components (if applicable): OCP: 4.14.0-0.nightly-2023-07-31-181848 ODF: 4.14.0-93 ACM: 2.9.0-62 (quay.io:443/acm-d/acm-custom-registry:2.9.0-DOWNSTREAM-2023-07-31-16-30-30) Submariner: 0.16.0 (brew.registry.redhat.io/rh-osbs/iib:543072) Does this issue impact your ability to continue to work with the product (please explain in detail what is the user impact)? Is there any workaround available to the best of your knowledge? Rate from 1 - 5 the complexity of the scenario you performed that caused this bug (1 - very simple, 5 - very complex)? 2 Can this issue reproducible? 1/1 Can this issue reproduce from the UI? If this is a regression, please provide more details to justify this: Steps to Reproduce: 1. Deploy 3 OCP 4.14 clusters 2. Install RHACM on hub, import other two clusters and connect using Submariner add-ons with globalnet enabled 3. Deploy ODF 4.14 on both managed clusters 4. Install MCO on hub cluster and then create DRPolicy 5. Observe the ramen-dr-cluster-operator pod status on managed clusters Actual results: ramen-dr-cluster-operator pod in CrashLoopBackOff state Expected results: ramen-dr-cluster-operator pod in Running state Additional info:
Facing similar issue on MDR setup also using Using OCP 4.14 and ODF 4.14 C1: oc get clusterversion NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.14.0-0.nightly-2023-07-31-181848 True False 13h Cluster version is 4.14.0-0.nightly-2023-07-31-181848 oc get pod -n openshift-dr-system -o wide NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES ramen-dr-cluster-operator-88b987576-w246x 1/2 CrashLoopBackOff 151 (99s ago) 12h 10.132.2.35 compute-0 <none> <none> oc get csv -n openshift-dr-system NAME DISPLAY VERSION REPLACES PHASE odr-cluster-operator.v4.14.0-93.stable Openshift DR Cluster Operator 4.14.0-93.stable Failed volsync-product.v0.7.3 VolSync 0.7.3 volsync-product.v0.7.2 Failed C2: NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES ramen-dr-cluster-operator-88b987576-69nm7 1/2 CrashLoopBackOff 150 (4m54s ago) 12h 10.131.0.35 compute-2 <none> <none>
Fix backported here: https://github.com/red-hat-storage/ramen/pull/118 Build from when fix is available 4.14.0-99
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Important: Red Hat OpenShift Data Foundation 4.14.0 security, enhancement & bug fix update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2023:6832