2228319 – [RDR] [MDR]ramen-dr-cluster-operator pod in CrashLoopBackOff state

Bug 2228319 - [RDR] [MDR]ramen-dr-cluster-operator pod in CrashLoopBackOff state

Summary: [RDR] [MDR]ramen-dr-cluster-operator pod in CrashLoopBackOff state

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat OpenShift Data Foundation
Classification:	Red Hat Storage
Component:	odf-dr
Sub Component:
Version:	4.14
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	urgent
Target Milestone:	---
Target Release:	ODF 4.14.0
Assignee:	Benamar Mekhissi
QA Contact:	Sidhant Agrawal
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2023-08-02 03:50 UTC by Sidhant Agrawal
Modified:	2023-11-08 18:54 UTC (History)
CC List:	8 users (show)
Fixed In Version:	4.14.0-99
Doc Type:	No Doc Update
Doc Text:
Clone Of:
Environment:
Last Closed:	2023-11-08 18:53:30 UTC
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Product Errata	RHSA-2023:6832	0	None	None	None	2023-11-08 18:54:29 UTC

Description Sidhant Agrawal 2023-08-02 03:50:14 UTC

Description of problem (please be detailed as possible and provide log
snippests):
Using OCP 4.14 for configuring RDR setup, after creating DRPolicy on hub cluster, ramen-dr-cluster-operator pod on managed clusters goes into CrashLoopBackOff state.

C1:
NAME                                        READY   STATUS             RESTARTS      AGE   IP             NODE        NOMINATED NODE   READINESS GATES
ramen-dr-cluster-operator-88b987576-6zxz9   1/2     CrashLoopBackOff   7 (18s ago)   11m   10.135.0.150   compute-0   <none>           <none>

C2:
NAME                                        READY   STATUS             RESTARTS      AGE   IP             NODE        NOMINATED NODE   READINESS GATES
ramen-dr-cluster-operator-88b987576-wxw8b   1/2     CrashLoopBackOff   7 (28s ago)   11m   10.131.0.128   compute-2   <none>           <none>


CSV in failed phase:

$ oc get csv -n openshift-dr-system
NAME                                     DISPLAY                         VERSION            REPLACES                 PHASE
odr-cluster-operator.v4.14.0-93.stable   Openshift DR Cluster Operator   4.14.0-93.stable                            Failed
volsync-product.v0.7.3                   VolSync                         0.7.3              volsync-product.v0.7.2   Failed


Pod logs show following error:
```
2023-08-02T03:38:35.414Z	INFO	setup	controllers/ramenconfig.go:62	loading Ramen configuration from 	{"file": "/config/ramen_manager_config.yaml"}
2023-08-02T03:38:35.508Z	INFO	setup	controllers/ramenconfig.go:70	s3 profile	{"key": 0, "value": {"s3ProfileName":"s3profile-sagrawal-nc1-ocs-storagecluster","s3Bucket":"odrbucket-c76ef3ef878d","s3CompatibleEndpoint":"https://s3-openshift-storage.apps.sagrawal-nc1.qe.rh-ocs.com","s3Region":"noobaa","s3SecretRef":{"name":"afad760beb02470bea0590703f25f2dd341064e"}}}
2023-08-02T03:38:35.509Z	INFO	setup	controllers/ramenconfig.go:70	s3 profile	{"key": 1, "value": {"s3ProfileName":"s3profile-sagrawal-nc2-ocs-storagecluster","s3Bucket":"odrbucket-c76ef3ef878d","s3CompatibleEndpoint":"https://s3-openshift-storage.apps.sagrawal-nc2.qe.rh-ocs.com","s3Region":"noobaa","s3SecretRef":{"name":"1a3b49b412612a151fb4a657f706623d2aa805f"}}}
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0x225ce50]

goroutine 1 [running]:
k8s.io/client-go/discovery.convertAPIResource(...)
	/remote-source/deps/gomod/pkg/mod/k8s.io/client-go.2/discovery/aggregated_discovery.go:88
k8s.io/client-go/discovery.convertAPIGroup({{{0x0, 0x0}, {0x0, 0x0}}, {{0xc00089d650, 0x15}, {0x0, 0x0}, {0x0, 0x0}, ...}, ...})
	/remote-source/deps/gomod/pkg/mod/k8s.io/client-go.2/discovery/aggregated_discovery.go:69 +0x570
k8s.io/client-go/discovery.SplitGroupsAndResources({{{0xc0005b6330, 0x15}, {0xc00027af40, 0x1b}}, {{0x0, 0x0}, {0x0, 0x0}, {0x0, 0x0}, ...}, ...})
	/remote-source/deps/gomod/pkg/mod/k8s.io/client-go.2/discovery/aggregated_discovery.go:35 +0x118
k8s.io/client-go/discovery.(*DiscoveryClient).downloadAPIs(0x1fb1c59?)
	/remote-source/deps/gomod/pkg/mod/k8s.io/client-go.2/discovery/discovery_client.go:310 +0x47c
k8s.io/client-go/discovery.(*DiscoveryClient).GroupsAndMaybeResources(0x22610bf?)
	/remote-source/deps/gomod/pkg/mod/k8s.io/client-go.2/discovery/discovery_client.go:198 +0x5c
k8s.io/client-go/discovery.ServerGroupsAndResources({0x3da1e38, 0xc0005ea3f0})
	/remote-source/deps/gomod/pkg/mod/k8s.io/client-go.2/discovery/discovery_client.go:392 +0x59
k8s.io/client-go/discovery.(*DiscoveryClient).ServerGroupsAndResources.func1()
	/remote-source/deps/gomod/pkg/mod/k8s.io/client-go.2/discovery/discovery_client.go:356 +0x25
k8s.io/client-go/discovery.withRetries(0x2, 0xc000b3d008)
	/remote-source/deps/gomod/pkg/mod/k8s.io/client-go.2/discovery/discovery_client.go:621 +0x71
k8s.io/client-go/discovery.(*DiscoveryClient).ServerGroupsAndResources(0x0?)
	/remote-source/deps/gomod/pkg/mod/k8s.io/client-go.2/discovery/discovery_client.go:355 +0x3a
k8s.io/client-go/restmapper.GetAPIGroupResources({0x3da1e38?, 0xc0005ea3f0?})
	/remote-source/deps/gomod/pkg/mod/k8s.io/client-go.2/restmapper/discovery.go:148 +0x42
sigs.k8s.io/controller-runtime/pkg/client/apiutil.NewDynamicRESTMapper.func1()
	/remote-source/deps/gomod/pkg/mod/sigs.k8s.io/controller-runtime.6/pkg/client/apiutil/dynamicrestmapper.go:94 +0x25
sigs.k8s.io/controller-runtime/pkg/client/apiutil.(*dynamicRESTMapper).setStaticMapper(...)
	/remote-source/deps/gomod/pkg/mod/sigs.k8s.io/controller-runtime.6/pkg/client/apiutil/dynamicrestmapper.go:130
sigs.k8s.io/controller-runtime/pkg/client/apiutil.NewDynamicRESTMapper(0xc000b3d4e8?, {0x0, 0x0, 0x24a208e?})
	/remote-source/deps/gomod/pkg/mod/sigs.k8s.io/controller-runtime.6/pkg/client/apiutil/dynamicrestmapper.go:110 +0x182
sigs.k8s.io/controller-runtime/pkg/cluster.setOptionsDefaults.func1(0x3747aa0?)
	/remote-source/deps/gomod/pkg/mod/sigs.k8s.io/controller-runtime.6/pkg/cluster/cluster.go:217 +0x25
sigs.k8s.io/controller-runtime/pkg/cluster.New(0xc0005ad440, {0xc000b3d9a8, 0x1, 0x0?})
	/remote-source/deps/gomod/pkg/mod/sigs.k8s.io/controller-runtime.6/pkg/cluster/cluster.go:159 +0x18d
sigs.k8s.io/controller-runtime/pkg/manager.New(_, {0xc000376150, 0x0, 0x0, {{0x3d9b190, 0xc0003247c0}, 0x0}, 0x1, {0x0, 0x0}, ...})
	/remote-source/deps/gomod/pkg/mod/sigs.k8s.io/controller-runtime.6/pkg/manager/manager.go:351 +0xf9
main.newManager()
	/remote-source/app/main.go:105 +0x5d6
main.main()
	/remote-source/app/main.go:210 +0x1d
```

Version of all relevant components (if applicable):
OCP: 4.14.0-0.nightly-2023-07-31-181848
ODF: 4.14.0-93
ACM: 2.9.0-62 (quay.io:443/acm-d/acm-custom-registry:2.9.0-DOWNSTREAM-2023-07-31-16-30-30)
Submariner: 0.16.0 (brew.registry.redhat.io/rh-osbs/iib:543072)

Does this issue impact your ability to continue to work with the product
(please explain in detail what is the user impact)?


Is there any workaround available to the best of your knowledge?


Rate from 1 - 5 the complexity of the scenario you performed that caused this
bug (1 - very simple, 5 - very complex)?
2

Can this issue reproducible?
1/1

Can this issue reproduce from the UI?


If this is a regression, please provide more details to justify this:


Steps to Reproduce:
1. Deploy 3 OCP 4.14 clusters
2. Install RHACM on hub, import other two clusters and connect using Submariner add-ons with globalnet enabled
3. Deploy ODF 4.14 on both managed clusters
4. Install MCO on hub cluster and then create DRPolicy
5. Observe the ramen-dr-cluster-operator pod status on managed clusters


Actual results:
ramen-dr-cluster-operator pod in CrashLoopBackOff state

Expected results:
ramen-dr-cluster-operator pod in Running state

Additional info:

Comment 3 avdhoot 2023-08-02 06:55:29 UTC

Facing similar issue on MDR setup also using Using OCP 4.14 and ODF 4.14

C1:
oc get clusterversion
NAME      VERSION                              AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.14.0-0.nightly-2023-07-31-181848   True        False         13h     Cluster version is 4.14.0-0.nightly-2023-07-31-181848

oc get pod -n openshift-dr-system -o wide
NAME                                        READY   STATUS             RESTARTS        AGE   IP            NODE        NOMINATED NODE   READINESS GATES
ramen-dr-cluster-operator-88b987576-w246x   1/2     CrashLoopBackOff   151 (99s ago)   12h   10.132.2.35   compute-0   <none>           <none>

oc get csv -n openshift-dr-system
NAME                                     DISPLAY                         VERSION            REPLACES                 PHASE
odr-cluster-operator.v4.14.0-93.stable   Openshift DR Cluster Operator   4.14.0-93.stable                            Failed
volsync-product.v0.7.3                   VolSync                         0.7.3              volsync-product.v0.7.2   Failed


C2:
NAME                                        READY   STATUS             RESTARTS          AGE   IP            NODE        NOMINATED NODE   READINESS GATES
ramen-dr-cluster-operator-88b987576-69nm7   1/2     CrashLoopBackOff   150 (4m54s ago)   12h   10.131.0.35   compute-2   <none>           <none>

Comment 7 Shyamsundar 2023-08-03 14:20:59 UTC

Fix backported here: https://github.com/red-hat-storage/ramen/pull/118

Build from when fix is available 4.14.0-99

Comment 22 errata-xmlrpc 2023-11-08 18:53:30 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Important: Red Hat OpenShift Data Foundation 4.14.0 security, enhancement & bug fix update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2023:6832

Note You need to log in before you can comment on or make changes to this bug.