2232414 – [4.13 clone][RDR] [MDR] ramen operator pods in CrashLoopBackOff state due to client-go bug

Bug 2232414 - [4.13 clone][RDR] [MDR] ramen operator pods in CrashLoopBackOff state due to client-go bug

Summary: [4.13 clone][RDR] [MDR] ramen operator pods in CrashLoopBackOff state due to ...

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat OpenShift Data Foundation
Classification:	Red Hat Storage
Component:	odf-dr
Sub Component:
Version:	4.13
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	unspecified
Target Milestone:	---
Target Release:	ODF 4.13.3
Assignee:	Raghavendra Talur
QA Contact:	Parikshith
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2023-08-16 18:11 UTC by Raghavendra Talur
Modified:	2023-09-27 14:24 UTC (History)
CC List:	4 users (show)
Fixed In Version:	4.13.3-2
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2023-09-27 14:22:42 UTC
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Github	red-hat-storage ramen pull 125	0	None	open	Bug 2232414: Bump to k8s.io 1.26.4 dependencies	2023-08-25 14:50:20 UTC
Red Hat Product Errata	RHSA-2023:5376	0	None	None	None	2023-09-27 14:24:13 UTC

Description Raghavendra Talur 2023-08-16 18:11:29 UTC

This bug was initially created as a copy of Bug #2228319

I am copying this bug because: 
We need to fix the issue in 4.13 release too as ODF 4.13 with OCP 4.14 is a valid combination and without this fix the ramen operators will crash when used in a OCP 4.14 environment.


Description of problem (please be detailed as possible and provide log
snippests):
Using OCP 4.14 for configuring RDR setup, after creating DRPolicy on hub cluster, ramen-dr-cluster-operator pod on managed clusters goes into CrashLoopBackOff state.

C1:
NAME                                        READY   STATUS             RESTARTS      AGE   IP             NODE        NOMINATED NODE   READINESS GATES
ramen-dr-cluster-operator-88b987576-6zxz9   1/2     CrashLoopBackOff   7 (18s ago)   11m   10.135.0.150   compute-0   <none>           <none>

C2:
NAME                                        READY   STATUS             RESTARTS      AGE   IP             NODE        NOMINATED NODE   READINESS GATES
ramen-dr-cluster-operator-88b987576-wxw8b   1/2     CrashLoopBackOff   7 (28s ago)   11m   10.131.0.128   compute-2   <none>           <none>


CSV in failed phase:

$ oc get csv -n openshift-dr-system
NAME                                     DISPLAY                         VERSION            REPLACES                 PHASE
odr-cluster-operator.v4.14.0-93.stable   Openshift DR Cluster Operator   4.14.0-93.stable                            Failed
volsync-product.v0.7.3                   VolSync                         0.7.3              volsync-product.v0.7.2   Failed


Pod logs show following error:
```
2023-08-02T03:38:35.414Z	INFO	setup	controllers/ramenconfig.go:62	loading Ramen configuration from 	{"file": "/config/ramen_manager_config.yaml"}
2023-08-02T03:38:35.508Z	INFO	setup	controllers/ramenconfig.go:70	s3 profile	{"key": 0, "value": {"s3ProfileName":"s3profile-sagrawal-nc1-ocs-storagecluster","s3Bucket":"odrbucket-c76ef3ef878d","s3CompatibleEndpoint":"https://s3-openshift-storage.apps.sagrawal-nc1.qe.rh-ocs.com","s3Region":"noobaa","s3SecretRef":{"name":"afad760beb02470bea0590703f25f2dd341064e"}}}
2023-08-02T03:38:35.509Z	INFO	setup	controllers/ramenconfig.go:70	s3 profile	{"key": 1, "value": {"s3ProfileName":"s3profile-sagrawal-nc2-ocs-storagecluster","s3Bucket":"odrbucket-c76ef3ef878d","s3CompatibleEndpoint":"https://s3-openshift-storage.apps.sagrawal-nc2.qe.rh-ocs.com","s3Region":"noobaa","s3SecretRef":{"name":"1a3b49b412612a151fb4a657f706623d2aa805f"}}}
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0x225ce50]

goroutine 1 [running]:
k8s.io/client-go/discovery.convertAPIResource(...)
	/remote-source/deps/gomod/pkg/mod/k8s.io/client-go.2/discovery/aggregated_discovery.go:88
k8s.io/client-go/discovery.convertAPIGroup({{{0x0, 0x0}, {0x0, 0x0}}, {{0xc00089d650, 0x15}, {0x0, 0x0}, {0x0, 0x0}, ...}, ...})
	/remote-source/deps/gomod/pkg/mod/k8s.io/client-go.2/discovery/aggregated_discovery.go:69 +0x570
k8s.io/client-go/discovery.SplitGroupsAndResources({{{0xc0005b6330, 0x15}, {0xc00027af40, 0x1b}}, {{0x0, 0x0}, {0x0, 0x0}, {0x0, 0x0}, ...}, ...})
	/remote-source/deps/gomod/pkg/mod/k8s.io/client-go.2/discovery/aggregated_discovery.go:35 +0x118
k8s.io/client-go/discovery.(*DiscoveryClient).downloadAPIs(0x1fb1c59?)
	/remote-source/deps/gomod/pkg/mod/k8s.io/client-go.2/discovery/discovery_client.go:310 +0x47c
k8s.io/client-go/discovery.(*DiscoveryClient).GroupsAndMaybeResources(0x22610bf?)
	/remote-source/deps/gomod/pkg/mod/k8s.io/client-go.2/discovery/discovery_client.go:198 +0x5c
k8s.io/client-go/discovery.ServerGroupsAndResources({0x3da1e38, 0xc0005ea3f0})
	/remote-source/deps/gomod/pkg/mod/k8s.io/client-go.2/discovery/discovery_client.go:392 +0x59
k8s.io/client-go/discovery.(*DiscoveryClient).ServerGroupsAndResources.func1()
	/remote-source/deps/gomod/pkg/mod/k8s.io/client-go.2/discovery/discovery_client.go:356 +0x25
k8s.io/client-go/discovery.withRetries(0x2, 0xc000b3d008)
	/remote-source/deps/gomod/pkg/mod/k8s.io/client-go.2/discovery/discovery_client.go:621 +0x71
k8s.io/client-go/discovery.(*DiscoveryClient).ServerGroupsAndResources(0x0?)
	/remote-source/deps/gomod/pkg/mod/k8s.io/client-go.2/discovery/discovery_client.go:355 +0x3a
k8s.io/client-go/restmapper.GetAPIGroupResources({0x3da1e38?, 0xc0005ea3f0?})
	/remote-source/deps/gomod/pkg/mod/k8s.io/client-go.2/restmapper/discovery.go:148 +0x42
sigs.k8s.io/controller-runtime/pkg/client/apiutil.NewDynamicRESTMapper.func1()
	/remote-source/deps/gomod/pkg/mod/sigs.k8s.io/controller-runtime.6/pkg/client/apiutil/dynamicrestmapper.go:94 +0x25
sigs.k8s.io/controller-runtime/pkg/client/apiutil.(*dynamicRESTMapper).setStaticMapper(...)
	/remote-source/deps/gomod/pkg/mod/sigs.k8s.io/controller-runtime.6/pkg/client/apiutil/dynamicrestmapper.go:130
sigs.k8s.io/controller-runtime/pkg/client/apiutil.NewDynamicRESTMapper(0xc000b3d4e8?, {0x0, 0x0, 0x24a208e?})
	/remote-source/deps/gomod/pkg/mod/sigs.k8s.io/controller-runtime.6/pkg/client/apiutil/dynamicrestmapper.go:110 +0x182
sigs.k8s.io/controller-runtime/pkg/cluster.setOptionsDefaults.func1(0x3747aa0?)
	/remote-source/deps/gomod/pkg/mod/sigs.k8s.io/controller-runtime.6/pkg/cluster/cluster.go:217 +0x25
sigs.k8s.io/controller-runtime/pkg/cluster.New(0xc0005ad440, {0xc000b3d9a8, 0x1, 0x0?})
	/remote-source/deps/gomod/pkg/mod/sigs.k8s.io/controller-runtime.6/pkg/cluster/cluster.go:159 +0x18d
sigs.k8s.io/controller-runtime/pkg/manager.New(_, {0xc000376150, 0x0, 0x0, {{0x3d9b190, 0xc0003247c0}, 0x0}, 0x1, {0x0, 0x0}, ...})
	/remote-source/deps/gomod/pkg/mod/sigs.k8s.io/controller-runtime.6/pkg/manager/manager.go:351 +0xf9
main.newManager()
	/remote-source/app/main.go:105 +0x5d6
main.main()
	/remote-source/app/main.go:210 +0x1d
```

Version of all relevant components (if applicable):
OCP: 4.14.0-0.nightly-2023-07-31-181848
ODF: 4.14.0-93
ACM: 2.9.0-62 (quay.io:443/acm-d/acm-custom-registry:2.9.0-DOWNSTREAM-2023-07-31-16-30-30)
Submariner: 0.16.0 (brew.registry.redhat.io/rh-osbs/iib:543072)

Does this issue impact your ability to continue to work with the product
(please explain in detail what is the user impact)?


Is there any workaround available to the best of your knowledge?


Rate from 1 - 5 the complexity of the scenario you performed that caused this
bug (1 - very simple, 5 - very complex)?
2

Can this issue reproducible?
1/1

Can this issue reproduce from the UI?


If this is a regression, please provide more details to justify this:


Steps to Reproduce:
1. Deploy 3 OCP 4.14 clusters
2. Install RHACM on hub, import other two clusters and connect using Submariner add-ons with globalnet enabled
3. Deploy ODF 4.14 on both managed clusters
4. Install MCO on hub cluster and then create DRPolicy
5. Observe the ramen-dr-cluster-operator pod status on managed clusters


Actual results:
ramen-dr-cluster-operator pod in CrashLoopBackOff state

Expected results:
ramen-dr-cluster-operator pod in Running state

Additional info:

Comment 15 errata-xmlrpc 2023-09-27 14:22:42 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Important: Red Hat OpenShift Data Foundation 4.13.3 security and bug fix update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2023:5376

Note You need to log in before you can comment on or make changes to this bug.