2094179 – MCO fails to create DRClusters when replication mode is synchronous

Bug 2094179 - MCO fails to create DRClusters when replication mode is synchronous

Summary: MCO fails to create DRClusters when replication mode is synchronous

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat OpenShift Data Foundation
Classification:	Red Hat Storage
Component:	odf-dr
Sub Component:
Version:	4.11
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	urgent
Target Milestone:	---
Target Release:	ODF 4.11.0
Assignee:	Vineet
QA Contact:	akarsha
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2022-06-07 04:02 UTC by Raghavendra Talur
Modified:	2023-08-09 17:00 UTC (History)
CC List:	10 users (show)
Fixed In Version:	4.11.0-96
Doc Type:	No Doc Update
Doc Text:
Clone Of:
Environment:
Last Closed:	2022-08-24 13:54:14 UTC
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Github	red-hat-storage odf-multicluster-orchestrator pull 121	0	None	open	Bug 2094179: [release-4.11] Sync mode fixes and status updates	2022-06-13 07:17:00 UTC
Red Hat Product Errata	RHSA-2022:6156	0	None	None	None	2022-08-24 13:55:05 UTC

Description Raghavendra Talur 2022-06-07 04:02:06 UTC

Description of problem (please be detailed as possible and provide log
snippests):

We created the DRPolicy using the UI provided by the MCO console. The DRPolicy got created but could not be validated. On inspecting the logs of the MCO, we found the following log message.

1.654572189675748e+09	ERROR	controller.mirrorpeer	failed to fetch rook secret	{"reconciler group": "multicluster.odf.openshift.io", "reconciler kind": "MirrorPeer", "name": "mirror-peer-w8kkr", "namespace": "", "Secret": "bf7f60d03c97aa2c4e21186904c873a58658177", "error": "Secret \"bf7f60d03c97aa2c4e21186904c873a58658177\" not found"}
github.com/red-hat-storage/odf-multicluster-orchestrator/controllers.(*MirrorPeerReconciler).Reconcile
	/remote-source/app/controllers/mirrorpeer_controller.go:218
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Reconcile
	/remote-source/deps/gomod/pkg/mod/sigs.k8s.io/controller-runtime.2/pkg/internal/controller/controller.go:114
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler
	/remote-source/deps/gomod/pkg/mod/sigs.k8s.io/controller-runtime.2/pkg/internal/controller/controller.go:311
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem
	/remote-source/deps/gomod/pkg/mod/sigs.k8s.io/controller-runtime.2/pkg/internal/controller/controller.go:266
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2
	/remote-source/deps/gomod/pkg/mod/sigs.k8s.io/controller-runtime.2/pkg/internal/controller/controller.go:227


Can this issue reproducible?
Yes

Can this issue reproduce from the UI?
Yes

Comment 10 akarsha 2022-08-01 09:17:58 UTC

Tested with 3 OCP clusters, say hub, c1, and c2

Version:
OCP: 4.11.0-0.nightly-2022-07-29-173905
ODF: 4.11.0-129
CEPH: 16.2.7-112.el8cp (e18db2ff03ac60c64a18f3315c032b9d5a0a3b8f) pacific (stable)
ACM: 2.5.1

Steps performed

Following doc [1], able to create drpolicy via MCO console. Based on the observation moving the bug to a verified state.

Snippet output:

$ oc get pods -n openshift-operators
NAME                                        READY   STATUS    RESTARTS   AGE
odf-multicluster-console-768ff5d67-hwf7l    1/1     Running   0          3m14s
odfmo-controller-manager-7b9ffcd97f-2mb9r   1/1     Running   0          51s
ramen-hub-operator-78895779f6-grgp8         2/2     Running   0          59s


$ oc get drcluster
NAME           AGE
akrai-j31-c1   25h
akrai-j31-c2   25h

$ oc get drcluster akrai-j31-c1 -o jsonpath='{.status.conditions[2].reason}{"\n"}'
Succeeded

$ oc get drcluster akrai-j31-c2 -o jsonpath='{.status.conditions[2].reason}{"\n"}'
Succeeded

$ oc get drpolicy 
NAME           AGE
odr-policy-1   25h

$ oc get drpolicy odr-policy-1 -o jsonpath='{.status.conditions[].reason}{"\n"}'
Succeeded

$ date; date --utc; oc get drpc -n busybox-cephfs
Monday 01 August 2022 10:20:36 AM IST
Monday 01 August 2022 04:50:36 AM UTC
NAME                              AGE   PREFERREDCLUSTER   FAILOVERCLUSTER   DESIREDSTATE   CURRENTSTATE
busybox-cephfs-placement-1-drpc   20h   akrai-j31-c1 

$ date; date --utc; oc get drpc -n busybox-rbd
Monday 01 August 2022 10:20:16 AM IST
Monday 01 August 2022 04:50:16 AM UTC
NAME                           AGE   PREFERREDCLUSTER   FAILOVERCLUSTER   DESIREDSTATE   CURRENTSTATE
busybox-rbd-placement-1-drpc   20h   akrai-j31-c1


Logs collected here http://rhsqe-repo.lab.eng.blr.redhat.com/OCS/ocs-qe-bugs/bz2102506/

Comment 12 errata-xmlrpc 2022-08-24 13:54:14 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Important: Red Hat OpenShift Data Foundation 4.11.0 security, enhancement, & bugfix update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2022:6156

Note You need to log in before you can comment on or make changes to this bug.