Bug 2094179

Summary: MCO fails to create DRClusters when replication mode is synchronous
Product: [Red Hat Storage] Red Hat OpenShift Data Foundation Reporter: Raghavendra Talur <rtalur>
Component: odf-drAssignee: Vineet <vbadrina>
odf-dr sub component: multicluster-orchestrator QA Contact: akarsha <akrai>
Status: CLOSED ERRATA Docs Contact:
Severity: urgent    
Priority: unspecified CC: akrai, hnallurv, madam, mbukatov, mmuench, muagarwa, ocs-bugs, odf-bz-bot, rtalur, vbadrina
Version: 4.11Keywords: TestBlocker
Target Milestone: ---   
Target Release: ODF 4.11.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: 4.11.0-96 Doc Type: No Doc Update
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2022-08-24 13:54:14 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Raghavendra Talur 2022-06-07 04:02:06 UTC
Description of problem (please be detailed as possible and provide log
snippests):

We created the DRPolicy using the UI provided by the MCO console. The DRPolicy got created but could not be validated. On inspecting the logs of the MCO, we found the following log message.

1.654572189675748e+09	ERROR	controller.mirrorpeer	failed to fetch rook secret	{"reconciler group": "multicluster.odf.openshift.io", "reconciler kind": "MirrorPeer", "name": "mirror-peer-w8kkr", "namespace": "", "Secret": "bf7f60d03c97aa2c4e21186904c873a58658177", "error": "Secret \"bf7f60d03c97aa2c4e21186904c873a58658177\" not found"}
github.com/red-hat-storage/odf-multicluster-orchestrator/controllers.(*MirrorPeerReconciler).Reconcile
	/remote-source/app/controllers/mirrorpeer_controller.go:218
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Reconcile
	/remote-source/deps/gomod/pkg/mod/sigs.k8s.io/controller-runtime.2/pkg/internal/controller/controller.go:114
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler
	/remote-source/deps/gomod/pkg/mod/sigs.k8s.io/controller-runtime.2/pkg/internal/controller/controller.go:311
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem
	/remote-source/deps/gomod/pkg/mod/sigs.k8s.io/controller-runtime.2/pkg/internal/controller/controller.go:266
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2
	/remote-source/deps/gomod/pkg/mod/sigs.k8s.io/controller-runtime.2/pkg/internal/controller/controller.go:227


Can this issue reproducible?
Yes

Can this issue reproduce from the UI?
Yes

Comment 10 akarsha 2022-08-01 09:17:58 UTC
Tested with 3 OCP clusters, say hub, c1, and c2

Version:
OCP: 4.11.0-0.nightly-2022-07-29-173905
ODF: 4.11.0-129
CEPH: 16.2.7-112.el8cp (e18db2ff03ac60c64a18f3315c032b9d5a0a3b8f) pacific (stable)
ACM: 2.5.1

Steps performed

Following doc [1], able to create drpolicy via MCO console. Based on the observation moving the bug to a verified state.

Snippet output:

$ oc get pods -n openshift-operators
NAME                                        READY   STATUS    RESTARTS   AGE
odf-multicluster-console-768ff5d67-hwf7l    1/1     Running   0          3m14s
odfmo-controller-manager-7b9ffcd97f-2mb9r   1/1     Running   0          51s
ramen-hub-operator-78895779f6-grgp8         2/2     Running   0          59s


$ oc get drcluster
NAME           AGE
akrai-j31-c1   25h
akrai-j31-c2   25h

$ oc get drcluster akrai-j31-c1 -o jsonpath='{.status.conditions[2].reason}{"\n"}'
Succeeded

$ oc get drcluster akrai-j31-c2 -o jsonpath='{.status.conditions[2].reason}{"\n"}'
Succeeded

$ oc get drpolicy 
NAME           AGE
odr-policy-1   25h

$ oc get drpolicy odr-policy-1 -o jsonpath='{.status.conditions[].reason}{"\n"}'
Succeeded

$ date; date --utc; oc get drpc -n busybox-cephfs
Monday 01 August 2022 10:20:36 AM IST
Monday 01 August 2022 04:50:36 AM UTC
NAME                              AGE   PREFERREDCLUSTER   FAILOVERCLUSTER   DESIREDSTATE   CURRENTSTATE
busybox-cephfs-placement-1-drpc   20h   akrai-j31-c1 

$ date; date --utc; oc get drpc -n busybox-rbd
Monday 01 August 2022 10:20:16 AM IST
Monday 01 August 2022 04:50:16 AM UTC
NAME                           AGE   PREFERREDCLUSTER   FAILOVERCLUSTER   DESIREDSTATE   CURRENTSTATE
busybox-rbd-placement-1-drpc   20h   akrai-j31-c1


Logs collected here http://rhsqe-repo.lab.eng.blr.redhat.com/OCS/ocs-qe-bugs/bz2102506/

Comment 12 errata-xmlrpc 2022-08-24 13:54:14 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Important: Red Hat OpenShift Data Foundation 4.11.0 security, enhancement, & bugfix update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2022:6156