Bug 2135371

Summary: [RDR] Volume Replication resources not reaching Primary state due to "failed to enable replication" or "Failed to get volume replication info"
Product: [Red Hat Storage] Red Hat OpenShift Data Foundation Reporter: Sidhant Agrawal <sagrawal>
Component: csi-driverAssignee: Madhu Rajanna <mrajanna>
Status: CLOSED NOTABUG QA Contact: krishnaram Karthick <kramdoss>
Severity: medium Docs Contact:
Priority: unspecified    
Version: 4.12CC: amagrawa, bniver, ebenahar, idryomov, kseeger, mrajanna, muagarwa, ocs-bugs, odf-bz-bot, srangana, vashastr, ypadia
Target Milestone: ---Keywords: Automation
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2023-03-29 11:47:32 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Sidhant Agrawal 2022-10-17 12:13:48 UTC
Description of problem (please be detailed as possible and provide log
snippests):
In RDR setup, after deploying application, the volume replication resources are not reaching Primary state.
On checking csi-addons-controller-manager pod logs, observed that there there are two types of error messages as shown below 

1st type of error:
```
2022-10-17T11:52:31.446Z	ERROR	Failed to get volume replication info	{"controller": "volumereplication", "controllerGroup": "replication.storage.openshift.io", "controllerKind": "VolumeReplication", "VolumeReplication": {"name":"busybox-pvc-10","namespace":"busybox-workloads-1"}, "namespace": "busybox-workloads-1", "name": "busybox-pvc-10", "reconcileID": "89b21642-217b-47c2-baf9-a42f4275b04b", "Request.Name": "busybox-pvc-10", "Request.Namespace": "busybox-workloads-1", "error": "rpc error: code = Unknown desc = failed to get remote status: rbd: ret=-2, No such file or directory"}
github.com/csi-addons/kubernetes-csi-addons/controllers/replication%2estorage.(*VolumeReplicationReconciler).Reconcile
	/remote-source/app/controllers/replication.storage/volumereplication_controller.go:373
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Reconcile
	/remote-source/app/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:121
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler
	/remote-source/app/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:320
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem
	/remote-source/app/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:273
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2
	/remote-source/app/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:234
```


2nd type of error:
```
2022-10-17T11:52:07.738Z	ERROR	failed to enable replication	{"controller": "volumereplication", "controllerGroup": "replication.storage.openshift.io", "controllerKind": "VolumeReplication", "VolumeReplication": {"name":"busybox-pvc-4","namespace":"busybox-workloads-1"}, "namespace": "busybox-workloads-1", "name": "busybox-pvc-4", "reconcileID": "a1f87cca-f9e1-48c5-b68b-765887909fac", "Request.Name": "busybox-pvc-4", "Request.Namespace": "busybox-workloads-1", "error": "rpc error: code = NotFound desc = volume 0001-0011-openshift-storage-0000000000000001-9f8e8081-cc03-4ca5-a787-ef439a6cafbb not found"}
github.com/csi-addons/kubernetes-csi-addons/controllers/replication%2estorage.(*VolumeReplicationReconciler).Reconcile
	/remote-source/app/controllers/replication.storage/volumereplication_controller.go:259
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Reconcile
	/remote-source/app/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:121
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler
	/remote-source/app/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:320
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem
	/remote-source/app/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:273
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2
	/remote-source/app/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:234
```

Version of all relevant components (if applicable):
OCP: 4.12.0-0.nightly-2022-10-05-053337
ODF: 4.12.0-74
ACM: 2.6.1
Submariner: 0.13.0

Does this issue impact your ability to continue to work with the product
(please explain in detail what is the user impact)?
Yes, application deployment is failing.

Is there any workaround available to the best of your knowledge?
No

Rate from 1 - 5 the complexity of the scenario you performed that caused this
bug (1 - very simple, 5 - very complex)?
2

Can this issue reproducible?

Can this issue reproduce from the UI?

If this is a regression, please provide more details to justify this:

Steps to Reproduce:
1. Configure RDR setup
2. Deploy an application containing multiple PVCs/Pods
3. Check VRG and VR status


Actual results:
VR and VRG are not reaching Primary state

Expected results:
VR and VRG should reach Primary state.

Additional info:

$ oc get drpc -A -o wide
NAMESPACE             NAME           AGE   PREFERREDCLUSTER   FAILOVERCLUSTER   DESIREDSTATE   CURRENTSTATE   PROGRESSION   START TIME             DURATION        PEER READY
busybox-workloads-1   busybox-drpc   18m   sagrawal-c1                                         Deployed       Completed     2022-10-17T11:45:48Z   30.043175448s   True
busybox-workloads-3   busybox-drpc   60m   sagrawal-c1                                         Deployed       Completed     2022-10-17T11:04:08Z   30.042983128s   True

$ oc get vrg -A -o wide
NAMESPACE             NAME           DESIREDSTATE   CURRENTSTATE
busybox-workloads-1   busybox-drpc   primary
busybox-workloads-3   busybox-drpc   primary

---
$ oc get pvc,vr,vrg -n busybox-workloads-1
NAME                                   STATUS   VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLASS                  AGE
persistentvolumeclaim/busybox-pvc-1    Bound    pvc-c03a1655-aaa8-47e0-bf5b-79d16a0b22c9   94Gi       RWO            ocs-storagecluster-ceph-rbd   24m
persistentvolumeclaim/busybox-pvc-10   Bound    pvc-b27a27e5-c333-418e-9bd4-1ac836eda480   87Gi       RWO            ocs-storagecluster-ceph-rbd   24m
persistentvolumeclaim/busybox-pvc-11   Bound    pvc-b592bc25-748a-4351-ab85-09149589a4ea   33Gi       RWO            ocs-storagecluster-ceph-rbd   24m
persistentvolumeclaim/busybox-pvc-12   Bound    pvc-2ad50547-1e7a-4e2b-80c2-9390de107a93   147Gi      RWO            ocs-storagecluster-ceph-rbd   24m
persistentvolumeclaim/busybox-pvc-13   Bound    pvc-015b454e-edc2-4693-a4fd-ca913a9c1276   77Gi       RWO            ocs-storagecluster-ceph-rbd   24m
persistentvolumeclaim/busybox-pvc-14   Bound    pvc-78414bce-07a7-446b-8cea-04750a43db42   70Gi       RWO            ocs-storagecluster-ceph-rbd   24m
persistentvolumeclaim/busybox-pvc-15   Bound    pvc-88f9e064-c9e7-45fe-ab3a-5c09529be888   131Gi      RWO            ocs-storagecluster-ceph-rbd   24m
persistentvolumeclaim/busybox-pvc-16   Bound    pvc-bbdc03f0-8410-4696-a706-02696b1b1c77   127Gi      RWO            ocs-storagecluster-ceph-rbd   24m
persistentvolumeclaim/busybox-pvc-17   Bound    pvc-48d50f38-8a3a-41d8-9633-8e942b0fa5bd   58Gi       RWO            ocs-storagecluster-ceph-rbd   24m
persistentvolumeclaim/busybox-pvc-18   Bound    pvc-3b00b276-52e2-4a55-8b68-c3ff42a5aa70   123Gi      RWO            ocs-storagecluster-ceph-rbd   24m
persistentvolumeclaim/busybox-pvc-19   Bound    pvc-cc4c0710-6399-454a-ab26-a7a68f15e162   61Gi       RWO            ocs-storagecluster-ceph-rbd   24m
persistentvolumeclaim/busybox-pvc-2    Bound    pvc-1ee2f9c4-e363-4920-b594-606aaf1965a3   44Gi       RWO            ocs-storagecluster-ceph-rbd   24m
persistentvolumeclaim/busybox-pvc-20   Bound    pvc-948d371b-3c24-41c8-be5c-267c1a26945d   33Gi       RWO            ocs-storagecluster-ceph-rbd   24m
persistentvolumeclaim/busybox-pvc-3    Bound    pvc-87508427-bf1c-4596-8388-c20b17ea9996   76Gi       RWO            ocs-storagecluster-ceph-rbd   24m
persistentvolumeclaim/busybox-pvc-4    Bound    pvc-fa5d03b4-2f26-47f5-9431-b98376e1be56   144Gi      RWO            ocs-storagecluster-ceph-rbd   24m
persistentvolumeclaim/busybox-pvc-5    Bound    pvc-67df1d16-e5b9-451b-a554-3b68458cc83c   107Gi      RWO            ocs-storagecluster-ceph-rbd   24m
persistentvolumeclaim/busybox-pvc-6    Bound    pvc-23cd6c53-cef1-4811-ae6b-570662eac783   123Gi      RWO            ocs-storagecluster-ceph-rbd   24m
persistentvolumeclaim/busybox-pvc-7    Bound    pvc-64e361a4-cf37-4e4c-882c-e687b0314f23   90Gi       RWO            ocs-storagecluster-ceph-rbd   24m
persistentvolumeclaim/busybox-pvc-8    Bound    pvc-18fd3418-0581-438f-bcd1-9aac4a657980   91Gi       RWO            ocs-storagecluster-ceph-rbd   24m
persistentvolumeclaim/busybox-pvc-9    Bound    pvc-d70b0522-b776-4b5e-84cc-269567f194bb   111Gi      RWO            ocs-storagecluster-ceph-rbd   24m

NAME                                                                AGE   VOLUMEREPLICATIONCLASS                  PVCNAME          DESIREDSTATE   CURRENTSTATE
volumereplication.replication.storage.openshift.io/busybox-pvc-1    24m   rbd-volumereplicationclass-1625360775   busybox-pvc-1    primary
volumereplication.replication.storage.openshift.io/busybox-pvc-10   24m   rbd-volumereplicationclass-1625360775   busybox-pvc-10   primary
volumereplication.replication.storage.openshift.io/busybox-pvc-11   24m   rbd-volumereplicationclass-1625360775   busybox-pvc-11   primary
volumereplication.replication.storage.openshift.io/busybox-pvc-12   24m   rbd-volumereplicationclass-1625360775   busybox-pvc-12   primary
volumereplication.replication.storage.openshift.io/busybox-pvc-13   24m   rbd-volumereplicationclass-1625360775   busybox-pvc-13   primary
volumereplication.replication.storage.openshift.io/busybox-pvc-14   24m   rbd-volumereplicationclass-1625360775   busybox-pvc-14   primary
volumereplication.replication.storage.openshift.io/busybox-pvc-15   24m   rbd-volumereplicationclass-1625360775   busybox-pvc-15   primary
volumereplication.replication.storage.openshift.io/busybox-pvc-16   24m   rbd-volumereplicationclass-1625360775   busybox-pvc-16   primary        Unknown
volumereplication.replication.storage.openshift.io/busybox-pvc-17   24m   rbd-volumereplicationclass-1625360775   busybox-pvc-17   primary
volumereplication.replication.storage.openshift.io/busybox-pvc-18   24m   rbd-volumereplicationclass-1625360775   busybox-pvc-18   primary        Unknown
volumereplication.replication.storage.openshift.io/busybox-pvc-19   24m   rbd-volumereplicationclass-1625360775   busybox-pvc-19   primary
volumereplication.replication.storage.openshift.io/busybox-pvc-2    24m   rbd-volumereplicationclass-1625360775   busybox-pvc-2    primary        Unknown
volumereplication.replication.storage.openshift.io/busybox-pvc-20   24m   rbd-volumereplicationclass-1625360775   busybox-pvc-20   primary        Unknown
volumereplication.replication.storage.openshift.io/busybox-pvc-3    24m   rbd-volumereplicationclass-1625360775   busybox-pvc-3    primary        Unknown
volumereplication.replication.storage.openshift.io/busybox-pvc-4    24m   rbd-volumereplicationclass-1625360775   busybox-pvc-4    primary        Unknown
volumereplication.replication.storage.openshift.io/busybox-pvc-5    24m   rbd-volumereplicationclass-1625360775   busybox-pvc-5    primary
volumereplication.replication.storage.openshift.io/busybox-pvc-6    24m   rbd-volumereplicationclass-1625360775   busybox-pvc-6    primary
volumereplication.replication.storage.openshift.io/busybox-pvc-7    24m   rbd-volumereplicationclass-1625360775   busybox-pvc-7    primary
volumereplication.replication.storage.openshift.io/busybox-pvc-8    24m   rbd-volumereplicationclass-1625360775   busybox-pvc-8    primary
volumereplication.replication.storage.openshift.io/busybox-pvc-9    24m   rbd-volumereplicationclass-1625360775   busybox-pvc-9    primary        Unknown

NAME                                                       DESIREDSTATE   CURRENTSTATE
volumereplicationgroup.ramendr.openshift.io/busybox-drpc   primary
---

---
$ oc get pvc,vr,vrg -n busybox-workloads-3
NAME                                   STATUS   VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLASS                  AGE
persistentvolumeclaim/busybox-pvc-41   Bound    pvc-c0e63408-4400-4455-8c36-0bfdb5e5b2dc   42Gi       RWO            ocs-storagecluster-ceph-rbd   66m
persistentvolumeclaim/busybox-pvc-42   Bound    pvc-1a776220-1e30-4c36-af18-b18ee9ee5897   81Gi       RWO            ocs-storagecluster-ceph-rbd   66m
persistentvolumeclaim/busybox-pvc-43   Bound    pvc-3884dc82-d834-4d07-b08f-f71bc43611ba   28Gi       RWO            ocs-storagecluster-ceph-rbd   66m
persistentvolumeclaim/busybox-pvc-44   Bound    pvc-e05b256a-c936-457e-a013-8bbfdb71819c   118Gi      RWO            ocs-storagecluster-ceph-rbd   66m
persistentvolumeclaim/busybox-pvc-45   Bound    pvc-a06f5549-6f8a-4a48-ae98-faf17600b7d2   19Gi       RWO            ocs-storagecluster-ceph-rbd   66m
persistentvolumeclaim/busybox-pvc-46   Bound    pvc-7cfa84d4-4a10-4f24-99c2-205b37212898   129Gi      RWO            ocs-storagecluster-ceph-rbd   66m
persistentvolumeclaim/busybox-pvc-47   Bound    pvc-eb7be1ee-4df3-4969-aeca-222bebe40eb3   43Gi       RWO            ocs-storagecluster-ceph-rbd   66m
persistentvolumeclaim/busybox-pvc-48   Bound    pvc-87da4b2e-aeb9-4c9b-aad9-3de0f8f67035   57Gi       RWO            ocs-storagecluster-ceph-rbd   66m
persistentvolumeclaim/busybox-pvc-49   Bound    pvc-d970e792-edbb-46aa-a461-2a0cfda0122b   89Gi       RWO            ocs-storagecluster-ceph-rbd   66m
persistentvolumeclaim/busybox-pvc-50   Bound    pvc-768d577f-8578-4aea-ac3f-d1f688ae797c   124Gi      RWO            ocs-storagecluster-ceph-rbd   66m
persistentvolumeclaim/busybox-pvc-51   Bound    pvc-413699c6-a946-4f37-8d8f-54ee1198d316   95Gi       RWO            ocs-storagecluster-ceph-rbd   66m
persistentvolumeclaim/busybox-pvc-52   Bound    pvc-d83206e6-1c6b-43d5-9782-b98ba9256233   129Gi      RWO            ocs-storagecluster-ceph-rbd   66m
persistentvolumeclaim/busybox-pvc-53   Bound    pvc-bddebe87-192c-4c9c-8fbf-60a82800f8d2   51Gi       RWO            ocs-storagecluster-ceph-rbd   66m
persistentvolumeclaim/busybox-pvc-54   Bound    pvc-66fe12b3-2276-4a87-a10b-f0bb80c36901   30Gi       RWO            ocs-storagecluster-ceph-rbd   66m
persistentvolumeclaim/busybox-pvc-55   Bound    pvc-c4fa2ef8-4b37-4a9f-b83c-b97ad47eb479   102Gi      RWO            ocs-storagecluster-ceph-rbd   66m
persistentvolumeclaim/busybox-pvc-56   Bound    pvc-73473181-ee13-447e-9257-3c9448d74234   40Gi       RWO            ocs-storagecluster-ceph-rbd   66m
persistentvolumeclaim/busybox-pvc-57   Bound    pvc-706c900f-bfa7-4fac-a1fe-54ef6f070a9e   146Gi      RWO            ocs-storagecluster-ceph-rbd   66m
persistentvolumeclaim/busybox-pvc-58   Bound    pvc-136d3d61-1c91-4a70-90fc-54713b21fcd4   63Gi       RWO            ocs-storagecluster-ceph-rbd   66m
persistentvolumeclaim/busybox-pvc-59   Bound    pvc-01647d45-8785-43c6-8678-05c327615a5e   118Gi      RWO            ocs-storagecluster-ceph-rbd   66m
persistentvolumeclaim/busybox-pvc-60   Bound    pvc-a260c42a-dc17-4b80-a1f5-2d1630c37b2e   25Gi       RWO            ocs-storagecluster-ceph-rbd   66m

NAME                                                                AGE   VOLUMEREPLICATIONCLASS                  PVCNAME          DESIREDSTATE   CURRENTSTATE
volumereplication.replication.storage.openshift.io/busybox-pvc-41   66m   rbd-volumereplicationclass-1625360775   busybox-pvc-41   primary
volumereplication.replication.storage.openshift.io/busybox-pvc-42   66m   rbd-volumereplicationclass-1625360775   busybox-pvc-42   primary
volumereplication.replication.storage.openshift.io/busybox-pvc-43   66m   rbd-volumereplicationclass-1625360775   busybox-pvc-43   primary
volumereplication.replication.storage.openshift.io/busybox-pvc-44   66m   rbd-volumereplicationclass-1625360775   busybox-pvc-44   primary
volumereplication.replication.storage.openshift.io/busybox-pvc-45   66m   rbd-volumereplicationclass-1625360775   busybox-pvc-45   primary
volumereplication.replication.storage.openshift.io/busybox-pvc-46   66m   rbd-volumereplicationclass-1625360775   busybox-pvc-46   primary
volumereplication.replication.storage.openshift.io/busybox-pvc-47   66m   rbd-volumereplicationclass-1625360775   busybox-pvc-47   primary
volumereplication.replication.storage.openshift.io/busybox-pvc-48   66m   rbd-volumereplicationclass-1625360775   busybox-pvc-48   primary
volumereplication.replication.storage.openshift.io/busybox-pvc-49   66m   rbd-volumereplicationclass-1625360775   busybox-pvc-49   primary
volumereplication.replication.storage.openshift.io/busybox-pvc-50   66m   rbd-volumereplicationclass-1625360775   busybox-pvc-50   primary
volumereplication.replication.storage.openshift.io/busybox-pvc-51   66m   rbd-volumereplicationclass-1625360775   busybox-pvc-51   primary
volumereplication.replication.storage.openshift.io/busybox-pvc-52   66m   rbd-volumereplicationclass-1625360775   busybox-pvc-52   primary
volumereplication.replication.storage.openshift.io/busybox-pvc-53   66m   rbd-volumereplicationclass-1625360775   busybox-pvc-53   primary
volumereplication.replication.storage.openshift.io/busybox-pvc-54   66m   rbd-volumereplicationclass-1625360775   busybox-pvc-54   primary
volumereplication.replication.storage.openshift.io/busybox-pvc-55   66m   rbd-volumereplicationclass-1625360775   busybox-pvc-55   primary
volumereplication.replication.storage.openshift.io/busybox-pvc-56   66m   rbd-volumereplicationclass-1625360775   busybox-pvc-56   primary
volumereplication.replication.storage.openshift.io/busybox-pvc-57   66m   rbd-volumereplicationclass-1625360775   busybox-pvc-57   primary
volumereplication.replication.storage.openshift.io/busybox-pvc-58   66m   rbd-volumereplicationclass-1625360775   busybox-pvc-58   primary
volumereplication.replication.storage.openshift.io/busybox-pvc-59   66m   rbd-volumereplicationclass-1625360775   busybox-pvc-59   primary
volumereplication.replication.storage.openshift.io/busybox-pvc-60   66m   rbd-volumereplicationclass-1625360775   busybox-pvc-60   primary

NAME                                                       DESIREDSTATE   CURRENTSTATE
volumereplicationgroup.ramendr.openshift.io/busybox-drpc   primary
---

Comment 12 Mudit Agarwal 2022-11-30 04:37:46 UTC
Sidhant, did you get a chance to repro this with verbose logging?

Comment 13 Sidhant Agrawal 2022-11-30 05:42:38 UTC
(In reply to Mudit Agarwal from comment #12)
> Sidhant, did you get a chance to repro this with verbose logging?

So far issue not reproduced. Will update in case issue is seen again.

Comment 44 Mudit Agarwal 2023-03-29 11:47:32 UTC
Please create a new BZ if this is reproducible.