Bug 2058310

Summary: [DR] Relocate operation is taking around ~15 minutes when 7 DR enabled PVC's are relocated to the preferred cluster
Product: [Red Hat Storage] Red Hat OpenShift Data Foundation Reporter: Aman Agrawal <amagrawa>
Component: odf-drAssignee: Shyamsundar <srangana>
odf-dr sub component: ramen QA Contact: Sidhant Agrawal <sagrawal>
Status: CLOSED CURRENTRELEASE Docs Contact:
Severity: urgent    
Priority: unspecified CC: bmekhiss, kramdoss, madam, muagarwa, ocs-bugs, odf-bz-bot, prsurve, rperiyas, sagrawal
Version: 4.10Keywords: AutomationBackLog, Regression
Target Milestone: ---   
Target Release: ODF 4.10.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: 4.10.0-171 Doc Type: No Doc Update
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2022-04-21 09:12:49 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Comment 4 krishnaram Karthick 2022-02-25 04:20:24 UTC
This is a regression as relocate operation did not take ~15 minutes during the 4.9 testing.

Comment 5 Benamar Mekhissi 2022-02-25 18:35:23 UTC
The reason the relocation took around 15 minutes to complete was because ramen on C1 was panicking during the relocation process and it didn't process the request from the DRPC (Hub) for that long.
```
NAME                                         READY   STATUS    RESTARTS       AGE
ramen-dr-cluster-operator-78c6b8c575-66x5k   2/2     Running   31 (43m ago)   8d
```
The log shows it here (these log statements are from the previous log and not the current log file)
``````````
2022-02-24T16:32:14.968928375Z 2022-02-24T16:32:14.968Z DPANIC  controllers.VolumeReplicationGroup.vrginstance  controllers/volumereplicationgroup_controller.go:1671   odd number of arguments passed as
 key-value pairs for logging   {"VolumeReplicationGroup": "busybox-workloads/busybox-drpc", "State": "secondary", "pvc": "busybox-workloads/busybox-pvc-1", "ignored key": "secondary"}
2022-02-24T16:32:14.968928375Z github.com/ramendr/ramen/controllers.(*VRGInstance).updateVR
2022-02-24T16:32:14.968928375Z  /remote-source/app/controllers/volumereplicationgroup_controller.go:1671
2022-02-24T16:32:14.968928375Z github.com/ramendr/ramen/controllers.(*VRGInstance).createOrUpdateVR
2022-02-24T16:32:14.968928375Z  /remote-source/app/controllers/volumereplicationgroup_controller.go:1640
2022-02-24T16:32:14.968928375Z github.com/ramendr/ramen/controllers.(*VRGInstance).processVRAsSecondary
2022-02-24T16:32:14.968928375Z  /remote-source/app/controllers/volumereplicationgroup_controller.go:1586
2022-02-24T16:32:14.968928375Z github.com/ramendr/ramen/controllers.(*VRGInstance).reconcileVRAsSecondary
2022-02-24T16:32:14.968928375Z  /remote-source/app/controllers/volumereplicationgroup_controller.go:1162
2022-02-24T16:32:14.968928375Z github.com/ramendr/ramen/controllers.(*VRGInstance).reconcileVRsAsSecondary
2022-02-24T16:32:14.968928375Z  /remote-source/app/controllers/volumereplicationgroup_controller.go:1128
2022-02-24T16:32:14.968928375Z github.com/ramendr/ramen/controllers.(*VRGInstance).handleVRGMode
...
...
...
`````````

The issue is fixed in this PR: https://github.com/RamenDR/ramen/pull/396

Comment 8 Shyamsundar 2022-02-28 13:32:27 UTC
(In reply to Benamar Mekhissi from comment #5)
> The reason the relocation took around 15 minutes to complete was because
> ramen on C1 was panicking during the relocation process and it didn't
> process the request from the DRPC (Hub) for that long.

> The issue is fixed in this PR: https://github.com/RamenDR/ramen/pull/396

The above issue is backported and available as of last week with downstream builds as part of bz #2055359