Bug 2115507

Summary: [RDR] After Failover when primary cluster is down, cleanup stuck on primary once nodes are powered on
Product: [Red Hat Storage] Red Hat OpenShift Data Foundation Reporter: Sidhant Agrawal <sagrawal>
Component: csi-driverAssignee: Madhu Rajanna <mrajanna>
Status: CLOSED CURRENTRELEASE QA Contact: Sidhant Agrawal <sagrawal>
Severity: urgent Docs Contact:
Priority: unspecified    
Version: 4.11CC: amagrawa, bmekhiss, jespy, kseeger, mrajanna, muagarwa, ocs-bugs, odf-bz-bot, srangana
Target Milestone: ---Keywords: Automation, Regression
Target Release: ODF 4.12.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: 4.11.0-133 Doc Type: No Doc Update
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2023-02-08 14:06:28 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 2116605, 2139103, 2140550    
Bug Blocks:    

Comment 5 Benamar Mekhissi 2022-08-05 10:02:24 UTC
Yes, we have a bug in Ramen (VRG).
After the Replication State has moved to Secondary, we started processing VRG deletion.
As part of the deletion process, we try to make sure that the VR is not in degraded state. This is observed from the log through this log statement:
```
VolumeReplication and VolumeReplicationGroup state and autoresync match. Proceeding to status check    {"VolumeReplicationGroup": "workload-2/drpc-2", "State": "secondary", "Finalize": true, "pvc": "workload-2/busybox-pvc-1"}
2022-08-04T18:09:17.417005253Z 1.6596365574169962e+09   INFO    controllers.VolumeReplicationGroup.vrginstance  controllers/vrg_volrep.go:1097  VolumeReplication resource for the pvc is syncing as Secondary (busybox-pvc-1/workload-2)      {"VolumeReplicationGroup": "workload-2/drpc-2", "State": "secondary", "Finalize": true}
```

However, we fail to communicate that the VR is degraded to the higher layer of the call stack. That miscommunication ends up deleting the VR.

I am not sure if this is a regression or a day one issue.  I will try to figure out that next. After that, we will create a patch sometime today.

Comment 15 Mudit Agarwal 2022-08-23 08:34:58 UTC
Moving it to MODIFIED, this can be verified once https://bugzilla.redhat.com/show_bug.cgi?id=2116605 is ON_QA

Comment 33 Red Hat Bugzilla 2023-12-08 04:29:51 UTC
The needinfo request[s] on this closed bug have been removed as they have been unresolved for 120 days