Bug 2115507 - [RDR] After Failover when primary cluster is down, cleanup stuck on primary once nodes are powered on
Summary: [RDR] After Failover when primary cluster is down, cleanup stuck on primary o...
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: Red Hat OpenShift Data Foundation
Classification: Red Hat Storage
Component: csi-driver
Version: 4.11
Hardware: Unspecified
OS: Unspecified
unspecified
urgent
Target Milestone: ---
: ODF 4.12.0
Assignee: Madhu Rajanna
QA Contact: Sidhant Agrawal
URL:
Whiteboard:
Depends On: 2116605 2139103 2140550
Blocks:
TreeView+ depends on / blocked
 
Reported: 2022-08-04 19:47 UTC by Sidhant Agrawal
Modified: 2023-12-08 04:29 UTC (History)
9 users (show)

Fixed In Version: 4.11.0-133
Doc Type: No Doc Update
Doc Text:
Clone Of:
Environment:
Last Closed: 2023-02-08 14:06:28 UTC
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github red-hat-storage ceph-csi pull 119 0 None open BUG 2115507: rbd: consider mirror deamon state for ResyncVolume 2022-08-08 13:29:20 UTC

Comment 5 Benamar Mekhissi 2022-08-05 10:02:24 UTC
Yes, we have a bug in Ramen (VRG).
After the Replication State has moved to Secondary, we started processing VRG deletion.
As part of the deletion process, we try to make sure that the VR is not in degraded state. This is observed from the log through this log statement:
```
VolumeReplication and VolumeReplicationGroup state and autoresync match. Proceeding to status check    {"VolumeReplicationGroup": "workload-2/drpc-2", "State": "secondary", "Finalize": true, "pvc": "workload-2/busybox-pvc-1"}
2022-08-04T18:09:17.417005253Z 1.6596365574169962e+09   INFO    controllers.VolumeReplicationGroup.vrginstance  controllers/vrg_volrep.go:1097  VolumeReplication resource for the pvc is syncing as Secondary (busybox-pvc-1/workload-2)      {"VolumeReplicationGroup": "workload-2/drpc-2", "State": "secondary", "Finalize": true}
```

However, we fail to communicate that the VR is degraded to the higher layer of the call stack. That miscommunication ends up deleting the VR.

I am not sure if this is a regression or a day one issue.  I will try to figure out that next. After that, we will create a patch sometime today.

Comment 15 Mudit Agarwal 2022-08-23 08:34:58 UTC
Moving it to MODIFIED, this can be verified once https://bugzilla.redhat.com/show_bug.cgi?id=2116605 is ON_QA

Comment 33 Red Hat Bugzilla 2023-12-08 04:29:51 UTC
The needinfo request[s] on this closed bug have been removed as they have been unresolved for 120 days


Note You need to log in before you can comment on or make changes to this bug.