Bug 2251022 - [RDR] [Hub recovery] Failover of rbd workloads didn't proceed after drpc reporting WaitForStorageMaintenanceActivation
Summary: [RDR] [Hub recovery] Failover of rbd workloads didn't proceed after drpc repo...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat OpenShift Data Foundation
Classification: Red Hat Storage
Component: odf-dr
Version: 4.14
Hardware: Unspecified
OS: Unspecified
unspecified
urgent
Target Milestone: ---
: ODF 4.16.0
Assignee: umanga
QA Contact: Aman Agrawal
URL:
Whiteboard:
Depends On:
Blocks: 2252116
TreeView+ depends on / blocked
 
Reported: 2023-11-22 11:31 UTC by Aman Agrawal
Modified: 2024-07-17 13:10 UTC (History)
9 users (show)

Fixed In Version: 4.15.0-103
Doc Type: No Doc Update
Doc Text:
Clone Of:
: 2252116 (view as bug list)
Environment:
Last Closed: 2024-07-17 13:10:20 UTC
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github red-hat-storage odf-multicluster-orchestrator pull 185 0 None Merged validate if cluster FSID is empty 2023-11-30 06:42:06 UTC
Red Hat Product Errata RHSA-2024:4591 0 None None None 2024-07-17 13:10:23 UTC

Comment 20 Aman Agrawal 2024-05-30 17:19:43 UTC
Tested with following versions:

ceph version 18.2.1-188.el9cp (b1ae9c989e2f41dcfec0e680c11d1d9465b1db0e) reef (stable)
OCP 4.16.0-0.nightly-2024-05-23-173505
ACM 2.11.0-DOWNSTREAM-2024-05-23-15-16-26
MCE 2.6.0-104 
ODF 4.16.0-108.stable
Gitops v1.12.3 

Platform- VMware

When the steps to reproduce is repeated, Failover was successful for all RBD and CephFS workloads and VolumeReplicationClass was successfully restored on the surviving managed cluster (which is needed for RBD).

oc get volumereplicationclass -A
NAME                                    PROVISIONER
rbd-volumereplicationclass-1625360775   openshift-storage.rbd.csi.ceph.com
rbd-volumereplicationclass-473128587    openshift-storage.rbd.csi.ceph.com


DRPC from new hub-

busybox-workloads-101   rbd-sub-busybox101-placement-1-drpc       4h51m   amagrawa-c1-28my   amagrawa-c2-my28   Failover       FailedOver     Cleaning Up   2024-05-30T15:26:02Z                        False
busybox-workloads-13    cephfs-sub-busybox13-placement-1-drpc     4h51m   amagrawa-c1-28my   amagrawa-c2-my28   Failover       FailedOver     Cleaning Up   2024-05-30T15:27:33Z                        False
busybox-workloads-16    cephfs-sub-busybox16-placement-1-drpc     4h51m   amagrawa-c1-28my   amagrawa-c2-my28   Failover       FailedOver     Cleaning Up   2024-05-30T15:27:26Z                        False
busybox-workloads-18    cnv-sub-busybox18-placement-1-drpc        4h51m   amagrawa-c1-28my   amagrawa-c2-my28   Failover       FailedOver     Cleaning Up   2024-05-30T16:52:14Z                        False
busybox-workloads-5     rbd-sub-busybox5-placement-1-drpc         4h51m   amagrawa-c1-28my   amagrawa-c2-my28   Failover       FailedOver     Cleaning Up   2024-05-30T15:25:50Z                        False
busybox-workloads-6     rbd-sub-busybox6-placement-1-drpc         4h51m   amagrawa-c1-28my   amagrawa-c2-my28   Failover       FailedOver     Cleaning Up   2024-05-30T15:25:56Z                        False
busybox-workloads-7     rbd-sub-busybox7-placement-1-drpc         4h51m   amagrawa-c1-28my   amagrawa-c2-my28   Failover       FailedOver     Cleaning Up   2024-05-30T15:25:34Z                        False
openshift-gitops        cephfs-appset-busybox12-placement-drpc    4h51m   amagrawa-c1-28my   amagrawa-c2-my28   Failover       FailedOver     Cleaning Up   2024-05-30T15:28:14Z                        False
openshift-gitops        cephfs-appset-busybox9-placement-drpc     4h51m   amagrawa-c1-28my   amagrawa-c2-my28   Failover       FailedOver     Cleaning Up   2024-05-30T15:28:19Z                        False
openshift-gitops        cnv-appset-busybox17-placement-drpc       4h51m   amagrawa-c1-28my   amagrawa-c2-my28   Failover       FailedOver     Cleaning Up   2024-05-30T16:52:23Z                        False
openshift-gitops        rbd-appset-busybox1-placement-drpc        4h51m   amagrawa-c1-28my   amagrawa-c2-my28   Failover       FailedOver     Cleaning Up   2024-05-30T15:26:08Z                        False
openshift-gitops        rbd-appset-busybox100-placement-drpc      4h51m   amagrawa-c1-28my   amagrawa-c2-my28   Failover       FailedOver     Cleaning Up   2024-05-30T15:26:14Z                        False
openshift-gitops        rbd-appset-busybox2-placement-drpc        4h51m   amagrawa-c1-28my   amagrawa-c2-my28   Failover       FailedOver     Cleaning Up   2024-05-30T15:26:20Z                        False
openshift-gitops        rbd-appset-busybox3-placement-drpc        4h51m   amagrawa-c1-28my   amagrawa-c2-my28   Failover       FailedOver     Cleaning Up   2024-05-30T15:26:49Z                        False

Since the primary managed cluster is still down, PROGRESSION is reporting Cleaning Up which is expected.

Failover was successful on 2 CNV (RBD) workloads  cnv-sub-busybox18-placement-1-drpc and cnv-appset-busybox17-placement-drpc as well of both subscription and appset (pull model) types respectively and the data written into the VM was successfully restored after failover completion.


We will track BZ2264767 and BZ2264765 mentioned in Comment13 separately as the issue tracked by this BZ is now fixed. Therefore I am marking this bug as verified.

Comment 23 errata-xmlrpc 2024-07-17 13:10:20 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Important: Red Hat OpenShift Data Foundation 4.16.0 security, enhancement & bug fix update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2024:4591


Note You need to log in before you can comment on or make changes to this bug.