Description of problem (please be detailed as possible and provide log snippests): Version of all relevant components (if applicable): ODF 4.14.0-150.stable ACM 2.9.0-DOWNSTREAM-2023-10-12-14-53-11 advanced-cluster-management.v2.9.0-187 Does this issue impact your ability to continue to work with the product (please explain in detail what is the user impact)? Is there any workaround available to the best of your knowledge? Rate from 1 - 5 the complexity of the scenario you performed that caused this bug (1 - very simple, 5 - very complex)? Can this issue reproducible? Can this issue reproduce from the UI? If this is a regression, please provide more details to justify this: Steps to Reproduce: 1. Deploy appset based workloads on a RDR setup with DR monitoring dashboard configured 2. Run IOs, check if sync is working fine 3. Bring primary cluster down, failover the workload 4. Check Volume replication health status for the failedover workload. VolumeSynchronizationDelay alert will keep on firing for this workload as older primary cluster is still down and data sync is interrupted however Volume replication health shows healthy after failover for all VR associated of this failedover application (older primary cluster remains down). Actual results: Volume replication health shows healthy post failover while older primary cluster is still down and raises VolumeSynchronizationDelay alert for FailedOver apps VRG yaml for failedover app from new primary cluster post failover (which was secondary earlier), older primary cluster remains down- amagrawa:~$ oc get vrg -o yaml apiVersion: v1 items: - apiVersion: ramendr.openshift.io/v1alpha1 kind: VolumeReplicationGroup metadata: creationTimestamp: "2023-10-18T11:57:12Z" finalizers: - volumereplicationgroups.ramendr.openshift.io/vrg-protection generation: 1 name: busybox-workloads-2-placement-drpc namespace: busybox-workloads-2 ownerReferences: - apiVersion: work.open-cluster-management.io/v1 kind: AppliedManifestWork name: 1391f6e15d7df48686076b110a7dea069a2484056ed94fbce4b2b1bf0562e8a6-busybox-workloads-2-placement-drpc-busybox-workloads-2-vrg-mw uid: 5e3b7c81-9396-4927-8d7f-bc4ebb124430 resourceVersion: "3433435" uid: 255eeafd-0511-46d6-95ad-e00023fa56b5 spec: action: Failover async: replicationClassSelector: {} schedulingInterval: 15m volumeSnapshotClassSelector: {} pvcSelector: matchLabels: appname: busybox replicationState: primary s3Profiles: - s3profile-amagrawa-c1-14oct-ocs-storagecluster - s3profile-amagrawa-c2-14oct-ocs-storagecluster volSync: {} status: conditions: - lastTransitionTime: "2023-10-18T11:58:12Z" message: PVCs in the VolumeReplicationGroup are ready for use observedGeneration: 1 reason: Ready status: "True" type: DataReady - lastTransitionTime: "2023-10-18T11:57:17Z" message: VolumeReplicationGroup is replicating observedGeneration: 1 reason: Replicating status: "False" type: DataProtected - lastTransitionTime: "2023-10-18T11:57:13Z" message: Restored cluster data observedGeneration: 1 reason: Restored status: "True" type: ClusterDataReady - lastTransitionTime: "2023-10-18T11:57:30Z" message: Cluster data of all PVs are protected observedGeneration: 1 reason: Uploaded status: "True" type: ClusterDataProtected kubeObjectProtection: {} lastUpdateTime: "2023-10-18T12:57:21Z" observedGeneration: 1 protectedPVCs: - accessModes: - ReadWriteOnce conditions: - lastTransitionTime: "2023-10-18T11:57:20Z" message: PVC in the VolumeReplicationGroup is ready for use observedGeneration: 1 reason: Ready status: "True" type: DataReady - lastTransitionTime: "2023-10-18T11:57:17Z" message: 'Done uploading PV/PVC cluster data to 2 of 2 S3 profile(s): [s3profile-amagrawa-c1-14oct-ocs-storagecluster s3profile-amagrawa-c2-14oct-ocs-storagecluster]' observedGeneration: 1 reason: Uploaded status: "True" type: ClusterDataProtected - lastTransitionTime: "2023-10-18T11:57:20Z" message: PVC in the VolumeReplicationGroup is ready for use observedGeneration: 1 reason: Replicating status: "False" type: DataProtected csiProvisioner: openshift-storage.rbd.csi.ceph.com labels: app.kubernetes.io/instance: busybox-workloads-2-amagrawa-c2-14oct appname: busybox name: dd-io-pvc-1 replicationID: id: 433f6b6b47ccde08274e4a6ae1af38e44f2a435 modes: - Failover resources: requests: storage: 117Gi storageClassName: ocs-storagecluster-ceph-rbd storageID: id: 4b746a22-e8d8-4a9c-8f3e-cc1d50e6c64f - accessModes: - ReadWriteOnce conditions: - lastTransitionTime: "2023-10-18T11:57:20Z" message: PVC in the VolumeReplicationGroup is ready for use observedGeneration: 1 reason: Ready status: "True" type: DataReady - lastTransitionTime: "2023-10-18T11:57:17Z" message: 'Done uploading PV/PVC cluster data to 2 of 2 S3 profile(s): [s3profile-amagrawa-c1-14oct-ocs-storagecluster s3profile-amagrawa-c2-14oct-ocs-storagecluster]' observedGeneration: 1 reason: Uploaded status: "True" type: ClusterDataProtected - lastTransitionTime: "2023-10-18T11:57:20Z" message: PVC in the VolumeReplicationGroup is ready for use observedGeneration: 1 reason: Replicating status: "False" type: DataProtected csiProvisioner: openshift-storage.rbd.csi.ceph.com labels: app.kubernetes.io/instance: busybox-workloads-2-amagrawa-c2-14oct appname: busybox name: dd-io-pvc-2 replicationID: id: 433f6b6b47ccde08274e4a6ae1af38e44f2a435 modes: - Failover resources: requests: storage: 143Gi storageClassName: ocs-storagecluster-ceph-rbd storageID: id: 4b746a22-e8d8-4a9c-8f3e-cc1d50e6c64f - accessModes: - ReadWriteOnce conditions: - lastTransitionTime: "2023-10-18T11:58:10Z" message: PVC in the VolumeReplicationGroup is ready for use observedGeneration: 1 reason: Ready status: "True" type: DataReady - lastTransitionTime: "2023-10-18T11:57:30Z" message: 'Done uploading PV/PVC cluster data to 2 of 2 S3 profile(s): [s3profile-amagrawa-c1-14oct-ocs-storagecluster s3profile-amagrawa-c2-14oct-ocs-storagecluster]' observedGeneration: 1 reason: Uploaded status: "True" type: ClusterDataProtected - lastTransitionTime: "2023-10-18T11:57:29Z" message: PVC in the VolumeReplicationGroup is ready for use observedGeneration: 1 reason: Replicating status: "False" type: DataProtected csiProvisioner: openshift-storage.rbd.csi.ceph.com labels: app.kubernetes.io/instance: busybox-workloads-2-amagrawa-c2-14oct appname: busybox name: dd-io-pvc-3 replicationID: id: 433f6b6b47ccde08274e4a6ae1af38e44f2a435 modes: - Failover resources: requests: storage: 134Gi storageClassName: ocs-storagecluster-ceph-rbd storageID: id: 4b746a22-e8d8-4a9c-8f3e-cc1d50e6c64f - accessModes: - ReadWriteOnce conditions: - lastTransitionTime: "2023-10-18T11:57:20Z" message: PVC in the VolumeReplicationGroup is ready for use observedGeneration: 1 reason: Ready status: "True" type: DataReady - lastTransitionTime: "2023-10-18T11:57:18Z" message: 'Done uploading PV/PVC cluster data to 2 of 2 S3 profile(s): [s3profile-amagrawa-c1-14oct-ocs-storagecluster s3profile-amagrawa-c2-14oct-ocs-storagecluster]' observedGeneration: 1 reason: Uploaded status: "True" type: ClusterDataProtected - lastTransitionTime: "2023-10-18T11:57:20Z" message: PVC in the VolumeReplicationGroup is ready for use observedGeneration: 1 reason: Replicating status: "False" type: DataProtected csiProvisioner: openshift-storage.rbd.csi.ceph.com labels: app.kubernetes.io/instance: busybox-workloads-2-amagrawa-c2-14oct appname: busybox name: dd-io-pvc-4 replicationID: id: 433f6b6b47ccde08274e4a6ae1af38e44f2a435 modes: - Failover resources: requests: storage: 106Gi storageClassName: ocs-storagecluster-ceph-rbd storageID: id: 4b746a22-e8d8-4a9c-8f3e-cc1d50e6c64f - accessModes: - ReadWriteOnce conditions: - lastTransitionTime: "2023-10-18T11:58:12Z" message: PVC in the VolumeReplicationGroup is ready for use observedGeneration: 1 reason: Ready status: "True" type: DataReady - lastTransitionTime: "2023-10-18T11:57:29Z" message: 'Done uploading PV/PVC cluster data to 2 of 2 S3 profile(s): [s3profile-amagrawa-c1-14oct-ocs-storagecluster s3profile-amagrawa-c2-14oct-ocs-storagecluster]' observedGeneration: 1 reason: Uploaded status: "True" type: ClusterDataProtected - lastTransitionTime: "2023-10-18T11:57:29Z" message: PVC in the VolumeReplicationGroup is ready for use observedGeneration: 1 reason: Replicating status: "False" type: DataProtected csiProvisioner: openshift-storage.rbd.csi.ceph.com labels: app.kubernetes.io/instance: busybox-workloads-2-amagrawa-c2-14oct appname: busybox name: dd-io-pvc-5 replicationID: id: 433f6b6b47ccde08274e4a6ae1af38e44f2a435 modes: - Failover resources: requests: storage: 115Gi storageClassName: ocs-storagecluster-ceph-rbd storageID: id: 4b746a22-e8d8-4a9c-8f3e-cc1d50e6c64f - accessModes: - ReadWriteOnce conditions: - lastTransitionTime: "2023-10-18T11:57:25Z" message: PVC in the VolumeReplicationGroup is ready for use observedGeneration: 1 reason: Ready status: "True" type: DataReady - lastTransitionTime: "2023-10-18T11:57:18Z" message: 'Done uploading PV/PVC cluster data to 2 of 2 S3 profile(s): [s3profile-amagrawa-c1-14oct-ocs-storagecluster s3profile-amagrawa-c2-14oct-ocs-storagecluster]' observedGeneration: 1 reason: Uploaded status: "True" type: ClusterDataProtected - lastTransitionTime: "2023-10-18T11:57:25Z" message: PVC in the VolumeReplicationGroup is ready for use observedGeneration: 1 reason: Replicating status: "False" type: DataProtected csiProvisioner: openshift-storage.rbd.csi.ceph.com labels: app.kubernetes.io/instance: busybox-workloads-2-amagrawa-c2-14oct appname: busybox name: dd-io-pvc-6 replicationID: id: 433f6b6b47ccde08274e4a6ae1af38e44f2a435 modes: - Failover resources: requests: storage: 129Gi storageClassName: ocs-storagecluster-ceph-rbd storageID: id: 4b746a22-e8d8-4a9c-8f3e-cc1d50e6c64f - accessModes: - ReadWriteOnce conditions: - lastTransitionTime: "2023-10-18T11:58:10Z" message: PVC in the VolumeReplicationGroup is ready for use observedGeneration: 1 reason: Ready status: "True" type: DataReady - lastTransitionTime: "2023-10-18T11:57:29Z" message: 'Done uploading PV/PVC cluster data to 2 of 2 S3 profile(s): [s3profile-amagrawa-c1-14oct-ocs-storagecluster s3profile-amagrawa-c2-14oct-ocs-storagecluster]' observedGeneration: 1 reason: Uploaded status: "True" type: ClusterDataProtected - lastTransitionTime: "2023-10-18T11:57:28Z" message: PVC in the VolumeReplicationGroup is ready for use observedGeneration: 1 reason: Replicating status: "False" type: DataProtected csiProvisioner: openshift-storage.rbd.csi.ceph.com labels: app.kubernetes.io/instance: busybox-workloads-2-amagrawa-c2-14oct appname: busybox name: dd-io-pvc-7 replicationID: id: 433f6b6b47ccde08274e4a6ae1af38e44f2a435 modes: - Failover resources: requests: storage: 149Gi storageClassName: ocs-storagecluster-ceph-rbd storageID: id: 4b746a22-e8d8-4a9c-8f3e-cc1d50e6c64f state: Primary kind: List metadata: resourceVersion: "" amagrawa:~$ oc get vrg -o yaml | grep name name: busybox-workloads-2-placement-drpc namespace: busybox-workloads-2 name: 1391f6e15d7df48686076b110a7dea069a2484056ed94fbce4b2b1bf0562e8a6-busybox-workloads-2-placement-drpc-busybox-workloads-2-vrg-mw appname: busybox appname: busybox name: dd-io-pvc-1 appname: busybox name: dd-io-pvc-2 appname: busybox name: dd-io-pvc-3 appname: busybox name: dd-io-pvc-4 appname: busybox name: dd-io-pvc-5 appname: busybox name: dd-io-pvc-6 appname: busybox name: dd-io-pvc-7 amagrawa:~$ oc get vrg -o yaml | grep sync async: amagrawa:~$ oc get vrg -o yaml | grep Sync volSync: {} Expected results: Volume replication health should show Critical post failover while older primary cluster is still down and raises VolumeSynchronizationDelay alert for FailedOver apps Additional info:
Moving the bug to 4.14.4 as we are doing a quick 4.14.3 to include a critical fix at RGW (2254303) before to shutdown
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Red Hat OpenShift Data Foundation 4.14.4 Bug Fix Update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2024:0315
The needinfo request[s] on this closed bug have been removed as they have been unresolved for 120 days