2283994 – [RDR] Message in WorkloadUnprotected warning alert is misleading when lastGroupSyncTime is reset after failover

Bug 2283994 - [RDR] Message in WorkloadUnprotected warning alert is misleading when lastGroupSyncTime is reset after failover

Summary: [RDR] Message in WorkloadUnprotected warning alert is misleading when lastGro...

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat OpenShift Data Foundation
Classification:	Red Hat Storage
Component:	odf-dr
Sub Component:
Version:	4.16
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	low
Target Milestone:	---
Target Release:	ODF 4.17.0
Assignee:	rakesh-gm
QA Contact:	Aman Agrawal
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2024-05-30 15:00 UTC by Aman Agrawal
Modified:	2025-02-28 04:25 UTC (History)
CC List:	3 users (show)
Fixed In Version:	4.17.0-94
Doc Type:	No Doc Update
Doc Text:
Clone Of:
Environment:
Last Closed:	2024-10-30 14:28:03 UTC
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Github	red-hat-storage ramen pull 346	0	None	open	Bug 2283994: rephrase alerts message	2024-09-02 12:22:41 UTC
Red Hat Product Errata	RHSA-2024:8676	0	None	None	None	2024-10-30 14:28:04 UTC

Description Aman Agrawal 2024-05-30 15:00:04 UTC

Description of problem (please be detailed as possible and provide log
snippests):

Version of all relevant components (if applicable):

OCP 4.16.0-0.nightly-2024-05-23-173505

ACM 2.11.0-DOWNSTREAM-2024-05-23-15-16-26

MCE 2.6.0-104 

ODF 4.16.0-108.stable

Gitops v1.12.3 

Platform- VMware


Does this issue impact your ability to continue to work with the product
(please explain in detail what is the user impact)?


Is there any workaround available to the best of your knowledge?


Rate from 1 - 5 the complexity of the scenario you performed that caused this
bug (1 - very simple, 5 - very complex)?


Can this issue reproducible?


Can this issue reproduce from the UI?


If this is a regression, please provide more details to justify this:


Steps to Reproduce:
1. Perform failover on a DR protected workload and when lastGroupSyncTime for the failedover workload is reset, check the drpcyaml for the failedover workload and also check ACM console (alert menu) or DR monitoring dashboard for WorkloadUnprotected warning alert and it's message.
2.
3.


Actual results:
When the below warning alert is fired, the drcyaml for the failedover workload looks like this:

alert message-

WorkloadUnprotected
 Warning
Workload is not protected for disaster recovery (DRPC: cephfs-sub-busybox16-placement-1-drpc, Namespace: busybox-workloads-16).


drpc yaml-

- apiVersion: ramendr.openshift.io/v1alpha1
  kind: DRPlacementControl
  metadata:
    annotations:
      drplacementcontrol.ramendr.openshift.io/app-namespace: busybox-workloads-12
      drplacementcontrol.ramendr.openshift.io/last-app-deployment-cluster: amagrawa-c1-28my
    creationTimestamp: "2024-05-30T12:13:39Z"
    finalizers:
    - drpc.ramendr.openshift.io/finalizer
    generation: 2
    labels:
      cluster.open-cluster-management.io/backup: ramen
      velero.io/backup-name: acm-resources-generic-schedule-20240530120055
      velero.io/restore-name: restore-acm-acm-resources-generic-schedule-20240530120055
    name: cephfs-appset-busybox12-placement-drpc
    namespace: openshift-gitops
    ownerReferences:
    - apiVersion: cluster.open-cluster-management.io/v1beta1
      blockOwnerDeletion: true
      controller: true
      kind: Placement
      name: cephfs-appset-busybox12-placement
      uid: 731b6f12-f1f7-471d-a81d-36451148625d
    resourceVersion: "2866901"
    uid: c531eb86-b9a4-428a-b58e-fc3d00281cc7
  spec:
    action: Failover
    drPolicyRef:
      apiVersion: ramendr.openshift.io/v1alpha1
      kind: DRPolicy
      name: my-drpolicy-5
    failoverCluster: amagrawa-c1-28my
    placementRef:
      apiVersion: cluster.open-cluster-management.io/v1beta1
      kind: Placement
      name: cephfs-appset-busybox12-placement
      namespace: openshift-gitops
    preferredCluster: amagrawa-c2-my28
    pvcSelector:
      matchLabels:
        appname: busybox_app3_cephfs
  status:
    actionDuration: 2m52.635660435s
    actionStartTime: "2024-05-30T14:34:57Z"
    conditions:
    - lastTransitionTime: "2024-05-30T14:35:15Z"
      message: Completed
      observedGeneration: 2
      reason: FailedOver
      status: "True"
      type: Available
    - lastTransitionTime: "2024-05-30T14:37:49Z"
      message: Ready
      observedGeneration: 2
      reason: Success
      status: "True"
      type: PeerReady
    - lastTransitionTime: "2024-05-30T14:37:15Z"
      message: VolumeReplicationGroup (busybox-workloads-12/cephfs-appset-busybox12-placement-drpc)
        on cluster amagrawa-c1-28my is progressing on protecting workload data (Not
        all VolSync PVCs are protected), retrying till DataProtected condition is
        met
      observedGeneration: 2
      reason: Progressing
      status: "False"
      type: Protected
    lastUpdateTime: "2024-05-30T14:47:49Z"
    observedGeneration: 2
    phase: FailedOver
    preferredDecision:
      clusterName: amagrawa-c2-my28
      clusterNamespace: amagrawa-c2-my28
    progression: Completed
    resourceConditions:
      conditions:
      - lastTransitionTime: "2024-05-30T14:37:16Z"
        message: All VolSync PVCs are ready
        observedGeneration: 4
        reason: Ready
        status: "True"
        type: DataReady
      - lastTransitionTime: "2024-05-30T14:37:16Z"
        message: Not all VolSync PVCs are protected
        observedGeneration: 4
        reason: Progressing
        status: "False"
        type: DataProtected
      - lastTransitionTime: "2024-05-30T14:37:16Z"
        message: Not all VolSync PVCs are protected
        observedGeneration: 4
        reason: Progressing
        status: "False"
        type: ClusterDataProtected
      - lastTransitionTime: "2024-05-30T14:37:15Z"
        message: Nothing to restore
        observedGeneration: 4
        reason: Restored
        status: "True"
        type: ClusterDataReady
      resourceMeta:
        generation: 4
        kind: VolumeReplicationGroup
        name: cephfs-appset-busybox12-placement-drpc
        namespace: busybox-workloads-12
        protectedpvcs:
        - busybox-pvc-1

However, the text here is misleading because the workload is already DR protected (applied to a DR policy) and a DR operation could be performed on the workload which is failover/relocate.


Expected results: The message needs to be re-phrased on the WorkloadUnprotected alert to make it more meaningful when a failover operation is performed. 


Additional info:

Comment 7 Sunil Kumar Acharya 2024-09-18 12:06:54 UTC

Please update the RDT flag/text appropriately.

Comment 13 errata-xmlrpc 2024-10-30 14:28:03 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Important: Red Hat OpenShift Data Foundation 4.17.0 Security, Enhancement, & Bug Fix Update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2024:8676

Comment 14 Red Hat Bugzilla 2025-02-28 04:25:18 UTC

The needinfo request[s] on this closed bug have been removed as they have been unresolved for 120 days

Note You need to log in before you can comment on or make changes to this bug.