Bug 2283994

Summary: [RDR] Message in WorkloadUnprotected warning alert is misleading when lastGroupSyncTime is reset after failover
Product: [Red Hat Storage] Red Hat OpenShift Data Foundation Reporter: Aman Agrawal <amagrawa>
Component: odf-drAssignee: rakesh-gm <rgowdege>
odf-dr sub component: ramen QA Contact: Aman Agrawal <amagrawa>
Status: CLOSED ERRATA Docs Contact:
Severity: low    
Priority: unspecified CC: ebenahar, muagarwa, rgowdege
Version: 4.16   
Target Milestone: ---   
Target Release: ODF 4.17.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: 4.17.0-94 Doc Type: No Doc Update
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2024-10-30 14:28:03 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Aman Agrawal 2024-05-30 15:00:04 UTC
Description of problem (please be detailed as possible and provide log
snippests):

Version of all relevant components (if applicable):

OCP 4.16.0-0.nightly-2024-05-23-173505

ACM 2.11.0-DOWNSTREAM-2024-05-23-15-16-26

MCE 2.6.0-104 

ODF 4.16.0-108.stable

Gitops v1.12.3 

Platform- VMware


Does this issue impact your ability to continue to work with the product
(please explain in detail what is the user impact)?


Is there any workaround available to the best of your knowledge?


Rate from 1 - 5 the complexity of the scenario you performed that caused this
bug (1 - very simple, 5 - very complex)?


Can this issue reproducible?


Can this issue reproduce from the UI?


If this is a regression, please provide more details to justify this:


Steps to Reproduce:
1. Perform failover on a DR protected workload and when lastGroupSyncTime for the failedover workload is reset, check the drpcyaml for the failedover workload and also check ACM console (alert menu) or DR monitoring dashboard for WorkloadUnprotected warning alert and it's message.
2.
3.


Actual results:
When the below warning alert is fired, the drcyaml for the failedover workload looks like this:

alert message-

WorkloadUnprotected
 Warning
Workload is not protected for disaster recovery (DRPC: cephfs-sub-busybox16-placement-1-drpc, Namespace: busybox-workloads-16).


drpc yaml-

- apiVersion: ramendr.openshift.io/v1alpha1
  kind: DRPlacementControl
  metadata:
    annotations:
      drplacementcontrol.ramendr.openshift.io/app-namespace: busybox-workloads-12
      drplacementcontrol.ramendr.openshift.io/last-app-deployment-cluster: amagrawa-c1-28my
    creationTimestamp: "2024-05-30T12:13:39Z"
    finalizers:
    - drpc.ramendr.openshift.io/finalizer
    generation: 2
    labels:
      cluster.open-cluster-management.io/backup: ramen
      velero.io/backup-name: acm-resources-generic-schedule-20240530120055
      velero.io/restore-name: restore-acm-acm-resources-generic-schedule-20240530120055
    name: cephfs-appset-busybox12-placement-drpc
    namespace: openshift-gitops
    ownerReferences:
    - apiVersion: cluster.open-cluster-management.io/v1beta1
      blockOwnerDeletion: true
      controller: true
      kind: Placement
      name: cephfs-appset-busybox12-placement
      uid: 731b6f12-f1f7-471d-a81d-36451148625d
    resourceVersion: "2866901"
    uid: c531eb86-b9a4-428a-b58e-fc3d00281cc7
  spec:
    action: Failover
    drPolicyRef:
      apiVersion: ramendr.openshift.io/v1alpha1
      kind: DRPolicy
      name: my-drpolicy-5
    failoverCluster: amagrawa-c1-28my
    placementRef:
      apiVersion: cluster.open-cluster-management.io/v1beta1
      kind: Placement
      name: cephfs-appset-busybox12-placement
      namespace: openshift-gitops
    preferredCluster: amagrawa-c2-my28
    pvcSelector:
      matchLabels:
        appname: busybox_app3_cephfs
  status:
    actionDuration: 2m52.635660435s
    actionStartTime: "2024-05-30T14:34:57Z"
    conditions:
    - lastTransitionTime: "2024-05-30T14:35:15Z"
      message: Completed
      observedGeneration: 2
      reason: FailedOver
      status: "True"
      type: Available
    - lastTransitionTime: "2024-05-30T14:37:49Z"
      message: Ready
      observedGeneration: 2
      reason: Success
      status: "True"
      type: PeerReady
    - lastTransitionTime: "2024-05-30T14:37:15Z"
      message: VolumeReplicationGroup (busybox-workloads-12/cephfs-appset-busybox12-placement-drpc)
        on cluster amagrawa-c1-28my is progressing on protecting workload data (Not
        all VolSync PVCs are protected), retrying till DataProtected condition is
        met
      observedGeneration: 2
      reason: Progressing
      status: "False"
      type: Protected
    lastUpdateTime: "2024-05-30T14:47:49Z"
    observedGeneration: 2
    phase: FailedOver
    preferredDecision:
      clusterName: amagrawa-c2-my28
      clusterNamespace: amagrawa-c2-my28
    progression: Completed
    resourceConditions:
      conditions:
      - lastTransitionTime: "2024-05-30T14:37:16Z"
        message: All VolSync PVCs are ready
        observedGeneration: 4
        reason: Ready
        status: "True"
        type: DataReady
      - lastTransitionTime: "2024-05-30T14:37:16Z"
        message: Not all VolSync PVCs are protected
        observedGeneration: 4
        reason: Progressing
        status: "False"
        type: DataProtected
      - lastTransitionTime: "2024-05-30T14:37:16Z"
        message: Not all VolSync PVCs are protected
        observedGeneration: 4
        reason: Progressing
        status: "False"
        type: ClusterDataProtected
      - lastTransitionTime: "2024-05-30T14:37:15Z"
        message: Nothing to restore
        observedGeneration: 4
        reason: Restored
        status: "True"
        type: ClusterDataReady
      resourceMeta:
        generation: 4
        kind: VolumeReplicationGroup
        name: cephfs-appset-busybox12-placement-drpc
        namespace: busybox-workloads-12
        protectedpvcs:
        - busybox-pvc-1

However, the text here is misleading because the workload is already DR protected (applied to a DR policy) and a DR operation could be performed on the workload which is failover/relocate.


Expected results: The message needs to be re-phrased on the WorkloadUnprotected alert to make it more meaningful when a failover operation is performed. 


Additional info:

Comment 7 Sunil Kumar Acharya 2024-09-18 12:06:54 UTC
Please update the RDT flag/text appropriately.

Comment 13 errata-xmlrpc 2024-10-30 14:28:03 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Important: Red Hat OpenShift Data Foundation 4.17.0 Security, Enhancement, & Bug Fix Update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2024:8676

Comment 14 Red Hat Bugzilla 2025-02-28 04:25:18 UTC
The needinfo request[s] on this closed bug have been removed as they have been unresolved for 120 days