Bug 2311790 - [RDR] CephFS Consistency Group: Applications not getting DR protected
Summary: [RDR] CephFS Consistency Group: Applications not getting DR protected
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat OpenShift Data Foundation
Classification: Red Hat Storage
Component: odf-dr
Version: 4.17
Hardware: Unspecified
OS: Unspecified
unspecified
medium
Target Milestone: ---
: ODF 4.17.0
Assignee: Benamar Mekhissi
QA Contact: Sidhant Agrawal
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2024-09-11 19:25 UTC by Sidhant Agrawal
Modified: 2025-02-28 04:25 UTC (History)
4 users (show)

Fixed In Version: 4.17.0-105
Doc Type: No Doc Update
Doc Text:
Clone Of:
Environment:
Last Closed: 2024-10-30 14:35:15 UTC
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github RamenDR ramen pull 1554 0 None open Upgrade ramen to latest snapshotter crd and enforce VGS name length limit 2024-09-12 18:46:58 UTC
Github red-hat-storage ramen pull 355 0 None open Bug 2311790: Upgrade ramen to latest snapshotter crd and enforce VGS name length limit 2024-09-17 13:39:17 UTC
Red Hat Issue Tracker OCSBZM-9224 0 None None None 2024-09-11 19:26:00 UTC
Red Hat Product Errata RHSA-2024:8676 0 None None None 2024-10-30 14:35:17 UTC

Description Sidhant Agrawal 2024-09-11 19:25:07 UTC
Description of problem (please be detailed as possible and provide log
snippests):
In an OCP,ODF 4.17 RDR setup with OCP feature gate enabled, workloads that use consistency groups are not getting DR protected.

Version of all relevant components (if applicable):
OCP: 4.17.0-0.nightly-2024-09-09-120947
ODF: 4.17.0-97
ACM: 2.12.0-69 (2.12.0-DOWNSTREAM-2024-09-04-21-14-10)
Submariner: 0.18.0 (Globalnet enabled)
VolSync: 0.10.0

Does this issue impact your ability to continue to work with the product
(please explain in detail what is the user impact)?
Yes, DR protection for workloads using consistency group is not functioning.

Is there any workaround available to the best of your knowledge?
No

Rate from 1 - 5 the complexity of the scenario you performed that caused this
bug (1 - very simple, 5 - very complex)?
3

Can this issue reproducible?
Yes

Can this issue reproduce from the UI?


If this is a regression, please provide more details to justify this:


Steps to Reproduce:
1. Deploy RDR setup
2. Enable required feature gate in OCP
3. Deploy CephFS based workload that uses consistency group (ApplicationSet-Pull model)
   Workload with four PVCs, grouped into two Consistency Groups: cg1 and cg2. The Consistency Group designation was managed through the label `ramendr.openshift.io/consistency-group` on the PVCs

   NAME               LABELS
busybox-cg1-pvc1   map[app.kubernetes.io/instance:busybox-cg-sagrawal-mc1 appname:busybox ramendr.openshift.io/consistency-group:cg1]
busybox-cg1-pvc2   map[app.kubernetes.io/instance:busybox-cg-sagrawal-mc1 appname:busybox ramendr.openshift.io/consistency-group:cg1]
busybox-cg2-pvc1   map[app.kubernetes.io/instance:busybox-cg-sagrawal-mc1 appname:busybox ramendr.openshift.io/consistency-group:cg2]
busybox-cg2-pvc2   map[app.kubernetes.io/instance:busybox-cg-sagrawal-mc1 appname:busybox ramendr.openshift.io/consistency-group:cg2]

4. Enable DR protection via UI using current documented steps
5. Edit the DRPC resource and add the annotation drplacementcontrol.ramendr.openshift.io/is-cg-enabled: "true"
6. Delete the existing ReplicationSource/Destination resources on managed clusters, so that they can be recreated for CG
7. Observe that the DRPC Protected condition remains False and WorkloadUnprotected alert is displayed in the UI. ReplicationSource resources are not getting created.

Actual results:
Workload that use a consistency group are not getting DR protected.

Expected results:
Workload that use a consistency group should get DR protected.

Additional info:

DRPC:
---
- apiVersion: ramendr.openshift.io/v1alpha1
  kind: DRPlacementControl
  metadata:
    annotations:
      drplacementcontrol.ramendr.openshift.io/app-namespace: busybox-cg
      drplacementcontrol.ramendr.openshift.io/is-cg-enabled: "true"
      drplacementcontrol.ramendr.openshift.io/last-app-deployment-cluster: sagrawal-mc1
    creationTimestamp: "2024-09-11T11:20:38Z"
    finalizers:
    - drpc.ramendr.openshift.io/finalizer
    generation: 1
    labels:
      cluster.open-cluster-management.io/backup: ramen
    name: busybox-cg-placement-drpc
    namespace: openshift-gitops
    ownerReferences:
    - apiVersion: cluster.open-cluster-management.io/v1beta1
      blockOwnerDeletion: true
      controller: true
      kind: Placement
      name: busybox-cg-placement
      uid: b7454e6e-ae92-4eeb-bef2-4f7daf29b86f
    resourceVersion: "1576451"
    uid: 72389007-f691-448a-afd2-e50fbb180a5b
  spec:
    drPolicyRef:
      apiVersion: ramendr.openshift.io/v1alpha1
      kind: DRPolicy
      name: odr-policy-5m
    placementRef:
      apiVersion: cluster.open-cluster-management.io/v1beta1
      kind: Placement
      name: busybox-cg-placement
      namespace: openshift-gitops
    preferredCluster: sagrawal-mc1
    pvcSelector:
      matchExpressions:
      - key: appname
        operator: In
        values:
        - busybox
  status:
    actionDuration: 51.134894505s
    actionStartTime: "2024-09-11T11:20:47Z"
    conditions:
    - lastTransitionTime: "2024-09-11T11:20:38Z"
      message: Initial deployment completed
      observedGeneration: 1
      reason: Deployed
      status: "True"
      type: Available
    - lastTransitionTime: "2024-09-11T11:20:38Z"
      message: Ready
      observedGeneration: 1
      reason: Success
      status: "True"
      type: PeerReady
    - lastTransitionTime: "2024-09-11T13:05:12Z"
      message: VolumeReplicationGroup (busybox-cg/busybox-cg-placement-drpc) on cluster
        sagrawal-mc1 is progressing on protecting workload data (Not all VolSync PVCs
        are protected), retrying till DataProtected condition is met
      observedGeneration: 1
      reason: Progressing
      status: "False"
      type: Protected
    lastGroupSyncDuration: 12.672310659s
    lastGroupSyncTime: "2024-09-11T15:05:12Z"
    lastUpdateTime: "2024-09-11T15:05:17Z"
    observedGeneration: 1
    phase: Deployed
    preferredDecision:
      clusterName: sagrawal-mc1
      clusterNamespace: sagrawal-mc1
    progression: Completed
    resourceConditions:
      conditions:
      - lastTransitionTime: "2024-09-11T11:37:36Z"
        message: All VolSync PVCs are ready
        observedGeneration: 1
        reason: Ready
        status: "True"
        type: DataReady
      - lastTransitionTime: "2024-09-11T13:05:01Z"
        message: Not all VolSync PVCs are protected
        observedGeneration: 1
        reason: Progressing
        status: "False"
        type: DataProtected
      - lastTransitionTime: "2024-09-11T11:20:38Z"
        message: Nothing to restore
        observedGeneration: 1
        reason: Restored
        status: "True"
        type: ClusterDataReady
      - lastTransitionTime: "2024-09-11T13:05:01Z"
        message: Not all VolSync PVCs are protected
        observedGeneration: 1
        reason: Progressing
        status: "False"
        type: ClusterDataProtected
      resourceMeta:
        generation: 1
        kind: VolumeReplicationGroup
        name: busybox-cg-placement-drpc
        namespace: busybox-cg
        protectedpvcs:
        - busybox-cg2-pvc2
        - busybox-cg1-pvc2
        - busybox-cg2-pvc1
        - busybox-cg1-pvc1
        resourceVersion: "2035952"

Comment 4 Sunil Kumar Acharya 2024-09-18 12:06:54 UTC
Please update the RDT flag/text appropriately.

Comment 7 errata-xmlrpc 2024-10-30 14:35:15 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Important: Red Hat OpenShift Data Foundation 4.17.0 Security, Enhancement, & Bug Fix Update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2024:8676

Comment 8 Red Hat Bugzilla 2025-02-28 04:25:36 UTC
The needinfo request[s] on this closed bug have been removed as they have been unresolved for 120 days


Note You need to log in before you can comment on or make changes to this bug.