Bug 2311893 - [RDR] Disable DR stuck when using drpolicy without flattening with for a cloned/snapshot pvc based workload
Summary: [RDR] Disable DR stuck when using drpolicy without flattening with for a clo...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat OpenShift Data Foundation
Classification: Red Hat Storage
Component: csi-addons
Version: 4.17
Hardware: Unspecified
OS: Unspecified
unspecified
high
Target Milestone: ---
: ODF 4.17.0
Assignee: Rakshith
QA Contact: Aman Agrawal
URL:
Whiteboard:
Depends On:
Blocks: 2307909
TreeView+ depends on / blocked
 
Reported: 2024-09-12 10:37 UTC by Rakshith
Modified: 2024-10-30 14:35 UTC (History)
4 users (show)

Fixed In Version: 4.17.0-101
Doc Type: No Doc Update
Doc Text:
Clone Of:
Environment:
Last Closed: 2024-10-30 14:35:26 UTC
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github csi-addons kubernetes-csi-addons pull 664 0 None Merged replication: add new Validated condition 2024-09-13 10:50:28 UTC
Github red-hat-storage kubernetes-csi-addons pull 196 0 None Merged BUG 2311893: replication: add new Validated condition 2024-09-13 10:50:32 UTC
Red Hat Issue Tracker OCSBZM-9238 0 None None None 2024-09-12 10:38:27 UTC
Red Hat Product Errata RHSA-2024:8676 0 None None None 2024-10-30 14:35:31 UTC

Description Rakshith 2024-09-12 10:37:24 UTC
This bug was initially created as a copy of Bug #2307909

I am copying this bug because: 

A fix needs to go into csi addons to enable ramen to process deletion of DRPC
when wrong DRPolicy is selected.


Description of problem (please be detailed as possible and provide log
snippests):


Version of all relevant components (if applicable):
ODF 4.17.0-77
ACM 2.12.0-DOWNSTREAM-2024-08-17-01-53-31


Does this issue impact your ability to continue to work with the product
(please explain in detail what is the user impact)?


Is there any workaround available to the best of your knowledge?


Rate from 1 - 5 the complexity of the scenario you performed that caused this
bug (1 - very simple, 5 - very complex)?


Can this issue reproducible?


Can this issue reproduce from the UI?


If this is a regression, please provide more details to justify this:


Steps to Reproduce:
1. Deploy a RDR workload (appset pull model based)
2. Create clone of the PVC
3. Delete the parent workload
4. Deploy another pod consuming the cloned pvc in the same NS (appset pull model based)
5. Create a DRPolicy without flattening enabled
6. Apply this DRPolicy to this new workload with cloned PVC and wait for sync to start
7. If sync doesn't start, delete the drpc and wait for deletion to complete. Deletion remains stuck for me.


- apiVersion: ramendr.openshift.io/v1alpha1
  kind: DRPlacementControl
  metadata:
    annotations:
      drplacementcontrol.ramendr.openshift.io/app-namespace: busybox-workloads-3
      drplacementcontrol.ramendr.openshift.io/last-app-deployment-cluster: prsurve-ci
    creationTimestamp: "2024-08-26T13:22:26Z"
    deletionGracePeriodSeconds: 0
    deletionTimestamp: "2024-08-26T13:47:01Z"
    finalizers:
    - drpc.ramendr.openshift.io/finalizer
    generation: 2
    labels:
      cluster.open-cluster-management.io/backup: ramen
    name: rbd-appset-busybox1-cloned-placement-drpc
    namespace: openshift-gitops
    ownerReferences:
    - apiVersion: cluster.open-cluster-management.io/v1beta1
      blockOwnerDeletion: true
      controller: true
      kind: Placement
      name: rbd-appset-busybox1-cloned-placement
      uid: 72c8f3da-c7f3-4cd8-a6e3-598a9746bdd2
    resourceVersion: "9151956"
    uid: ff5863eb-67dc-45a8-9fbf-aa054a3c6e85
  spec:
    drPolicyRef:
      apiVersion: ramendr.openshift.io/v1alpha1
      kind: DRPolicy
      name: my-drpolicy-5-normal
    placementRef:
      apiVersion: cluster.open-cluster-management.io/v1beta1
      kind: Placement
      name: rbd-appset-busybox1-cloned-placement
      namespace: openshift-gitops
    preferredCluster: prsurve-ci
    pvcSelector:
      matchLabels:
        app: test
  status:
    actionDuration: 21.03528906s
    actionStartTime: "2024-08-26T13:22:35Z"
    conditions:
    - lastTransitionTime: "2024-08-26T13:22:26Z"
      message: Initial deployment completed
      observedGeneration: 1
      reason: Deployed
      status: "True"
      type: Available
    - lastTransitionTime: "2024-08-26T13:22:26Z"
      message: Ready
      observedGeneration: 1
      reason: Success
      status: "True"
      type: PeerReady
    - lastTransitionTime: "2024-08-26T13:22:29Z"
      message: VolumeReplicationGroup (busybox-workloads-3/rbd-appset-busybox1-cloned-placement-drpc)
        on cluster prsurve-ci is reporting errors (All PVCs of the VolumeReplicationGroup
        are not ready) readying workload data, retrying till DataReady condition is
        met
      observedGeneration: 1
      reason: Error
      status: "False"
      type: Protected
    lastUpdateTime: "2024-08-26T13:22:56Z"
    observedGeneration: 2
    phase: Deleting
    preferredDecision:
      clusterName: prsurve-ci
      clusterNamespace: prsurve-ci
    progression: Deleting
    resourceConditions:
      conditions:
      - lastTransitionTime: "2024-08-26T13:22:28Z"
        message: All PVCs of the VolumeReplicationGroup are not ready
        observedGeneration: 1
        reason: Error
        status: "False"
        type: DataReady
      - lastTransitionTime: "2024-08-26T13:22:28Z"
        message: All PVCs of the VolumeReplicationGroup are not ready
        observedGeneration: 1
        reason: Error
        status: "False"
        type: DataProtected
      - lastTransitionTime: "2024-08-26T13:22:27Z"
        message: Nothing to restore
        observedGeneration: 1
        reason: Restored
        status: "True"
        type: ClusterDataReady
      - lastTransitionTime: "2024-08-26T13:22:28Z"
        message: Cluster data of all PVs are protected. Kube objects protected
        observedGeneration: 1
        reason: Uploaded
        status: "True"
        type: ClusterDataProtected
      resourceMeta:
        generation: 1
        kind: VolumeReplicationGroup
        name: rbd-appset-busybox1-cloned-placement-drpc
        namespace: busybox-workloads-3
        protectedpvcs:
        - busybox-pvc-41-clone
        resourceVersion: "11450267"
kind: List
metadata:
  resourceVersion: ""


Actual results: Sync doesn't start for a cloned/snapshot pvc based workload when drpolicy without flattening is assigned to it


DRPCyaml-

- apiVersion: ramendr.openshift.io/v1alpha1
  kind: DRPlacementControl
  metadata:
    annotations:
      drplacementcontrol.ramendr.openshift.io/app-namespace: busybox-workloads-3
      drplacementcontrol.ramendr.openshift.io/last-app-deployment-cluster: prsurve-ci
    creationTimestamp: "2024-08-26T13:22:26Z"
    finalizers:
    - drpc.ramendr.openshift.io/finalizer
    generation: 1
    labels:
      cluster.open-cluster-management.io/backup: ramen
    name: rbd-appset-busybox1-cloned-placement-drpc
    namespace: openshift-gitops
    ownerReferences:
    - apiVersion: cluster.open-cluster-management.io/v1beta1
      blockOwnerDeletion: true
      controller: true
      kind: Placement
      name: rbd-appset-busybox1-cloned-placement
      uid: 72c8f3da-c7f3-4cd8-a6e3-598a9746bdd2
    resourceVersion: "9130883"
    uid: ff5863eb-67dc-45a8-9fbf-aa054a3c6e85
  spec:
    drPolicyRef:
      apiVersion: ramendr.openshift.io/v1alpha1
      kind: DRPolicy
      name: my-drpolicy-5-normal
    placementRef:
      apiVersion: cluster.open-cluster-management.io/v1beta1
      kind: Placement
      name: rbd-appset-busybox1-cloned-placement
      namespace: openshift-gitops
    preferredCluster: prsurve-ci
    pvcSelector:
      matchLabels:
        app: test
  status:
    actionDuration: 21.03528906s
    actionStartTime: "2024-08-26T13:22:35Z"
    conditions:
    - lastTransitionTime: "2024-08-26T13:22:26Z"
      message: Initial deployment completed
      observedGeneration: 1
      reason: Deployed
      status: "True"
      type: Available
    - lastTransitionTime: "2024-08-26T13:22:26Z"
      message: Ready
      observedGeneration: 1
      reason: Success
      status: "True"
      type: PeerReady
    - lastTransitionTime: "2024-08-26T13:22:29Z"
      message: VolumeReplicationGroup (busybox-workloads-3/rbd-appset-busybox1-cloned-placement-drpc)
        on cluster prsurve-ci is reporting errors (All PVCs of the VolumeReplicationGroup
        are not ready) readying workload data, retrying till DataReady condition is
        met
      observedGeneration: 1
      reason: Error
      status: "False"
      type: Protected
    lastUpdateTime: "2024-08-26T13:22:56Z"
    observedGeneration: 1
    phase: Deployed
    preferredDecision:
      clusterName: prsurve-ci
      clusterNamespace: prsurve-ci
    progression: Completed
    resourceConditions:
      conditions:
      - lastTransitionTime: "2024-08-26T13:22:28Z"
        message: All PVCs of the VolumeReplicationGroup are not ready
        observedGeneration: 1
        reason: Error
        status: "False"
        type: DataReady
      - lastTransitionTime: "2024-08-26T13:22:28Z"
        message: All PVCs of the VolumeReplicationGroup are not ready
        observedGeneration: 1
        reason: Error
        status: "False"
        type: DataProtected
      - lastTransitionTime: "2024-08-26T13:22:27Z"
        message: Nothing to restore
        observedGeneration: 1
        reason: Restored
        status: "True"
        type: ClusterDataReady
      - lastTransitionTime: "2024-08-26T13:22:28Z"
        message: Cluster data of all PVs are protected. Kube objects protected
        observedGeneration: 1
        reason: Uploaded
        status: "True"
        type: ClusterDataProtected
      resourceMeta:
        generation: 1
        kind: VolumeReplicationGroup
        name: rbd-appset-busybox1-cloned-placement-drpc
        namespace: busybox-workloads-3
        protectedpvcs:
        - busybox-pvc-41-clone
        resourceVersion: "11450267"
kind: List
metadata:
  resourceVersion: ""


DRPC-

openshift-gitops      rbd-appset-busybox1-cloned-placement-drpc   16m     prsurve-ci                                          Deployed       Completed     2024-08-26T13:22:35Z   21.03528906s    


C1 primary-

busybox-3
Already on project "busybox-workloads-3" on server "https://api.prsurve-ci.qe.rh-ocs.com:6443".
NAME                                         STATUS   VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLASS                  VOLUMEATTRIBUTESCLASS   AGE   VOLUMEMODE
persistentvolumeclaim/busybox-pvc-41-clone   Bound    pvc-47e5b565-51eb-432c-b9fb-054d741627ae   42Gi       RWO            ocs-storagecluster-ceph-rbd   <unset>                 26m   Filesystem

NAME                                                                      AGE   VOLUMEREPLICATIONCLASS                  PVCNAME                DESIREDSTATE   CURRENTSTATE
volumereplication.replication.storage.openshift.io/busybox-pvc-41-clone   16m   rbd-volumereplicationclass-1625360775   busybox-pvc-41-clone   primary        Unknown

NAME                                                                                    DESIREDSTATE   CURRENTSTATE
volumereplicationgroup.ramendr.openshift.io/rbd-appset-busybox1-cloned-placement-drpc   primary        Unknown

NAME                              READY   STATUS    RESTARTS   AGE   IP            NODE        NOMINATED NODE   READINESS GATES
pod/busybox-41-5d7bd7c476-wnls4   1/1     Running   0          20m   10.129.2.85   compute-0   <none>           <none>



oc describe vr
Name:         busybox-pvc-41-clone
Namespace:    busybox-workloads-3
Labels:       ramendr.openshift.io/owner-name=rbd-appset-busybox1-cloned-placement-drpc
              ramendr.openshift.io/owner-namespace-name=busybox-workloads-3
Annotations:  <none>
API Version:  replication.storage.openshift.io/v1alpha1
Kind:         VolumeReplication
Metadata:
  Creation Timestamp:  2024-08-26T13:22:27Z
  Finalizers:
    replication.storage.openshift.io
  Generation:  1
  Owner References:
    API Version:           ramendr.openshift.io/v1alpha1
    Block Owner Deletion:  true
    Controller:            true
    Kind:                  VolumeReplicationGroup
    Name:                  rbd-appset-busybox1-cloned-placement-drpc
    UID:                   dc251368-36c9-4f4f-a829-f565907a7557
  Resource Version:        11450228
  UID:                     46cbda2d-e27f-4c2b-934a-e6d9c19f11ae
Spec:
  Auto Resync:  false
  Data Source:
    API Group:
    Kind:                    PersistentVolumeClaim
    Name:                    busybox-pvc-41-clone
  Replication Handle:
  Replication State:         primary
  Volume Replication Class:  rbd-volumereplicationclass-1625360775
Status:
  Conditions:
    Last Transition Time:  2024-08-26T13:22:27Z
    Message:
    Observed Generation:   1
    Reason:                FailedToPromote
    Status:                False
    Type:                  Completed
    Last Transition Time:  2024-08-26T13:22:27Z
    Message:
    Observed Generation:   1
    Reason:                Error
    Status:                True
    Type:                  Degraded
    Last Transition Time:  2024-08-26T13:22:27Z
    Message:
    Observed Generation:   1
    Reason:                NotResyncing
    Status:                False
    Type:                  Resyncing
  Message:                 system is not in a state required for the operation's execution: failed to enable mirroring on image "ocs-storagecluster-cephblockpool/csi-vol-65f92216-ad7b-4dfb-bef3-34cd363363c5": parent image "ocs-storagecluster-cephblockpool/csi-vol-65f92216-ad7b-4dfb-bef3-34cd363363c5-temp" is not enabled for mirroring
  Observed Generation:     1
  State:                   Unknown
Events:                    <none>




oc get vr -oyaml
apiVersion: v1
items:
- apiVersion: replication.storage.openshift.io/v1alpha1
  kind: VolumeReplication
  metadata:
    creationTimestamp: "2024-08-26T13:22:27Z"
    finalizers:
    - replication.storage.openshift.io
    generation: 1
    labels:
      ramendr.openshift.io/owner-name: rbd-appset-busybox1-cloned-placement-drpc
      ramendr.openshift.io/owner-namespace-name: busybox-workloads-3
    name: busybox-pvc-41-clone
    namespace: busybox-workloads-3
    ownerReferences:
    - apiVersion: ramendr.openshift.io/v1alpha1
      blockOwnerDeletion: true
      controller: true
      kind: VolumeReplicationGroup
      name: rbd-appset-busybox1-cloned-placement-drpc
      uid: dc251368-36c9-4f4f-a829-f565907a7557
    resourceVersion: "11450228"
    uid: 46cbda2d-e27f-4c2b-934a-e6d9c19f11ae
  spec:
    autoResync: false
    dataSource:
      apiGroup: ""
      kind: PersistentVolumeClaim
      name: busybox-pvc-41-clone
    replicationHandle: ""
    replicationState: primary
    volumeReplicationClass: rbd-volumereplicationclass-1625360775
  status:
    conditions:
    - lastTransitionTime: "2024-08-26T13:22:27Z"
      message: ""
      observedGeneration: 1
      reason: FailedToPromote
      status: "False"
      type: Completed
    - lastTransitionTime: "2024-08-26T13:22:27Z"
      message: ""
      observedGeneration: 1
      reason: Error
      status: "True"
      type: Degraded
    - lastTransitionTime: "2024-08-26T13:22:27Z"
      message: ""
      observedGeneration: 1
      reason: NotResyncing
      status: "False"
      type: Resyncing
    message: 'system is not in a state required for the operation''s execution: failed
      to enable mirroring on image "ocs-storagecluster-cephblockpool/csi-vol-65f92216-ad7b-4dfb-bef3-34cd363363c5":
      parent image "ocs-storagecluster-cephblockpool/csi-vol-65f92216-ad7b-4dfb-bef3-34cd363363c5-temp"
      is not enabled for mirroring'
    observedGeneration: 1
    state: Unknown
kind: List
metadata:
  resourceVersion: ""





oc describe vrg
Name:         rbd-appset-busybox1-cloned-placement-drpc
Namespace:    busybox-workloads-3
Labels:       <none>
Annotations:  drplacementcontrol.ramendr.openshift.io/destination-cluster: prsurve-ci
              drplacementcontrol.ramendr.openshift.io/do-not-delete-pvc:
              drplacementcontrol.ramendr.openshift.io/drpc-uid: ff5863eb-67dc-45a8-9fbf-aa054a3c6e85
              drplacementcontrol.ramendr.openshift.io/is-cg-enabled:
API Version:  ramendr.openshift.io/v1alpha1
Kind:         VolumeReplicationGroup
Metadata:
  Creation Timestamp:  2024-08-26T13:22:27Z
  Finalizers:
    volumereplicationgroups.ramendr.openshift.io/vrg-protection
  Generation:  1
  Owner References:
    API Version:     work.open-cluster-management.io/v1
    Kind:            AppliedManifestWork
    Name:            a886abc37b147c9bcb446cc55d8427d165d0a651db5133c8bebc9104e5ec8b1b-rbd-appset-busybox1-cloned-placement-drpc-busybox-workloads-3-vrg-mw
    UID:             7cb63ab5-bbbc-408c-85d7-4c1388653f46
  Resource Version:  11450267
  UID:               dc251368-36c9-4f4f-a829-f565907a7557
Spec:
  Async:
    Replication Class Selector:
    Scheduling Interval:  5m
    Volume Group Snapshot Class Selector:
    Volume Snapshot Class Selector:
  Pvc Selector:
    Match Labels:
      App:            test
  Replication State:  primary
  s3Profiles:
    s3profile-prsurve-ci-ocs-storagecluster
    s3profile-prsurve-vm-d-ocs-storagecluster
  Vol Sync:
Status:
  Conditions:
    Last Transition Time:  2024-08-26T13:22:28Z
    Message:               All PVCs of the VolumeReplicationGroup are not ready
    Observed Generation:   1
    Reason:                Error
    Status:                False
    Type:                  DataReady
    Last Transition Time:  2024-08-26T13:22:28Z
    Message:               All PVCs of the VolumeReplicationGroup are not ready
    Observed Generation:   1
    Reason:                Error
    Status:                False
    Type:                  DataProtected
    Last Transition Time:  2024-08-26T13:22:27Z
    Message:               Nothing to restore
    Observed Generation:   1
    Reason:                Restored
    Status:                True
    Type:                  ClusterDataReady
    Last Transition Time:  2024-08-26T13:22:28Z
    Message:               Cluster data of all PVs are protected. Kube objects protected
    Observed Generation:   1
    Reason:                Uploaded
    Status:                True
    Type:                  ClusterDataProtected
  Kube Object Protection:
  Last Update Time:     2024-08-26T13:22:29Z
  Observed Generation:  1
  Protected PV Cs:
    Access Modes:
      ReadWriteOnce
    Conditions:
      Last Transition Time:  2024-08-26T13:22:27Z
      Message:               VolumeReplication resource for pvc not promoted to primary
      Observed Generation:   1
      Reason:                Error
      Status:                False
      Type:                  DataReady
      Last Transition Time:  2024-08-26T13:22:28Z
      Message:               PV cluster data already protected for PVC busybox-pvc-41-clone
      Observed Generation:   1
      Reason:                Uploaded
      Status:                True
      Type:                  ClusterDataProtected
      Last Transition Time:  2024-08-26T13:22:28Z
      Message:               VolumeReplication resource for pvc not promoted to primary
      Observed Generation:   1
      Reason:                Error
      Status:                False
      Type:                  DataProtected
    Csi Provisioner:         openshift-storage.rbd.csi.ceph.com
    Labels:
      App:                                        test
      ramendr.openshift.io/owner-name:            rbd-appset-busybox1-cloned-placement-drpc
      ramendr.openshift.io/owner-namespace-name:  busybox-workloads-3
    Name:                                         busybox-pvc-41-clone
    Namespace:                                    busybox-workloads-3
    Replication ID:
      Id:  95c39e71747014538e8cdbd5ae4b92886f088b5
      Modes:
        Failover
    Resources:
      Requests:
        Storage:         42Gi
    Storage Class Name:  ocs-storagecluster-ceph-rbd
    Storage ID:
      Id:  2e44281a-cc8a-4cdb-bd8b-2a9b81edacca
  State:   Unknown
Events:
  Type    Reason                    Age   From                               Message
  ----    ------                    ----  ----                               -------
  Normal  PrimaryVRGProcessSuccess  17m   controller_VolumeReplicationGroup  Primary Success



Expected results: Users should be able to recover from this state and select correct DRPolicy with flattening enabled.


Additional info:

Comment 7 errata-xmlrpc 2024-10-30 14:35:26 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Important: Red Hat OpenShift Data Foundation 4.17.0 Security, Enhancement, & Bug Fix Update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2024:8676


Note You need to log in before you can comment on or make changes to this bug.