This bug was initially created as a copy of Bug #2307909 I am copying this bug because: A fix needs to go into csi addons to enable ramen to process deletion of DRPC when wrong DRPolicy is selected. Description of problem (please be detailed as possible and provide log snippests): Version of all relevant components (if applicable): ODF 4.17.0-77 ACM 2.12.0-DOWNSTREAM-2024-08-17-01-53-31 Does this issue impact your ability to continue to work with the product (please explain in detail what is the user impact)? Is there any workaround available to the best of your knowledge? Rate from 1 - 5 the complexity of the scenario you performed that caused this bug (1 - very simple, 5 - very complex)? Can this issue reproducible? Can this issue reproduce from the UI? If this is a regression, please provide more details to justify this: Steps to Reproduce: 1. Deploy a RDR workload (appset pull model based) 2. Create clone of the PVC 3. Delete the parent workload 4. Deploy another pod consuming the cloned pvc in the same NS (appset pull model based) 5. Create a DRPolicy without flattening enabled 6. Apply this DRPolicy to this new workload with cloned PVC and wait for sync to start 7. If sync doesn't start, delete the drpc and wait for deletion to complete. Deletion remains stuck for me. - apiVersion: ramendr.openshift.io/v1alpha1 kind: DRPlacementControl metadata: annotations: drplacementcontrol.ramendr.openshift.io/app-namespace: busybox-workloads-3 drplacementcontrol.ramendr.openshift.io/last-app-deployment-cluster: prsurve-ci creationTimestamp: "2024-08-26T13:22:26Z" deletionGracePeriodSeconds: 0 deletionTimestamp: "2024-08-26T13:47:01Z" finalizers: - drpc.ramendr.openshift.io/finalizer generation: 2 labels: cluster.open-cluster-management.io/backup: ramen name: rbd-appset-busybox1-cloned-placement-drpc namespace: openshift-gitops ownerReferences: - apiVersion: cluster.open-cluster-management.io/v1beta1 blockOwnerDeletion: true controller: true kind: Placement name: rbd-appset-busybox1-cloned-placement uid: 72c8f3da-c7f3-4cd8-a6e3-598a9746bdd2 resourceVersion: "9151956" uid: ff5863eb-67dc-45a8-9fbf-aa054a3c6e85 spec: drPolicyRef: apiVersion: ramendr.openshift.io/v1alpha1 kind: DRPolicy name: my-drpolicy-5-normal placementRef: apiVersion: cluster.open-cluster-management.io/v1beta1 kind: Placement name: rbd-appset-busybox1-cloned-placement namespace: openshift-gitops preferredCluster: prsurve-ci pvcSelector: matchLabels: app: test status: actionDuration: 21.03528906s actionStartTime: "2024-08-26T13:22:35Z" conditions: - lastTransitionTime: "2024-08-26T13:22:26Z" message: Initial deployment completed observedGeneration: 1 reason: Deployed status: "True" type: Available - lastTransitionTime: "2024-08-26T13:22:26Z" message: Ready observedGeneration: 1 reason: Success status: "True" type: PeerReady - lastTransitionTime: "2024-08-26T13:22:29Z" message: VolumeReplicationGroup (busybox-workloads-3/rbd-appset-busybox1-cloned-placement-drpc) on cluster prsurve-ci is reporting errors (All PVCs of the VolumeReplicationGroup are not ready) readying workload data, retrying till DataReady condition is met observedGeneration: 1 reason: Error status: "False" type: Protected lastUpdateTime: "2024-08-26T13:22:56Z" observedGeneration: 2 phase: Deleting preferredDecision: clusterName: prsurve-ci clusterNamespace: prsurve-ci progression: Deleting resourceConditions: conditions: - lastTransitionTime: "2024-08-26T13:22:28Z" message: All PVCs of the VolumeReplicationGroup are not ready observedGeneration: 1 reason: Error status: "False" type: DataReady - lastTransitionTime: "2024-08-26T13:22:28Z" message: All PVCs of the VolumeReplicationGroup are not ready observedGeneration: 1 reason: Error status: "False" type: DataProtected - lastTransitionTime: "2024-08-26T13:22:27Z" message: Nothing to restore observedGeneration: 1 reason: Restored status: "True" type: ClusterDataReady - lastTransitionTime: "2024-08-26T13:22:28Z" message: Cluster data of all PVs are protected. Kube objects protected observedGeneration: 1 reason: Uploaded status: "True" type: ClusterDataProtected resourceMeta: generation: 1 kind: VolumeReplicationGroup name: rbd-appset-busybox1-cloned-placement-drpc namespace: busybox-workloads-3 protectedpvcs: - busybox-pvc-41-clone resourceVersion: "11450267" kind: List metadata: resourceVersion: "" Actual results: Sync doesn't start for a cloned/snapshot pvc based workload when drpolicy without flattening is assigned to it DRPCyaml- - apiVersion: ramendr.openshift.io/v1alpha1 kind: DRPlacementControl metadata: annotations: drplacementcontrol.ramendr.openshift.io/app-namespace: busybox-workloads-3 drplacementcontrol.ramendr.openshift.io/last-app-deployment-cluster: prsurve-ci creationTimestamp: "2024-08-26T13:22:26Z" finalizers: - drpc.ramendr.openshift.io/finalizer generation: 1 labels: cluster.open-cluster-management.io/backup: ramen name: rbd-appset-busybox1-cloned-placement-drpc namespace: openshift-gitops ownerReferences: - apiVersion: cluster.open-cluster-management.io/v1beta1 blockOwnerDeletion: true controller: true kind: Placement name: rbd-appset-busybox1-cloned-placement uid: 72c8f3da-c7f3-4cd8-a6e3-598a9746bdd2 resourceVersion: "9130883" uid: ff5863eb-67dc-45a8-9fbf-aa054a3c6e85 spec: drPolicyRef: apiVersion: ramendr.openshift.io/v1alpha1 kind: DRPolicy name: my-drpolicy-5-normal placementRef: apiVersion: cluster.open-cluster-management.io/v1beta1 kind: Placement name: rbd-appset-busybox1-cloned-placement namespace: openshift-gitops preferredCluster: prsurve-ci pvcSelector: matchLabels: app: test status: actionDuration: 21.03528906s actionStartTime: "2024-08-26T13:22:35Z" conditions: - lastTransitionTime: "2024-08-26T13:22:26Z" message: Initial deployment completed observedGeneration: 1 reason: Deployed status: "True" type: Available - lastTransitionTime: "2024-08-26T13:22:26Z" message: Ready observedGeneration: 1 reason: Success status: "True" type: PeerReady - lastTransitionTime: "2024-08-26T13:22:29Z" message: VolumeReplicationGroup (busybox-workloads-3/rbd-appset-busybox1-cloned-placement-drpc) on cluster prsurve-ci is reporting errors (All PVCs of the VolumeReplicationGroup are not ready) readying workload data, retrying till DataReady condition is met observedGeneration: 1 reason: Error status: "False" type: Protected lastUpdateTime: "2024-08-26T13:22:56Z" observedGeneration: 1 phase: Deployed preferredDecision: clusterName: prsurve-ci clusterNamespace: prsurve-ci progression: Completed resourceConditions: conditions: - lastTransitionTime: "2024-08-26T13:22:28Z" message: All PVCs of the VolumeReplicationGroup are not ready observedGeneration: 1 reason: Error status: "False" type: DataReady - lastTransitionTime: "2024-08-26T13:22:28Z" message: All PVCs of the VolumeReplicationGroup are not ready observedGeneration: 1 reason: Error status: "False" type: DataProtected - lastTransitionTime: "2024-08-26T13:22:27Z" message: Nothing to restore observedGeneration: 1 reason: Restored status: "True" type: ClusterDataReady - lastTransitionTime: "2024-08-26T13:22:28Z" message: Cluster data of all PVs are protected. Kube objects protected observedGeneration: 1 reason: Uploaded status: "True" type: ClusterDataProtected resourceMeta: generation: 1 kind: VolumeReplicationGroup name: rbd-appset-busybox1-cloned-placement-drpc namespace: busybox-workloads-3 protectedpvcs: - busybox-pvc-41-clone resourceVersion: "11450267" kind: List metadata: resourceVersion: "" DRPC- openshift-gitops rbd-appset-busybox1-cloned-placement-drpc 16m prsurve-ci Deployed Completed 2024-08-26T13:22:35Z 21.03528906s C1 primary- busybox-3 Already on project "busybox-workloads-3" on server "https://api.prsurve-ci.qe.rh-ocs.com:6443". NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS VOLUMEATTRIBUTESCLASS AGE VOLUMEMODE persistentvolumeclaim/busybox-pvc-41-clone Bound pvc-47e5b565-51eb-432c-b9fb-054d741627ae 42Gi RWO ocs-storagecluster-ceph-rbd <unset> 26m Filesystem NAME AGE VOLUMEREPLICATIONCLASS PVCNAME DESIREDSTATE CURRENTSTATE volumereplication.replication.storage.openshift.io/busybox-pvc-41-clone 16m rbd-volumereplicationclass-1625360775 busybox-pvc-41-clone primary Unknown NAME DESIREDSTATE CURRENTSTATE volumereplicationgroup.ramendr.openshift.io/rbd-appset-busybox1-cloned-placement-drpc primary Unknown NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES pod/busybox-41-5d7bd7c476-wnls4 1/1 Running 0 20m 10.129.2.85 compute-0 <none> <none> oc describe vr Name: busybox-pvc-41-clone Namespace: busybox-workloads-3 Labels: ramendr.openshift.io/owner-name=rbd-appset-busybox1-cloned-placement-drpc ramendr.openshift.io/owner-namespace-name=busybox-workloads-3 Annotations: <none> API Version: replication.storage.openshift.io/v1alpha1 Kind: VolumeReplication Metadata: Creation Timestamp: 2024-08-26T13:22:27Z Finalizers: replication.storage.openshift.io Generation: 1 Owner References: API Version: ramendr.openshift.io/v1alpha1 Block Owner Deletion: true Controller: true Kind: VolumeReplicationGroup Name: rbd-appset-busybox1-cloned-placement-drpc UID: dc251368-36c9-4f4f-a829-f565907a7557 Resource Version: 11450228 UID: 46cbda2d-e27f-4c2b-934a-e6d9c19f11ae Spec: Auto Resync: false Data Source: API Group: Kind: PersistentVolumeClaim Name: busybox-pvc-41-clone Replication Handle: Replication State: primary Volume Replication Class: rbd-volumereplicationclass-1625360775 Status: Conditions: Last Transition Time: 2024-08-26T13:22:27Z Message: Observed Generation: 1 Reason: FailedToPromote Status: False Type: Completed Last Transition Time: 2024-08-26T13:22:27Z Message: Observed Generation: 1 Reason: Error Status: True Type: Degraded Last Transition Time: 2024-08-26T13:22:27Z Message: Observed Generation: 1 Reason: NotResyncing Status: False Type: Resyncing Message: system is not in a state required for the operation's execution: failed to enable mirroring on image "ocs-storagecluster-cephblockpool/csi-vol-65f92216-ad7b-4dfb-bef3-34cd363363c5": parent image "ocs-storagecluster-cephblockpool/csi-vol-65f92216-ad7b-4dfb-bef3-34cd363363c5-temp" is not enabled for mirroring Observed Generation: 1 State: Unknown Events: <none> oc get vr -oyaml apiVersion: v1 items: - apiVersion: replication.storage.openshift.io/v1alpha1 kind: VolumeReplication metadata: creationTimestamp: "2024-08-26T13:22:27Z" finalizers: - replication.storage.openshift.io generation: 1 labels: ramendr.openshift.io/owner-name: rbd-appset-busybox1-cloned-placement-drpc ramendr.openshift.io/owner-namespace-name: busybox-workloads-3 name: busybox-pvc-41-clone namespace: busybox-workloads-3 ownerReferences: - apiVersion: ramendr.openshift.io/v1alpha1 blockOwnerDeletion: true controller: true kind: VolumeReplicationGroup name: rbd-appset-busybox1-cloned-placement-drpc uid: dc251368-36c9-4f4f-a829-f565907a7557 resourceVersion: "11450228" uid: 46cbda2d-e27f-4c2b-934a-e6d9c19f11ae spec: autoResync: false dataSource: apiGroup: "" kind: PersistentVolumeClaim name: busybox-pvc-41-clone replicationHandle: "" replicationState: primary volumeReplicationClass: rbd-volumereplicationclass-1625360775 status: conditions: - lastTransitionTime: "2024-08-26T13:22:27Z" message: "" observedGeneration: 1 reason: FailedToPromote status: "False" type: Completed - lastTransitionTime: "2024-08-26T13:22:27Z" message: "" observedGeneration: 1 reason: Error status: "True" type: Degraded - lastTransitionTime: "2024-08-26T13:22:27Z" message: "" observedGeneration: 1 reason: NotResyncing status: "False" type: Resyncing message: 'system is not in a state required for the operation''s execution: failed to enable mirroring on image "ocs-storagecluster-cephblockpool/csi-vol-65f92216-ad7b-4dfb-bef3-34cd363363c5": parent image "ocs-storagecluster-cephblockpool/csi-vol-65f92216-ad7b-4dfb-bef3-34cd363363c5-temp" is not enabled for mirroring' observedGeneration: 1 state: Unknown kind: List metadata: resourceVersion: "" oc describe vrg Name: rbd-appset-busybox1-cloned-placement-drpc Namespace: busybox-workloads-3 Labels: <none> Annotations: drplacementcontrol.ramendr.openshift.io/destination-cluster: prsurve-ci drplacementcontrol.ramendr.openshift.io/do-not-delete-pvc: drplacementcontrol.ramendr.openshift.io/drpc-uid: ff5863eb-67dc-45a8-9fbf-aa054a3c6e85 drplacementcontrol.ramendr.openshift.io/is-cg-enabled: API Version: ramendr.openshift.io/v1alpha1 Kind: VolumeReplicationGroup Metadata: Creation Timestamp: 2024-08-26T13:22:27Z Finalizers: volumereplicationgroups.ramendr.openshift.io/vrg-protection Generation: 1 Owner References: API Version: work.open-cluster-management.io/v1 Kind: AppliedManifestWork Name: a886abc37b147c9bcb446cc55d8427d165d0a651db5133c8bebc9104e5ec8b1b-rbd-appset-busybox1-cloned-placement-drpc-busybox-workloads-3-vrg-mw UID: 7cb63ab5-bbbc-408c-85d7-4c1388653f46 Resource Version: 11450267 UID: dc251368-36c9-4f4f-a829-f565907a7557 Spec: Async: Replication Class Selector: Scheduling Interval: 5m Volume Group Snapshot Class Selector: Volume Snapshot Class Selector: Pvc Selector: Match Labels: App: test Replication State: primary s3Profiles: s3profile-prsurve-ci-ocs-storagecluster s3profile-prsurve-vm-d-ocs-storagecluster Vol Sync: Status: Conditions: Last Transition Time: 2024-08-26T13:22:28Z Message: All PVCs of the VolumeReplicationGroup are not ready Observed Generation: 1 Reason: Error Status: False Type: DataReady Last Transition Time: 2024-08-26T13:22:28Z Message: All PVCs of the VolumeReplicationGroup are not ready Observed Generation: 1 Reason: Error Status: False Type: DataProtected Last Transition Time: 2024-08-26T13:22:27Z Message: Nothing to restore Observed Generation: 1 Reason: Restored Status: True Type: ClusterDataReady Last Transition Time: 2024-08-26T13:22:28Z Message: Cluster data of all PVs are protected. Kube objects protected Observed Generation: 1 Reason: Uploaded Status: True Type: ClusterDataProtected Kube Object Protection: Last Update Time: 2024-08-26T13:22:29Z Observed Generation: 1 Protected PV Cs: Access Modes: ReadWriteOnce Conditions: Last Transition Time: 2024-08-26T13:22:27Z Message: VolumeReplication resource for pvc not promoted to primary Observed Generation: 1 Reason: Error Status: False Type: DataReady Last Transition Time: 2024-08-26T13:22:28Z Message: PV cluster data already protected for PVC busybox-pvc-41-clone Observed Generation: 1 Reason: Uploaded Status: True Type: ClusterDataProtected Last Transition Time: 2024-08-26T13:22:28Z Message: VolumeReplication resource for pvc not promoted to primary Observed Generation: 1 Reason: Error Status: False Type: DataProtected Csi Provisioner: openshift-storage.rbd.csi.ceph.com Labels: App: test ramendr.openshift.io/owner-name: rbd-appset-busybox1-cloned-placement-drpc ramendr.openshift.io/owner-namespace-name: busybox-workloads-3 Name: busybox-pvc-41-clone Namespace: busybox-workloads-3 Replication ID: Id: 95c39e71747014538e8cdbd5ae4b92886f088b5 Modes: Failover Resources: Requests: Storage: 42Gi Storage Class Name: ocs-storagecluster-ceph-rbd Storage ID: Id: 2e44281a-cc8a-4cdb-bd8b-2a9b81edacca State: Unknown Events: Type Reason Age From Message ---- ------ ---- ---- ------- Normal PrimaryVRGProcessSuccess 17m controller_VolumeReplicationGroup Primary Success Expected results: Users should be able to recover from this state and select correct DRPolicy with flattening enabled. Additional info:
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Important: Red Hat OpenShift Data Foundation 4.17.0 Security, Enhancement, & Bug Fix Update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2024:8676