Bug 2207632

Summary: [RDR] Relocate operation fails with AMQ streams
Product: [Red Hat Storage] Red Hat OpenShift Data Foundation Reporter: kmanohar
Component: odf-drAssignee: Benamar Mekhissi <bmekhiss>
odf-dr sub component: ramen QA Contact: krishnaram Karthick <kramdoss>
Status: ASSIGNED --- Docs Contact:
Severity: high    
Priority: high CC: kramdoss, kseeger, muagarwa, odf-bz-bot, rtalur, srangana
Version: 4.13Flags: kramdoss: needinfo+
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 2087782    

Description kmanohar 2023-05-16 11:47:43 UTC
Description of problem (please be detailed as possible and provide log
snippests):

Relocate operation fails with AMQ workload

Steps to reproduce:-
===================

1) Deploy AMQ workload (deployed through ACM UI via Appset)
        path: rdr
        repoURL: https://github.com/prsurve/strimzi-kafka-operator
        source cluster - c2
        target cluster - c1

2) Perform relocate from source to target

Actual Results -
==============
    
     PVCs stuck in "terminating" state, vr, vrgs, images have not been deleted from the primary site.

Related info -
============

 DRPC status
 ------------

    oc get drpc -n openshift-gitops
NAME                         AGE     PREFERREDCLUSTER   FAILOVERCLUSTER   DESIREDSTATE   CURRENTSTATE
amq-dr-demo-placement-drpc   4d17h   kmanohar-clu1                        Relocate       Relocating

 drpc yaml
 -----------

   apiVersion: ramendr.openshift.io/v1alpha1
kind: DRPlacementControl
metadata:
  creationTimestamp: "2023-05-11T17:11:39Z"
  finalizers:
  - drpc.ramendr.openshift.io/finalizer
  generation: 2
  labels:
    cluster.open-cluster-management.io/backup: resource
  name: amq-dr-demo-placement-drpc
  namespace: openshift-gitops
  resourceVersion: "14606610"
  uid: 907ff349-86dd-4a8f-a860-9b7e87baaa09
spec:
  action: Relocate
  drPolicyRef:
    apiVersion: ramendr.openshift.io/v1alpha1
    kind: DRPolicy
    name: dr-policy-10m
  placementRef:
    apiVersion: cluster.open-cluster-management.io/v1beta1
    kind: Placement
    name: amq-dr-demo-placement
    namespace: openshift-gitops
  preferredCluster: kmanohar-clu1
  pvcSelector:
    matchLabels:
      app.kubernetes.io/instance: my-cluster
status:
  actionStartTime: "2023-05-15T08:21:48Z"
  conditions:
  - lastTransitionTime: "2023-05-15T08:22:18Z"
    message: waiting for VRGs to move to secondaries everywhere
    observedGeneration: 2
    reason: Relocating
    status: "False"
    type: Available
  - lastTransitionTime: "2023-05-11T17:11:39Z"
    message: Ready
    observedGeneration: 2
    reason: Success
    status: "True"
    type: PeerReady
  lastGroupSyncTime: "2023-05-15T08:10:00Z"
  lastUpdateTime: "2023-05-16T10:06:46Z"
  phase: Relocating
  preferredDecision:
    clusterName: kmanohar-clu2
    clusterNamespace: amq-dr
  progression: EnsuringVolumesAreSecondary
  resourceConditions:
    conditions:
    - lastTransitionTime: "2023-05-15T08:21:33Z"
      message: PVCs in the VolumeReplicationGroup are ready for use
      observedGeneration: 1
      reason: Ready
      status: "True"
      type: DataReady
    - lastTransitionTime: "2023-05-11T17:11:44Z"
      message: VolumeReplicationGroup is replicating
      observedGeneration: 1
      reason: Replicating
      status: "False"
      type: DataProtected
    - lastTransitionTime: "2023-05-11T17:11:39Z"
      message: Restored cluster data
      observedGeneration: 1
      reason: Restored
      status: "True"
      type: ClusterDataReady
    - lastTransitionTime: "2023-05-12T05:14:38Z"
      message: Kube objects protected
      observedGeneration: 1
      reason: Uploaded
      status: "True"
      type: ClusterDataProtected
    resourceMeta:
      generation: 1
      kind: VolumeReplicationGroup
      name: amq-dr-demo-placement-drpc
      namespace: amq-dr
      protectedpvcs:
      - data-my-cluster-zookeeper-1
      - data-my-cluster-zookeeper-0
      - data-my-cluster-zookeeper-2
      - data-0-my-cluster-kafka-0
      - data-0-my-cluster-kafka-1
      - data-0-my-cluster-kafka-2


  PVC Status
  ----------- 
    
 oc get pvc -n amq-dr

NAME                          STATUS        VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLASS                  AGE
data-0-my-cluster-kafka-0     Terminating   pvc-477f390b-1767-4e85-b5cf-ab51a60eb517   100Gi      RWO            ocs-storagecluster-ceph-rbd   4d17h
data-0-my-cluster-kafka-1     Terminating   pvc-dd33dca5-c225-4ba5-91e8-9adf739aceaa   100Gi      RWO            ocs-storagecluster-ceph-rbd   4d17h
data-0-my-cluster-kafka-2     Terminating   pvc-4fa25c3b-848d-43d6-b285-cff9a514f70e   100Gi      RWO            ocs-storagecluster-ceph-rbd   4d17h
data-my-cluster-zookeeper-0   Terminating   pvc-5a310319-8b25-4d43-b72e-829a510f98c7   100Gi      RWO            ocs-storagecluster-ceph-rbd   4d17h
data-my-cluster-zookeeper-1   Terminating   pvc-2c4c6df0-c637-4cd1-8ce2-e928890cc5b0   100Gi      RWO            ocs-storagecluster-ceph-rbd   4d17h
data-my-cluster-zookeeper-2   Terminating   pvc-b1031f88-6be4-464e-926d-baff6909e980   100Gi      RWO            ocs-storagecluster-ceph-rbd   4d17h


  PVC yaml
  --------

   oc get pvc data-0-my-cluster-kafka-0 -n amq-dr -o yaml
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  annotations:
    pv.kubernetes.io/bind-completed: "yes"
    replication.storage.openshift.io/volume-replication-name: data-0-my-cluster-kafka-0
    strimzi.io/delete-claim: "true"
    volumereplicationgroups.ramendr.openshift.io/vr-protected: protected
  creationTimestamp: "2023-05-11T16:56:25Z"
  deletionGracePeriodSeconds: 0
  deletionTimestamp: "2023-05-15T08:22:47Z"
  finalizers:
  - volumereplicationgroups.ramendr.openshift.io/pvc-vr-protection
  labels:
    app.kubernetes.io/instance: my-cluster
    app.kubernetes.io/managed-by: strimzi-cluster-operator
    app.kubernetes.io/name: kafka
    app.kubernetes.io/part-of: strimzi-my-cluster
    strimzi.io/cluster: my-cluster
    strimzi.io/component-type: kafka
    strimzi.io/kind: Kafka
    strimzi.io/name: my-cluster-kafka
  name: data-0-my-cluster-kafka-0
  namespace: amq-dr
  ownerReferences:
  - apiVersion: kafka.strimzi.io/v1beta2
    blockOwnerDeletion: false
    controller: false
    kind: Kafka
    name: my-cluster
    uid: ebcda051-b805-4fa0-85ff-f641625ac650
  resourceVersion: "8411251"
  uid: 477f390b-1767-4e85-b5cf-ab51a60eb517
spec:
  accessModes:
  - ReadWriteOnce
  resources:
    requests:
      storage: 100Gi
  storageClassName: ocs-storagecluster-ceph-rbd
  volumeMode: Filesystem
  volumeName: pvc-477f390b-1767-4e85-b5cf-ab51a60eb517
status:
  accessModes:
  - ReadWriteOnce
  capacity:
    storage: 100Gi
  phase: Bound


 VR,VRG state
--------------

oc get vr,vrg -n amq-dr

NAME                                                                             AGE     VOLUMEREPLICATIONCLASS                 PVCNAME                       DESIREDSTATE   CURRENTSTATE
volumereplication.replication.storage.openshift.io/data-0-my-cluster-kafka-0     4d18h   rbd-volumereplicationclass-473128587   data-0-my-cluster-kafka-0     secondary      Primary
volumereplication.replication.storage.openshift.io/data-0-my-cluster-kafka-1     4d18h   rbd-volumereplicationclass-473128587   data-0-my-cluster-kafka-1     secondary      Primary
volumereplication.replication.storage.openshift.io/data-0-my-cluster-kafka-2     4d18h   rbd-volumereplicationclass-473128587   data-0-my-cluster-kafka-2     secondary      Primary
volumereplication.replication.storage.openshift.io/data-my-cluster-zookeeper-0   4d18h   rbd-volumereplicationclass-473128587   data-my-cluster-zookeeper-0   secondary      Primary
volumereplication.replication.storage.openshift.io/data-my-cluster-zookeeper-1   4d18h   rbd-volumereplicationclass-473128587   data-my-cluster-zookeeper-1   secondary      Primary
volumereplication.replication.storage.openshift.io/data-my-cluster-zookeeper-2   4d18h   rbd-volumereplicationclass-473128587   data-my-cluster-zookeeper-2   secondary      Primary

NAME                                                                     DESIREDSTATE   CURRENTSTATE
volumereplicationgroup.ramendr.openshift.io/amq-dr-demo-placement-drpc   secondary      Primary




ODF Version - 4.13-184
ACM Version - 2.7.3
MCO - 4.13-184

Logs - http://rhsqe-repo.lab.eng.blr.redhat.com/OCS/ocs-qe-bugs/AMQ-RDR/






Version of all relevant components (if applicable):


Does this issue impact your ability to continue to work with the product
(please explain in detail what is the user impact)?


Is there any workaround available to the best of your knowledge?


Rate from 1 - 5 the complexity of the scenario you performed that caused this
bug (1 - very simple, 5 - very complex)?


Can this issue reproducible?


Can this issue reproduce from the UI?

Comment 5 Mudit Agarwal 2023-05-30 11:26:56 UTC
Not a 4.13 blocker

Comment 10 Karolin Seeger 2024-01-11 10:57:45 UTC
Moving out this one to 4.16, confirmed by QE.