2296264 – [MDR] Not able to disable Disaster Recovery for ACM discovered applications after primary is down and Failing over to secondary

Bug 2296264 - [MDR] Not able to disable Disaster Recovery for ACM discovered applications after primary is down and Failing over to secondary

Summary: [MDR] Not able to disable Disaster Recovery for ACM discovered applications a...

Keywords:
Status:	ASSIGNED
Alias:	None
Product:	Red Hat OpenShift Data Foundation
Classification:	Red Hat Storage
Component:	odf-dr
Sub Component:
Version:	4.16
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	high
Target Milestone:	---
Target Release:	ODF 4.18.0
Assignee:	Raghavendra Talur
QA Contact:	avdhoot
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2024-07-08 10:54 UTC by avdhoot
Modified:	2024-10-18 07:28 UTC (History)
CC List:	5 users (show)
Fixed In Version:
Doc Type:	No Doc Update
Doc Text:
Clone Of:
Environment:
Last Closed:
Embargoed:
Flags:	asagare: needinfo+ asagare: needinfo+

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Issue Tracker	OCSBZM-8622	0	None	None	None	2024-09-16 13:08:00 UTC

Description avdhoot 2024-07-08 10:54:30 UTC

Description of problem (please be detailed as possible and provide log
snippests):
Hi ,

I am trying to do Recovery to replace cluster with MDR of Discovered Apps by following mentioned steps.

[1]
https://docs.redhat.com/en/documentation/red_hat_openshift_data_foundation/4.15/html/configuring_openshift_data_foundation_disaster_recovery_for_openshift_workloads/metro-dr-solution#recovering-to-a-replacement-cluster-with-mdr_manage-mdr 


Followed below steps to disable DR of Discovered apps.
[2]
https://docs.google.com/document/d/1BoqbEqDBLCQZXp2qvd7Hw5mvg59njv1dqlrH6Hy7L58/edit#heading=h.1yx58g1ouy2 

When I am trying to delete DRPC it get stuck in deleting state.Below is the drpc yaml output.

➜  hub oc get drpc imperative-1 -n openshift-dr-ops -oyaml
apiVersion: ramendr.openshift.io/v1alpha1
kind: DRPlacementControl
metadata:
  annotations:
    drplacementcontrol.ramendr.openshift.io/app-namespace: openshift-dr-ops
    drplacementcontrol.ramendr.openshift.io/last-app-deployment-cluster: asagare-sec
  creationTimestamp: "2024-07-04T07:39:18Z"
  deletionGracePeriodSeconds: 0
  deletionTimestamp: "2024-07-08T05:25:48Z"
  finalizers:
  - drpc.ramendr.openshift.io/finalizer
  generation: 6
  labels:
    cluster.open-cluster-management.io/backup: ramen
  name: imperative-1
  namespace: openshift-dr-ops
  ownerReferences:
  - apiVersion: cluster.open-cluster-management.io/v1beta1
    blockOwnerDeletion: true
    controller: true
    kind: Placement
    name: imperative-1-placement-1
    uid: 0e05a1af-f89a-48b1-aa0f-28cb89b14344
  resourceVersion: "12464211"
  uid: 2f372e03-fb46-4ec5-96d2-e517f9f88d09
spec:
  action: Failover
  drPolicyRef:
    apiVersion: ramendr.openshift.io/v1alpha1
    kind: DRPolicy
    name: odr-policy-mdr
  failoverCluster: asagare-sec
  kubeObjectProtection:
    captureInterval: 2m0s
    kubeObjectSelector:
      matchExpressions:
      - key: appname
        operator: In
        values:
        - busybox
  placementRef:
    apiVersion: cluster.open-cluster-management.io/v1beta1
    kind: Placement
    name: imperative-1-placement-1
    namespace: openshift-dr-ops
  preferredCluster: asagare-pri
  protectedNamespaces:
  - busybox-discovered
  pvcSelector:
    matchExpressions:
    - key: appname
      operator: In
      values:
      - busybox
status:
  actionStartTime: "2024-07-05T11:28:26Z"
  conditions:
  - lastTransitionTime: "2024-07-05T11:28:27Z"
    message: Completed
    observedGeneration: 5
    reason: FailedOver
    status: "True"
    type: Available
  - lastTransitionTime: "2024-07-05T11:28:26Z"
    message: cleaning secondaries
    observedGeneration: 5
    reason: Cleaning
    status: "False"
    type: PeerReady
  - lastTransitionTime: "2024-07-05T11:29:57Z"
    message: VolumeReplicationGroup (openshift-dr-ops/imperative-1) on cluster asagare-sec
      is reporting errors (Cluster data of one or more PVs are unprotectedVRG Kube
      object protect errorunable to ListKeys in DeleteObjects from endpoint https://s3-openshift-storage.apps.asagare-pri.qe.rh-ocs.com
      bucket odrbucket-84427fcbc7ce keyPrefix openshift-dr-ops/imperative-1/kube-objects/1/velero/backups/)
      protecting workload resources, retrying till ClusterDataProtected condition
      is met
    observedGeneration: 5
    reason: Error
    status: "False"
    type: Protected
  lastKubeObjectProtectionTime: "2024-07-05T11:20:51Z"
  lastUpdateTime: "2024-07-07T19:49:50Z"
  observedGeneration: 6
  phase: Deleting
  preferredDecision:
    clusterName: asagare-pri
    clusterNamespace: asagare-pri
  progression: Deleting
  resourceConditions:
    conditions:
    - lastTransitionTime: "2024-07-05T11:29:52Z"
      message: PVCs in the VolumeReplicationGroup are ready for use
      observedGeneration: 1
      reason: Ready
      status: "True"
      type: DataReady
    - lastTransitionTime: "2024-07-05T11:29:52Z"
      message: VolumeReplicationGroup is replicating
      observedGeneration: 1
      reason: Replicating
      status: "False"
      type: DataProtected
    - lastTransitionTime: "2024-07-05T11:29:16Z"
      message: Restored PVs and PVCs
      observedGeneration: 1
      reason: Restored
      status: "True"
      type: ClusterDataReady
    - lastTransitionTime: "2024-07-05T11:29:52Z"
      message: Cluster data of one or more PVs are unprotectedVRG Kube object protect
        errorunable to ListKeys in DeleteObjects from endpoint https://s3-openshift-storage.apps.asagare-pri.qe.rh-ocs.com
        bucket odrbucket-84427fcbc7ce keyPrefix openshift-dr-ops/imperative-1/kube-objects/1/velero/backups/
      observedGeneration: 1
      reason: UploadError
      status: "False"
      type: ClusterDataProtected
    resourceMeta:
      generation: 1
      kind: VolumeReplicationGroup
      name: imperative-1
      namespace: openshift-dr-ops
      protectedpvcs:
      - busybox-pvc
      resourceVersion: "7885562"


Version of all relevant components (if applicable):

OCP: 4.16.0-0.nightly-2024-06-27-091410
ODF: 4.16.0-134
ACM: 2.11.0-140
CEPH: 18.2.1-194.el9cp (04a992766839cd3207877e518a1238cdbac3787e) reef (stable)
OADP: 1.4.0
GitOps: 1.12.4



Does this issue impact your ability to continue to work with the product
(please explain in detail what is the user impact)?


Is there any workaround available to the best of your knowledge?


Rate from 1 - 5 the complexity of the scenario you performed that caused this
bug (1 - very simple, 5 - very complex)?


Can this issue reproducible?
yes

Can this issue reproduce from the UI?


If this is a regression, please provide more details to justify this:


Steps to Reproduce:
1. I have upgraded MDR setup from 4.15.4 to 4.16
2. Deployed and applied DRpolicy to Discovered apps on primary cluster 1
3. Powered off primary cluster.
4. Followed steps for replace cluster mentioned in doc[1]
5. Disabled DR for protected apps using doc[2]
6. Drpc deletion stuck in deleting state.



Actual results:
drpc deletion stuck in deleting state.

Expected results:
drpc should get deleted.

Additional info:

Note You need to log in before you can comment on or make changes to this bug.