Bug 2241015 - [RDR][Ceph-FS] Relocation does not proceed, progression status stuck at WaitingForResourceRestore
Summary: [RDR][Ceph-FS] Relocation does not proceed, progression status stuck at Waiti...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat OpenShift Data Foundation
Classification: Red Hat Storage
Component: odf-dr
Version: 4.14
Hardware: Unspecified
OS: Unspecified
unspecified
high
Target Milestone: ---
: ODF 4.14.0
Assignee: Benamar Mekhissi
QA Contact: kmanohar
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2023-09-27 17:11 UTC by rakesh-gm
Modified: 2023-11-08 18:56 UTC (History)
4 users (show)

Fixed In Version: 4.14.0-148
Doc Type: No Doc Update
Doc Text:
Clone Of:
Environment:
Last Closed: 2023-11-08 18:54:58 UTC
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github RamenDR ramen pull 1087 0 None open Rollback to the last snapshot on failover (only) 2023-10-06 19:31:10 UTC
Red Hat Product Errata RHSA-2023:6832 0 None None None 2023-11-08 18:56:32 UTC

Description rakesh-gm 2023-09-27 17:11:36 UTC
Description of problem (please be detailed as possible and provide log
snippests):

Relocation does not complete after a failover, 
The progression does not move from WaitingForResourceRestore.
this might be because of not waiting for enough time for all the data to be transferred 
after relocate

Version of all relevant components (if applicable):
odf4.14 


Does this issue impact your ability to continue to work with the product
(please explain in detail what is the user impact)?


Is there any workaround available to the best of your knowledge?


Rate from 1 - 5 the complexity of the scenario you performed that caused this
bug (1 - very simple, 5 - very complex)?


Can this issue be reproducible?
yes 

Can this issue be reproduced from the UI?


If this is a regression, please provide more details to justify this:


Steps to Reproduce:
1. Deploy application based on Appset, Perform failover  
2. Perform Relocate 
3. DRPC progression stuck at WaitingForResourceRestore

The app has multiple PVCs/PV


Actual results:
Output of DRPC status 

NAME                            AGE   PREFERREDCLUSTER   FAILOVERCLUSTER   DESIREDSTATE   CURRENTSTATE   PROGRESSION                 START TIME             DURATION   PEER READY
dev-qe-volsync-placement-drpc   21h   prsurve-dev-1      prsurve-dev-2     Relocate       Relocating     WaitingForResourceRestore   2023-09-27T08:08:56Z              False

Expected results:
Relocat should be complete. 

Additional info:

Comment 2 krishnaram Karthick 2023-10-03 05:11:36 UTC
Marking this bug as a blocker as this is a basic positive workflow.

Comment 5 Benamar Mekhissi 2023-10-06 19:31:10 UTC
Details are in the PR: https://github.com/RamenDR/ramen/pull/1087

Comment 6 kmanohar 2023-11-02 17:20:45 UTC
VERIFICATION COMMENTS
=====================

Steps to Reproduce:
-------------------

1. Deploy application based on Appset, Perform failover  
2. Perform Relocate 
3. DRPC progression stuck at WaitingForResourceRestore


Verification O/P after performing relocate:
-------------------------------------------

O/P on the new primary :-

$ pods

NAME                       READY   STATUS    RESTARTS   AGE
dd-io-1-5dbcfccf76-rcvfb   1/1     Running   0          65m
dd-io-2-684fc84b64-m7clh   1/1     Running   0          65m
dd-io-3-68bf99586d-vpfjs   1/1     Running   0          65m
dd-io-4-757c8d8b7b-45rt9   1/1     Running   0          65m
dd-io-5-74768ccf84-9lqg5   1/1     Running   0          65m
dd-io-6-68d5769c76-cjrcd   1/1     Running   0          65m
dd-io-7-67d87688b4-r7wfv   1/1     Running   0          65m

$ pvc

NAME          STATUS   VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLASS                AGE
dd-io-pvc-1   Bound    pvc-9d8128d6-cbb5-41f3-84df-4e1559db5036   117Gi      RWO            ocs-storagecluster-cephfs   9h
dd-io-pvc-2   Bound    pvc-6078b90b-170b-4e0b-8985-2f5edff84b4b   143Gi      RWO            ocs-storagecluster-cephfs   9h
dd-io-pvc-3   Bound    pvc-c1134ea7-3ee0-4924-a08e-d1dd14a52932   134Gi      RWO            ocs-storagecluster-cephfs   9h
dd-io-pvc-4   Bound    pvc-b9370af1-bac4-4990-90fd-54fda2ee56d2   106Gi      RWO            ocs-storagecluster-cephfs   9h
dd-io-pvc-5   Bound    pvc-3a7e387f-71c4-4543-971d-6e59d4e837b8   115Gi      RWO            ocs-storagecluster-cephfs   9h
dd-io-pvc-6   Bound    pvc-eac2029d-93c4-483f-877d-1f2d282c5a9a   129Gi      RWO            ocs-storagecluster-cephfs   9h
dd-io-pvc-7   Bound    pvc-f0bfbdbd-09e8-428a-bda2-2bdc12825965   149Gi      RWO            ocs-storagecluster-cephfs   9h

$ oc get vrg

NAME                                 DESIREDSTATE   CURRENTSTATE
busybox-1-cephfs-c1-placement-drpc   primary        Primary

$ oc get replicationsources.volsync.backube

NAME          SOURCE        LAST SYNC              DURATION          NEXT SYNC
dd-io-pvc-1   dd-io-pvc-1   2023-11-02T14:41:25Z   1m25.883277818s   2023-11-02T14:50:00Z
dd-io-pvc-2   dd-io-pvc-2   2023-11-02T14:41:12Z   1m12.190563634s   2023-11-02T14:50:00Z
dd-io-pvc-3   dd-io-pvc-3   2023-11-02T14:41:08Z   1m8.781483412s    2023-11-02T14:50:00Z
dd-io-pvc-4   dd-io-pvc-4   2023-11-02T14:41:09Z   1m9.538156757s    2023-11-02T14:50:00Z
dd-io-pvc-5   dd-io-pvc-5   2023-11-02T14:41:16Z   1m16.026772249s   2023-11-02T14:50:00Z
dd-io-pvc-6   dd-io-pvc-6   2023-11-02T14:41:15Z   1m15.76896785s    2023-11-02T14:50:00Z
dd-io-pvc-7   dd-io-pvc-7   2023-11-02T14:41:05Z   1m5.365572687s    2023-11-02T14:50:00Z


On HUB:-


$ oc get drpc busybox-1-cephfs-c1-placement-drpc -o yaml

apiVersion: ramendr.openshift.io/v1alpha1
kind: DRPlacementControl
metadata:
  annotations:
    drplacementcontrol.ramendr.openshift.io/app-namespace: appset-busybox-1-cephfs-c1
    drplacementcontrol.ramendr.openshift.io/last-app-deployment-cluster: kmanohar-clu2
  creationTimestamp: "2023-11-02T04:43:51Z"
  finalizers:
  - drpc.ramendr.openshift.io/finalizer
  generation: 2
  labels:
    cluster.open-cluster-management.io/backup: resource
  name: busybox-1-cephfs-c1-placement-drpc
  namespace: openshift-gitops
  ownerReferences:
  - apiVersion: cluster.open-cluster-management.io/v1beta1
    blockOwnerDeletion: true
    controller: true
    kind: Placement
    name: busybox-1-cephfs-c1-placement
    uid: 8064c466-f386-4bfb-b339-f12cded188a1
  resourceVersion: "22589985"
  uid: 0fbb0832-f2de-4dc4-b7bd-802d3f6b9113
spec:
  action: Relocate
  drPolicyRef:
    apiVersion: ramendr.openshift.io/v1alpha1
    kind: DRPolicy
    name: dr-policy-10m
  placementRef:
    apiVersion: cluster.open-cluster-management.io/v1beta1
    kind: Placement
    name: busybox-1-cephfs-c1-placement
    namespace: openshift-gitops
  preferredCluster: kmanohar-clu2
  pvcSelector:
    matchLabels:
      appname: busybox-cephfs
status:
  actionDuration: 3m27.162568163s
  actionStartTime: "2023-11-02T13:33:07Z"
  conditions:
  - lastTransitionTime: "2023-11-02T13:36:04Z"
    message: Completed
    observedGeneration: 2
    reason: Relocated
    status: "True"
    type: Available
  - lastTransitionTime: "2023-11-02T13:36:34Z"
    message: Ready
    observedGeneration: 2
    reason: Success
    status: "True"
    type: PeerReady
  lastGroupSyncDuration: 1m25.883277818s
  lastGroupSyncTime: "2023-11-02T14:41:05Z"
  lastUpdateTime: "2023-11-02T14:41:48Z"
  phase: Relocated
  preferredDecision:
    clusterName: kmanohar-clu1
    clusterNamespace: kmanohar-clu1
  progression: Completed
  resourceConditions:
    conditions:
    - lastTransitionTime: "2023-11-02T13:36:04Z"
      message: All VolSync PVCs are ready
      observedGeneration: 4
      reason: Ready
      status: "True"
      type: DataReady
    - lastTransitionTime: "2023-11-02T13:37:47Z"
      message: All VolSync PVCs are protected
      observedGeneration: 4
      reason: DataProtected
      status: "True"
      type: DataProtected
    - lastTransitionTime: "2023-11-02T13:36:04Z"
      message: Restored cluster data
      observedGeneration: 4
      reason: Restored
      status: "True"
      type: ClusterDataReady
    - lastTransitionTime: "2023-11-02T13:37:47Z"
      message: All VolSync PVCs are protected
      observedGeneration: 4
      reason: DataProtected
      status: "True"
      type: ClusterDataProtected
    resourceMeta:
      generation: 4
      kind: VolumeReplicationGroup
      name: busybox-1-cephfs-c1-placement-drpc
      namespace: appset-busybox-1-cephfs-c1
      protectedpvcs:
      - dd-io-pvc-4
      - dd-io-pvc-1
      - dd-io-pvc-5
      - dd-io-pvc-7
      - dd-io-pvc-3
      - dd-io-pvc-2
      - dd-io-pvc-6


$ oc get drpc busybox-1-cephfs-c1-placement-drpc -o yaml | grep lastGroupSyncTime
  lastGroupSyncTime: "2023-11-02T14:41:05Z"


$ oc get drpc
NAME                                          AGE    PREFERREDCLUSTER   FAILOVERCLUSTER   DESIREDSTATE   CURRENTSTATE
busybox-1-c1-placement-drpc                   16d    kmanohar-clu1                                       Deployed
busybox-1-cephfs-c1-placement-drpc            12h    kmanohar-clu1                        Relocate       Relocated
busybox-1-cephfs-c2-placement-drpc            5h4m   kmanohar-clu2                                       Deployed
busybox-2-cephfs-c1-creation-placement-drpc   12h    kmanohar-clu1                                       Deployed

Verified On
-----------

ODF Version - 4.14.0-150
OCP - 4.14.0-0.nightly-2023-10-15-164249
Submariner - 0.16.0(594788)
ACM - 2.9.0(2.9.0-DOWNSTREAM-2023-10-03-20-08-35)
Ceph version - ceph version 17.2.6-146.el9cp (1d01c2b30b5fd39787bb8804707c4b2e52e30137) quincy (stable)


Must gather for verification
----------------------------

C1 - http://rhsqe-repo.lab.eng.blr.redhat.com/OCS/ocs-qe-bugs/keerthana/bz-v/bz-CephFS/c1/

C2 - http://rhsqe-repo.lab.eng.blr.redhat.com/OCS/ocs-qe-bugs/keerthana/bz-v/bz-CephFS/c2/

HUB - http://rhsqe-repo.lab.eng.blr.redhat.com/OCS/ocs-qe-bugs/keerthana/bz-v/bz-CephFS/hub/

Comment 9 errata-xmlrpc 2023-11-08 18:54:58 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Important: Red Hat OpenShift Data Foundation 4.14.0 security, enhancement & bug fix update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2023:6832


Note You need to log in before you can comment on or make changes to this bug.