Bug 2319334 - [RDR] Relocate of ceph fs is stuck in WaitForReadiness
Summary: [RDR] Relocate of ceph fs is stuck in WaitForReadiness
Keywords:
Status: ASSIGNED
Alias: None
Product: Red Hat OpenShift Data Foundation
Classification: Red Hat Storage
Component: odf-dr
Version: 4.16
Hardware: Unspecified
OS: Unspecified
unspecified
urgent
Target Milestone: ---
: ODF 4.18.0
Assignee: Benamar Mekhissi
QA Contact: krishnaram Karthick
URL:
Whiteboard:
Depends On:
Blocks: 2281703 2320289 2321510
TreeView+ depends on / blocked
 
Reported: 2024-10-17 09:13 UTC by Pratik Surve
Modified: 2024-11-01 11:35 UTC (History)
7 users (show)

Fixed In Version:
Doc Type: Known Issue
Doc Text:
.Relocating of CephFS gets stuck in WaitForReadiness There is a scenario where the DRPC progression gets stuck in WaitForReadiness. If it remains in this state for an extended period, it's possible that a known issue has occurred, preventing Ramen from updating the PlacementDecision with the new Primary. As a result, the relocation process will not complete, leaving the workload undeployed on the new primary cluster. This can cause delays in recovery until the user intervenes. Workaround: Manually update the PlacementDecision to point to the new Primary. * For workload using PlacementRule: 1. Edit the PlacementRule oc edit placementrule --subresource=status -n [namespace] [name of the placementrule] Example: oc edit placementrule --subresource=status -n busybox-workloads-cephfs-2 busybox-placement 2. Add the following to the placementrule status. ``` status: decisions: - clusterName: [primary cluster name] reason: [primary cluster name] ``` - For workload using Placement: 1. Edit the PlacementRule oc edit placementdecision --subresource=status -n [namespace] [name of the placementdecision] Example: oc get placementdecision --subresource=status -n openshift-gitops busybox-3-placement-cephfs-decision-1 2. Add the following to the placementrule status. ``` status: decisions: - clusterName: [primary cluster name] reason: [primary cluster name] ``` As a result, the PlacementDecision is updated and the workload is deployed on the Primary cluster.
Clone Of:
Environment:
Last Closed:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github RamenDR ramen pull 1605 0 None open Fix rdspec and protectedpvcs condition 2024-11-01 11:35:08 UTC
Red Hat Issue Tracker OCSBZM-9398 0 None None None 2024-10-17 09:16:41 UTC

Description Pratik Surve 2024-10-17 09:13:48 UTC
Description of problem (please be detailed as possible and provide log
snippests):

	
[RDR] Relocate of ceph fs is stuck in WaitForReadiness


Version of all relevant components (if applicable):
OCS operator	4.16.3-2
Cluster Version	4.16.0-0.nightly-2024-10-12-102620
acm_version	2.11.3
gitops_version	1.14.0
submariner_version	0.18.0

Does this issue impact your ability to continue to work with the product
(please explain in detail what is the user impact)?


Is there any workaround available to the best of your knowledge?


Rate from 1 - 5 the complexity of the scenario you performed that caused this
bug (1 - very simple, 5 - very complex)?

2
Can this issue reproducible?
yes

Can this issue reproduce from the UI?


If this is a regression, please provide more details to justify this:


Steps to Reproduce:
1.Deploy 4.16.3 RDR cluster 
2.Deploy ceph fs workloads
3. Relocate cephfs worklods 


Actual results:

oc describe drpc busybox-3-placement-cephfs-drpc -n openshift-gitops
Name:         busybox-3-placement-cephfs-drpc
Namespace:    openshift-gitops
Labels:       cluster.open-cluster-management.io/backup=ramen
Annotations:  drplacementcontrol.ramendr.openshift.io/app-namespace: appset-busybox-3-cephfs
              drplacementcontrol.ramendr.openshift.io/last-app-deployment-cluster: prsurve-5c1
API Version:  ramendr.openshift.io/v1alpha1
Kind:         DRPlacementControl
Metadata:
  Creation Timestamp:  2024-10-16T13:21:33Z
  Finalizers:
    drpc.ramendr.openshift.io/finalizer
  Generation:  2
  Owner References:
    API Version:           cluster.open-cluster-management.io/v1beta1
    Block Owner Deletion:  true
    Controller:            true
    Kind:                  Placement
    Name:                  busybox-3-placement-cephfs
    UID:                   c08571cd-03c5-46f0-a1c5-4f77bea158fd
  Resource Version:        2853670
  UID:                     b63d684d-74b4-4c83-83e3-17a9829b5bc9
Spec:
  Action:  Relocate
  Dr Policy Ref:
    API Version:  ramendr.openshift.io/v1alpha1
    Kind:         DRPolicy
    Name:         odr-policy-5m
  Placement Ref:
    API Version:      cluster.open-cluster-management.io/v1beta1
    Kind:             Placement
    Name:             busybox-3-placement-cephfs
    Namespace:        openshift-gitops
  Preferred Cluster:  prsurve-5c2
  Pvc Selector:
    Match Labels:
      Appname:  busybox_app3_cephfs
Status:
  Action Start Time:  2024-10-16T13:30:33Z
  Conditions:
    Last Transition Time:    2024-10-16T13:30:43Z
    Message:                 Waiting for App resources to be restored...)
    Observed Generation:     2
    Reason:                  Relocating
    Status:                  False
    Type:                    Available
    Last Transition Time:    2024-10-16T13:34:43Z
    Message:                 Relocation in progress to cluster "prsurve-5c2"
    Observed Generation:     2
    Reason:                  NotStarted
    Status:                  False
    Type:                    PeerReady
    Last Transition Time:    2024-10-16T13:34:44Z
    Message:                 VolumeReplicationGroup (appset-busybox-3-cephfs/busybox-3-placement-cephfs-drpc) on cluster prsurve-5c2 is progressing on readying workload data (Not all VolSync PVCs are ready), retrying till DataReady condition is met
    Observed Generation:     2
    Reason:                  Progressing
    Status:                  False
    Type:                    Protected
  Last Group Sync Duration:  36.74055203s
  Last Group Sync Time:      2024-10-16T13:34:34Z
  Last Update Time:          2024-10-16T14:15:48Z
  Observed Generation:       2
  Phase:                     Relocating
  Preferred Decision:
    Cluster Name:       prsurve-5c1
    Cluster Namespace:  prsurve-5c1
  Progression:          WaitForReadiness
  Resource Conditions:
    Conditions:
      Last Transition Time:  2024-10-16T13:34:44Z
      Message:               Not all VolSync PVCs are ready
      Observed Generation:   3
      Reason:                Progressing
      Status:                False
      Type:                  DataReady
      Last Transition Time:  2024-10-16T13:34:44Z
      Message:               Not all VolSync PVCs are protected
      Observed Generation:   3
      Reason:                Progressing
      Status:                False
      Type:                  DataProtected
      Last Transition Time:  2024-10-16T13:34:44Z
      Message:               Not all VolSync PVCs are protected
      Observed Generation:   3
      Reason:                Progressing
      Status:                False
      Type:                  ClusterDataProtected
      Last Transition Time:  2024-10-16T13:34:44Z
      Message:               Restored PVs and PVCs
      Observed Generation:   3
      Reason:                Restored
      Status:                True
      Type:                  ClusterDataReady
    Resource Meta:
      Generation:  3
      Kind:        VolumeReplicationGroup
      Name:        busybox-3-placement-cephfs-drpc
      Namespace:   appset-busybox-3-cephfs
      Protectedpvcs:
        busybox-pvc-7
        busybox-pvc-6
        busybox-pvc-10
        busybox-pvc-5
        busybox-pvc-4
        busybox-pvc-3
        busybox-pvc-1
        busybox-pvc-8
        busybox-pvc-2
        busybox-pvc-9
      Resource Version:  3633777
Events:
  Type     Reason             Age                 From                           Message
  ----     ------             ----                ----                           -------
  Normal   DRPCDeploying      54m (x8 over 54m)   controller_DRPlacementControl  Deploying the application and VRG
  Normal   DRPCDeploySuccess  54m (x8 over 54m)   controller_DRPlacementControl  Successfully deployed the application and VRG
  Warning  unknown state      45m (x14 over 54m)  controller_DRPlacementControl  next state not known
Expected results:

Relocation should happen successfully 

Additional info:


Note You need to log in before you can comment on or make changes to this bug.