Bug 2144180

Summary: [RDR][CEPHFS] Relocate operation is taking a lot of time to be completed
Product: [Red Hat Storage] Red Hat OpenShift Data Foundation Reporter: Pratik Surve <prsurve>
Component: odf-drAssignee: Benamar Mekhissi <bmekhiss>
odf-dr sub component: ramen QA Contact: Pratik Surve <prsurve>
Status: CLOSED CURRENTRELEASE Docs Contact:
Severity: urgent    
Priority: unspecified CC: bmekhiss, kramdoss, muagarwa, ocs-bugs, odf-bz-bot, srangana
Version: 4.12   
Target Milestone: ---   
Target Release: ODF 4.12.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: 4.12.0-143 Doc Type: Known Issue
Doc Text:
When we perform a failover or relocation, Ramen creates a writable copy of the application PVC. However, if the PVC contains a large number of files, it can take a long time for CephFS to create the copy from a snapshot. To address this issue, we have introduced a setting that allows the source to synchronize directly to the application PVC at the destination. Additionally, the destination will save a snapshot after the last successful synchronization as a backup in case the application PVC becomes corrupted. To use Direct copy method, add the following entry to the Ramen ConfigMap ``` data: ramen_manager_config.yaml: | ... volsync: destinationCopyMethod: Direct ```
Story Points: ---
Clone Of: Environment:
Last Closed: 2023-02-08 14:06:28 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Pratik Surve 2022-11-19 17:04:54 UTC
Description of problem (please be detailed as possible and provide log
snippets):

[RDR][CEPHFS] Relocate operation is taking a lot of time to be completed

pod's were taking a lot of time to be in Running state

Version of all relevant components (if applicable):

OCP version:- 4.12.0-0.nightly-2022-11-10-033725
ODF version:- 4.12.0-111
CEPH version:- ceph version 16.2.10-72.el8cp (3311949c2d1edf5cabcc20ba0f35b4bfccbf021e) pacific (stable)
ACM version:- 2.7.0
SUBMARINER version:- 0.14.0-rc3
VOLSYNC version:- volsync-product.v0.6.0

Does this issue impact your ability to continue to work with the product
(please explain in detail what is the user impact)?

Yes

Is there any workaround available to the best of your knowledge?


Rate from 1 - 5 the complexity of the scenario you performed that caused this
bug (1 - very simple, 5 - very complex)?


Can this issue reproducible?
Yes

Can this issue reproduce from the UI?


If this is a regression, please provide more details to justify this:


Steps to Reproduce:
1. Deploy RDR cluster 
2. Run cephfs workload
3. After 3,4 days perform Relocate operation


Actual results:

pod's were taking a lot of time to be in the Running state

There were around 7 pods in that workload.
the first pod came to running state after 95min of relocation operation and the last pod took 7hr to be in running state

and this is the time it take for the first sync to be completed post relocate operation

6h57m3.588518191s -4 
7h12m32.390207043s- 3
11h29m4.715456059s -6
9h39m58.274429532s - 1
8h40m51.764587212s -2 
12h28m29.509488719s -5
12h -7 


Expected results:
Relocate operations should not take this long time

Additional info:

#####This was the space consumed in the file 

Pod name pod/dd-io-1-5857bfdcd9-zldkq
Filesystem                                                                                                                                             Size  Used Avail Use% Mounted on
172.30.117.79:6789,172.30.90.1:6789,172.30.45.133:6789:/volumes/csi/csi-vol-6748380f-e45d-4786-bbc3-132d8ad6b9ee/c4a9d00d-8ec0-43c8-9855-d90203efe963  117G   35G   83G  30% /mnt/test
Pod name pod/dd-io-2-bcd6d9f65-k9nbc
Filesystem                                                                                                                                             Size  Used Avail Use% Mounted on
172.30.117.79:6789,172.30.90.1:6789,172.30.45.133:6789:/volumes/csi/csi-vol-e698f006-7bbb-49f1-8437-4a4fe38b60f7/83542667-07aa-46fd-bbcd-3acd264100c6  143G   35G  109G  25% /mnt/test
Pod name pod/dd-io-3-5d6b4b84df-c6k66
Filesystem                                                                                                                                             Size  Used Avail Use% Mounted on
172.30.117.79:6789,172.30.90.1:6789,172.30.45.133:6789:/volumes/csi/csi-vol-17e47d6b-674a-40d7-881d-434076ea1d6a/36241a87-682e-4d3e-81da-28d5a4821073  134G   26G  109G  20% /mnt/test
Pod name pod/dd-io-4-6f6db89fbf-cxmfz
Filesystem                                                                                                                                             Size  Used Avail Use% Mounted on
172.30.117.79:6789,172.30.90.1:6789,172.30.45.133:6789:/volumes/csi/csi-vol-51022aad-b9bb-4e16-a650-d3a1b9cc44b5/59db039a-dc75-4863-ad05-6954f662402c  106G   36G   71G  34% /mnt/test
Pod name pod/dd-io-5-7868bc6b5c-7szzp
Filesystem                                                                                                                                             Size  Used Avail Use% Mounted on
172.30.117.79:6789,172.30.90.1:6789,172.30.45.133:6789:/volumes/csi/csi-vol-0672eb30-f2a3-4e0c-a4c4-8416227eb7e6/bb70394e-0157-42b1-ac29-9d9ca5a002e7  115G   34G   82G  30% /mnt/test
Pod name pod/dd-io-6-58c98598d5-rnqgn
Filesystem                                                                                                                                             Size  Used Avail Use% Mounted on
172.30.117.79:6789,172.30.90.1:6789,172.30.45.133:6789:/volumes/csi/csi-vol-6e587b6a-7645-4bf8-b2e4-755c646b4aec/d93d82e1-dde1-4474-800e-3389cb792a29  129G   28G  102G  22% /mnt/test
Pod name pod/dd-io-7-694958ff97-sg8bj
Filesystem                                                                                                                                             Size  Used Avail Use% Mounted on
172.30.117.79:6789,172.30.90.1:6789,172.30.45.133:6789:/volumes/csi/csi-vol-b9ad4559-b07c-4688-8a47-0d625af72fcd/cc77a697-e970-4a02-aede-cba00f565ced  149G   45G  105G  30% /mnt/test


#####File count in the mount point

Pod name pod/dd-io-1-5857bfdcd9-zldkq
1757
Pod name pod/dd-io-2-bcd6d9f65-k9nbc
1775
Pod name pod/dd-io-3-5d6b4b84df-c6k66
1310
Pod name pod/dd-io-4-6f6db89fbf-cxmfz
1841
Pod name pod/dd-io-5-7868bc6b5c-7szzp
1733
Pod name pod/dd-io-6-58c98598d5-rnqgn
1400
Pod name pod/dd-io-7-694958ff97-sg8bj
2260

Comment 4 Mudit Agarwal 2022-11-22 07:44:20 UTC
Benamar, is this planned for 4.12?