Bug 2318399

Summary: [4.16.2] CephFS - Creating large PV from volumesnapshcontent takes a long time
Product: [Red Hat Storage] Red Hat OpenShift Data Foundation Reporter: David Vaanunu <dvaanunu>
Component: cephAssignee: Venky Shankar <vshankar>
ceph sub component: CephFS QA Contact: Elad <ebenahar>
Status: NEW --- Docs Contact:
Severity: high    
Priority: unspecified CC: bniver, muagarwa, sostapov
Version: 4.16   
Target Milestone: ---   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description David Vaanunu 2024-10-13 10:55:32 UTC
Created attachment 2051837 [details]
csi-cephfsplugin-provisioner-5677fd74c7-62q2c pod log

Description of problem (please be detailed as possible and provide log
snippets):

Create a single namespace with 1 pod using a large PV size(500g/1T/2T/4T) and usage (100g/500g/1.5T/3T) on CephFs.

Using OADP to create CSI snapshot backup for the single namespace. (volumesnapshot & volumesnapshotcontent were created during the backup. volumesnapshot - Delete while backup complete)

While restoring the single namespace using the volumesnapshotcontent, the PVC creates fast and the PV creation takes a long time. 

Version of all relevant components (if applicable):

OCP - 4.16.2
ODF - 4.16.2-rhodf

Does this issue impact your ability to continue to work with the product
(please explain in detail what is the user impact)?


Is there any workaround available to the best of your knowledge?


Rate from 1 - 5 the complexity of the scenario you performed that caused this
bug (1 - very simple, 5 - very complex)?


Can this issue reproducible?
Yes

Can this issue reproduce from the UI?
maybe

If this is a regression, please provide more details to justify this:


Steps to Reproduce:
1. Using OADP operator to backup and restore
2. Create a single namespace with 1 pod
3. Using OADP - Create CSI snapshot backup
4. Using OADP - Restore from CSI snapshot backup


Actual results:
PV creation (cephFS) takes a long time

Expected results:
PV creation (cephFS) takes a few seconds (like CephRBD)

Additional info:


CephFS  vs. CephRBD restore  duration:
	
	                         CephFS       CephRBD
PV Size 500G,  Usage 100g	0:33:43	       0:00:13
PV Size 1000G, Usage 500g	2:44:43	       0:00:13
PV Size 2000G, Usage 1500g	8:13:32	       0:00:23
PV Size 4000G, Usage 3000g	16:32:42       0:00:23

Comment 6 David Vaanunu 2024-10-13 12:52:14 UTC
Can see the time gap between the PVC and PV

# oc get pv | grep datagen-1pod-3000g-fs
pvc-25682db5-1da6-46f9-8e48-13332eaca35e   4000Gi     RWO            Delete           Bound    datagen-1pod-3000g-fs/pvc-busy-data-fs-1pod-3000g-1        ocs-storagecluster-cephfs     <unset>                          29h


# oc get pvc -ndatagen-1pod-3000g-fs
NAME                            STATUS   VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLASS                VOLUMEATTRIBUTESCLASS   AGE
pvc-busy-data-fs-1pod-3000g-1   Bound    pvc-25682db5-1da6-46f9-8e48-13332eaca35e   4000Gi     RWO            ocs-storagecluster-cephfs   <unset>                 45h

Comment 8 David Vaanunu 2024-10-14 10:29:18 UTC
Can it stay open until the fix?
It is impacting my test cases and a relevant bug is needed.

Comment 10 Sunil Kumar Acharya 2024-10-15 10:07:11 UTC
Moving the non-blocker bzs out of ODF-4.17.0. If this is a blocker, please feel free to propose it as a blocker for ODF-4.17.0 with justification note.