2222020 – Slow data removal for PVC with ODF VolSync enabled

Bug 2222020 - Slow data removal for PVC with ODF VolSync enabled [NEEDINFO]

Summary: Slow data removal for PVC with ODF VolSync enabled

Keywords:
Status:	ASSIGNED
Alias:	None
Product:	Red Hat OpenShift Data Foundation
Classification:	Red Hat Storage
Component:	odf-dr
Sub Component:
Version:	4.13
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	unspecified
Target Milestone:	---
Target Release:	---
Assignee:	Benamar Mekhissi
QA Contact:	krishnaram Karthick
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2023-07-11 14:36 UTC by Elvir Kuric
Modified:	2024-09-05 14:44 UTC (History)
CC List:	8 users (show)
Fixed In Version:
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:
Embargoed:
Flags:	bmekhiss: needinfo? (prsurve) rtalur: needinfo? (kramdoss) kseeger: needinfo? (ekuric) kseeger: needinfo? (ekuric)

Attachments	(Terms of Use)

Description Elvir Kuric 2023-07-11 14:36:52 UTC

Description of problem (please be detailed as possible and provide log
snippests):
In this tests we created 100 pods, each writing 10GB per pod - this was randrw test running for 10h ( with --time-based=1 ). 

Writing to ceph beckend was fine and VolSync replicated data between cluster1 and cluster2. 

After test is done, we deleted pods/pvc/replicationsources, volumereplicationgroups and primary and secondary cluster ( also we deleted replicationdestination ) on secondary cluster and all objects were deleted. 

However, "ceph df" on primary cluster did not shown that all storage space is reclaimed back. 20h after pods / pvc are deleted there is still 120 GB ( even there is nobody/nothing using this cluster ) 

this is problematic:

"ocs-storagecluster-cephfilesystem-data0                12  128  112 GiB   57.30k  336 GiB   2.31    4.6 TiB"

$ ceph df
--- RAW STORAGE ---
CLASS    SIZE   AVAIL     USED  RAW USED  %RAW USED
ssd    18 TiB  17 TiB  1.2 TiB   1.2 TiB       6.55
TOTAL  18 TiB  17 TiB  1.2 TiB   1.2 TiB       6.55
 
--- POOLS ---
POOL                                                   ID  PGS   STORED  OBJECTS     USED  %USED  MAX AVAIL
.mgr                                                    1    1   54 MiB       15  162 MiB      0    4.6 TiB
ocs-storagecluster-cephblockpool                        2  512  265 GiB  130.13k  794 GiB   5.29    4.6 TiB
ocs-storagecluster-cephobjectstore.rgw.otp              3    8      0 B        0      0 B      0    4.6 TiB
ocs-storagecluster-cephobjectstore.rgw.control          4    8      0 B        8      0 B      0    4.6 TiB
ocs-storagecluster-cephobjectstore.rgw.buckets.index    5    8    541 B       11  1.6 KiB      0    4.6 TiB
.rgw.root                                               6    8  5.7 KiB       16  180 KiB      0    4.6 TiB
ocs-storagecluster-cephobjectstore.rgw.buckets.non-ec   7    8      0 B        0      0 B      0    4.6 TiB
ocs-storagecluster-cephobjectstore.rgw.log              8    8  1.7 MiB      340  7.0 MiB      0    4.6 TiB
ocs-storagecluster-cephobjectstore.rgw.meta             9    8  4.6 KiB       14  126 KiB      0    4.6 TiB
ocs-storagecluster-cephfilesystem-metadata             10   16  769 MiB      322  2.3 GiB   0.02    4.6 TiB
ocs-storagecluster-cephobjectstore.rgw.buckets.data    11  128  1.0 KiB        2   24 KiB      0    4.6 TiB
ocs-storagecluster-cephfilesystem-data0                12  128  112 GiB   57.30k  336 GiB   2.31    4.6 TiB


Version of all relevant components (if applicable):
ceph version 17.2.6-26.el9cp (ef7b8da24916178ade693b2fd0de13b917f53865) quincy (stable)


Does this issue impact your ability to continue to work with the product
(please explain in detail what is the user impact)?

Yes
Is there any workaround available to the best of your knowledge?

NA
Rate from 1 - 5 the complexity of the scenario you performed that caused this
bug (1 - very simple, 5 - very complex)?
3

Can this issue reproducible?
I believe yes, but tested 1x 

Can this issue reproduce from the UI?
NA


If this is a regression, please provide more details to justify this:
NA

Steps to Reproduce:
1. create VolSync setup between cluster1 and cluster2 with ODF v4.13. 
2. Create 100 pods, writing 10 GB per pod, test duration cca 10h
3. delete pods/pvc/replicationsources, volumereplicationgroups on primary and secondary cluster
4. check "ceph df" on primary cluster

Secondary cluster is fine. "ceph df" shows that test data are deleted. 


Actual results:
data purge from cephfs volume is slow 

Expected results:
data deletion to be faster

Additional info:

Note You need to log in before you can comment on or make changes to this bug.