Bug 2308550

Summary: ODF should alert when csi-clones exceed clone limits
Product: [Red Hat Storage] Red Hat OpenShift Data Foundation Reporter: Jenifer Abrams <jhopper>
Component: ceph-monitoringAssignee: arun kumar mohan <amohan>
Status: NEW --- QA Contact: Harish NV Rao <hnallurv>
Severity: medium Docs Contact:
Priority: unspecified    
Version: 4.14CC: alitke, etamir, nthomas, odf-bz-bot
Target Milestone: ---Keywords: RFE
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Jenifer Abrams 2024-08-29 19:19:40 UTC
Description of problem (please be detailed as possible and provide log
snippests):
It is a known issue that 100s of csi-clones of a single source can cause exponentially slower clone performance due to clone/flattening limitations. The ODF recommendation is to update docs to recommend a VolumeSnapshot cloning method for this type of scale, however there should be some alert for users if the cluster has exceeded these cloning limits since currently the only symptom may be extremely slow clone performance.

More background here: https://issues.redhat.com/browse/CNV-41845

Version of all relevant components (if applicable):
4.x


Does this issue impact your ability to continue to work with the product
(please explain in detail what is the user impact)?
Without an alert, users do not understand why clones may take many hours to complete 

Is there any workaround available to the best of your knowledge?
Working to update docs for VolumeSnapshot recommendation, but it is not the default clone strategy

Rate from 1 - 5 the complexity of the scenario you performed that caused this
bug (1 - very simple, 5 - very complex)?
2

Can this issue reproducible?
Y

Can this issue reproduce from the UI?
Y

If this is a regression, please provide more details to justify this:
N

Steps to Reproduce:
1. Create source pvc
2. Create 200+ csi-clones


Actual results:
Once clone limit is reached it can take exponentially longer to complete

Expected results:
Users are alerted of this limit and advised to use VolumeSnapshot

Additional info:
Doc recommendation draft: https://docs.google.com/document/d/1_I5ayeVHtvP5Has1dpdGNkpPOUESZL0Gb-IfSi6O8sE/edit