Bug 2238406

Summary: Handle limits when cloning CephFS snapshots
Product: Red Hat OpenStack Reporter: Goutham Pacha Ravi <gouthamr>
Component: openstack-manilaAssignee: OpenStack Manila Bugzilla Bot <openstack-manila-bugs>
Status: CLOSED MIGRATED QA Contact: vhariria
Severity: medium Docs Contact:
Priority: medium    
Version: 17.1 (Wallaby)CC: ashrodri
Target Milestone: ---Keywords: Triaged
Target Release: ---   
Hardware: All   
OS: All   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2024-12-11 19:17:54 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 2196829    
Bug Blocks:    

Description Goutham Pacha Ravi 2023-09-11 17:31:18 UTC
Description of problem:

Creation of CephFS snapshot clones (manila shares created from snapshots) may involve long running clone creation operation on the Ceph cluster. Ceph can only handle a particular number of these clone operations concurrently. If too many clone requests arrive, they are queued.  

In RHCS 6 and beyond, this queuing behavior is turned off by default. When the limit of concurrent clones has been hit, the "ceph subvolume clone create" command returns with a retryable error, "EAGAIN", instead of queueing clone operations: 

https://github.com/ceph/ceph/pull/52670/
https://tracker.ceph.com/issues/59714
https://bugzilla.redhat.com/show_bug.cgi?id=2196829

The CephFS driver in manila must handle this error appropriately and either perform retries within reason or return an error to the end user.



Version-Release number of selected component (if applicable): RHOSP 17.1 and beyond (earlier versions of RHOSP do not support snapshot cloning) 


How reproducible: 


Steps to Reproduce:
1. Create a manila share
2. Mount the share, write some data to the share
3. Create a snapshot of the share
4. Create more than four shares from the snapshot (the default limit for concurrent clone operations on Ceph is 4).
5. The fifth and subsequent shares must be set to "error" and the share manager will have a log with "EAGAIN" error from the ceph cluster


Additional Info:

The default ``max_concurrent_clones`` value can be set via configuration:

`ceph config set mgr mgr/volumes/max_concurrent_clones <value>`

The clone queue can be re-enabled with:

`ceph config set mgr mgr/volumes/snapshot_clone_no_wait false`