Bug 2238406 - Handle limits when cloning CephFS snapshots
Summary: Handle limits when cloning CephFS snapshots
Keywords:
Status: NEW
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: openstack-manila
Version: 17.1 (Wallaby)
Hardware: All
OS: All
medium
medium
Target Milestone: ---
: ---
Assignee: OpenStack Manila Bugzilla Bot
QA Contact: vhariria
URL:
Whiteboard:
Depends On: 2196829
Blocks:
TreeView+ depends on / blocked
 
Reported: 2023-09-11 17:31 UTC by Goutham Pacha Ravi
Modified: 2023-09-27 14:14 UTC (History)
0 users

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Issue Tracker OSP-28489 0 None None None 2023-09-11 17:37:54 UTC

Description Goutham Pacha Ravi 2023-09-11 17:31:18 UTC
Description of problem:

Creation of CephFS snapshot clones (manila shares created from snapshots) may involve long running clone creation operation on the Ceph cluster. Ceph can only handle a particular number of these clone operations concurrently. If too many clone requests arrive, they are queued.  

In RHCS 6 and beyond, this queuing behavior is turned off by default. When the limit of concurrent clones has been hit, the "ceph subvolume clone create" command returns with a retryable error, "EAGAIN", instead of queueing clone operations: 

https://github.com/ceph/ceph/pull/52670/
https://tracker.ceph.com/issues/59714
https://bugzilla.redhat.com/show_bug.cgi?id=2196829

The CephFS driver in manila must handle this error appropriately and either perform retries within reason or return an error to the end user.



Version-Release number of selected component (if applicable): RHOSP 17.1 and beyond (earlier versions of RHOSP do not support snapshot cloning) 


How reproducible: 


Steps to Reproduce:
1. Create a manila share
2. Mount the share, write some data to the share
3. Create a snapshot of the share
4. Create more than four shares from the snapshot (the default limit for concurrent clone operations on Ceph is 4).
5. The fifth and subsequent shares must be set to "error" and the share manager will have a log with "EAGAIN" error from the ceph cluster


Additional Info:

The default ``max_concurrent_clones`` value can be set via configuration:

`ceph config set mgr mgr/volumes/max_concurrent_clones <value>`

The clone queue can be re-enabled with:

`ceph config set mgr mgr/volumes/snapshot_clone_no_wait false`


Note You need to log in before you can comment on or make changes to this bug.