2238406 – Handle limits when cloning CephFS snapshots

This bug has been migrated to another issue tracking site. It has been closed here and may no longer be being monitored.

If you would like to get updates for this issue, or to participate in it, you may do so at Red Hat Issue Tracker .

Bug 2238406 - Handle limits when cloning CephFS snapshots

Summary: Handle limits when cloning CephFS snapshots

Keywords:
Status:	CLOSED MIGRATED
Alias:	None
Product:	Red Hat OpenStack
Classification:	Red Hat
Component:	openstack-manila
Sub Component:
Version:	17.1 (Wallaby)
Hardware:	All
OS:	All
Priority:	medium
Severity:	medium
Target Milestone:	---
Target Release:	---
Assignee:	OpenStack Manila Bugzilla Bot
QA Contact:	vhariria
Docs Contact:
URL:
Whiteboard:
Depends On:	2196829
Blocks:
TreeView+	depends on / blocked

Reported:	2023-09-11 17:31 UTC by Goutham Pacha Ravi
Modified:	2024-12-11 19:18 UTC (History)
CC List:	1 user (show)
Fixed In Version:
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2024-12-11 19:17:54 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Issue Tracker	OSP-28489	0	None	None	None	2024-12-11 19:17:54 UTC
Red Hat Issue Tracker	OSP-33211	0	None	None	None	2024-12-11 19:18:33 UTC

Description Goutham Pacha Ravi 2023-09-11 17:31:18 UTC

Description of problem:

Creation of CephFS snapshot clones (manila shares created from snapshots) may involve long running clone creation operation on the Ceph cluster. Ceph can only handle a particular number of these clone operations concurrently. If too many clone requests arrive, they are queued.  

In RHCS 6 and beyond, this queuing behavior is turned off by default. When the limit of concurrent clones has been hit, the "ceph subvolume clone create" command returns with a retryable error, "EAGAIN", instead of queueing clone operations: 

https://github.com/ceph/ceph/pull/52670/
https://tracker.ceph.com/issues/59714
https://bugzilla.redhat.com/show_bug.cgi?id=2196829

The CephFS driver in manila must handle this error appropriately and either perform retries within reason or return an error to the end user.



Version-Release number of selected component (if applicable): RHOSP 17.1 and beyond (earlier versions of RHOSP do not support snapshot cloning) 


How reproducible: 


Steps to Reproduce:
1. Create a manila share
2. Mount the share, write some data to the share
3. Create a snapshot of the share
4. Create more than four shares from the snapshot (the default limit for concurrent clone operations on Ceph is 4).
5. The fifth and subsequent shares must be set to "error" and the share manager will have a log with "EAGAIN" error from the ceph cluster


Additional Info:

The default ``max_concurrent_clones`` value can be set via configuration:

`ceph config set mgr mgr/volumes/max_concurrent_clones <value>`

The clone queue can be re-enabled with:

`ceph config set mgr mgr/volumes/snapshot_clone_no_wait false`

Note You need to log in before you can comment on or make changes to this bug.