1910288 – Set CEPH_OSD_FLAG_FULL_TRY to enable deletion when cluster is full

Bug 1910288 - Set CEPH_OSD_FLAG_FULL_TRY to enable deletion when cluster is full

Summary: Set CEPH_OSD_FLAG_FULL_TRY to enable deletion when cluster is full

Keywords:
Status:	NEW
Alias:	None
Product:	Red Hat Ceph Storage
Classification:	Red Hat Storage
Component:	RBD
Sub Component:
Version:	4.2
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	high
Target Milestone:	---
Target Release:	9.1
Assignee:	Ram Raja
QA Contact:	Sunil Angadi
Docs Contact:
URL:
Whiteboard:
Duplicates (1):	2227750 (view as bug list)
Depends On:
Blocks:	1897351
TreeView+	depends on / blocked

Reported:	2020-12-23 10:25 UTC by Orit Wasserman
Modified:	2025-08-05 12:16 UTC (History)
CC List:	9 users (show)
Fixed In Version:
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Ceph Project Bug Tracker	48710	0	None	None	None	2020-12-23 14:11:03 UTC

Description Orit Wasserman 2020-12-23 10:25:28 UTC

Description of problem:
Allow krbd to delete images or snapshots when cluster is full to free space.

Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1. Fill the cluster till the full alert
2. Delete an image or a snapshot to free space
3.

Actual results:
Error

Expected results:
Successfully delete the image or snapshot

Additional info:

Comment 1 RHEL Program Management 2020-12-23 10:25:32 UTC

Please specify the severity of this bug. Severity is defined here:
https://bugzilla.redhat.com/page.cgi?id=fields.html#bug_severity.

Comment 2 Jason Dillaman 2020-12-23 14:07:26 UTC

krbd doesn't delete images -- it's just an block-layer IO driver and the rbd CLI already sets this flag for deletion operations. I suspect this is really a request to set the flag in the MGR when processing background deletions? This would also require changes in ceph-csi since it too will get blocked by a full cluster. It's also important to point out that this flag does not provide a mechanism to delete data when the cluster has already passed the configured full ratio (i.e. it's not a magic bullet).

Comment 4 Orit Wasserman 2020-12-23 14:20:19 UTC

(In reply to Jason Dillaman from comment #2)
> krbd doesn't delete images -- it's just an block-layer IO driver and the rbd
> CLI already sets this flag for deletion operations. I suspect this is really
> a request to set the flag in the MGR when processing background deletions?

It should have been on RBD CLI and as he flag is set than all is good.

I opened a separate BZ for the MGR changes: https://bugzilla.redhat.com/show_bug.cgi?id=1910272

> This would also require changes in ceph-csi since it too will get blocked by
> a full cluster. 

I can move this BZ to Ceph-CSI or open a new BZ, what do you prefer?

> It's also important to point out that this flag does not
> provide a mechanism to delete data when the cluster has already passed the
> configured full ratio (i.e. it's not a magic bullet)

I didn't expect it but it is good to clarify it.
Thanks,
Orit

Comment 5 Jason Dillaman 2020-12-23 14:23:25 UTC

I'll use this BZ for tracking the MGR changes (since it's an RBD issue). You can take the existing MGR one and move it to CephFS since otherwise it most likely won't get assessed by the correct team (CephFS team is responsible for their MGR module).

Comment 6 Jason Dillaman 2020-12-23 19:39:44 UTC

This looks like it is going to be a much larger issue to address since it's very easy to block the MGR command processing pathway when the cluster is full, thereby blocking incoming "rbd task add trash remove <image-spec>" command processing before it's able to reach any code where the FULL_TRY flag has been applied. Plus, any MGR module command can block and therefore block processing other other unrelated MGR module commands.

Comment 11 Adam Litke 2021-08-02 20:56:03 UTC

Any updates on this bug?  It's assigned but I don't see any substantive updates since January.

Comment 13 Dan Kenigsberg 2022-08-21 07:19:53 UTC

We've just seen this bug in the lab and it has forced us to rebuild the cluster. I believe that its severity should be marked as high or even urgent.

Comment 14 Dan Kenigsberg 2022-08-21 07:31:55 UTC

Note there *is* a documented workaround to delete PVs and restore storage, but we have not tested it recently https://access.redhat.com/solutions/6387181

Comment 28 Mudit Agarwal 2024-05-08 07:26:10 UTC

*** Bug 2227750 has been marked as a duplicate of this bug. ***

Note You need to log in before you can comment on or make changes to this bug.