Bug 1910288

Summary:	Set CEPH_OSD_FLAG_FULL_TRY to enable deletion when cluster is full
Product:	[Red Hat Storage] Red Hat Ceph Storage	Reporter:	Orit Wasserman <owasserm>
Component:	RBD	Assignee:	Ram Raja <rraja>
Status:	NEW ---	QA Contact:	Sunil Angadi <sangadi>
Severity:	high	Docs Contact:
Priority:	unspecified
Version:	4.2	CC:	acaro, alitke, ceph-eng-bugs, danken, idryomov, sangadi, soh, vereddy, vumrao
Target Milestone:	---	Keywords:	RFE
Target Release:	9.1
Hardware:	Unspecified
OS:	Unspecified
Whiteboard:
Fixed In Version:		Doc Type:	If docs needed, set a value
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:		Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:
Bug Depends On:
Bug Blocks:	1897351

Description Orit Wasserman 2020-12-23 10:25:28 UTC

Description of problem:
Allow krbd to delete images or snapshots when cluster is full to free space.

Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1. Fill the cluster till the full alert
2. Delete an image or a snapshot to free space
3.

Actual results:
Error

Expected results:
Successfully delete the image or snapshot

Additional info:

Comment 1 RHEL Program Management 2020-12-23 10:25:32 UTC

Please specify the severity of this bug. Severity is defined here:
https://bugzilla.redhat.com/page.cgi?id=fields.html#bug_severity.

Comment 2 Jason Dillaman 2020-12-23 14:07:26 UTC

krbd doesn't delete images -- it's just an block-layer IO driver and the rbd CLI already sets this flag for deletion operations. I suspect this is really a request to set the flag in the MGR when processing background deletions? This would also require changes in ceph-csi since it too will get blocked by a full cluster. It's also important to point out that this flag does not provide a mechanism to delete data when the cluster has already passed the configured full ratio (i.e. it's not a magic bullet).

Comment 4 Orit Wasserman 2020-12-23 14:20:19 UTC

(In reply to Jason Dillaman from comment #2)
> krbd doesn't delete images -- it's just an block-layer IO driver and the rbd
> CLI already sets this flag for deletion operations. I suspect this is really
> a request to set the flag in the MGR when processing background deletions?

It should have been on RBD CLI and as he flag is set than all is good.

I opened a separate BZ for the MGR changes: https://bugzilla.redhat.com/show_bug.cgi?id=1910272

> This would also require changes in ceph-csi since it too will get blocked by
> a full cluster. 

I can move this BZ to Ceph-CSI or open a new BZ, what do you prefer?

> It's also important to point out that this flag does not
> provide a mechanism to delete data when the cluster has already passed the
> configured full ratio (i.e. it's not a magic bullet)

I didn't expect it but it is good to clarify it.
Thanks,
Orit

Comment 5 Jason Dillaman 2020-12-23 14:23:25 UTC

I'll use this BZ for tracking the MGR changes (since it's an RBD issue). You can take the existing MGR one and move it to CephFS since otherwise it most likely won't get assessed by the correct team (CephFS team is responsible for their MGR module).

Comment 6 Jason Dillaman 2020-12-23 19:39:44 UTC

This looks like it is going to be a much larger issue to address since it's very easy to block the MGR command processing pathway when the cluster is full, thereby blocking incoming "rbd task add trash remove <image-spec>" command processing before it's able to reach any code where the FULL_TRY flag has been applied. Plus, any MGR module command can block and therefore block processing other other unrelated MGR module commands.

Comment 11 Adam Litke 2021-08-02 20:56:03 UTC

Any updates on this bug?  It's assigned but I don't see any substantive updates since January.

Comment 13 Dan Kenigsberg 2022-08-21 07:19:53 UTC

We've just seen this bug in the lab and it has forced us to rebuild the cluster. I believe that its severity should be marked as high or even urgent.

Comment 14 Dan Kenigsberg 2022-08-21 07:31:55 UTC

Note there *is* a documented workaround to delete PVs and restore storage, but we have not tested it recently https://access.redhat.com/solutions/6387181

Comment 28 Mudit Agarwal 2024-05-08 07:26:10 UTC

*** Bug 2227750 has been marked as a duplicate of this bug. ***