Description of problem: Allow krbd to delete images or snapshots when cluster is full to free space. Version-Release number of selected component (if applicable): How reproducible: Steps to Reproduce: 1. Fill the cluster till the full alert 2. Delete an image or a snapshot to free space 3. Actual results: Error Expected results: Successfully delete the image or snapshot Additional info:
Please specify the severity of this bug. Severity is defined here: https://bugzilla.redhat.com/page.cgi?id=fields.html#bug_severity.
krbd doesn't delete images -- it's just an block-layer IO driver and the rbd CLI already sets this flag for deletion operations. I suspect this is really a request to set the flag in the MGR when processing background deletions? This would also require changes in ceph-csi since it too will get blocked by a full cluster. It's also important to point out that this flag does not provide a mechanism to delete data when the cluster has already passed the configured full ratio (i.e. it's not a magic bullet).
(In reply to Jason Dillaman from comment #2) > krbd doesn't delete images -- it's just an block-layer IO driver and the rbd > CLI already sets this flag for deletion operations. I suspect this is really > a request to set the flag in the MGR when processing background deletions? It should have been on RBD CLI and as he flag is set than all is good. I opened a separate BZ for the MGR changes: https://bugzilla.redhat.com/show_bug.cgi?id=1910272 > This would also require changes in ceph-csi since it too will get blocked by > a full cluster. I can move this BZ to Ceph-CSI or open a new BZ, what do you prefer? > It's also important to point out that this flag does not > provide a mechanism to delete data when the cluster has already passed the > configured full ratio (i.e. it's not a magic bullet) I didn't expect it but it is good to clarify it. Thanks, Orit
I'll use this BZ for tracking the MGR changes (since it's an RBD issue). You can take the existing MGR one and move it to CephFS since otherwise it most likely won't get assessed by the correct team (CephFS team is responsible for their MGR module).
This looks like it is going to be a much larger issue to address since it's very easy to block the MGR command processing pathway when the cluster is full, thereby blocking incoming "rbd task add trash remove <image-spec>" command processing before it's able to reach any code where the FULL_TRY flag has been applied. Plus, any MGR module command can block and therefore block processing other other unrelated MGR module commands.
Any updates on this bug? It's assigned but I don't see any substantive updates since January.
We've just seen this bug in the lab and it has forced us to rebuild the cluster. I believe that its severity should be marked as high or even urgent.
Note there *is* a documented workaround to delete PVs and restore storage, but we have not tested it recently https://access.redhat.com/solutions/6387181
*** Bug 2227750 has been marked as a duplicate of this bug. ***