1810525 – [GSS][RFE] [Tracker for Ceph # BZ # 1910272] Deletion of data is not allowed after the Ceph cluster reaches osd-full-ratio threshold.

Bug 1810525 - [GSS][RFE] [Tracker for Ceph # BZ # 1910272] Deletion of data is not allowed after the Ceph cluster reaches osd-full-ratio threshold.

Summary: [GSS][RFE] [Tracker for Ceph # BZ # 1910272] Deletion of data is not allowed ...

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat OpenShift Data Foundation
Classification:	Red Hat Storage
Component:	ceph
Sub Component:
Version:	4.2
Hardware:	Unspecified
OS:	Unspecified
Priority:	medium
Severity:	medium
Target Milestone:	---
Target Release:	ODF 4.9.0
Assignee:	Kotresh HR
QA Contact:	Anna Sandler
Docs Contact:
URL:
Whiteboard:
Depends On:	1897351
Blocks:	1841426 2011326
TreeView+	depends on / blocked

Reported:	2020-03-05 12:40 UTC by Ashish Singh
Modified:	2023-08-09 16:37 UTC (History)
CC List:	19 users (show)
Fixed In Version:	v4.9.0-182.ci
Doc Type:	Enhancement
Doc Text:	.Deletion of data is allowed when the storage cluster is full Previously, when the storage cluster was full, the Ceph Manager hung on checking pool permissions while reading the configuration file. The Ceph Metadata Server (MDS) did not allow write operations to occur when the Ceph OSD was full, resulting in an `ENOSPACE` error. When the storage cluster hit full ratio, users could not delete data to free space using the Ceph Manager volume plugin. With this release, the new FULL capability is introduced. With the FULL capability, the Ceph Manager bypasses the Ceph OSD full check. The `client_check_pool_permission` option is disabled by default whereas, in previous releases, it was enabled. With the Ceph Manager having FULL capabilities, the MDS no longer blocks Ceph Manager calls. This results in allowing the Ceph Manager to free up space by deleting subvolumes and snapshots when a storage cluster is full.
Clone Of:
Environment:
Last Closed:	2021-12-13 17:44:23 UTC
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Product Errata	RHSA-2021:5086	0	None	None	None	2021-12-13 17:44:44 UTC

Description Ashish Singh 2020-03-05 12:40:29 UTC

* Description of problem (please be detailed as possible and provide log
snippests):

Once the Ceph cluster reaches 'osd-full-ratio' threshold, the deletion of data is not allowed, to free-up space.
This works as expected as deletion also requires "write" access which is denied when the cluster is Full.

However, A user shouldn't need to increase the full-ratio in order to allow deletion. This is not a good user experience. 


* Version of all relevant components (if applicable):
RHOCS 4.2.2


* Does this issue impact your ability to continue to work with the product
(please explain in detail what is the user impact)?
Partially


* Is there any workaround available to the best of your knowledge?
Yes, increase the threshold manually and delete data.


* Rate from 1 - 5 the complexity of the scenario you performed that caused this
bug (1 - very simple, 5 - very complex)?
3


* Can this issue reproducible?
Yes

* Can this issue reproduce from the UI?
No

* If this is a regression, please provide more details to justify this:
No

* Steps to Reproduce:
NA

* Actual results:
NA

* Expected results:
NA

* Additional info:
The same behavior is present in RHCS as well.
https://access.redhat.com/solutions/3001761

Comment 2 Michael Adam 2020-03-05 16:47:50 UTC

As Ashish Singh is from the GSS team, re-adding the [GSS] tag. Or should it only be used for BZs with customer cases attached?

Also moving this out of 4.3

Comment 3 Elad 2020-03-16 11:16:36 UTC

We can keep the title as is.

Comment 4 Michael Adam 2020-05-04 07:42:42 UTC

Is this something we could or would have to address in ceph itself?

Comment 5 Josh Durgin 2020-05-06 22:02:00 UTC

(In reply to Michael Adam from comment #4)
> Is this something we could or would have to address in ceph itself?

Ceph itself can't add more capacity, OCS may be able to, so it should be addressed there.

Comment 6 Josh Durgin 2020-05-06 22:03:53 UTC

FYI deletion in Ceph is allowed when it is full. What version and which commands are not working?

Comment 8 Mudit Agarwal 2020-09-28 05:36:49 UTC

Doesn't seem like this RFE was priortized in 4.6 and now when we are nearly approaching dev freeze, I don't think we have a chance to fix it.
Moving it out, please retarget if some one thinks otherwise. 

Also, we are in early phase of 4.7 so if we don't want this BZ to drag further now is the time to priortize it.

Comment 9 Orit Wasserman 2020-12-23 14:34:08 UTC

In order to enable deletion when cluster is we need to enable it in Ceph MGR: https://bugzilla.redhat.com/show_bug.cgi?id=1910272
We require change is Ceph-CSI as well.
As we won't need any changes in OCS operator I am moving the BZ to Ceph-CSI.

Comment 11 Humble Chirammal 2021-05-05 06:17:53 UTC

(In reply to Orit Wasserman from comment #9)
> In order to enable deletion when cluster is we need to enable it in Ceph
> MGR: https://bugzilla.redhat.com/show_bug.cgi?id=1910272
> We require change is Ceph-CSI as well.
> As we won't need any changes in OCS operator I am moving the BZ to Ceph-CSI.

Marking this bug for Ceph CSI component as Tracker till we get it addressed in Ceph Core components.

Comment 14 Mudit Agarwal 2021-09-22 08:58:33 UTC

https://bugzilla.redhat.com/show_bug.cgi?id=1910272 is acked for 5.0z1

Comment 16 Mudit Agarwal 2021-09-28 13:01:43 UTC

This is getting fixed in 5.0z1

Comment 24 Anna Sandler 2021-12-02 18:29:17 UTC

tested the workflow: 
wrote data to the cluster using ocs-ci function write_data_via_fio() until cluster was almost full
sh-4.4$ ceph -s
  cluster:
    id:     1ae93eb4-edd9-4942-a27e-13dba341f1f2
    health: HEALTH_ERR
            3 full osd(s)
            3 pool(s) full


then deleted the data using delete_fio_data()
data was deleted as expected

sh-4.4$ ceph -s
  cluster:
    id:     1ae93eb4-edd9-4942-a27e-13dba341f1f2
    health: HEALTH_OK
 


moving to verified.

Comment 25 Anna Sandler 2021-12-02 18:29:57 UTC

tested the workflow: 
wrote data to the cluster using ocs-ci function write_data_via_fio() until cluster was almost full
sh-4.4$ ceph -s
  cluster:
    id:     1ae93eb4-edd9-4942-a27e-13dba341f1f2
    health: HEALTH_ERR
            3 full osd(s)
            3 pool(s) full


then deleted the data using delete_fio_data()
data was deleted as expected

sh-4.4$ ceph -s
  cluster:
    id:     1ae93eb4-edd9-4942-a27e-13dba341f1f2
    health: HEALTH_OK
 


moving to verified.

Comment 27 errata-xmlrpc 2021-12-13 17:44:23 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: Red Hat OpenShift Data Foundation 4.9.0 enhancement, security, and bug fix update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2021:5086

Note You need to log in before you can comment on or make changes to this bug.