Bug 1810525

Summary: [GSS][RFE] [Tracker for Ceph # BZ # 1910272] Deletion of data is not allowed after the Ceph cluster reaches osd-full-ratio threshold.
Product: [Red Hat Storage] Red Hat OpenShift Data Foundation Reporter: Ashish Singh <assingh>
Component: cephAssignee: Kotresh HR <khiremat>
Status: CLOSED ERRATA QA Contact: Anna Sandler <asandler>
Severity: medium Docs Contact:
Priority: medium    
Version: 4.2CC: bkunal, bniver, ebenahar, etamir, gmeno, hchiramm, jdurgin, kbg, khiremat, kramdoss, madam, mrajanna, muagarwa, ndevos, ocs-bugs, odf-bz-bot, owasserm, rcyriac, sostapov
Target Milestone: ---Keywords: FutureFeature, Tracking
Target Release: ODF 4.9.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: v4.9.0-182.ci Doc Type: Enhancement
Doc Text:
.Deletion of data is allowed when the storage cluster is full Previously, when the storage cluster was full, the Ceph Manager hung on checking pool permissions while reading the configuration file. The Ceph Metadata Server (MDS) did not allow write operations to occur when the Ceph OSD was full, resulting in an `ENOSPACE` error. When the storage cluster hit full ratio, users could not delete data to free space using the Ceph Manager volume plugin. With this release, the new FULL capability is introduced. With the FULL capability, the Ceph Manager bypasses the Ceph OSD full check. The `client_check_pool_permission` option is disabled by default whereas, in previous releases, it was enabled. With the Ceph Manager having FULL capabilities, the MDS no longer blocks Ceph Manager calls. This results in allowing the Ceph Manager to free up space by deleting subvolumes and snapshots when a storage cluster is full.
Story Points: ---
Clone Of: Environment:
Last Closed: 2021-12-13 17:44:23 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1897351    
Bug Blocks: 1841426, 2011326    

Description Ashish Singh 2020-03-05 12:40:29 UTC
* Description of problem (please be detailed as possible and provide log
snippests):

Once the Ceph cluster reaches 'osd-full-ratio' threshold, the deletion of data is not allowed, to free-up space.
This works as expected as deletion also requires "write" access which is denied when the cluster is Full.

However, A user shouldn't need to increase the full-ratio in order to allow deletion. This is not a good user experience. 


* Version of all relevant components (if applicable):
RHOCS 4.2.2


* Does this issue impact your ability to continue to work with the product
(please explain in detail what is the user impact)?
Partially


* Is there any workaround available to the best of your knowledge?
Yes, increase the threshold manually and delete data.


* Rate from 1 - 5 the complexity of the scenario you performed that caused this
bug (1 - very simple, 5 - very complex)?
3


* Can this issue reproducible?
Yes

* Can this issue reproduce from the UI?
No

* If this is a regression, please provide more details to justify this:
No

* Steps to Reproduce:
NA

* Actual results:
NA

* Expected results:
NA

* Additional info:
The same behavior is present in RHCS as well.
https://access.redhat.com/solutions/3001761

Comment 2 Michael Adam 2020-03-05 16:47:50 UTC
As Ashish Singh is from the GSS team, re-adding the [GSS] tag. Or should it only be used for BZs with customer cases attached?

Also moving this out of 4.3

Comment 3 Elad 2020-03-16 11:16:36 UTC
We can keep the title as is.

Comment 4 Michael Adam 2020-05-04 07:42:42 UTC
Is this something we could or would have to address in ceph itself?

Comment 5 Josh Durgin 2020-05-06 22:02:00 UTC
(In reply to Michael Adam from comment #4)
> Is this something we could or would have to address in ceph itself?

Ceph itself can't add more capacity, OCS may be able to, so it should be addressed there.

Comment 6 Josh Durgin 2020-05-06 22:03:53 UTC
FYI deletion in Ceph is allowed when it is full. What version and which commands are not working?

Comment 8 Mudit Agarwal 2020-09-28 05:36:49 UTC
Doesn't seem like this RFE was priortized in 4.6 and now when we are nearly approaching dev freeze, I don't think we have a chance to fix it.
Moving it out, please retarget if some one thinks otherwise. 

Also, we are in early phase of 4.7 so if we don't want this BZ to drag further now is the time to priortize it.

Comment 9 Orit Wasserman 2020-12-23 14:34:08 UTC
In order to enable deletion when cluster is we need to enable it in Ceph MGR: https://bugzilla.redhat.com/show_bug.cgi?id=1910272
We require change is Ceph-CSI as well.
As we won't need any changes in OCS operator I am moving the BZ to Ceph-CSI.

Comment 11 Humble Chirammal 2021-05-05 06:17:53 UTC
(In reply to Orit Wasserman from comment #9)
> In order to enable deletion when cluster is we need to enable it in Ceph
> MGR: https://bugzilla.redhat.com/show_bug.cgi?id=1910272
> We require change is Ceph-CSI as well.
> As we won't need any changes in OCS operator I am moving the BZ to Ceph-CSI.

Marking this bug for Ceph CSI component as Tracker till we get it addressed in Ceph Core components.

Comment 14 Mudit Agarwal 2021-09-22 08:58:33 UTC
https://bugzilla.redhat.com/show_bug.cgi?id=1910272 is acked for 5.0z1

Comment 16 Mudit Agarwal 2021-09-28 13:01:43 UTC
This is getting fixed in 5.0z1

Comment 24 Anna Sandler 2021-12-02 18:29:17 UTC
tested the workflow: 
wrote data to the cluster using ocs-ci function write_data_via_fio() until cluster was almost full
sh-4.4$ ceph -s
  cluster:
    id:     1ae93eb4-edd9-4942-a27e-13dba341f1f2
    health: HEALTH_ERR
            3 full osd(s)
            3 pool(s) full


then deleted the data using delete_fio_data()
data was deleted as expected

sh-4.4$ ceph -s
  cluster:
    id:     1ae93eb4-edd9-4942-a27e-13dba341f1f2
    health: HEALTH_OK
 


moving to verified.

Comment 25 Anna Sandler 2021-12-02 18:29:57 UTC
tested the workflow: 
wrote data to the cluster using ocs-ci function write_data_via_fio() until cluster was almost full
sh-4.4$ ceph -s
  cluster:
    id:     1ae93eb4-edd9-4942-a27e-13dba341f1f2
    health: HEALTH_ERR
            3 full osd(s)
            3 pool(s) full


then deleted the data using delete_fio_data()
data was deleted as expected

sh-4.4$ ceph -s
  cluster:
    id:     1ae93eb4-edd9-4942-a27e-13dba341f1f2
    health: HEALTH_OK
 


moving to verified.

Comment 27 errata-xmlrpc 2021-12-13 17:44:23 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: Red Hat OpenShift Data Foundation 4.9.0 enhancement, security, and bug fix update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2021:5086