Bug 1467352 - [RFE] Enable an easy method to delete objects in a Ceph pool if an OSD hit full_ratio
[RFE] Enable an easy method to delete objects in a Ceph pool if an OSD hit fu...
Status: ON_QA
Product: Red Hat Ceph Storage
Classification: Red Hat
Component: RBD (Show other bugs)
3.0
x86_64 Linux
high Severity high
: rc
: 3.0
Assigned To: Jason Dillaman
Jason Dillaman
: FutureFeature
Depends On:
Blocks: 1494421
  Show dependency treegraph
 
Reported: 2017-07-03 09:30 EDT by Vimal Kumar
Modified: 2017-10-18 08:18 EDT (History)
8 users (show)

See Also:
Fixed In Version: RHEL: ceph-12.1.2-1.el7cp Ubuntu: ceph_12.1.2-2redhat1xenial
Doc Type: Enhancement
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed:
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)

  None (edit)
Description Vimal Kumar 2017-07-03 09:30:48 EDT
RFE/Bugzilla:

~~~

a) Description of problem:

When an OSD hit its full_ratio, the cluster stops any I/O coming in. From an OpenStack perspective, an OSD hitting the full_ratio will pause the VMs. 

In order to delete objects and free space, a manual intervention is needed to set cluster flags such as norebalance, increase the full_ratio (0.95) a bit higher to allow I/O, and then delete objects from the OSP side. 

Since this involves quite a few steps, it is not the easiest to fix. We need a better solution which is easier to follow, perhaps from both Ceph and OpenStack.


b) Version-Release number of selected component (if applicable):

RHCS2.x

c) How reproducible:

Always

d) Steps to Reproduce:
    1. Use OpenStack with Ceph as backing storage.
    2. Fill the OSDs until it hit the full_ratio.
    3. Make sure an I/O error is hit for further writes.


e) Additional info:

It may be good to understand how to fix this from an OpenStack perspective as well, and such a feature would help administrators not to meddle with the Ceph cluster but rather get it fixed from the OSP side.
Comment 2 Josh Durgin 2017-07-18 20:25:08 EDT
This is really an rbd feature - to use the librados FORCE_FULL_TRY functionality for deletes - and removing an rbd image or snapshot when the cluster is full is possible in luminous.
Comment 7 Harish NV Rao 2017-07-21 03:59:44 EDT
@Jason, any specific config settings or steps to be done before deleting rbd image or snapshot when the cluster is full?
Comment 8 Jason Dillaman 2017-07-21 07:49:49 EDT
@Harish: negative -- it *should* just allow you to run the following commands when the cluster is full: "rbd remove", "rbd snap rm", "rbd snap unprotect", and "rbd snap purge".

Note You need to log in before you can comment on or make changes to this bug.