Bug 1467352 - [RFE] Enable an easy method to delete objects in a Ceph pool if an OSD hit full_ratio
[RFE] Enable an easy method to delete objects in a Ceph pool if an OSD hit fu...
Product: Red Hat Ceph Storage
Classification: Red Hat
Component: RBD (Show other bugs)
x86_64 Linux
high Severity high
: rc
: 3.0
Assigned To: Jason Dillaman
Jason Dillaman
: FutureFeature
Depends On:
Blocks: 1494421
  Show dependency treegraph
Reported: 2017-07-03 09:30 EDT by Vimal Kumar
Modified: 2017-12-05 18:35 EST (History)
8 users (show)

See Also:
Fixed In Version: RHEL: ceph-12.1.2-1.el7cp Ubuntu: ceph_12.1.2-2redhat1xenial
Doc Type: Enhancement
Doc Text:
.Deleting images and snapshots from full clusters is now easier When a cluster reaches its `full_ratio`, the following commands can be used to remove Ceph Block Device images and snapshots: * `rbd remove` * `rbd snap rm` * `rbd snap unprotect` * `rbd snap purge`
Story Points: ---
Clone Of:
Last Closed: 2017-12-05 18:35:34 EST
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---

Attachments (Terms of Use)

  None (edit)
Description Vimal Kumar 2017-07-03 09:30:48 EDT


a) Description of problem:

When an OSD hit its full_ratio, the cluster stops any I/O coming in. From an OpenStack perspective, an OSD hitting the full_ratio will pause the VMs. 

In order to delete objects and free space, a manual intervention is needed to set cluster flags such as norebalance, increase the full_ratio (0.95) a bit higher to allow I/O, and then delete objects from the OSP side. 

Since this involves quite a few steps, it is not the easiest to fix. We need a better solution which is easier to follow, perhaps from both Ceph and OpenStack.

b) Version-Release number of selected component (if applicable):


c) How reproducible:


d) Steps to Reproduce:
    1. Use OpenStack with Ceph as backing storage.
    2. Fill the OSDs until it hit the full_ratio.
    3. Make sure an I/O error is hit for further writes.

e) Additional info:

It may be good to understand how to fix this from an OpenStack perspective as well, and such a feature would help administrators not to meddle with the Ceph cluster but rather get it fixed from the OSP side.
Comment 2 Josh Durgin 2017-07-18 20:25:08 EDT
This is really an rbd feature - to use the librados FORCE_FULL_TRY functionality for deletes - and removing an rbd image or snapshot when the cluster is full is possible in luminous.
Comment 7 Harish NV Rao 2017-07-21 03:59:44 EDT
@Jason, any specific config settings or steps to be done before deleting rbd image or snapshot when the cluster is full?
Comment 8 Jason Dillaman 2017-07-21 07:49:49 EDT
@Harish: negative -- it *should* just allow you to run the following commands when the cluster is full: "rbd remove", "rbd snap rm", "rbd snap unprotect", and "rbd snap purge".
Comment 12 Jason Dillaman 2017-10-24 21:26:45 EDT
$ ceph health
HEALTH_ERR full flag(s) set; 3 full osd(s)

$ ceph df
    SIZE       AVAIL     RAW USED     %RAW USED 
    30911M      623M       30288M         97.98 
    NAME     ID     USED      %USED      MAX AVAIL     OBJECTS 
    rbd      1      9000M     100.00             0        2262 

$ rbd snap ls foo 
SNAPID NAME     SIZE TIMESTAMP                
     4 1    10240 MB Tue Oct 24 21:08:11 2017 
     5 2    10240 MB Tue Oct 24 21:08:29 2017 
     6 3    10240 MB Tue Oct 24 21:08:51 2017 
     7 4    10240 MB Tue Oct 24 21:09:32 2017 
     8 5    10240 MB Tue Oct 24 21:10:01 2017 
     9 6    10240 MB Tue Oct 24 21:11:10 2017 
    10 7    10240 MB Tue Oct 24 21:14:01 2017 

$ rbd snap unprotect foo@7
$ rbd snap unprotect foo@1
$ rbd snap rm foo@7
$ rbd snap rm foo@1

$ rbd snap purge foo
Removing all snapshots: 100% complete...done.

$ ceph health
HEALTH_ERR full flag(s) set; 3 full osd(s)

$ rbd rm foo
Removing image: 100% complete...done.

$ ceph health
Comment 15 errata-xmlrpc 2017-12-05 18:35:34 EST
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.


Note You need to log in before you can comment on or make changes to this bug.