Bug 1467352 - [RFE] Enable an easy method to delete objects in a Ceph pool if an OSD hit full_ratio
Summary: [RFE] Enable an easy method to delete objects in a Ceph pool if an OSD hit fu...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Ceph Storage
Classification: Red Hat Storage
Component: RBD
Version: 3.0
Hardware: x86_64
OS: Linux
high
high
Target Milestone: rc
: 3.0
Assignee: Jason Dillaman
QA Contact: Jason Dillaman
URL:
Whiteboard:
Depends On:
Blocks: 1494421
TreeView+ depends on / blocked
 
Reported: 2017-07-03 13:30 UTC by Vimal Kumar
Modified: 2021-03-11 15:23 UTC (History)
9 users (show)

Fixed In Version: RHEL: ceph-12.1.2-1.el7cp Ubuntu: ceph_12.1.2-2redhat1xenial
Doc Type: Enhancement
Doc Text:
.Deleting images and snapshots from full clusters is now easier When a cluster reaches its `full_ratio`, the following commands can be used to remove Ceph Block Device images and snapshots: * `rbd remove` * `rbd snap rm` * `rbd snap unprotect` * `rbd snap purge`
Clone Of:
Environment:
Last Closed: 2017-12-05 23:35:34 UTC
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2017:3387 0 normal SHIPPED_LIVE Red Hat Ceph Storage 3.0 bug fix and enhancement update 2017-12-06 03:03:45 UTC

Description Vimal Kumar 2017-07-03 13:30:48 UTC
RFE/Bugzilla:

~~~

a) Description of problem:

When an OSD hit its full_ratio, the cluster stops any I/O coming in. From an OpenStack perspective, an OSD hitting the full_ratio will pause the VMs. 

In order to delete objects and free space, a manual intervention is needed to set cluster flags such as norebalance, increase the full_ratio (0.95) a bit higher to allow I/O, and then delete objects from the OSP side. 

Since this involves quite a few steps, it is not the easiest to fix. We need a better solution which is easier to follow, perhaps from both Ceph and OpenStack.


b) Version-Release number of selected component (if applicable):

RHCS2.x

c) How reproducible:

Always

d) Steps to Reproduce:
    1. Use OpenStack with Ceph as backing storage.
    2. Fill the OSDs until it hit the full_ratio.
    3. Make sure an I/O error is hit for further writes.


e) Additional info:

It may be good to understand how to fix this from an OpenStack perspective as well, and such a feature would help administrators not to meddle with the Ceph cluster but rather get it fixed from the OSP side.

Comment 2 Josh Durgin 2017-07-19 00:25:08 UTC
This is really an rbd feature - to use the librados FORCE_FULL_TRY functionality for deletes - and removing an rbd image or snapshot when the cluster is full is possible in luminous.

Comment 7 Harish NV Rao 2017-07-21 07:59:44 UTC
@Jason, any specific config settings or steps to be done before deleting rbd image or snapshot when the cluster is full?

Comment 8 Jason Dillaman 2017-07-21 11:49:49 UTC
@Harish: negative -- it *should* just allow you to run the following commands when the cluster is full: "rbd remove", "rbd snap rm", "rbd snap unprotect", and "rbd snap purge".

Comment 12 Jason Dillaman 2017-10-25 01:26:45 UTC
$ ceph health
HEALTH_ERR full flag(s) set; 3 full osd(s)

$ ceph df
GLOBAL:
    SIZE       AVAIL     RAW USED     %RAW USED 
    30911M      623M       30288M         97.98 
POOLS:
    NAME     ID     USED      %USED      MAX AVAIL     OBJECTS 
    rbd      1      9000M     100.00             0        2262 

$ rbd snap ls foo 
SNAPID NAME     SIZE TIMESTAMP                
     4 1    10240 MB Tue Oct 24 21:08:11 2017 
     5 2    10240 MB Tue Oct 24 21:08:29 2017 
     6 3    10240 MB Tue Oct 24 21:08:51 2017 
     7 4    10240 MB Tue Oct 24 21:09:32 2017 
     8 5    10240 MB Tue Oct 24 21:10:01 2017 
     9 6    10240 MB Tue Oct 24 21:11:10 2017 
    10 7    10240 MB Tue Oct 24 21:14:01 2017 

$ rbd snap unprotect foo@7
$ rbd snap unprotect foo@1
$ rbd snap rm foo@7
$ rbd snap rm foo@1

$ rbd snap purge foo
Removing all snapshots: 100% complete...done.

$ ceph health
HEALTH_ERR full flag(s) set; 3 full osd(s)

$ rbd rm foo
Removing image: 100% complete...done.

$ ceph health
HEALTH_OK

Comment 15 errata-xmlrpc 2017-12-05 23:35:34 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2017:3387


Note You need to log in before you can comment on or make changes to this bug.