1467352 – [RFE] Enable an easy method to delete objects in a Ceph pool if an OSD hit full_ratio

Bug 1467352 - [RFE] Enable an easy method to delete objects in a Ceph pool if an OSD hit full_ratio

Summary: [RFE] Enable an easy method to delete objects in a Ceph pool if an OSD hit fu...

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Ceph Storage
Classification:	Red Hat Storage
Component:	RBD
Sub Component:
Version:	3.0
Hardware:	x86_64
OS:	Linux
Priority:	high
Severity:	high
Target Milestone:	rc
Target Release:	3.0
Assignee:	Jason Dillaman
QA Contact:	Jason Dillaman
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:	1494421
TreeView+	depends on / blocked

Reported:	2017-07-03 13:30 UTC by Vimal Kumar
Modified:	2021-03-11 15:23 UTC (History)
CC List:	9 users (show)
Fixed In Version:	RHEL: ceph-12.1.2-1.el7cp Ubuntu: ceph_12.1.2-2redhat1xenial
Doc Type:	Enhancement
Doc Text:	.Deleting images and snapshots from full clusters is now easier When a cluster reaches its `full_ratio`, the following commands can be used to remove Ceph Block Device images and snapshots: * `rbd remove` * `rbd snap rm` * `rbd snap unprotect` * `rbd snap purge`
Clone Of:
Environment:
Last Closed:	2017-12-05 23:35:34 UTC
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Product Errata	RHBA-2017:3387	0	normal	SHIPPED_LIVE	Red Hat Ceph Storage 3.0 bug fix and enhancement update	2017-12-06 03:03:45 UTC

Description Vimal Kumar 2017-07-03 13:30:48 UTC

RFE/Bugzilla:

~~~

a) Description of problem:

When an OSD hit its full_ratio, the cluster stops any I/O coming in. From an OpenStack perspective, an OSD hitting the full_ratio will pause the VMs. 

In order to delete objects and free space, a manual intervention is needed to set cluster flags such as norebalance, increase the full_ratio (0.95) a bit higher to allow I/O, and then delete objects from the OSP side. 

Since this involves quite a few steps, it is not the easiest to fix. We need a better solution which is easier to follow, perhaps from both Ceph and OpenStack.


b) Version-Release number of selected component (if applicable):

RHCS2.x

c) How reproducible:

Always

d) Steps to Reproduce:
    1. Use OpenStack with Ceph as backing storage.
    2. Fill the OSDs until it hit the full_ratio.
    3. Make sure an I/O error is hit for further writes.


e) Additional info:

It may be good to understand how to fix this from an OpenStack perspective as well, and such a feature would help administrators not to meddle with the Ceph cluster but rather get it fixed from the OSP side.

Comment 2 Josh Durgin 2017-07-19 00:25:08 UTC

This is really an rbd feature - to use the librados FORCE_FULL_TRY functionality for deletes - and removing an rbd image or snapshot when the cluster is full is possible in luminous.

Comment 7 Harish NV Rao 2017-07-21 07:59:44 UTC

@Jason, any specific config settings or steps to be done before deleting rbd image or snapshot when the cluster is full?

Comment 8 Jason Dillaman 2017-07-21 11:49:49 UTC

@Harish: negative -- it *should* just allow you to run the following commands when the cluster is full: "rbd remove", "rbd snap rm", "rbd snap unprotect", and "rbd snap purge".

Comment 12 Jason Dillaman 2017-10-25 01:26:45 UTC

$ ceph health
HEALTH_ERR full flag(s) set; 3 full osd(s)

$ ceph df
GLOBAL:
    SIZE       AVAIL     RAW USED     %RAW USED 
    30911M      623M       30288M         97.98 
POOLS:
    NAME     ID     USED      %USED      MAX AVAIL     OBJECTS 
    rbd      1      9000M     100.00             0        2262 

$ rbd snap ls foo 
SNAPID NAME     SIZE TIMESTAMP                
     4 1    10240 MB Tue Oct 24 21:08:11 2017 
     5 2    10240 MB Tue Oct 24 21:08:29 2017 
     6 3    10240 MB Tue Oct 24 21:08:51 2017 
     7 4    10240 MB Tue Oct 24 21:09:32 2017 
     8 5    10240 MB Tue Oct 24 21:10:01 2017 
     9 6    10240 MB Tue Oct 24 21:11:10 2017 
    10 7    10240 MB Tue Oct 24 21:14:01 2017 

$ rbd snap unprotect foo@7
$ rbd snap unprotect foo@1
$ rbd snap rm foo@7
$ rbd snap rm foo@1

$ rbd snap purge foo
Removing all snapshots: 100% complete...done.

$ ceph health
HEALTH_ERR full flag(s) set; 3 full osd(s)

$ rbd rm foo
Removing image: 100% complete...done.

$ ceph health
HEALTH_OK

Comment 15 errata-xmlrpc 2017-12-05 23:35:34 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2017:3387

Note You need to log in before you can comment on or make changes to this bug.