Red Hat Bugzilla – Bug 1467352
[RFE] Enable an easy method to delete objects in a Ceph pool if an OSD hit full_ratio
Last modified: 2017-08-08 17:50:12 EDT
a) Description of problem:
When an OSD hit its full_ratio, the cluster stops any I/O coming in. From an OpenStack perspective, an OSD hitting the full_ratio will pause the VMs.
In order to delete objects and free space, a manual intervention is needed to set cluster flags such as norebalance, increase the full_ratio (0.95) a bit higher to allow I/O, and then delete objects from the OSP side.
Since this involves quite a few steps, it is not the easiest to fix. We need a better solution which is easier to follow, perhaps from both Ceph and OpenStack.
b) Version-Release number of selected component (if applicable):
c) How reproducible:
d) Steps to Reproduce:
1. Use OpenStack with Ceph as backing storage.
2. Fill the OSDs until it hit the full_ratio.
3. Make sure an I/O error is hit for further writes.
e) Additional info:
It may be good to understand how to fix this from an OpenStack perspective as well, and such a feature would help administrators not to meddle with the Ceph cluster but rather get it fixed from the OSP side.
This is really an rbd feature - to use the librados FORCE_FULL_TRY functionality for deletes - and removing an rbd image or snapshot when the cluster is full is possible in luminous.
@Jason, any specific config settings or steps to be done before deleting rbd image or snapshot when the cluster is full?
@Harish: negative -- it *should* just allow you to run the following commands when the cluster is full: "rbd remove", "rbd snap rm", "rbd snap unprotect", and "rbd snap purge".