Description of problem (please be detailed as possible and provide log snippets): Use ceph 'osd safe-to-destroy' and 'osd ok-to-stop' feature in OSD purge job [1] mgr: implement 'osd safe-to-destroy' and 'osd ok-to-stop' commands https://github.com/ceph/ceph/pull/16976 An osd is safe to destroy if we have osd_stat for it osd_stat indicates no pgs stored all pgs are known no pgs map to it i.e., overall data durability will not be affected An OSD is ok to stop if we have the pg stats we need no PGs will drop below min_size i.e., availability won't be immediately compromised
Not a blocker for 4.9. Moving out to 4.10, but could be considered for 4.9.z if needed.
Should I add the parameters 'osd safe-to-destroy' and 'osd ok-to-stop' in the osd removal job? Please provide more details about the exact steps needed to test it.
I think first you need to mark osd safe to destroy and then pass the flag accordingly in the oc process.
According to my comment https://bugzilla.redhat.com/show_bug.cgi?id=2027826#c16 in the bz https://bugzilla.redhat.com/show_bug.cgi?id=2027826, I am moving this bug also to Verified.
Pls add doc text
This is fine, thanks Shilpi.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Important: Red Hat OpenShift Data Foundation 4.10.0 enhancement, security & bug fix update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2022:1372