Bug 2026007 - Use ceph 'osd safe-to-destroy' feature in OSD purge job
Summary: Use ceph 'osd safe-to-destroy' feature in OSD purge job
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat OpenShift Data Foundation
Classification: Red Hat Storage
Component: rook
Version: 4.8
Hardware: Unspecified
OS: Unspecified
unspecified
high
Target Milestone: ---
: ODF 4.10.0
Assignee: Sébastien Han
QA Contact: Itzhak
URL:
Whiteboard:
Depends On:
Blocks: 2027396 2056571 2106025 2106026 2106027
TreeView+ depends on / blocked
 
Reported: 2021-11-23 16:01 UTC by Vikhyat Umrao
Modified: 2023-08-09 17:03 UTC (History)
14 users (show)

Fixed In Version: 4.10.0-132
Doc Type: Enhancement
Doc Text:
.OSDs are safe when multiple jobs are fired Previously, when multiple jobs removal were fired in parallel then there was a risk of losing data since it would forcefully remove the OSD. With this update, if you perform multiple jobs removal then it checks whether the OSD is ok-to-stop first and then proceeds. This implementation waits endlessly and retries every minute thereby keeping the OSD safe from losing data.
Clone Of:
: 2027396 2106026 2106027 (view as bug list)
Environment:
Last Closed: 2022-04-13 18:50:37 UTC
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github red-hat-storage ocs-ci pull 6114 0 None Merged Added FORCE_OSD_REMOVAL flag on ocs-osd-removal-job 2022-07-27 10:04:06 UTC
Github rook rook pull 9230 0 None open osd: check if osd is ok-to-stop before removal 2021-11-23 17:53:13 UTC
Red Hat Product Errata RHSA-2022:1372 0 None None None 2022-04-13 18:51:04 UTC

Description Vikhyat Umrao 2021-11-23 16:01:30 UTC
Description of problem (please be detailed as possible and provide log
snippets):
Use ceph 'osd safe-to-destroy' and 'osd ok-to-stop' feature in OSD purge job

[1] mgr: implement 'osd safe-to-destroy' and 'osd ok-to-stop' commands
     https://github.com/ceph/ceph/pull/16976 
     An osd is safe to destroy if
we have osd_stat for it
osd_stat indicates no pgs stored
all pgs are known
no pgs map to it
i.e., overall data durability will not be affected
An OSD is ok to stop if

we have the pg stats we need
no PGs will drop below min_size
i.e., availability won't be immediately compromised

Comment 5 Travis Nielsen 2021-11-23 17:12:45 UTC
Not a blocker for 4.9. Moving out to 4.10, but could be considered for 4.9.z if needed.

Comment 17 Itzhak 2022-03-13 13:25:57 UTC
Should I add the parameters 'osd safe-to-destroy' and 'osd ok-to-stop' in the osd removal job?
Please provide more details about the exact steps needed to test it.

Comment 18 Subham Rai 2022-03-14 09:54:57 UTC
I think first you need to mark osd safe to destroy and then pass the flag accordingly in the oc process.

Comment 19 Itzhak 2022-03-15 10:05:54 UTC
According to my comment https://bugzilla.redhat.com/show_bug.cgi?id=2027826#c16 in the bz https://bugzilla.redhat.com/show_bug.cgi?id=2027826, I am moving this bug also to Verified.

Comment 20 Mudit Agarwal 2022-03-31 15:02:14 UTC
Pls add doc text

Comment 22 Sébastien Han 2022-04-11 08:21:13 UTC
This is fine, thanks Shilpi.

Comment 24 errata-xmlrpc 2022-04-13 18:50:37 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Important: Red Hat OpenShift Data Foundation 4.10.0 enhancement, security & bug fix update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2022:1372


Note You need to log in before you can comment on or make changes to this bug.