As discussed with support, fixing these kinds of issues requires using ceph-objectstore-tool with the osd disk while the osd is offline. Sebastien, is there a way to do this in this version of OCS?
Yes, we need to:
* remove the livenessprobe with: oc patch deployment rook-ceph-osd-<OSD_ID> --type='json' -p '[{"op":"remove", "path":"/spec/template/spec/containers/0/livenessProbe"}]'
* change the osd container command with sleep: oc patch deployment rook-ceph-osd-<OSD_ID> -p '{"spec": {"template": {"spec": {"containers": [{"name": "osd", "command": ["sleep", "infinity"], "args": []}]}}}}'
* exec into the container: oc exec -ti dpeloy/rook-ceph-osd-<OSD_ID> -- bash
* run the "ceph-objectstore-tool" command against the OSD block dev
* once maintenance is done, restart the rook-ceph operator, the OSD deployment changes will be reverted
Thanks.
Comment 13Scott Ostapovicz
2021-09-07 14:03:37 UTC