Bug 1959171

Summary: [GSS] manually repairing inconsistent objects in OCS
Product: [Red Hat Storage] Red Hat OpenShift Data Foundation Reporter: kelwhite
Component: cephAssignee: kelwhite
Status: CLOSED NOTABUG QA Contact: Harish NV Rao <hnallurv>
Severity: high Docs Contact:
Priority: unspecified    
Version: 4.6CC: assingh, bkunal, bniver, hnallurv, jdurgin, kelwhite, khover, linuxkidd, madam, mamccoma, mhackett, muagarwa, ocs-bugs, odf-bz-bot, r.martinez, roemerso, sbaldwin, shan, vumrao
Target Milestone: ---Keywords: AutomationBackLog
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2022-01-31 16:32:38 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Comment 3 Josh Durgin 2021-05-14 22:28:03 UTC
As discussed with support, fixing these kinds of issues requires using ceph-objectstore-tool with the osd disk while the osd is offline. Sebastien, is there a way to do this in this version of OCS?

Comment 4 Sébastien Han 2021-05-17 08:51:00 UTC
Yes, we need to:

* remove the livenessprobe with: oc patch deployment rook-ceph-osd-<OSD_ID>  --type='json' -p '[{"op":"remove", "path":"/spec/template/spec/containers/0/livenessProbe"}]'
* change the osd container command with sleep: oc patch deployment rook-ceph-osd-<OSD_ID> -p '{"spec": {"template": {"spec": {"containers": [{"name": "osd", "command": ["sleep", "infinity"], "args": []}]}}}}'
* exec into the container: oc exec -ti dpeloy/rook-ceph-osd-<OSD_ID> -- bash
* run the "ceph-objectstore-tool" command against the OSD block dev
* once maintenance is done, restart the rook-ceph operator, the OSD deployment changes will be reverted

Thanks.

Comment 13 Scott Ostapovicz 2021-09-07 14:03:37 UTC
Still waiting for an update.