Bug 1895796
Summary: | Update node replacement procedure for local storage devices for local volume set changes, upgraded cluster scenario, OCS 4.6 job update | ||
---|---|---|---|
Product: | [Red Hat Storage] Red Hat OpenShift Container Storage | Reporter: | Kusuma <kbg> |
Component: | documentation | Assignee: | Laura Bailey <lbailey> |
Status: | CLOSED CURRENTRELEASE | QA Contact: | Pratik Surve <prsurve> |
Severity: | unspecified | Docs Contact: | |
Priority: | unspecified | ||
Version: | 4.6 | CC: | asriram, ebenahar, ikave, lbailey, nberry, ocs-bugs, olakra, prsurve, rohgupta, rojoseph, sabose, sdudhgao |
Target Milestone: | --- | ||
Target Release: | OCS 4.6.0 | ||
Hardware: | Unspecified | ||
OS: | Unspecified | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | If docs needed, set a value | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2021-08-25 14:55:03 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: | |||
Bug Depends On: | |||
Bug Blocks: | 1882363 |
Comment 9
Servesha
2020-11-18 07:00:46 UTC
@Laura ack. thanks I didn't get to test the steps from the doc on a cluster yet, But from what I remembered when trying the node replacement procedure I have a few comments about the doc: 1. I think we need to run the ocs-osd-removal job before deleting the pv(steps 15, 16). After executing the ocs-osd-removal job, the pv will be in status 'Released', and then we can delete it safely. 2. In step 19, no need to delete "rook-ceph-operator" in 4.6. Maybe we can write something like the in the device replacement doc https://access.redhat.com/documentation/en-us/red_hat_openshift_container_storage/4.6/html-single/replacing_devices/index?lb_target=preview#replacing-operational-or-failed-storage-devices-on-clusters-backed-by-local-storage-devices_rhocs: "If the new OSD does not show as Running after a few minutes, restart the rook-ceph-operator pod to force a reconciliation." Rather than these 2 comments, the doc looks good to me. > After executing the ocs-osd-removal job, the pv will be in status 'Released', and then we can delete it safely.
Also, The PV should be eventually deleted after it is released as the ReclaimPolicy on the LSO storageclass is "Delete"
> 1. I think we need to run the ocs-osd-removal job before deleting the pv(steps 15, 16).
> After executing the ocs-osd-removal job, the pv will be in status 'Released', and then we can delete it safely.
+1
> Asking Rohan in chat now whether his comment 21 means I should delete "Delete the PV associated with the failed node" or just add a sentence about the ReclaimPolicy meaning that the PV will eventually be deleted automatically.
I made a mistake. The PV will not get cleaned up on failed nodes.
I tested the doc section 3.1.1, the other sections I didn't test yet. There 2 things we may need to fix in the doc: 1. In step 18.1 - The ocs-osd-removal job deletes the pvc, so we can't get the pvc after executing the ocs-osd-removal job. Instead, we need to perform this steps: - Figure out the pv by the pvc(and don't delete the pv, or the pvc). - executing the ocs-osd-removal job - delete the pv 2. In step 20 - I don't think we need to delete the rook-ceph-operator in 4.6. Also another note, one of the mons was in a pending state for a short time, and then back to be in a "Running" state. other than that the doc looks good to me. The Ceph health back to be OK at the end. |