Describe the issue: Documentation for replacing nodes on IBM Z is incomplete Describe the task you were trying to accomplish: steps are missing to reset Ceph Suggestions for improvement: Document URL: Chapter/Section Number and Title: 2.2.1 Product Version: 4.11 Environment Details: IBM Z Any other versions of this document that also needs this update: Additional information: For this section the documentation for IBM Z is incomplete. https://access.redhat.com/documentation/en-us/red_hat_openshift_data_foundation/4.10/html/replacing_nodes/openshift_data_foundation_deployed_using_local_storage_devices#replacing-operational-nodes-on-ibmz-infrastructure_ibm-z It should have similar instructions to clean up Ceph as the documentation provided for bare metal infrastructure (2.2.1): https://access.redhat.com/documentation/en-us/red_hat_openshift_data_foundation/4.10/html/replacing_nodes/openshift_data_foundation_deployed_using_local_storage_devices#replacing-failed-storage-nodes-on-ibm-power-infrastructure_ibm-power In particular: steps 1-6 (2.2.1) as described in the baremetal section are missing and need to be added step 7 would be called “Get a new zSystem storage node as replacement” after step 7, add csr approvement as described in steps 9-10 (2.2.1) Steps 12-19 (2.2.1) needs to be added too, in order to cleanly remove the osd from ODF There should also be a troubleshoot section, especially for Step 18 (2.2.1) in order to verify that the ocs-osd-removal-job pod worked correctly. It may be necessary to manually cleanup the removed OSD (i.e. ID 2) as follows: ceph osd crush remove osd.REMOVED_OSD_ID ceph osd rm REMOVED_OSD_ID ceph auth del osd.REMOVED_OSD_ID ceph osd crush rm REMOVED_NODE ODF now should be able to replace the node, check via ceph status and rook-ceph-osd-prepare pod Hint: You can speed up the rebalancing after adding the replacement node with the following ceph commands - please make sure to return them to default values for a productive cluster: ceph tell 'osd.*' injectargs --osd-max-backfills=16 --osd-recovery-max-active=4 ceph tell 'osd.*' config set osd_recovery_sleep_hdd 0 ceph tell 'osd.*' config set osd_recovery_sleep_ssd 0
Manuel Gotin has approved the changes as per MR: https://gitlab.cee.redhat.com/red-hat-openshift-container-storage-documentation/openshift-data-foundation-documentation-4.11/-/merge_requests/119
Manuel and I have verified the content. Looks good, thanks