Created attachment 1780258 [details] disk_replacement_error_ui Description of problem: After a successful disk replacement from UI for a particular node, re-initiated the procedure (section 4.2) again for a different OSD in the another node and following was the observation: The status of the disk is "replacement ready" before I clicked on start replacement Version-Release number of selected component (if applicable): OCP Version:4.7.0-0.nightly-2021-04-29-101239 OCS Version:4.7.0-372.ci Provider: vmware Setup: lso cluster How reproducible: Steps to Reproduce: 1.Replace OSD-1 on compute-1 node, via UI [pass] 2.Replace OSD-3 on compute-3 node, via UI a.Scale down osd-2 $ oc scale -n openshift-storage deployment rook-ceph-osd-3 --replicas=0 deployment.apps/rook-ceph-osd-3 scaled b.Click Troubleshoot in the Disk <disk1> not responding or the Disk <disk1> not accessible alert. c.From the Action (⋮) menu of the failed disk, click Start Disk Replacement. The status of the disk is "replacement ready" before I clicked on start replacement [Failed!!] *Attached screenshot for more details: https://docs.google.com/document/d/1KuUjbP25i9vBR7Wp_dKwLAULi5LDFLeIXi4kZY4wLjc/edit Actual results: The status of the disk is "replacement ready" before I clicked on "start replacement" Expected results: The status of the disk is "Not Responding" before I clicked on "start replacement" Additional info:
Disk did not move to "ReaplacementReady" state. SetUp: OCP Version:4.8.0-0.nightly-2021-05-09-105430 OCS Version:4.8.0-374.ci Provider: vmware, lso cluster Test Procedure: 1.Scale Down osd-0 oc scale -n openshift-storage deployment rook-ceph-osd-0 --replicas=0 2.From the Action (⋮) menu of the failed disk, click "Start Disk Replacement". 3.Disk moved to "Available" state. Expected: Disk move to "ReaplacementReady" state. for more details: https://docs.google.com/document/d/1G26hBY908AihauaTdxNbktQY67W-1XyF3eP8tl2hDtk/edit
Bug reconstructed SetUp: OCP Version:4.8.0-0.nightly-2021-05-21-233425 OCS Version:ocs-operator.v4.8.0-399.ci LSO Version: 4.7.0-202102110027.p0 Provider: vmware type: lso cluster Test Procedure: 1.Replace OSD-2 on compute-2 node via UI [pass] 2.Replace OSD-2 on compute-2 node via UI [Failed!!] Danger alert:An error occurred replacement disallowed: disk /dev/sdb is ReplacementReady for more details: https://docs.google.com/document/d/1KuUjbP25i9vBR7Wp_dKwLAULi5LDFLeIXi4kZY4wLjc/edit on section: Device Replacement 4.8 LSO via UI
Bug Fixed SetUp: OCP Version:4.8.0-0.nightly-2021-05-29-114625 OCS Version:ocs-operator.v4.8.0-404.ci LSO Version: 4.7.0-202102110027.p0 Provider: vmware type: lso cluster Test Procedure: 1.Replace OSD-0 on compute-2 node via UI [pass] 2.Replace OSD-0 on compute-2 node via UI [pass] 3.Replace OSD-1 on compute-0 node via UI [pass] for more details: https://docs.google.com/document/d/1KuUjbP25i9vBR7Wp_dKwLAULi5LDFLeIXi4kZY4wLjc/edit
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: OpenShift Container Platform 4.8.2 bug fix and security update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2021:2438