Bug 1958875

Summary: [OCS tracker for OCP bug 1958873] :Device Replacemet UI, The status of the disk is "replacement ready" before I clicked on "start replacement"
Product: Red Hat OpenShift Container Storage Reporter: Neha Berry <nberry>
Component: management-consoleAssignee: afrahman
Status: CLOSED CURRENTRELEASE QA Contact: Elad <ebenahar>
Severity: high Docs Contact:
Priority: high    
Version: 4.7CC: afrahman, aos-bugs, dwalveka, ebenahar, jefbrown, madam, muagarwa, nberry, nthomas, ocs-bugs, oviner, ratamir
Target Milestone: ---Keywords: AutomationBackLog
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Fixed In Version: Doc Type: Known Issue
Doc Text:
.The status of the disk is `replacement ready` before `start replacement` is clicked The user interface can not differentiate between a new disk failure on a different or same node and the previously failed disk if both the disks have the same name. Due to this same name issue, disk replacement is not allowed as the user interface considers that this newly failed disk is already replaced. To work around this issue, follow the below steps: 1. On OpenShift Container Platform Web Console --> click *Administrator*. 2. Click *Home* --> *Search*. 3. In *resources dropdown* -> search for `TemplateInstance`. 4. Select `TemplateInstance` and make sure to choose openshift-storage namespace. 5. Delete all template instances.
Story Points: ---
Clone Of: 1958873
: 1967628 (view as bug list) Environment:
Last Closed: 2021-09-07 13:53:41 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Bug Depends On: 1957756, 1958873    
Bug Blocks: 1967628    

Description Neha Berry 2021-05-10 11:05:00 UTC
Created a clone to track in OCS 4.7 

+++ This bug was initially created as a clone of Bug #1958873 +++

+++ This bug was initially created as a clone of Bug #1957756 +++

Description of problem:
After a successful disk replacement from UI for a particular node, re-initiated the procedure (section 4.2) again for a different OSD in the another node and following was the observation:
The status of the disk is "replacement ready" before I clicked on start replacement

Version-Release number of selected component (if applicable):
OCP Version:4.7.0-0.nightly-2021-04-29-101239
OCS Version:4.7.0-372.ci
Provider: vmware
Setup: lso cluster

How reproducible:

Steps to Reproduce:
1.Replace OSD-1 on compute-1 node, via UI [pass]
2.Replace OSD-3 on compute-3 node, via UI
a.Scale down osd-2
$ oc scale -n openshift-storage deployment rook-ceph-osd-3 --replicas=0
deployment.apps/rook-ceph-osd-3 scaled
b.Click Troubleshoot in the Disk <disk1> not responding or the Disk <disk1> not accessible alert.
c.From the Action (⋮) menu of the failed disk, click Start Disk Replacement.
The status of the disk is "replacement ready" before I clicked on start replacement [Failed!!]

*Attached screenshot

for more details:

Actual results:
The status of the disk is "replacement ready" before I clicked on "start replacement"

Expected results:
The status of the disk is "Not Responding" before I clicked on "start replacement"

Additional info:

--- Additional comment from  on 2021-05-06 12:10:08 UTC ---

missing severity @Oded

--- Additional comment from OpenShift Automated Release Tooling on 2021-05-06 23:25:20 UTC ---

Elliott changed bug status from MODIFIED to ON_QA.

--- Additional comment from  on 2021-05-07 12:56:28 UTC ---

It does not allow initiating osd replacement due to this message.
Hence one cannot do disk replacement.

--- Additional comment from  on 2021-05-07 13:03:29 UTC ---

Hi Neha,

This is the z stream latest information for backporting this BZ - 4.6.29 (may 20) and 4.7.11 (may 19) window opened.

--- Additional comment from  on 2021-05-10 10:41:20 UTC ---

Workaround for now:

Run `oc delete templateinstance -n openshift-storage --all` when this issue is encountered.

Comment 7 Oded 2021-06-16 20:30:57 UTC
Bug Fixed

OCP Version:4.7.0-0.nightly-2021-06-12-151209
OCS Version:ocs-operator.v4.7.1-410.ci
LSO Version: 4.7.0-202105210300.p0
Provider: vmware
type: lso cluster

Test Procedure:
1.Replace OSD-0 on compute-0 node via UI [pass]
2.Replace OSD-0 on compute-0 node via UI [pass]
3.Replace OSD-1 on compute-1 node via UI [pass]

for more details: