Bug 1906002

Summary:	[Cloned in OCS as tracker] [RFE]There is no option to replace failed storage device via UI on encrypted cluster in LSO
Product:	[Red Hat Storage] Red Hat OpenShift Container Storage	Reporter:	Neha Berry <nberry>
Component:	management-console	Assignee:	Afreen <afrahman>
Status:	CLOSED DEFERRED	QA Contact:	Elad <ebenahar>
Severity:	high	Docs Contact:
Priority:	unspecified
Version:	4.6	CC:	afrahman, anbehl, aos-bugs, edonnell, etamir, jefbrown, madam, muagarwa, nthomas, ocs-bugs, oviner, ratamir
Target Milestone:	---	Keywords:	AutomationBackLog, FutureFeature
Target Release:	---
Hardware:	Unspecified
OS:	Unspecified
Whiteboard:
Fixed In Version:		Doc Type:	Known Issue
Doc Text:	.Device replacement action cannot be performed via UI for an encrypted OpenShift Container Storage cluster On an encrypted OpenShift Container Storage cluster, the discovery result CR discovers the device backed by a Ceph OSD (Object Storage Daemon) differently from the one reported in the Ceph alerts. When clicking the alert, the user is presented with `Disk not found` message. Due to the mismatch, console UI cannot enable the disk replacement option for an OCS user. To workaround this issue, use the CLI procedure for failed device replacement.	Story Points:	---
Clone Of:	1905963	Environment:
Last Closed:	2021-06-03 18:14:18 UTC	Type:	---
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:
Bug Depends On:	1905963
Bug Blocks:	1882359

Description Neha Berry 2020-12-09 14:12:26 UTC

Cloning this BZ to OCS to track the inclusion of KNOWN ISSUE in OCS 4.6 Release Notes.



+++ This bug was initially created as a clone of Bug #1905963 +++

Description of problem:
There is no option to replace failed storage device via UI on encrypted cluster

Version-Release number of selected component (if applicable):
Provider:Vmware LSO
OCP Version:4.6.0-0.nightly-2020-12-08-021151
OCS Version:ocs-operator.v4.6.0-189.ci

How reproducible:


Steps to Reproduce:
1.Check Cluster status (OCS+OCP)

2.Check Ceph status:
sh-4.4# ceph health
HEALTH_OK

3.Verify OSDs encrypted:
[root@compute-1 /]# lsblk
NAME                                                    MAJ:MIN RM   SIZE RO TYPE  MOUNTPOINT
loop0                                                     7:0    0   100G  0 loop  
sda                                                       8:0    0   120G  0 disk  
|-sda1                                                    8:1    0   384M  0 part  /boot
|-sda2                                                    8:2    0   127M  0 part  /boot/efi
|-sda3                                                    8:3    0     1M  0 part  
`-sda4                                                    8:4    0 119.5G  0 part  
  `-coreos-luks-root-nocrypt                            253:0    0 119.5G  0 dm    /sysroot
sdb                                                       8:16   0   100G  0 disk  
`-ocs-deviceset-localblock-1-data-0-5mdwg-block-dmcrypt 253:1    0   100G  0 crypt 
[root@compute-1 /]# dmsetup ls
ocs-deviceset-localblock-1-data-0-5mdwg-block-dmcrypt	(253:1)
coreos-luks-root-nocrypt	(253:0)


3.Scale down the OSD deployment for the OSD to be replaced (via CLI).
[odedviner@localhost ~]$ oc get -n openshift-storage pods -l app=rook-ceph-osd -o wide
NAME                               READY   STATUS    RESTARTS   AGE     IP            NODE        NOMINATED NODE   READINESS GATES
rook-ceph-osd-0-8649856684-bvbn8   1/1     Running   0          5h52m   10.129.2.63   compute-1   <none>           <none>
rook-ceph-osd-1-84c75fd56c-5vhx5   1/1     Running   0          5h52m   10.128.2.36   compute-0   <none>           <none>
rook-ceph-osd-2-559c675859-8cbdl   1/1     Running   0          5h52m   10.131.0.30   compute-2   <none>           <none>

Choose OSD-0:
$ oc scale -n openshift-storage deployment rook-ceph-osd-0 --replicas=0
deployment.apps/rook-ceph-osd-0 scaled

$ oc get -n openshift-storage pods -l ceph-osd-id=0
No resources found in openshift-storage namespace.


4.Persistent Storage dashboard with the alert

5.Add HDD disk to node compute-1 via vcenter

6.Click Troubleshoot in the Disk not responding (on UI)

7.Click the Disks tab. From the Action (:) menu of the failed disk, click Start Disk Replacement:
There is no option to replace the device via UI.[Failed!!!!]


Detailed test procedure:
https://docs.google.com/document/d/1rIGJ3lFh7yXpVQ6rR4rNAqby11MxQRvuQHVUnunwa9s/edit

Actual results:


Expected results:


Additional info: