Bug 1897830

Summary: [GSS] Unable to deploy OCS 4.5.2 on OCP 4.6.1, cannot `Create OCS Cluster Service`
Product: OpenShift Container Platform Reporter: jasonzhu
Component: Console Storage PluginAssignee: Afreen <afrahman>
Status: CLOSED ERRATA QA Contact: Pratik Surve <prsurve>
Severity: high Docs Contact:
Priority: urgent    
Version: 4.6CC: afrahman, aos-bugs, assingh, dwojewod, hekumar, jsafrane, madam, mdunnett, muagarwa, nberry, nthomas, ocs-bugs, pdhange, rojoseph, sapillai, sarora, smordech, sorkim, sostapov, ygalanti
Target Milestone: ---Keywords: Reopened
Target Release: 4.7.0   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: 4.7.0-0.nightly-2020-12-17-001141 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of:
: 1908749 (view as bug list) Environment:
Last Closed: 2021-02-24 15:33:27 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1908749    
Attachments:
Description Flags
screenshots and LocalVolume CR yaml file none

Description jasonzhu 2020-11-14 21:27:57 UTC
Created attachment 1729362 [details]
screenshots and LocalVolume CR yaml file

Description of problem (please be detailed as possible and provide log
snippests):
We're unable to deploy the OCS 4.5 using bare metal infrastructure via OCS Operator in OCP 4.6.1
The options in the OCS Operator UI is slightly different from the documentation we followed (in the Additional info section)
The `Storage Class` dropdown list is missing in `Internal` mode in `Create OCS Cluster Service` under `Storage Cluster` tab in OpenShift Container Storage operator
While in the `Internal - Attached Devices` mode, the `Storage Class` dropdown list is back. However, the `Nodes` list is empty

Version of all relevant components (if applicable):
* OpenShift Container Platform 4.6.1 (UPI)
* Local Storage Operator 4.6.0-202010311441.p0
* OpenShift Container Storage Operator 4.5.2

Does this issue impact your ability to continue to work with the product
(please explain in detail what is the user impact)?
Yes, we're unable to present the complete OCP+OCS solution to the customers.

Is there any workaround available to the best of your knowledge?
Possible workaround is to deploy OCS in external mode.

Rate from 1 - 5 the complexity of the scenario you performed that caused this
bug (1 - very simple, 5 - very complex)?
1

Can this issue reproducible?
Yes

Can this issue reproduce from the UI?
Partially

If this is a regression, please provide more details to justify this:


Steps to Reproduce:
1. Deploy OCP 4.6.1 (UPI)
2. Label 3 worker nodes (1.1)
3. Install Red Hat OpenShift Container Storage Operator from the OperatorHub (1.3)
4. Install Red Hat Local Storage Operator from the OperatorHub (1.4)
5. Find available storage devices (1.5) and Create the LocalVolume CR for block PVs (1.6.1)
6. `Create OCS Cluster Service` under `Storage Cluster` tab in OpenShift Container Storage operator (1.6.6)


Actual results:
In the `Internal` mode, the `Storage Class` dropdown list is missing as the attached screenshot `ocp4.6.1_ocs4.5.2_mode1_missing_StorageClass.png` shows. Click `Create` will lead to the error message `No StorageClass selected`
In the `Internal - Attached Devices` mode, the `Storage Class` dropdown list is back, but the `Nodes` list is empty

Expected results:
In the `Internal` mode, the `Storage Class` dropdown list should display `localblock` as the attached screenshot `document_1.6.6.5.png` shows. Click `Create` will deploy the OCS

Additional info:
The documentation we followed:
https://access.redhat.com/documentation/en-us/red_hat_openshift_container_storage/4.5/html-single/deploying_openshift_container_storage_using_bare_metal_infrastructure/index

Comment 2 Mudit Agarwal 2020-11-17 08:02:38 UTC
Looks quite similar to https://bugzilla.redhat.com/show_bug.cgi?id=1895263.

Rohan, PTAL

Comment 3 Rohan CJ 2020-11-18 09:09:03 UTC
OCS 4.6 UI is different, and we should be using [Internal - attached devices] and LocalVolumeSet instead of LocalVolume.


Here are the unreleased preview docs for OCS 4.6 installs: https://access.redhat.com/documentation/en-us/red_hat_openshift_container_storage/4.6/html-single/deploying_openshift_container_storage_using_bare_metal_infrastructure/index?lb_target=preview#installing-local-storage-operator_rhocs 


No trouble for fresh installs, but the flow is still blocked for upgraded LSO because it tends to be in a different namespace for pre-4.6 LSO installs.

> In the `Internal - Attached Devices` mode, the `Storage Class` dropdown list is back, but the `Nodes` list is empty

This has the same root cause as 1895263, and the same fix, so I'm marking this as a dupe.

*** This bug has been marked as a duplicate of bug 1895263 ***

Comment 10 jasonzhu 2020-11-24 17:41:20 UTC
Created the case #02805997 via https://connect.redhat.com/support/technology-partner/ directed by ecosystem-partners-oem

Include more details here.

In the `OpenShift Container Storage` Operator => `Storage Cluster` tab => `Create Storage Cluster` => `Internal - Attached Devices` mode => create `Storage Class` => `Create new volume set instance`:
1. The SSD drives connected to the worker nodes without a RAID controller can be seen.
2. The SSD drives connected to the worker nodes with a RAID controller cannot be seen.
3. The HDD drives connected to the worker nodes without a RAID controller cannot be seen.

Deployment with the SSD drives with no RAID controller is working. However, 2 & 3 didn't work. We expect it can also support the virtual drives from RAID controller and HDDs. We can see these kind of disks available in the `Local Storage` operator => `Local Volume Discovery Result`

Comment 12 Rohan CJ 2020-11-30 07:35:51 UTC
> Investigating the issue found that UI is looking for a particular label on PVs provisioned by LSO : " kubernetes.io/hostname" to show the list of nodes used while creating storage class.
This label is not added to PV when created storage class via local volume CR but present when created via local volume set.

I think you bean creating StorageCluster, because the PVs wouldn't exist when creating the storageclass(via localvolume/localvolumeset). 

So, IIUC, this only applies OCS deploy flow which only uses LocalVolumeSet, so I think this won't be an OCS blocker.

We could patch LSO to provide that label on provisioned PVs, but it wouldn't apply retroactively as the PV creation is done by the upstream provisioned which does not do update operations.

Comment 13 Hemant Kumar 2020-11-30 17:28:09 UTC
We could fix the label but UI should not be looking for labels. Are these labels needed for deciding node where PV is available? if yes, they should be picked from nodeAffinity field.

Comment 14 Santosh Pillai 2020-12-04 04:55:54 UTC
Removing NeedInfo as its already answered in comment 12 and comment 13

Comment 15 Afreen 2020-12-04 10:15:30 UTC
*** Bug 1904169 has been marked as a duplicate of this bug. ***

Comment 18 Rohan CJ 2020-12-15 08:42:44 UTC
Am not familiar with the console code, removing needinfo on me. @Afreen PTAL, thanks!

Comment 31 errata-xmlrpc 2021-02-24 15:33:27 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Container Platform 4.7.0 security, bug fix, and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2020:5633