Bug 1897830 - [GSS] Unable to deploy OCS 4.5.2 on OCP 4.6.1, cannot `Create OCS Cluster Service`
Summary: [GSS] Unable to deploy OCS 4.5.2 on OCP 4.6.1, cannot `Create OCS Cluster Ser...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Console Storage Plugin
Version: 4.6
Hardware: x86_64
OS: Linux
urgent
high
Target Milestone: ---
: 4.7.0
Assignee: afrahman
QA Contact: Pratik Surve
URL:
Whiteboard:
: 1904169 (view as bug list)
Depends On:
Blocks: 1908749
TreeView+ depends on / blocked
 
Reported: 2020-11-14 21:27 UTC by jasonzhu
Modified: 2021-02-24 15:33 UTC (History)
20 users (show)

Fixed In Version: 4.7.0-0.nightly-2020-12-17-001141
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
: 1908749 (view as bug list)
Environment:
Last Closed: 2021-02-24 15:33:27 UTC
Target Upstream Version:


Attachments (Terms of Use)
screenshots and LocalVolume CR yaml file (277.63 KB, application/zip)
2020-11-14 21:27 UTC, jasonzhu
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Github openshift console pull 7552 0 None closed Bug 1897830: Fix cluster creation when using localvolume 2021-02-08 04:00:51 UTC
Red Hat Product Errata RHSA-2020:5633 0 None None None 2021-02-24 15:33:55 UTC

Description jasonzhu 2020-11-14 21:27:57 UTC
Created attachment 1729362 [details]
screenshots and LocalVolume CR yaml file

Description of problem (please be detailed as possible and provide log
snippests):
We're unable to deploy the OCS 4.5 using bare metal infrastructure via OCS Operator in OCP 4.6.1
The options in the OCS Operator UI is slightly different from the documentation we followed (in the Additional info section)
The `Storage Class` dropdown list is missing in `Internal` mode in `Create OCS Cluster Service` under `Storage Cluster` tab in OpenShift Container Storage operator
While in the `Internal - Attached Devices` mode, the `Storage Class` dropdown list is back. However, the `Nodes` list is empty

Version of all relevant components (if applicable):
* OpenShift Container Platform 4.6.1 (UPI)
* Local Storage Operator 4.6.0-202010311441.p0
* OpenShift Container Storage Operator 4.5.2

Does this issue impact your ability to continue to work with the product
(please explain in detail what is the user impact)?
Yes, we're unable to present the complete OCP+OCS solution to the customers.

Is there any workaround available to the best of your knowledge?
Possible workaround is to deploy OCS in external mode.

Rate from 1 - 5 the complexity of the scenario you performed that caused this
bug (1 - very simple, 5 - very complex)?
1

Can this issue reproducible?
Yes

Can this issue reproduce from the UI?
Partially

If this is a regression, please provide more details to justify this:


Steps to Reproduce:
1. Deploy OCP 4.6.1 (UPI)
2. Label 3 worker nodes (1.1)
3. Install Red Hat OpenShift Container Storage Operator from the OperatorHub (1.3)
4. Install Red Hat Local Storage Operator from the OperatorHub (1.4)
5. Find available storage devices (1.5) and Create the LocalVolume CR for block PVs (1.6.1)
6. `Create OCS Cluster Service` under `Storage Cluster` tab in OpenShift Container Storage operator (1.6.6)


Actual results:
In the `Internal` mode, the `Storage Class` dropdown list is missing as the attached screenshot `ocp4.6.1_ocs4.5.2_mode1_missing_StorageClass.png` shows. Click `Create` will lead to the error message `No StorageClass selected`
In the `Internal - Attached Devices` mode, the `Storage Class` dropdown list is back, but the `Nodes` list is empty

Expected results:
In the `Internal` mode, the `Storage Class` dropdown list should display `localblock` as the attached screenshot `document_1.6.6.5.png` shows. Click `Create` will deploy the OCS

Additional info:
The documentation we followed:
https://access.redhat.com/documentation/en-us/red_hat_openshift_container_storage/4.5/html-single/deploying_openshift_container_storage_using_bare_metal_infrastructure/index

Comment 2 Mudit Agarwal 2020-11-17 08:02:38 UTC
Looks quite similar to https://bugzilla.redhat.com/show_bug.cgi?id=1895263.

Rohan, PTAL

Comment 3 Rohan CJ 2020-11-18 09:09:03 UTC
OCS 4.6 UI is different, and we should be using [Internal - attached devices] and LocalVolumeSet instead of LocalVolume.


Here are the unreleased preview docs for OCS 4.6 installs: https://access.redhat.com/documentation/en-us/red_hat_openshift_container_storage/4.6/html-single/deploying_openshift_container_storage_using_bare_metal_infrastructure/index?lb_target=preview#installing-local-storage-operator_rhocs 


No trouble for fresh installs, but the flow is still blocked for upgraded LSO because it tends to be in a different namespace for pre-4.6 LSO installs.

> In the `Internal - Attached Devices` mode, the `Storage Class` dropdown list is back, but the `Nodes` list is empty

This has the same root cause as 1895263, and the same fix, so I'm marking this as a dupe.

*** This bug has been marked as a duplicate of bug 1895263 ***

Comment 10 jasonzhu 2020-11-24 17:41:20 UTC
Created the case #02805997 via https://connect.redhat.com/support/technology-partner/ directed by ecosystem-partners-oem@redhat.com

Include more details here.

In the `OpenShift Container Storage` Operator => `Storage Cluster` tab => `Create Storage Cluster` => `Internal - Attached Devices` mode => create `Storage Class` => `Create new volume set instance`:
1. The SSD drives connected to the worker nodes without a RAID controller can be seen.
2. The SSD drives connected to the worker nodes with a RAID controller cannot be seen.
3. The HDD drives connected to the worker nodes without a RAID controller cannot be seen.

Deployment with the SSD drives with no RAID controller is working. However, 2 & 3 didn't work. We expect it can also support the virtual drives from RAID controller and HDDs. We can see these kind of disks available in the `Local Storage` operator => `Local Volume Discovery Result`

Comment 12 Rohan CJ 2020-11-30 07:35:51 UTC
> Investigating the issue found that UI is looking for a particular label on PVs provisioned by LSO : " kubernetes.io/hostname" to show the list of nodes used while creating storage class.
This label is not added to PV when created storage class via local volume CR but present when created via local volume set.

I think you bean creating StorageCluster, because the PVs wouldn't exist when creating the storageclass(via localvolume/localvolumeset). 

So, IIUC, this only applies OCS deploy flow which only uses LocalVolumeSet, so I think this won't be an OCS blocker.

We could patch LSO to provide that label on provisioned PVs, but it wouldn't apply retroactively as the PV creation is done by the upstream provisioned which does not do update operations.

Comment 13 Hemant Kumar 2020-11-30 17:28:09 UTC
We could fix the label but UI should not be looking for labels. Are these labels needed for deciding node where PV is available? if yes, they should be picked from nodeAffinity field.

Comment 14 Santosh Pillai 2020-12-04 04:55:54 UTC
Removing NeedInfo as its already answered in comment 12 and comment 13

Comment 15 afrahman 2020-12-04 10:15:30 UTC
*** Bug 1904169 has been marked as a duplicate of this bug. ***

Comment 18 Rohan CJ 2020-12-15 08:42:44 UTC
Am not familiar with the console code, removing needinfo on me. @Afreen PTAL, thanks!

Comment 31 errata-xmlrpc 2021-02-24 15:33:27 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Container Platform 4.7.0 security, bug fix, and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2020:5633


Note You need to log in before you can comment on or make changes to this bug.