Bug 2060509

Summary: Incorrect installation of ibmcloud vpc csi driver in IBM Cloud ROKS 4.10
Product: OpenShift Container Platform Reporter: Jeff Nowicki <jnowicki>
Component: StorageAssignee: Jonathan Dobson <jdobson>
Storage sub component: Storage QA Contact: Chao Yang <chaoyang>
Status: CLOSED ERRATA Docs Contact:
Severity: high    
Priority: high CC: aos-bugs, arahamad, chaoyang, cschaefe, jdobson, jsafrane, rtheis
Version: 4.10   
Target Milestone: ---   
Target Release: 4.11.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: No Doc Update
Doc Text:
Story Points: ---
Clone Of:
: 2060557 (view as bug list) Environment:
Last Closed: 2022-08-10 10:52:11 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 2060557, 2061483    

Description Jeff Nowicki 2022-03-03 16:31:30 UTC
Description of problem:
OpenShift 4.10 IPI install should ensure that "ibmcloud vpc csi driver" is only installed for IBM Cloud when "controlPlaneTopology" (see infrastructure resource) is set to internal (or NOT external).

This was discovered during IBM ROKS 4.10 bringup (PR tests where breaking due to installation errors related to this issue).

The following components were installed (incorrectly) on a "classic infrastructure" IBM ROKS 4.10 cluster.
openshift-cluster-csi-drivers                      ibm-vpc-block-csi-controller-7f6958b-l66mb                0/5     ContainerCreating   0             46h
openshift-cluster-csi-drivers                      ibm-vpc-block-csi-driver-operator-56bf948469-8fscf        1/1     Running             0             46h
openshift-cluster-csi-drivers                      ibm-vpc-block-csi-node-d6rts                              0/3     Init:0/1            0             46h
openshift-cluster-csi-drivers                      ibm-vpc-block-csi-node-lf48n                              0/3     Init:0/1            0             46h
openshift-cluster-csi-drivers                      ibm-vpc-block-csi-node-q72kc                              0/3     Init:0/1            0             46h


Version-Release number of selected component (if applicable):
4.10

How reproducible:
IBM Cloud ROKS 4.10 PR testing - please work with IBM (jnowicki) to recreate/validate.

Steps to Reproduce:
1. Run IBM Cloud  ROKS 4.10 PR tests

Actual results:
PR tests are failing.

Expected results:
PR tests succeed.

Master Log:

Node Log (of failed PODs):

PV Dump:

PVC Dump:

StorageClass Dump (if StorageClass used by PV/PVC):

Additional info:

Comment 1 Jeff Nowicki 2022-03-03 16:40:51 UTC
Discussion thread in CoreOS/ipi-upi-ibm-cloud slack channel: https://coreos.slack.com/archives/C01U40AM37F/p1646318513793049

Suggestion from Jan (in slack thread):
We could add some hook to CSIOperatorConfig and call it in shouldRunController with the current infrastructure. The hook for IBMCould would allow installation of the driver only when the platform != external

Comment 2 Jonathan Dobson 2022-03-03 18:45:04 UTC
Discussed with Jeff that we'll not call it a blocker for 4.10, but a priority fix for 4.10.1. They can workaround it for now.

Comment 5 Jeff Nowicki 2022-03-08 16:43:01 UTC
@chaoyang Would you be able to prioritize verifying this BZ (marking it verified so we can get the 4.10 cherry-pick PR merged?

The RH verification test should be to verify that the fix did not break an IPI install.

@jdobson verified: see https://coreos.slack.com/archives/C01U40AM37F/p1646672454908649?thread_ts=1646318513.793049&cid=C01U40AM37F
(from jonathan) "I did at least do an IPI install with those changes on 4.11, made sure the operator/driver got deployed and could provision PV's. QE could certainly do something similar to verify it doesn't break unmanaged openshift."

IBM Cloud ROKS (managed openshift) can only test once this fix get's into a release build.

Thank you.

Comment 6 Jonathan Dobson 2022-03-08 17:10:26 UTC
Adding needinfo for Chao on Jeff's question above.

Comment 7 Chao Yang 2022-03-09 12:04:39 UTC
oc get pods -n openshift-cluster-csi-drivers
NAME                                                READY   STATUS    RESTARTS       AGE
ibm-vpc-block-csi-controller-786656b5ff-f2cgt       5/5     Running   4 (110m ago)   120m
ibm-vpc-block-csi-driver-operator-cd9cc677c-hmjht   1/1     Running   0              120m
ibm-vpc-block-csi-node-8v9kr                        3/3     Running   0              114m
ibm-vpc-block-csi-node-cbhk7                        3/3     Running   0              120m
ibm-vpc-block-csi-node-d9gkf                        3/3     Running   0              113m
ibm-vpc-block-csi-node-mhcnm                        3/3     Running   0              113m
ibm-vpc-block-csi-node-xbdq8                        3/3     Running   0              120m
ibm-vpc-block-csi-node-z9gf7                        3/3     Running   0              120m

Regression test is passed

oc get clusterversion
NAME      VERSION                              AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.11.0-0.nightly-2022-03-08-191358   True        False         98m     Cluster version is 4.11.0-0.nightly-2022-03-08-191358

Comment 8 Richard Theis 2022-03-18 13:01:33 UTC
Thank you.  We have verified the fix on Red Hat OpenShift on IBM Cloud version 4.10.

Comment 10 errata-xmlrpc 2022-08-10 10:52:11 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Important: OpenShift Container Platform 4.11.0 bug fix and security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2022:5069