Bug 2060509 - Incorrect installation of ibmcloud vpc csi driver in IBM Cloud ROKS 4.10
Summary: Incorrect installation of ibmcloud vpc csi driver in IBM Cloud ROKS 4.10
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Storage
Version: 4.10
Hardware: Unspecified
OS: Unspecified
Target Milestone: ---
: 4.11.0
Assignee: Jonathan Dobson
QA Contact: Chao Yang
Depends On:
Blocks: 2060557 2061483
TreeView+ depends on / blocked
Reported: 2022-03-03 16:31 UTC by Jeff Nowicki
Modified: 2022-08-10 10:52 UTC (History)
7 users (show)

Fixed In Version:
Doc Type: No Doc Update
Doc Text:
Clone Of:
: 2060557 (view as bug list)
Last Closed: 2022-08-10 10:52:11 UTC
Target Upstream Version:

Attachments (Terms of Use)

System ID Private Priority Status Summary Last Updated
Github openshift cluster-storage-operator pull 264 0 None Merged Bug 2060509: Incorrect installation of ibmcloud vpc csi driver in IBM… 2022-03-09 15:49:47 UTC
Red Hat Product Errata RHSA-2022:5069 0 None None None 2022-08-10 10:52:41 UTC

Description Jeff Nowicki 2022-03-03 16:31:30 UTC
Description of problem:
OpenShift 4.10 IPI install should ensure that "ibmcloud vpc csi driver" is only installed for IBM Cloud when "controlPlaneTopology" (see infrastructure resource) is set to internal (or NOT external).

This was discovered during IBM ROKS 4.10 bringup (PR tests where breaking due to installation errors related to this issue).

The following components were installed (incorrectly) on a "classic infrastructure" IBM ROKS 4.10 cluster.
openshift-cluster-csi-drivers                      ibm-vpc-block-csi-controller-7f6958b-l66mb                0/5     ContainerCreating   0             46h
openshift-cluster-csi-drivers                      ibm-vpc-block-csi-driver-operator-56bf948469-8fscf        1/1     Running             0             46h
openshift-cluster-csi-drivers                      ibm-vpc-block-csi-node-d6rts                              0/3     Init:0/1            0             46h
openshift-cluster-csi-drivers                      ibm-vpc-block-csi-node-lf48n                              0/3     Init:0/1            0             46h
openshift-cluster-csi-drivers                      ibm-vpc-block-csi-node-q72kc                              0/3     Init:0/1            0             46h

Version-Release number of selected component (if applicable):

How reproducible:
IBM Cloud ROKS 4.10 PR testing - please work with IBM (jnowicki) to recreate/validate.

Steps to Reproduce:
1. Run IBM Cloud  ROKS 4.10 PR tests

Actual results:
PR tests are failing.

Expected results:
PR tests succeed.

Master Log:

Node Log (of failed PODs):

PV Dump:

PVC Dump:

StorageClass Dump (if StorageClass used by PV/PVC):

Additional info:

Comment 1 Jeff Nowicki 2022-03-03 16:40:51 UTC
Discussion thread in CoreOS/ipi-upi-ibm-cloud slack channel: https://coreos.slack.com/archives/C01U40AM37F/p1646318513793049

Suggestion from Jan (in slack thread):
We could add some hook to CSIOperatorConfig and call it in shouldRunController with the current infrastructure. The hook for IBMCould would allow installation of the driver only when the platform != external

Comment 2 Jonathan Dobson 2022-03-03 18:45:04 UTC
Discussed with Jeff that we'll not call it a blocker for 4.10, but a priority fix for 4.10.1. They can workaround it for now.

Comment 5 Jeff Nowicki 2022-03-08 16:43:01 UTC
@chaoyang Would you be able to prioritize verifying this BZ (marking it verified so we can get the 4.10 cherry-pick PR merged?

The RH verification test should be to verify that the fix did not break an IPI install.

@jdobson verified: see https://coreos.slack.com/archives/C01U40AM37F/p1646672454908649?thread_ts=1646318513.793049&cid=C01U40AM37F
(from jonathan) "I did at least do an IPI install with those changes on 4.11, made sure the operator/driver got deployed and could provision PV's. QE could certainly do something similar to verify it doesn't break unmanaged openshift."

IBM Cloud ROKS (managed openshift) can only test once this fix get's into a release build.

Thank you.

Comment 6 Jonathan Dobson 2022-03-08 17:10:26 UTC
Adding needinfo for Chao on Jeff's question above.

Comment 7 Chao Yang 2022-03-09 12:04:39 UTC
oc get pods -n openshift-cluster-csi-drivers
NAME                                                READY   STATUS    RESTARTS       AGE
ibm-vpc-block-csi-controller-786656b5ff-f2cgt       5/5     Running   4 (110m ago)   120m
ibm-vpc-block-csi-driver-operator-cd9cc677c-hmjht   1/1     Running   0              120m
ibm-vpc-block-csi-node-8v9kr                        3/3     Running   0              114m
ibm-vpc-block-csi-node-cbhk7                        3/3     Running   0              120m
ibm-vpc-block-csi-node-d9gkf                        3/3     Running   0              113m
ibm-vpc-block-csi-node-mhcnm                        3/3     Running   0              113m
ibm-vpc-block-csi-node-xbdq8                        3/3     Running   0              120m
ibm-vpc-block-csi-node-z9gf7                        3/3     Running   0              120m

Regression test is passed

oc get clusterversion
NAME      VERSION                              AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.11.0-0.nightly-2022-03-08-191358   True        False         98m     Cluster version is 4.11.0-0.nightly-2022-03-08-191358

Comment 8 Richard Theis 2022-03-18 13:01:33 UTC
Thank you.  We have verified the fix on Red Hat OpenShift on IBM Cloud version 4.10.

Comment 10 errata-xmlrpc 2022-08-10 10:52:11 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Important: OpenShift Container Platform 4.11.0 bug fix and security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.


Note You need to log in before you can comment on or make changes to this bug.