Hide Forgot
Description of problem: OpenShift 4.10 IPI install should ensure that "ibmcloud vpc csi driver" is only installed for IBM Cloud when "controlPlaneTopology" (see infrastructure resource) is set to internal (or NOT external). This was discovered during IBM ROKS 4.10 bringup (PR tests where breaking due to installation errors related to this issue). The following components were installed (incorrectly) on a "classic infrastructure" IBM ROKS 4.10 cluster. openshift-cluster-csi-drivers ibm-vpc-block-csi-controller-7f6958b-l66mb 0/5 ContainerCreating 0 46h openshift-cluster-csi-drivers ibm-vpc-block-csi-driver-operator-56bf948469-8fscf 1/1 Running 0 46h openshift-cluster-csi-drivers ibm-vpc-block-csi-node-d6rts 0/3 Init:0/1 0 46h openshift-cluster-csi-drivers ibm-vpc-block-csi-node-lf48n 0/3 Init:0/1 0 46h openshift-cluster-csi-drivers ibm-vpc-block-csi-node-q72kc 0/3 Init:0/1 0 46h Version-Release number of selected component (if applicable): 4.10 How reproducible: IBM Cloud ROKS 4.10 PR testing - please work with IBM (jnowicki) to recreate/validate. Steps to Reproduce: 1. Run IBM Cloud ROKS 4.10 PR tests Actual results: PR tests are failing. Expected results: PR tests succeed. Master Log: Node Log (of failed PODs): PV Dump: PVC Dump: StorageClass Dump (if StorageClass used by PV/PVC): Additional info:
Discussion thread in CoreOS/ipi-upi-ibm-cloud slack channel: https://coreos.slack.com/archives/C01U40AM37F/p1646318513793049 Suggestion from Jan (in slack thread): We could add some hook to CSIOperatorConfig and call it in shouldRunController with the current infrastructure. The hook for IBMCould would allow installation of the driver only when the platform != external
Discussed with Jeff that we'll not call it a blocker for 4.10, but a priority fix for 4.10.1. They can workaround it for now.
@chaoyang Would you be able to prioritize verifying this BZ (marking it verified so we can get the 4.10 cherry-pick PR merged? The RH verification test should be to verify that the fix did not break an IPI install. @jdobson verified: see https://coreos.slack.com/archives/C01U40AM37F/p1646672454908649?thread_ts=1646318513.793049&cid=C01U40AM37F (from jonathan) "I did at least do an IPI install with those changes on 4.11, made sure the operator/driver got deployed and could provision PV's. QE could certainly do something similar to verify it doesn't break unmanaged openshift." IBM Cloud ROKS (managed openshift) can only test once this fix get's into a release build. Thank you.
Adding needinfo for Chao on Jeff's question above.
oc get pods -n openshift-cluster-csi-drivers NAME READY STATUS RESTARTS AGE ibm-vpc-block-csi-controller-786656b5ff-f2cgt 5/5 Running 4 (110m ago) 120m ibm-vpc-block-csi-driver-operator-cd9cc677c-hmjht 1/1 Running 0 120m ibm-vpc-block-csi-node-8v9kr 3/3 Running 0 114m ibm-vpc-block-csi-node-cbhk7 3/3 Running 0 120m ibm-vpc-block-csi-node-d9gkf 3/3 Running 0 113m ibm-vpc-block-csi-node-mhcnm 3/3 Running 0 113m ibm-vpc-block-csi-node-xbdq8 3/3 Running 0 120m ibm-vpc-block-csi-node-z9gf7 3/3 Running 0 120m Regression test is passed oc get clusterversion NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.11.0-0.nightly-2022-03-08-191358 True False 98m Cluster version is 4.11.0-0.nightly-2022-03-08-191358
Thank you. We have verified the fix on Red Hat OpenShift on IBM Cloud version 4.10.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Important: OpenShift Container Platform 4.11.0 bug fix and security update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2022:5069