Description of problem: The CCM for IBM Cloud (cloud-provider-ibm), is not able operate properly in OCP due to missing IAM permissions in the CredentialsRequest in the cluster-cloud-controller-manager-operator (CCCMO). Without these permissions, IPI installations fail. Version-Release number of selected component (if applicable): 4.10 How reproducible: All current IPI installs on IBM Cloud Steps to Reproduce: 1. Create the IPI manifests (openshift-install create manifests) 2. Create the Secrets via ccoctl (ccoctl ibmcloud create-service-id ...) 3. Create the IPI cluster (openshift-install create cluster) Actual results: IPI installation fails when the IBM Cloud CCM is not able to initialize master nodes ``` I1101 20:42:38.368482 1 node_controller.go:390] Initializing node ipi-dev-test-60-gdjhw-master-0 with cloud provider E1101 20:42:38.372191 1 node_controller.go:212] error syncing 'ipi-dev-test-60-gdjhw-master-0': failed to get provider ID for node ipi-dev-test-60-gdjhw-master-0 at cloudprovider: failed to get instance ID from cloud provider: node is missing labels, requeuing I1101 20:42:38.373877 1 shared_informer.go:247] Caches are synced for service I1101 20:42:38.378082 1 node_controller.go:390] Initializing node ipi-dev-test-60-gdjhw-master-0 with cloud provider E1101 20:42:38.393695 1 node_controller.go:212] error syncing 'ipi-dev-test-60-gdjhw-master-0': failed to get provider ID for node ipi-dev-test-60-gdjhw-master-0 at cloudprovider: failed to get instance ID from cloud provider: node is missing labels, requeuing E1101 20:42:38.399216 1 node_controller.go:241] Error getting instance metadata for node addresses: error fetching node by provider ID: unimplemented, and error by node name: node is missing labels I1101 20:42:38.404330 1 node_controller.go:390] Initializing node ipi-dev-test-60-gdjhw-master-0 with cloud provider E1101 20:42:38.406667 1 node_controller.go:212] error syncing 'ipi-dev-test-60-gdjhw-master-0': failed to get provider ID for node ipi-dev-test-60-gdjhw-master-0 at cloudprovider: failed to get instance ID from cloud provider: node is missing labels, requeuing I1101 20:42:38.427005 1 node_controller.go:390] Initializing node ipi-dev-test-60-gdjhw-master-0 with cloud provider E1101 20:42:38.429646 1 node_controller.go:212] error syncing 'ipi-dev-test-60-gdjhw-master-0': failed to get provider ID for node ipi-dev-test-60-gdjhw-master-0 at cloudprovider: failed to get instance ID from cloud provider: node is missing labels, requeuing ``` Expected results: An operational cluster is deployed Additional info: IBM Cloud is currently working on an update to CCCMO's CredentialsRequest for IBM Cloud to address this issue.
Update on error details, previous error log was due to missing changes in openshift/cloud-provider-ibm fork The permissions are still not correct and require an update to the CCCMO's CredentialsRequest, as the CCM cannot list resource groups ``` I1118 16:15:07.691626 1 ibm_vpc_loadbalancer.go:421] Processing line: ERROR: GetCloudProviderVpc failed: 0 resource groups match name: ipi-dev-test-66-lrmht E1118 16:15:07.691632 1 ibm_vpc_loadbalancer.go:518] ERROR: GetCloudProviderVpc failed: 0 resource groups match name: ipi-dev-test-66-lrmht I1118 16:15:07.691635 1 ibm_vpc_loadbalancer.go:421] Processing line: I1118 16:15:07.691637 1 ibm_vpc_loadbalancer.go:421] Processing line: I1118 16:15:08.680968 1 ibm_vpc_loadbalancer.go:289] [16:15:06.4564] Entering UpdateLoadBalancer(kube-ipi-dev-test-66-lrmht-6a8bed10200b45baa87dddec5b901f17, openshift-ingress/router-default) E1118 16:15:08.680987 1 ibm_vpc_loadbalancer.go:284] GetCloudProviderVpc failed: 0 resource groups match name: ipi-dev-test-66-lrmht I1118 16:15:08.681047 1 ibm_vpc_loadbalancer.go:115] GetLoadBalancer(kube-ipi-dev-test-66-lrmht-6a8bed10200b45baa87dddec5b901f17, kubernetes) I1118 16:15:08.681124 1 event.go:285] Event(v1.ObjectReference{Kind:"Service", Namespace:"openshift-ingress", Name:"router-default", UID:"6a8bed10-200b-45ba-a87d-ddec5b901f17", APIVersion:"v1", ResourceVersion:"15594", FieldPath:""}): type: 'Warning' reason: 'UpdatingCloudLoadBalancerFailed' Error on cloud load balancer kube-ipi-dev-test-66-lrmht-6a8bed10200b45baa87dddec5b901f17 for service openshift-ingress/router-default with UID 6a8bed10-200b-45ba-a87d-ddec5b901f17: Failed updating LoadBalancer: GetCloudProviderVpc failed: 0 resource groups match name: ipi-dev-test-66-lrmht I1118 16:15:09.951926 1 ibm_vpc_loadbalancer.go:138] [16:15:08.6944] Entering StatusLoadBalancer(kube-ipi-dev-test-66-lrmht-6a8bed10200b45baa87dddec5b901f17) E1118 16:15:09.951955 1 ibm_vpc_loadbalancer.go:133] GetCloudProviderVpc failed: 0 resource groups match name: ipi-dev-test-66-lrmht E1118 16:15:09.952005 1 controller.go:824] failed to check if load balancer exists for service openshift-ingress/router-default: Error on cloud load balancer kube-ipi-dev-test-66-lrmht-6a8bed10200b45baa87dddec5b901f17 for service openshift-ingress/router-default with UID 6a8bed10-200b-45ba-a87d-ddec5b901f17: Failed getting LoadBalancer: GetCloudProviderVpc failed: 0 resource groups match name: ipi-dev-test-66-lrmht E1118 16:15:09.952064 1 controller.go:765] failed to update load balancer hosts for service openshift-ingress/router-default: Error on cloud load balancer kube-ipi-dev-test-66-lrmht-6a8bed10200b45baa87dddec5b901f17 for service openshift-ingress/router-default with UID 6a8bed10-200b-45ba-a87d-ddec5b901f17: Failed updating LoadBalancer: GetCloudProviderVpc failed: 0 resource groups match name: ipi-dev-test-66-lrmht I1118 16:15:09.952058 1 event.go:285] Event(v1.ObjectReference{Kind:"Service", Namespace:"openshift-ingress", Name:"router-default", UID:"6a8bed10-200b-45ba-a87d-ddec5b901f17", APIVersion:"v1", ResourceVersion:"15594", FieldPath:""}): type: 'Warning' reason: 'GettingCloudLoadBalancerFailed' Error on cloud load balancer kube-ipi-dev-test-66-lrmht-6a8bed10200b45baa87dddec5b901f17 for service openshift-ingress/router-default with UID 6a8bed10-200b-45ba-a87d-ddec5b901f17: Failed getting LoadBalancer: GetCloudProviderVpc failed: 0 resource groups match name: ipi-dev-test-66-lrmht ```
This was already resolved in a PR I've just linked, moving onto QA as this will already be in the nightlies since November
Verified on 4.10.0-0.nightly-2022-01-07-004348 using automated templates https://gitlab.cee.redhat.com/aosqe/flexy-templates/-/blob/master/functionality-testing/aos-4_10/ipi-on-ibmcloud/versioned-installer (which covers the reproduced steps) with "eu-gb" region installation success. https://mastern-jenkins-csb-openshift-qe.apps.ocp-c1.prod.psi.redhat.com/job/ocp-common/job/Flexy-install/64969/console
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: OpenShift Container Platform 4.10.3 security update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2022:0056