Bug 2019219 - [IBMCLOUD]: cloud-provider-ibm missing IAM permissions in CCCMO CredentialRequest
Summary: [IBMCLOUD]: cloud-provider-ibm missing IAM permissions in CCCMO CredentialReq...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Cloud Compute
Version: 4.10
Hardware: Unspecified
OS: Unspecified
unspecified
high
Target Milestone: ---
: 4.10.0
Assignee: Joel Speed
QA Contact: Huali Liu
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2021-11-01 21:31 UTC by Christopher J Schaefer
Modified: 2022-04-11 08:33 UTC (History)
1 user (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2022-03-10 16:24:30 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github openshift cluster-cloud-controller-manager-operator pull 147 0 None Merged [Bug 2019219] IBMCloud: Add RG IAM permissions 2022-01-06 15:55:11 UTC
Red Hat Product Errata RHSA-2022:0056 0 None None None 2022-03-10 16:24:44 UTC

Description Christopher J Schaefer 2021-11-01 21:31:31 UTC
Description of problem:
The CCM for IBM Cloud (cloud-provider-ibm), is not able operate properly in OCP due to missing IAM permissions in the CredentialsRequest in the cluster-cloud-controller-manager-operator (CCCMO). Without these permissions, IPI installations fail.

Version-Release number of selected component (if applicable):
4.10


How reproducible:
All current IPI installs on IBM Cloud

Steps to Reproduce:
1. Create the IPI manifests (openshift-install create manifests)
2. Create the Secrets via ccoctl (ccoctl ibmcloud create-service-id ...)
3. Create the IPI cluster (openshift-install create cluster)

Actual results:
IPI installation fails when the IBM Cloud CCM is not able to initialize master nodes
```
I1101 20:42:38.368482       1 node_controller.go:390] Initializing node ipi-dev-test-60-gdjhw-master-0 with cloud provider
E1101 20:42:38.372191       1 node_controller.go:212] error syncing 'ipi-dev-test-60-gdjhw-master-0': failed to get provider ID for node ipi-dev-test-60-gdjhw-master-0 at cloudprovider: failed to get instance ID from cloud provider: node is missing labels, requeuing
I1101 20:42:38.373877       1 shared_informer.go:247] Caches are synced for service 
I1101 20:42:38.378082       1 node_controller.go:390] Initializing node ipi-dev-test-60-gdjhw-master-0 with cloud provider
E1101 20:42:38.393695       1 node_controller.go:212] error syncing 'ipi-dev-test-60-gdjhw-master-0': failed to get provider ID for node ipi-dev-test-60-gdjhw-master-0 at cloudprovider: failed to get instance ID from cloud provider: node is missing labels, requeuing
E1101 20:42:38.399216       1 node_controller.go:241] Error getting instance metadata for node addresses: error fetching node by provider ID: unimplemented, and error by node name: node is missing labels
I1101 20:42:38.404330       1 node_controller.go:390] Initializing node ipi-dev-test-60-gdjhw-master-0 with cloud provider
E1101 20:42:38.406667       1 node_controller.go:212] error syncing 'ipi-dev-test-60-gdjhw-master-0': failed to get provider ID for node ipi-dev-test-60-gdjhw-master-0 at cloudprovider: failed to get instance ID from cloud provider: node is missing labels, requeuing
I1101 20:42:38.427005       1 node_controller.go:390] Initializing node ipi-dev-test-60-gdjhw-master-0 with cloud provider
E1101 20:42:38.429646       1 node_controller.go:212] error syncing 'ipi-dev-test-60-gdjhw-master-0': failed to get provider ID for node ipi-dev-test-60-gdjhw-master-0 at cloudprovider: failed to get instance ID from cloud provider: node is missing labels, requeuing
```

Expected results:

An operational cluster is deployed

Additional info:

IBM Cloud is currently working on an update to CCCMO's CredentialsRequest for IBM Cloud to address this issue.

Comment 2 Christopher J Schaefer 2021-11-18 16:18:26 UTC
Update on error details, previous error log was due to missing changes in openshift/cloud-provider-ibm fork

The permissions are still not correct and require an update to the CCCMO's CredentialsRequest, as the CCM cannot list resource groups

```
I1118 16:15:07.691626       1 ibm_vpc_loadbalancer.go:421] Processing line: ERROR: GetCloudProviderVpc failed: 0 resource groups match name: ipi-dev-test-66-lrmht
E1118 16:15:07.691632       1 ibm_vpc_loadbalancer.go:518] ERROR: GetCloudProviderVpc failed: 0 resource groups match name: ipi-dev-test-66-lrmht
I1118 16:15:07.691635       1 ibm_vpc_loadbalancer.go:421] Processing line: 
I1118 16:15:07.691637       1 ibm_vpc_loadbalancer.go:421] Processing line: 
I1118 16:15:08.680968       1 ibm_vpc_loadbalancer.go:289] [16:15:06.4564] Entering UpdateLoadBalancer(kube-ipi-dev-test-66-lrmht-6a8bed10200b45baa87dddec5b901f17, openshift-ingress/router-default)
E1118 16:15:08.680987       1 ibm_vpc_loadbalancer.go:284] GetCloudProviderVpc failed: 0 resource groups match name: ipi-dev-test-66-lrmht
I1118 16:15:08.681047       1 ibm_vpc_loadbalancer.go:115] GetLoadBalancer(kube-ipi-dev-test-66-lrmht-6a8bed10200b45baa87dddec5b901f17, kubernetes)
I1118 16:15:08.681124       1 event.go:285] Event(v1.ObjectReference{Kind:"Service", Namespace:"openshift-ingress", Name:"router-default", UID:"6a8bed10-200b-45ba-a87d-ddec5b901f17", APIVersion:"v1", ResourceVersion:"15594", FieldPath:""}): type: 'Warning' reason: 'UpdatingCloudLoadBalancerFailed' Error on cloud load balancer kube-ipi-dev-test-66-lrmht-6a8bed10200b45baa87dddec5b901f17 for service openshift-ingress/router-default with UID 6a8bed10-200b-45ba-a87d-ddec5b901f17: Failed updating LoadBalancer: GetCloudProviderVpc failed: 0 resource groups match name: ipi-dev-test-66-lrmht
I1118 16:15:09.951926       1 ibm_vpc_loadbalancer.go:138] [16:15:08.6944] Entering StatusLoadBalancer(kube-ipi-dev-test-66-lrmht-6a8bed10200b45baa87dddec5b901f17)
E1118 16:15:09.951955       1 ibm_vpc_loadbalancer.go:133] GetCloudProviderVpc failed: 0 resource groups match name: ipi-dev-test-66-lrmht
E1118 16:15:09.952005       1 controller.go:824] failed to check if load balancer exists for service openshift-ingress/router-default: Error on cloud load balancer kube-ipi-dev-test-66-lrmht-6a8bed10200b45baa87dddec5b901f17 for service openshift-ingress/router-default with UID 6a8bed10-200b-45ba-a87d-ddec5b901f17: Failed getting LoadBalancer: GetCloudProviderVpc failed: 0 resource groups match name: ipi-dev-test-66-lrmht
E1118 16:15:09.952064       1 controller.go:765] failed to update load balancer hosts for service openshift-ingress/router-default: Error on cloud load balancer kube-ipi-dev-test-66-lrmht-6a8bed10200b45baa87dddec5b901f17 for service openshift-ingress/router-default with UID 6a8bed10-200b-45ba-a87d-ddec5b901f17: Failed updating LoadBalancer: GetCloudProviderVpc failed: 0 resource groups match name: ipi-dev-test-66-lrmht
I1118 16:15:09.952058       1 event.go:285] Event(v1.ObjectReference{Kind:"Service", Namespace:"openshift-ingress", Name:"router-default", UID:"6a8bed10-200b-45ba-a87d-ddec5b901f17", APIVersion:"v1", ResourceVersion:"15594", FieldPath:""}): type: 'Warning' reason: 'GettingCloudLoadBalancerFailed' Error on cloud load balancer kube-ipi-dev-test-66-lrmht-6a8bed10200b45baa87dddec5b901f17 for service openshift-ingress/router-default with UID 6a8bed10-200b-45ba-a87d-ddec5b901f17: Failed getting LoadBalancer: GetCloudProviderVpc failed: 0 resource groups match name: ipi-dev-test-66-lrmht
```

Comment 3 Joel Speed 2022-01-06 15:55:12 UTC
This was already resolved in a PR I've just linked, moving onto QA as this will already be in the nightlies since November

Comment 4 Huali Liu 2022-01-10 02:33:42 UTC
Verified on 4.10.0-0.nightly-2022-01-07-004348
using automated templates https://gitlab.cee.redhat.com/aosqe/flexy-templates/-/blob/master/functionality-testing/aos-4_10/ipi-on-ibmcloud/versioned-installer (which covers the reproduced steps) with "eu-gb" region installation success.
https://mastern-jenkins-csb-openshift-qe.apps.ocp-c1.prod.psi.redhat.com/job/ocp-common/job/Flexy-install/64969/console

Comment 7 errata-xmlrpc 2022-03-10 16:24:30 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Container Platform 4.10.3 security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2022:0056


Note You need to log in before you can comment on or make changes to this bug.