Bug 1992777

Summary: [IBMCLOUD] Default "ibm_iam_authorization_policy" is not working as expected in all scenarios
Product: OpenShift Container Platform Reporter: Pedro Amoedo <pamoedom>
Component: InstallerAssignee: Jeremiah Stuever <jstuever>
Installer sub component: openshift-installer QA Contact: Pedro Amoedo <pamoedom>
Status: CLOSED ERRATA Docs Contact:
Severity: medium    
Priority: unspecified CC: mstaeble
Version: 4.9   
Target Milestone: ---   
Target Release: 4.10.0   
Hardware: x86_64   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: No Doc Update
Doc Text:
New feature
Story Points: ---
Clone Of: Environment:
Last Closed: 2022-03-10 16:05:09 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
openshift_install.log none

Description Pedro Amoedo 2021-08-11 17:56:50 UTC
Created attachment 1813233 [details]
openshift_install.log

Version:

Using locally compiled version to enable ibmcloud testing:

$ ./openshift-install-local version
./openshift-install-local unreleased-master-4913-ga78f4df65e96de7e0ea20caa91928665243c26d6-dirty
built from commit a78f4df65e96de7e0ea20caa91928665243c26d6
release image registry.ci.openshift.org/origin/release:4.8

But using 4.9.0-0.nightly-2021-08-07-175228 payload as follows:

$ export OPENSHIFT_INSTALL_RELEASE_IMAGE_OVERRIDE=registry.ci.openshift.org/ocp/release@sha256:62f43b04f6af74ea568c135367b33f7cc8212ca4aa760388ea7884ebccbfb1f1


Platform:

IBMCloud
Api-key of type "user" with PowerUser access group.
CIS instance in place with base domain.

Please specify:

IPI

What happened?

Installation aborts with the following error when Terraform tries to create "ibm_is_image":

~~~
ERROR                                              
ERROR Error: [DEBUG] Image creation err The request is not authorized to access the Cloud Object Storage resource. 
ERROR {                                            
ERROR     "StatusCode": 403,                       
ERROR     "Headers": {                             
ERROR         "Cache-Control": [                   
ERROR             "max-age=0, no-cache, no-store, must-revalidate" 
ERROR         ],                                   
ERROR         "Cf-Cache-Status": [                 
ERROR             "DYNAMIC"                        
ERROR         ],                                   
ERROR         "Cf-Ray": [                          
ERROR             "67d0184a48f014f5-MAD"           
ERROR         ],                                   
ERROR         "Connection": [                      
ERROR             "keep-alive"                     
ERROR         ],                                   
ERROR         "Content-Type": [                    
ERROR             "application/json; charset=utf-8" 
ERROR         ],                                   
ERROR         "Date": [                            
ERROR             "Wed, 11 Aug 2021 08:29:14 GMT"  
ERROR         ],                                   
ERROR         "Expect-Ct": [                       
ERROR             "max-age=604800, report-uri=\"https://report-uri.cloudflare.com/cdn-cgi/beacon/expect-ct\"" 
ERROR         ],                                   
ERROR         "Expires": [                         
ERROR             "-1"                             
ERROR         ],                                   
ERROR         "Pragma": [                          
ERROR             "no-cache"                       
ERROR         ],                                   
ERROR         "Server": [                          
ERROR             "cloudflare"                     
ERROR         ],                                   
ERROR         "Strict-Transport-Security": [       
ERROR             "max-age=31536000; includeSubDomains" 
ERROR         ],                                   
ERROR         "Transaction-Id": [                  
ERROR             "5b1d586232c8691f21fc194e360e412d" 
ERROR         ],                                   
ERROR         "Vary": [                            
ERROR             "Accept-Encoding"                
ERROR         ],                                   
ERROR         "X-Content-Type-Options": [          
ERROR             "nosniff"                        
ERROR         ],                                   
ERROR         "X-Request-Id": [                    
ERROR             "5b1d586232c8691f21fc194e360e412d" 
ERROR         ],                                   
ERROR         "X-Xss-Protection": [                
ERROR             "1; mode=block"                  
ERROR         ]                                    
ERROR     },                                       
ERROR     "Result": {                              
ERROR         "errors": [                          
ERROR             {                                
ERROR                 "code": "cos_not_authorized", 
ERROR                 "message": "The request is not authorized to access the Cloud Object Storage resource.", 
ERROR                 "more_info": "http://cloud.ibm.com/docs/vpc-on-classic?topic=vpc-on-classic-rias-error-messages#cos_not_authorized", 
ERROR                 "target": {                  
ERROR                     "name": "file.href",     
ERROR                     "type": "field"          
ERROR                 }                            
ERROR             }                                
ERROR         ],                                   
ERROR         "trace": "5b1d586232c8691f21fc194e360e412d" 
ERROR     },                                       
ERROR     "RawResult": null                        
ERROR }                                            
ERROR                                              
ERROR                                              
ERROR   on ../../../../tmp/openshift-install--244302831/image/main.tf line 28, in resource "ibm_is_image" "image": 
ERROR   28: resource "ibm_is_image" "image" {      
ERROR                                              
ERROR                                              
FATAL failed to fetch Cluster: failed to generate asset "Cluster": failed to create cluster: failed to apply Terraform: failed to complete the change
~~~

(see attached "openshift_install.log" for more details if needed)

What did you expect to happen?

Having the "ibm_iam_authorization_policy" in place with the proper "target_resource_instance_id", the access should be allowed.

How to reproduce it (as minimally and precisely as possible)?

IBMCloud IPI installation on a new account with all necessary permissions in place for the API-key.

Anything else we need to know?

I already have a workaround in place that does the trick, instead of using "target_resource_instance_id" within the "ibm_iam_authorization_policy", I have modified the code from "data/data/ibmcloud/image/main.tf" as follows to use "target_resource_group_id":

~~~
resource "ibm_iam_authorization_policy" "policy" {
  source_service_name         = "is"
  source_resource_type        = "image"
  target_service_name         = "cloud-object-storage"
  target_resource_group_id    = var.resource_group_id
  roles                       = ["Reader"]
}
~~~

On this manner, the policy is more wide because it opens the Read access to all COS instances running within the same ResourceGroup, but is a possible solution to be considered.

Best Regards.

Comment 1 Pedro Amoedo 2021-08-12 20:16:30 UTC
[UPDATE]

I've realized that opening the policy to "target_resource_group_id" is problematic because it could match with a previous existing one and abort the installation.

Investigating more about how the "ibm_iam_authorization_policy" policy is being internally dumped via ibmcloud cli, I noticed that default one using "target_resource_instance_id = var.cos_resource_instance_id", despite showing the full CRN of the COS object in the authorization web interface, is not showing the same via cli, example:

~~~
$ ibmcloud iam authorization-policies

ID:                        03189115-465e-44f4-a06d-bf3c6831c41c   
Source service name:       is   
Source service instance:   All instances   
Source resource type:      image   
Target service name:       cloud-object-storage   
Target service instance:   All instances   
Roles:                     Reader
~~~

And this policy, despite matching "All instances", is the one failing to allow the access when the "ibm_is_image" is being created.

Making different tests, I've been able to conclude that the proper string that makes the trick is just the instanceID but without the full CRN, example of a working policy:

~~~
ID:                        53dede15-4207-4a47-a447-246a3a6c0988   
Source service name:       is   
Source service instance:   All instances   
Source resource type:      image   
Target service name:       cloud-object-storage   
Target service instance:   91410c53-6bfa-4366-89ed-d148ecd7653b   
Roles:                     Reader
~~~

If you create the policy via webUI, the assistant has an extra field not available via Terraform that allows you to select the "Service Instance" directly from a drop-down menu by its name but internally referencing by the short ID.

Based on this I've modified a little the code (see https://github.com/openshift/installer/pull/5147) to properly extract the short ID from the CRN (if needed) and properly remove the object when calling the destroy operation:

[data/data/ibmcloud/image/main.tf]

~~~
$ diff data/data/ibmcloud/image/main.tf ../upstream-repos/installer/data/data/ibmcloud/image/main.tf
24c24
<   target_resource_instance_id = length(split(":", var.cos_resource_instance_id)) >= 8 ? "${element(split(":", var.cos_resource_instance_id),7)}" : var.cos_resource_instance_id
---
>   target_resource_instance_id = var.cos_resource_instance_id
~~~

[pkg/destroy/ibmcloud/cloudobjectstorage.go]

~~~
$ diff pkg/destroy/ibmcloud/cloudobjectstorage.go ../upstream-repos/installer/pkg/destroy/ibmcloud/cloudobjectstorage.go
6d5
< 	"strings"
122,130c121,122
< 			split := strings.Split(instance.id, ":")
< 			if len(split) >= 8 { //CRN string
< 				o.cosInstanceID = split[7]
< 			} else {
< 				o.cosInstanceID = instance.id
< 			}
< 			return o.cosInstanceID, nil
---
> 			o.cosInstanceID = instance.id
> 			return instance.id, nil
~~~

Best Regards.

Comment 2 Pedro Amoedo 2021-08-12 20:39:23 UTC
BTW, forgot to mention that installation went ahead with this patch, the policy worked as expected and the nodes were created with the custom image, however it finally failed at other point because ibmcloud implementation is still missing some pieces like ingress LB, work nodes, etc.

FWIW, I'm posting the diff again with a proper format (git diff) against the upstream repo for better visibility of the changes:

~~~
[pamoedo@p50 installer] $ git diff upstream/master pamoedom-bz1992777
diff --git a/data/data/ibmcloud/image/main.tf b/data/data/ibmcloud/image/main.tf
index a8502d6c6..c2397c1af 100644
--- a/data/data/ibmcloud/image/main.tf
+++ b/data/data/ibmcloud/image/main.tf
@@ -21,7 +21,7 @@ resource "ibm_iam_authorization_policy" "policy" {
   source_service_name         = "is"
   source_resource_type        = "image"
   target_service_name         = "cloud-object-storage"
-  target_resource_instance_id = var.cos_resource_instance_id
+  target_resource_instance_id = length(split(":", var.cos_resource_instance_id)) >= 8 ? "${element(split(":", var.cos_resource_instance_id),7)}" : var.cos_resource_instance_id
   roles                       = ["Reader"]
 }
 
diff --git a/pkg/destroy/ibmcloud/cloudobjectstorage.go b/pkg/destroy/ibmcloud/cloudobjectstorage.go
index 6d1f33dca..5ab7f142c 100644
--- a/pkg/destroy/ibmcloud/cloudobjectstorage.go
+++ b/pkg/destroy/ibmcloud/cloudobjectstorage.go
@@ -3,6 +3,7 @@ package ibmcloud
 import (
        "fmt"
        "net/http"
+       "strings"
 
        "github.com/pkg/errors"
 )
@@ -118,8 +119,13 @@ func (o *ClusterUninstaller) COSInstanceID() (string, error) {
        // Locate the installer's COS instance by name.
        for _, instance := range instanceList {
                if instance.name == fmt.Sprintf("%s-cos", o.InfraID) {
-                       o.cosInstanceID = instance.id
-                       return instance.id, nil
+                        split := strings.Split(instance.id, ":")
+                        if len(split) >= 8 { //CRN string
+                                o.cosInstanceID = split[7]
+                        } else {
+                                o.cosInstanceID = instance.id
+                        }
+                        return o.cosInstanceID, nil
                }
        }
        return "", errors.Errorf("COS instance not found")
~~~

Regards.

Comment 3 Pedro Amoedo 2021-08-26 14:31:47 UTC
[UPDATE]

Still failing with "4.9.0-0.nightly-2021-08-26-040328":

~~~
ERROR                                              
ERROR Error: [DEBUG] Image creation err The request is not authorized to access the Cloud Object Storage resource. 
ERROR {                                            
ERROR     "StatusCode": 403,                       
ERROR     "Headers": {                             
ERROR         "Cache-Control": [                   
ERROR             "max-age=0, no-cache, no-store, must-revalidate" 
ERROR         ],                                   
ERROR         "Cf-Cache-Status": [                 
ERROR             "DYNAMIC"                        
ERROR         ],                                   
ERROR         "Cf-Ray": [                          
ERROR             "684db622dfb3115d-MAD"           
ERROR         ],                                   
ERROR         "Connection": [                      
ERROR             "keep-alive"                     
ERROR         ],                                   
ERROR         "Content-Type": [                    
ERROR             "application/json; charset=utf-8" 
ERROR         ],                                   
ERROR         "Date": [                            
ERROR             "Thu, 26 Aug 2021 14:22:20 GMT"  
ERROR         ],                                   
ERROR         "Expect-Ct": [                       
ERROR             "max-age=604800, report-uri=\"https://report-uri.cloudflare.com/cdn-cgi/beacon/expect-ct\"" 
ERROR         ],                                   
ERROR         "Expires": [                         
ERROR             "-1"                             
ERROR         ],                                   
ERROR         "Pragma": [                          
ERROR             "no-cache"                       
ERROR         ],                                   
ERROR         "Server": [                          
ERROR             "cloudflare"                     
ERROR         ],                                   
ERROR         "Strict-Transport-Security": [       
ERROR             "max-age=31536000; includeSubDomains" 
ERROR         ],                                   
ERROR         "Transaction-Id": [                  
ERROR             "a1880a9c221af03a604f816b4da644be" 
ERROR         ],                                   
ERROR         "Vary": [                            
ERROR             "Accept-Encoding"                
ERROR         ],                                   
ERROR         "X-Content-Type-Options": [          
ERROR             "nosniff"                        
ERROR         ],                                   
ERROR         "X-Request-Id": [                    
ERROR             "a1880a9c221af03a604f816b4da644be" 
ERROR         ],                                   
ERROR         "X-Xss-Protection": [                
ERROR             "1; mode=block"                  
ERROR         ]                                    
ERROR     },                                       
ERROR     "Result": {                              
ERROR         "errors": [                          
ERROR             {                                
ERROR                 "code": "cos_not_authorized", 
ERROR                 "message": "The request is not authorized to access the Cloud Object Storage resource.", 
ERROR                 "more_info": "http://cloud.ibm.com/docs/vpc-on-classic?topic=vpc-on-classic-rias-error-messages#cos_not_authorized", 
ERROR                 "target": {                  
ERROR                     "name": "file.href",     
ERROR                     "type": "field"          
ERROR                 }                            
ERROR             }                                
ERROR         ],                                   
ERROR         "trace": "a1880a9c221af03a604f816b4da644be" 
ERROR     },                                       
ERROR     "RawResult": null                        
ERROR }                                            
ERROR                                              
ERROR                                              
ERROR   on ../../../../tmp/openshift-install-network-235349676/image/main.tf line 28, in resource "ibm_is_image" "image": 
ERROR   28: resource "ibm_is_image" "image" {      
ERROR                                              
ERROR                                              
FATAL failed to fetch Cluster: failed to generate asset "Cluster": failed to create cluster: failed to apply Terraform: failed to complete the change
~~~

Comment 9 errata-xmlrpc 2022-03-10 16:05:09 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Container Platform 4.10.3 security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2022:0056