Bug 2036029
| Summary: | New added cloud-network-config operator doesn’t supported aws sts format credential | ||||||
|---|---|---|---|---|---|---|---|
| Product: | OpenShift Container Platform | Reporter: | wang lin <lwan> | ||||
| Component: | Networking | Assignee: | Casey Callendrello <cdc> | ||||
| Networking sub component: | openshift-sdn | QA Contact: | wang lin <lwan> | ||||
| Status: | CLOSED ERRATA | Docs Contact: | |||||
| Severity: | high | ||||||
| Priority: | unspecified | CC: | anbhat, bpickard, lwan | ||||
| Version: | 4.10 | Keywords: | TestBlocker | ||||
| Target Milestone: | --- | Flags: | lwan:
needinfo-
|
||||
| Target Release: | 4.10.0 | ||||||
| Hardware: | Unspecified | ||||||
| OS: | Unspecified | ||||||
| Whiteboard: | |||||||
| Fixed In Version: | Doc Type: | If docs needed, set a value | |||||
| Doc Text: | Story Points: | --- | |||||
| Clone Of: | Environment: | ||||||
| Last Closed: | 2022-03-10 16:36:47 UTC | Type: | Bug | ||||
| Regression: | --- | Mount Type: | --- | ||||
| Documentation: | --- | CRM: | |||||
| Verified Versions: | Category: | --- | |||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||
| Embargoed: | |||||||
| Attachments: |
|
||||||
|
Description
wang lin
2021-12-29 11:13:03 UTC
The issue blocks all sts related cluster testing, add TestBlocker keywords Does this only apply to AWS? If so, this will be fixed by PR: https://github.com/openshift/cloud-network-config-controller/pull/13 Now we support token credentials only on aws and gcp, only haves issue for aws platform, will verify it with the fix PR Verified with cluster-bot image with the fix PR merged, the crash issue has fixed, but the operator hits another issue #### caused by: WebIdentityErr: unable to read file at /var/run/secrets/openshift/serviceaccount/token caused by: open /var/run/secrets/openshift/serviceaccount/token: no such file or directory, requeuing in node workqueue E0104 02:53:03.460524 1 controller.go:165] error syncing 'ip-10-0-169-247.us-east-2.compute.internal': error retrieving the private IP configuration for node: ip-10-0-169-247.us-east-2.compute.internal, err: error: cannot list ec2 instance for node: ip-10-0-169-247.us-east-2.compute.internal, err: WebIdentityErr: failed fetching WebIdentity token: caused by: WebIdentityErr: unable to read file at /var/run/secrets/openshift/serviceaccount/token caused by: open /var/run/secrets/openshift/serviceaccount/token: no such file or directory, requeuing in node workqueue @jdiaz is the developer of Cloud Credential Operator , I think he can give some information if you need help. Based on the error message, it looks like the projected ServiceAccount token has not been mounted into /var/run/secrets/openshift/serviceaccount/token. Looking at the github repo, I didn't see where the Deployment manifest is defined, but I would expect the Deployment that is defined for running this 'cloud-network-config-controller' software to have a volume mount that looks like https://github.com/openshift/cluster-image-registry-operator/blob/master/manifests/07-operator.yaml#L75-L77 The Hypershift fix also includes PR: https://github.com/openshift/cluster-network-operator/pull/1268/files which loads the project service account token. I suspect that should fix the problem completely and maybe we should retest once that PR merges. I saw another error logs, please check
###
E0113 01:50:55.536951 1 controller.go:165] error syncing 'ip-10-0-184-222.us-east-2.compute.internal': error retrieving the private IP configuration for node: ip-10-0-184-222.us-east-2.compute.internal, err: error: cannot list ec2 instance for node: ip-10-0-184-222.us-east-2.compute.internal, err: WebIdentityErr: failed to retrieve credentials
caused by: AccessDenied: Not authorized to perform sts:AssumeRoleWithWebIdentity
status code: 403, request id: 3572c767-04c8-42c0-86c1-ba0f2c5d5091, requeuing in node workqueue
E0113 01:50:55.554710 1 controller.go:165] error syncing 'ip-10-0-202-212.us-east-2.compute.internal': error retrieving the private IP configuration for node: ip-10-0-202-212.us-east-2.compute.internal, err: error: cannot list ec2 instance for node: ip-10-0-202-212.us-east-2.compute.internal, err: WebIdentityErr: failed to retrieve credentials
caused by: AccessDenied: Not authorized to perform sts:AssumeRoleWithWebIdentity
status code: 403, request id: 896bdc0d-23dd-44fa-be42-52e5a4bf7893, requeuing in node workqueue
I0113 01:53:39.559676 1 controller.go:160] Dropping key 'ip-10-0-143-215.us-east-2.compute.internal' from the node workqueue
I0113 01:53:39.578314 1 controller.go:160] Dropping key 'ip-10-0-202-212.us-east-2.compute.internal' from the node workqueue
I0113 01:53:39.578350 1 controller.go:160] Dropping key 'ip-10-0-191-236.us-east-2.compute.internal' from the node workqueue
I0113 01:53:39.578356 1 controller.go:160] Dropping key 'ip-10-0-198-79.us-east-2.compute.internal' from the node workqueue
I0113 01:53:39.578363 1 controller.go:160] Dropping key 'ip-10-0-184-222.us-east-2.compute.internal' from the node workqueue
I0113 01:53:39.578365 1 controller.go:160] Dropping key 'ip-10-0-145-35.us-east-2.compute.internal' from the node workqueue
The CredentialsRequests I used for the cloud-network-config-controller is
cat > "${creds_dir}/0000_50_cloud-network_00_credentials-request.yaml" <<EOF
---
apiVersion: cloudcredential.openshift.io/v1
kind: CredentialsRequest
metadata:
name: openshift-cloud-network-config-controller
namespace: openshift-cloud-credential-operator
spec:
providerSpec:
apiVersion: cloudcredential.openshift.io/v1
kind: AWSProviderSpec
statementEntries:
- action:
- ec2:DescribeInstances
- ec2:DescribeInstanceStatus
- ec2:DescribeInstanceTypes
- ec2:UnassignPrivateIpAddresses
- ec2:AssignPrivateIpAddresses
- ec2:UnassignIpv6Addresses
- ec2:AssignIpv6Addresses
- ec2:DescribeSubnets
- ec2:DescribeNetworkInterfaces
effect: Allow
resource: '*'
secretRef:
name: cloud-credentials
namespace: openshift-cloud-network-config-controller
serviceAccountNames:
- cloud-network-config-controllers
EOF
attanched must-gather logs
@Casey Could you please have a look at this bug? Not sure how STS credentials should work and what might be missing. PR: https://github.com/openshift/cluster-network-operator/pull/1268/files mentioned that AssumeRoleWithWebIdentity should work on AWS, but it seems it doesn't Tested using payload 4.10.0-0.nightly-2022-01-17-182202 with PR 1275 merged, the same issuse
@jdiaz Could you help take a look? I checked s3/key.json is:
###
{
"keys": [
{
"use": "sig",
"kty": "RSA",
"kid": "0zCe_XCNey2bBDh9fYuVVCitvEYP7bbKgFb_jOpuTEU",
"alg": "RS256",
"n": "uN0XAwS4shiBQinahpsbrHOCgAT4jssPWjEDs2jerBOGPETySxDlc295HkuZCb8tVLZdu3PKQlxXDEIzBuTO1DW2yWTHONVu7tnEywtuOJVgnNTu-95LU1oIFLZMnxjVmuaZSygTgDQI_p_K9wmiDmQvGudJUHVokd0N0rM63Qf1JE1UmDOgdS6jfHbDKgBLI3lZfMi7Xfdb9YMQ8JG8fIh86yB6iPWPyasNqwSAUiIoTPdnjMm6s2mFM-AVMof3z7Fs4MThFRpyNkQUKfxhBOZRsLwgoK1H0Z1N9F5VAcRTC6tIkBFYY_ty74KQtzTy1neZNam8mRVPavjGdnUbse9J3Fn7kS5d9iylGg-2WDFNnrjwUTU3Cqh8BGkFV5UubAWLjald-0-yJUEC4EuBGH0sJkYnVHTONUNFmyO4F1nbhwISZQNtjayROKDRRdSYuEkC1KKWYZiNbmn8bUCjDqD3f5Sy_D0DQEqdLbv9t5Bry0stugwwjmdS_NamMSVpZtCfVcP0B0seqMU8kIhDEE_sYQpG72NgLLgg1JZEMV-8FafoL2z3_uJ3ZEJc7ECc-nM_TEDfEPObEO6orZhK87tXfPr4bomAMZGchSEE3lZhaj5y61iRYtUPZ6aYR3n5TZKY9PRzX6Z0-oABt_bYCJsJ80bw5pkeP8Q5XOoVWF8",
"e": "AQAB"
}
]
}
Token for CCNC is:
###
$ oc rsh cloud-network-config-controller-7dbdf9bb85-4xct7
sh-4.4$ cat /var/run/secrets/openshift/serviceaccount/token | awk -F. '{ print $1 }' | base64 -d
{"alg":"RS256","kid":"0zCe_XCNey2bBDh9fYuVVCitvEYP7bbKgFb_jOpuTEU"}base64: invalid input
sh-4.4$ cat /var/run/secrets/openshift/serviceaccount/token | awk -F. '{ print $2 }' | base64 -d
{"aud":["openshift"],"exp":1642478458,"iat":1642474858,"iss":"https://lwansts0118-14958-oidc.s3.us-east-2.amazonaws.com","kubernetes.io":{"namespace":"openshift-cloud-network-config-controller","pod":{"name":"cloud-network-config-controller-7dbdf9bb85-4xct7","uid":"6f9dfb1b-f9b5-480b-ae80-70cca4b87ecc"},"serviceaccount":{"name":"cloud-network-config-controller","uid":"4b630e33-3e18-45ed-bd77-5e07f94573c5"}},"nbf":1642474858,"sub":"system:serviceaccount:openshift-cloud-network-config-controller:cloud-network-config-controller"}
I can't see what issue's on it, but the pod show this error:
###
E0118 03:08:20.806257 1 controller.go:165] error syncing 'ip-10-0-213-232.us-east-2.compute.internal': error retrieving the private IP configuration for node: ip-10-0-213-232.us-east-2.compute.internal, err: error: cannot list ec2 instance for node: ip-10-0-213-232.us-east-2.compute.internal, err: WebIdentityErr: failed to retrieve credentials
caused by: AccessDenied: Not authorized to perform sts:AssumeRoleWithWebIdentity
status code: 403, request id: a7f6aa50-fb30-40cc-99b9-ed92c6497aee, requeuing in node workqueue
also Trust Relationship:
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Principal": {
"Federated": "arn:aws:iam::301721915996:oidc-provider/lwansts0118-14958-oidc.s3.us-east-2.amazonaws.com"
},
"Action": "sts:AssumeRoleWithWebIdentity",
"Condition": {
"StringEquals": {
"lwansts0118-14958-oidc.s3.us-east-2.amazonaws.com:sub": "system:serviceaccount:openshift-cloud-network-config-controller:cloud-network-config-controllers"
}
}
}
]
}
@cdc GCP workload identity cluster has the similar issue, the installation can successful, so I didn't notice the issue before. Do you want to use one bug to track or need me to create another bug for GCP platform? ### E0118 05:22:55.659257 1 controller.go:165] error syncing 'lwanstsg0118-5lwqc-worker-c-b2g8k.c.openshift-qe.internal': error retrieving the private IP configuration for node: lwanstsg0118-5lwqc-worker-c-b2g8k.c.openshift-qe.internal, err: error retrieving instance associated with node, err: Get "https://compute.googleapis.com/compute/v1/projects/openshift-qe/zones/us-central1-c/instances/lwanstsg0118-5lwqc-worker-c-b2g8k?alt=json&prettyPrint=false": oauth2/google: status code 403: { "error": { "code": 403, "message": "The caller does not have permission", "status": "PERMISSION_DENIED" } } , requeuing in node workqueue No, PR 1283 can't fix the issue, when I create sts cluster,I already manually add serviceAccountNames field to CredentialsRequest, I use ccoctl tool to create sts related manifests follow this offical doc https://docs.openshift.com/container-platform/4.9/authentication/managing_cloud_provider_credentials/cco-mode-sts.html#sts-mode-installing , if there is no serviceAccountNames filed, ccoctl tool will hit an issue like : ##### 2021/12/07 11:02:07 Failed to process IAM Roles: Failed while processing each CredentialsRequest: error while creating Role policy document for openshift-cluster-api-aws: CredentialsRequest must provide ServieAccounts to bind the Role policy to ##### so I have manually added serviceAccountNames before I launch cluster @jdiaz , sorry for the trouble above, I checked with Casey, I gave serviceAccountNames "cloud-network-config-controllers" value, but it should be cloud-network-config-controller without 's', please ignore above comments. The permission issue has fixed, will move the bug to verified ####pod logs#### I0118 11:34:35.683463 1 leaderelection.go:258] successfully acquired lease openshift-cloud-network-config-controller/cloud-network-config-controller-lock I0118 11:34:35.684267 1 controller.go:88] Starting node controller I0118 11:34:35.684278 1 controller.go:91] Waiting for informer caches to sync for node workqueue I0118 11:34:35.684761 1 controller.go:88] Starting secret controller I0118 11:34:35.684771 1 controller.go:91] Waiting for informer caches to sync for secret workqueue I0118 11:34:35.684780 1 controller.go:88] Starting cloud-private-ip-config controller I0118 11:34:35.684786 1 controller.go:91] Waiting for informer caches to sync for cloud-private-ip-config workqueue I0118 11:34:35.687749 1 controller.go:182] Assigning key: ip-10-0-194-96.us-east-2.compute.internal to node workqueue I0118 11:34:35.688394 1 controller.go:182] Assigning key: ip-10-0-141-229.us-east-2.compute.internal to node workqueue I0118 11:34:35.688412 1 controller.go:182] Assigning key: ip-10-0-186-235.us-east-2.compute.internal to node workqueue I0118 11:34:35.784845 1 controller.go:96] Starting cloud-private-ip-config workers I0118 11:34:35.784862 1 controller.go:96] Starting node workers I0118 11:34:35.784872 1 controller.go:102] Started cloud-private-ip-config workers I0118 11:34:35.784880 1 controller.go:102] Started node workers I0118 11:34:35.784896 1 controller.go:96] Starting secret workers I0118 11:34:35.784922 1 controller.go:102] Started secret workers Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: OpenShift Container Platform 4.10.3 security update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2022:0056 |