Bug 1974716
Summary: | Using bound SA tokens causes fail to query cluster resource especially in a sts cluster | ||||||
---|---|---|---|---|---|---|---|
Product: | OpenShift Container Platform | Reporter: | wang lin <lwan> | ||||
Component: | apiserver-auth | Assignee: | Standa Laznicka <slaznick> | ||||
Status: | CLOSED ERRATA | QA Contact: | liyao | ||||
Severity: | urgent | Docs Contact: | |||||
Priority: | urgent | ||||||
Version: | 4.8 | CC: | aos-bugs, jdiaz, jialiu, lwan, mfojtik, mifiedle, surbania, tsze, wking, wsun, wzheng, xtian, yapei, yunjiang | ||||
Target Milestone: | --- | Keywords: | TestBlocker | ||||
Target Release: | 4.9.0 | ||||||
Hardware: | Unspecified | ||||||
OS: | Unspecified | ||||||
Whiteboard: | EmergencyRequest | ||||||
Fixed In Version: | Doc Type: | No Doc Update | |||||
Doc Text: | Story Points: | --- | |||||
Clone Of: | |||||||
: | 1974773 1974788 (view as bug list) | Environment: | |||||
Last Closed: | 2021-10-18 17:35:57 UTC | Type: | Bug | ||||
Regression: | --- | Mount Type: | --- | ||||
Documentation: | --- | CRM: | |||||
Verified Versions: | Category: | --- | |||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
Cloudforms Team: | --- | Target Upstream Version: | |||||
Embargoed: | |||||||
Bug Depends On: | |||||||
Bug Blocks: | 1974773, 1974788 | ||||||
Attachments: |
|
Description
wang lin
2021-06-22 11:30:36 UTC
** WARNING ** This BZ claims that this bug is of urgent severity and priority. Note that urgent priority means that you just declared emergency within engineering. Engineers are asked to stop whatever they are doing, including putting important release work on hold, potentially risking the release of OCP while working on this case. Be prepared to have a good justification ready and your own and engineering management are aware and has approved this. Urgent bugs are very expensive and have maximal management visibility. NOTE: This bug was assigned to engineering manager with severity reset to *unspecified* until the emergency is vetted and confirmed. Please do not manually override the severity. root cause analysis revealed that we were missing to set api-audiences in the api server configuration. The necessary change was done quite some time ago https://github.com/openshift/cluster-kube-apiserver-operator/pull/1050 but the changes have not been effective as the settings have been pruned away. This is fixed in https://github.com/openshift/cluster-kube-apiserver-operator/pull/1158. I was able to repro this just by providing my own Authentication CR into the installer manifests (no need to install the cluster with full STS configuration). ./openshift-install create manifests cp /path/to/custom/cluster-authentication-02-config.yaml ./manifests/ # the name of the file matters ./openshift-install create cluster the file contents look like: [jdiaz@minigoomba os-install-4.8-nightly]$ cat cco/manifests/cluster-authentication-02-config.yaml apiVersion: config.openshift.io/v1 kind: Authentication metadata: name: cluster spec: serviceAccountIssuer: https://jdiaz-a1-oidc.s3.us-east-1.amazonaws.com You don't really need to set up an S3 bucket or anything like that. You should be able to literally use the above file contents. yes, simply setting service account issuer will provoke the issuer because api-audiences keep being unset which fails validating bound SA tokens as they include audiences now. We built a cluster using cluster bot with the fix. Cluster is installed but kube-api is degraded. When opening a support case, bugzilla, or issue please include the following summary data along with any other requested information. ClusterID: a73c5589-0de8-46a3-8ae0-2f7ee293f933 ClusterVersion: Stable at "4.8.0-0.ci.test-2021-06-22-144200-ci-ln-t7q8sc2-latest" ClusterOperators: clusteroperator/kube-apiserver is degraded because InstallerPodContainerWaitingDegraded: Pod "installer-5-ci-ln-t7q8sc2-f76d1-2vw48-master-1" on node "ci-ln-t7q8sc2-f76d1-2vw48-master-1" container "installer" is waiting since 2021-06-22 15:00:36 +0000 UTC because ContainerCreating as discussed on slack the failures seen in https://bugzilla.redhat.com/show_bug.cgi?id=1974716#c7 are not related to this bug. Verified on 4.8.0-0.nightly-2021-06-23-232238 1. provide my own Authentication CR into cluster manifests $ oc get Authentication cluster -o json | jq -r ".spec" { "oauthMetadata": { "name": "" }, "serviceAccountIssuer": "https://a-lwansts-480-021932120336748-oidc.s3.us-east-2.amazonaws.com", "type": "", "webhookTokenAuthenticator": { "kubeConfig": { "name": "webhook-authentication-integrated-oauth" } } } 2. launch an install 3. there is no longer Unauthorized keywords sorry, the above info is for 4.8 verify, move this one to ON_QA firstly, and will verify it with 4.9 payload. Tested with cluster 4.9.0-0.nightly-2021-06-23-160041 1. launch a 4.9 sts cluster and check the result, expected successful cluster is available 2. check the must-gather logs, no 'Unauthorized' keywords is found, which is the expected 3. monitor the cluster status and check the result, no exception found Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: OpenShift Container Platform 4.9.0 bug fix and security update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2021:3759 |