Created attachment 1792992 [details] must-gather Created attachment 1792992 [details] must-gather Description of problem: Because of recently SA token mechanism change, when install an sts cluster, almost all components hit the Unauthorized issue when query cluster resources, so the installation will fail. the sts cluster install different from normal install is that it will modify authentications.config.openshift.io "cluster" objects spec.serviceAccountIssuer to s3 bucket URL and inject a bound-service-account-signing-key before install. #authentications cr oc get authentications cluster -o json | jq -r ".spec" { "oauthMetadata": { "name": "" }, "serviceAccountIssuer": "https://a-lwansts49-0622-021958659976987-oidc.s3.us-east-2.amazonaws.com", "type": "", "webhookTokenAuthenticator": { "kubeConfig": { "name": "webhook-authentication-integrated-oauth" } } } Version-Release number of selected component (if applicable): 4.9.0-0.nightly-2021-06-21-191858 4.8.0-0.nightly-2021-06-21-175537 How reproducible: always Steps to Reproduce: 1. install an sts cluster , refer to doc https://docs.openshift.com/container-platform/4.8/authentication/managing_cloud_provider_credentials/cco-mode-sts.html#sts-mode-installing Actual results: installation fail, many components hit Unauthorized issue Expected results: the installation should success. Additional info: list several error messages: ## namespaces/openshift-apiserver-operator/pods/openshift-apiserver-operator-7f4c8855f4-v7x57/openshift-apiserver-operator/openshift-apiserver-operator/logs/previous.log:2021-06-22T03:18:17.951577177Z W0622 03:18:17.951539 1 builder.go:209] unable to get owner reference (falling back to namespace): Unauthorized namespaces/openshift-cluster-node-tuning-operator/pods/tuned-tb9vc/tuned/tuned/logs/current.log:2021-06-22T02:36:47.644895574Z E0622 02:36:47.644796 7288 reflector.go:138] github.com/openshift/cluster-node-tuning-operator/pkg/generated/informers/externalversions/factory.go:101: Failed to watch *v1.Profile: failed to list *v1.Profile: Unauthorized namespaces/openshift-oauth-apiserver/pods/apiserver-c9687d7b8-w7t62/oauth-apiserver/oauth-apiserver/logs/current.log:2021-06-22T03:04:57.275692572Z E0622 03:04:57.275642 1 webhook.go:202] Failed to make webhook authorizer request: Unauthorized namespaces/openshift-oauth-apiserver/pods/apiserver-c9687d7b8-w7t62/oauth-apiserver/oauth-apiserver/logs/current.log:2021-06-22T03:04:59.059527999Z E0622 03:04:59.059480 1 reflector.go:138] k8s.io/client-go.0/tools/cache/reflector.go:167: Failed to watch *v1.ConfigMap: failed to list *v1.ConfigMap: Unauthorized namespaces/openshift-config-operator/pods/openshift-config-operator-57c7c966d9-kgr8j/openshift-config-operator/openshift-config-operator/logs/previous.log:2021-06-22T03:16:24.630521031Z F0622 03:16:24.630484 1 cmd.go:129] unable to load configmap based request-header-client-ca-file: Unauthorized namespaces/openshift-kube-controller-manager-operator/pods/kube-controller-manager-operator-d9b645cb8-k828k/kube-controller-manager-operator/kube-controller-manager-operator/logs/previous.log:2021-06-22T03:18:47.434633867Z W0622 03:18:47.434601 1 builder.go:209] unable to get owner reference (falling back to namespace): Unauthorized namespaces/openshift-kube-controller-manager-operator/pods/kube-controller-manager-operator-d9b645cb8-k828k/kube-controller-manager-operator/kube-controller-manager-operator/logs/previous.log:2021-06-22T03:19:17.764932006Z F0622 03:19:17.764878 1 cmd.go:129] unable to load configmap based request-header-client-ca-file: Unauthorized namespaces/openshift-machine-config-operator/pods/machine-config-daemon-w7s8h/machine-config-daemon/machine-config-daemon/logs/current.log:2021-06-22T02:53:40.827737185Z E0622 02:53:40.827703 6350 reflector.go:138] k8s.io/client-go/informers/factory.go:134: Failed to watch *v1.Node: failed to list *v1.Node: Unauthorized namespaces/openshift-kube-apiserver-operator/pods/kube-apiserver-operator-684684c77b-stn2t/kube-apiserver-operator/kube-apiserver-operator/logs/previous.log:2021-06-22T03:18:36.426764094Z W0622 03:18:36.426727 1 builder.go:209] unable to get owner reference (falling back to namespace): Unauthorized namespaces/openshift-kube-apiserver-operator/pods/kube-apiserver-operator-684684c77b-stn2t/kube-apiserver-operator/kube-apiserver-operator/logs/previous.log:2021-06-22T03:19:06.936142534Z F0622 03:19:06.936091 1 cmd.go:129] unable to load configmap based request-header-client-ca-file: Unauthorized namespaces/openshift-dns-operator/pods/dns-operator-5968756979-rqxk2/dns-operator/dns-operator/logs/current.log:2021-06-22T02:36:44.955384531Z E0622 02:36:44.952537 1 reflector.go:138] sigs.k8s.io/controller-runtime/pkg/cache/internal/informers_map.go:229: Failed to watch *v1.DNS: failed to list *v1.DNS: Unauthorized namespaces/openshift-controller-manager/pods/controller-manager-s9mqg/controller-manager/controller-manager/logs/current.log:2021-06-22T02:37:07.480859735Z E0622 02:37:07.480826 1 leaderelection.go:325] error retrieving resource lock openshift-controller-manager/openshift-master-controllers: Unauthorized amespaces/openshift-controller-manager/pods/controller-manager-s9mqg/controller-manager/controller-manager/logs/previous.log:2021-06-22T02:36:43.626178769Z W0622 02:36:43.625993 1 client_builder_dynamic.go:197] get or create service account failed: Unauthorized namespaces/openshift-controller-manager/pods/controller-manager-s9mqg/controller-manager/controller-manager/logs/previous.log:2021-06-22T02:36:43.626178769Z W0622 02:36:43.626000 1 client_builder_dynamic.go:197] get or create service account failed: Unauthorized namespaces/openshift-kube-storage-version-migrator/pods/migrator-667f984cd-l56qq/migrator/migrator/logs/current.log:2021-06-22T02:36:44.891897026Z E0622 02:36:44.891855 1 reflector.go:138] k8s.io/client-go.0/tools/cache/reflector.go:167: Failed to watch *v1alpha1.StorageVersionMigration: failed to list *v1alpha1.StorageVersionMigration: Unauthorized namespaces/openshift-etcd-operator/pods/etcd-operator-f8b48cb5b-phjsc/etcd-operator/etcd-operator/logs/previous.log:2021-06-22T03:18:47.226949138Z F0622 03:18:47.226900 1 cmd.go:129] unable to load configmap based request-header-client-ca-file: Unauthorized namespaces/openshift-sdn/pods/sdn-controller-bcgqd/sdn-controller/sdn-controller/logs/current.log:2021-06-22T02:36:44.294778738Z E0622 02:36:44.294736 1 leaderelection.go:325] error retrieving resource lock openshift-sdn/openshift-network-controller: Unauthorized namespaces/openshift-monitoring/pods/cluster-monitoring-operator-97cb5bfb9-2trcq/cluster-monitoring-operator/cluster-monitoring-operator/logs/current.log:2021-06-22T02:36:45.495872596Z E0622 02:36:45.495829 1 reflector.go:138] github.com/openshift/cluster-monitoring-operator/pkg/operator/operator.go:292: Failed to watch *v1.Infrastructure: failed to list *v1.Infrastructure: Unauthorized namespaces/openshift-kube-apiserver/pods/kube-apiserver-ip-10-0-208-125.us-east-2.compute.internal/kube-apiserver/kube-apiserver/logs/current.log:2021-06-22T02:37:23.516768180Z E0622 02:37:23.516713 20 controller.go:116] loading OpenAPI spec for "v1.packages.operators.coreos.com" failed with: failed to retrieve openAPI spec, http error: ResponseCode: 500, Body: Internal Server Error: "/openapi/v2": Unauthorized namespaces/openshift-kube-scheduler/pods/installer-4-ip-10-0-157-106.us-east-2.compute.internal/installer-4-ip-10-0-157-106.us-east-2.compute.internal.yaml: W0622 02:35:41.683994 1 recorder.go:198] Error creating event &Event{ObjectMeta:{openshift-kube-scheduler.168ac6ee2130ec5a openshift-kube-scheduler 0 0001-01-01 00:00:00 +0000 UTC <nil> <nil> map[] map[] [] [] []},InvolvedObject:ObjectReference{Kind:Namespace,Namespace:openshift-kube-scheduler,Name:openshift-kube-scheduler,UID:,APIVersion:v1,ResourceVersion:,FieldPath:,},Reason:StaticPodInstallerFailed,Message:Installing revision 4: Unauthorized,Source:EventSource{Component:static-pod-installer,Host:,},FirstTimestamp:2021-06-22 02:35:41.681855578 +0000 UTC m=+0.419368009,LastTimestamp:2021-06-22 02:35:41.681855578 +0000 UTC m=+0.419368009,Count:1,Type:Warning,EventTime:0001-01-01 00:00:00 +0000 UTC,Series:nil,Action:,Related:nil,ReportingController:,ReportingInstance:,}: Unauthorized amespaces/openshift-operator-lifecycle-manager/pods/catalog-operator-75466b47d8-xbknh/catalog-operator/catalog-operator/logs/current.log:2021-06-22T02:36:43.971404761Z E0622 02:36:43.971385 1 queueinformer_operator.go:290] sync {"update" "openshift-marketplace/redhat-marketplace"} failed: couldn't ensure registry server - error ensuring service account: redhat-marketplace: Unauthorized namespaces/openshift-operator-lifecycle-manager/pods/catalog-operator-75466b47d8-xbknh/catalog-operator/catalog-operator/logs/current.log:2021-06-22T02:36:44.063298556Z time="2021-06-22T02:36:44Z" level=error msg="error getting catalogsource - Unauthorized" id=pANKp source=redhat-marketplace namespaces/openshift-kube-storage-version-migrator-operator/pods/kube-storage-version-migrator-operator-869b7c9756-2rtw4/kube-storage-version-migrator-operator/kube-storage-version-migrator-operator/logs/previous.log:2021-06-22T03:18:34.430324747Z W0622 03:18:34.430292 1 builder.go:209] unable to get owner reference (falling back to namespace): Unauthorized namespaces/openshift-cluster-machine-approver/pods/machine-approver-6b8dc75767-x2mck/machine-approver-controller/machine-approver-controller/logs/current.log:2021-06-22T02:36:44.581337132Z E0622 02:36:44.581297 1 reflector.go:138] sigs.k8s.io/controller-runtime/pkg/cache/internal/informers_map.go:241: Failed to watch *v1.CertificateSigningRequest: failed to list *v1.CertificateSigningRequest: Unauthorized namespaces/openshift-cluster-csi-drivers/pods/aws-ebs-csi-driver-operator-775cdf957d-k8p4z/aws-ebs-csi-driver-operator/aws-ebs-csi-driver-operator/logs/previous.log:2021-06-22T03:19:34.297851008Z F0622 03:19:34.297786 1 cmd.go:129] unable to load configmap based request-header-client-ca-file: Unauthorized namespaces/openshift-cluster-csi-drivers/pods/aws-ebs-csi-driver-controller-7f864cf74-zhxhg/csi-resizer/csi-resizer/logs/current.log:2021-06-22T02:36:53.305533645Z E0622 02:36:53.305492 1 leaderelection.go:325] error retrieving resource lock openshift-cluster-csi-drivers/external-resizer-ebs-csi-aws-com: Unauthorized namespaces/openshift-multus/pods/network-metrics-daemon-sggzf/network-metrics-daemon/network-metrics-daemon/logs/current.log:2021-06-22T02:36:43.967856968Z E0622 02:36:43.967216 1 reflector.go:178] github.com/openshift/network-metrics-daemon/main.go:82: Failed to list *v1.Pod: Unauthorized namespaces/openshift-multus/pods/network-metrics-daemon-sggzf/network-metrics-daemon/network-metrics-daemon/logs/current.log:2021-06-22T02:36:46.668861519Z E0622 02:36:46.668800 1 reflector.go:178] github.com/openshift/network-metrics-daemon/main.go:82: Failed to list *v1.Pod: Unauthorized namespaces/openshift-multus/pods/network-metrics-daemon-sggzf/network-metrics-daemon/network-metrics-daemon/logs/current.log:2021-06-22T02:36:50.081617507Z E0622 02:36:50.081576 1 reflector.go:178] github.com/openshift/network-metrics-daemon/main.go:82: Failed to list *v1.Pod: Unauthorized amespaces/openshift-kube-controller-manager/pods/kube-controller-manager-ip-10-0-188-234.us-east-2.compute.internal/cluster-policy-controller/cluster-policy-controller/logs/current.log:2021-06-22T02:37:05.108751903Z E0622 02:37:05.108706 1 reflector.go:127] k8s.io/client-go.0/tools/cache/reflector.go:156: Failed to watch *v1.ImageStream: failed to list *v1.ImageStream: an error on the server ("Internal Server Error: \"/apis/image.openshift.io/v1/imagestreams?limit=500&resourceVersion=0\": Unauthorized") has prevented the request from succeeding (get imagestreams.image.openshift.io) namespaces/openshift-kube-controller-manager/pods/kube-controller-manager-ip-10-0-188-234.us-east-2.compute.internal/cluster-policy-controller/cluster-policy-controller/logs/current.log:2021-06-22T02:37:12.249821119Z E0622 02:37:12.249746 1 reconciliation_controller.go:166] unable to retrieve the complete list of server APIs: apps.openshift.io/v1: an error on the server ("Internal Server Error: \"/apis/apps.openshift.io/v1?timeout=32s\": Unauthorized") has prevented the request from succeeding, authorization.openshift.io/v1: an error on the server ("Internal Server Error: \"/apis/authorization.openshift.io/v1?timeout=32s\": Unauthorized") has prevented the request from succeeding, build.openshift.io/v1: an error on the server ("Internal Server Error: \"/apis/build.openshift.io/v1?timeout=32s\": Unauthorized") has prevented the request from succeeding, image.openshift.io/v1: an error on the server ("Internal Server Error: \"/apis/image.openshift.io/v1?timeout=32s\": Unauthorized") has prevented the request from succeeding, oauth.openshift.io/v1: an error on the server ("Internal Server Error: \"/apis/oauth.openshift.io/v1?timeout=32s\": Unauthorized") has prevented the request from succeeding, packages.operators.coreos.com/v1: an error on the server ("Internal Server Error: \"/apis/packages.operators.coreos.com/v1?timeout=32s\": Unauthorized") has prevented the request from succeeding, project.openshift.io/v1: an error on the server ("Internal Server Error: \"/apis/project.openshift.io/v1?timeout=32s\": Unauthorized") has prevented the request from succeeding, quota.openshift.io/v1: an error on the server ("Internal Server Error: \"/apis/quota.openshift.io/v1?timeout=32s\": Unauthorized") has prevented the request from succeeding, route.openshift.io/v1: an error on the server ("Internal Server Error: \"/apis/route.openshift.io/v1?timeout=32s\": Unauthorized") has prevented the request from succeeding, security.openshift.io/v1: an error on the server ("Internal Server Error: \"/apis/security.openshift.io/v1?timeout=32s\": Unauthorized") has prevented the request from succeeding, template.openshift.io/v1: an error on the server ("Internal Server Error: \"/apis/template.openshift.io/v1?timeout=32s\": Unauthorized") has prevented the request from succeeding, user.openshift.io/v1: an error on the server ("Internal Server Error: \"/apis/user.openshift.io/v1?timeout=32s\": Unauthorized") has prevented the request from succeeding namespaces/openshift-kube-controller-manager/pods/kube-controller-manager-ip-10-0-188-234.us-east-2.compute.internal/cluster-policy-controller/cluster-policy-controller/logs/current.log:2021-06-22T02:37:29.768725217Z E0622 02:37:29.768688 1 reflector.go:127] k8s.io/client-go.0/tools/cache/reflector.go:156: Failed to watch *v1.ImageStream: failed to list *v1.ImageStream: an error on the server ("Internal Server Error: \"/apis/image.openshift.io/v1/imagestreams?limit=500&resourceVersion=0\": Unauthorized") has prevented the request from succeeding (get imagestreams.image.openshift.io)
** WARNING ** This BZ claims that this bug is of urgent severity and priority. Note that urgent priority means that you just declared emergency within engineering. Engineers are asked to stop whatever they are doing, including putting important release work on hold, potentially risking the release of OCP while working on this case. Be prepared to have a good justification ready and your own and engineering management are aware and has approved this. Urgent bugs are very expensive and have maximal management visibility. NOTE: This bug was assigned to engineering manager with severity reset to *unspecified* until the emergency is vetted and confirmed. Please do not manually override the severity.
root cause analysis revealed that we were missing to set api-audiences in the api server configuration. The necessary change was done quite some time ago https://github.com/openshift/cluster-kube-apiserver-operator/pull/1050 but the changes have not been effective as the settings have been pruned away. This is fixed in https://github.com/openshift/cluster-kube-apiserver-operator/pull/1158.
I was able to repro this just by providing my own Authentication CR into the installer manifests (no need to install the cluster with full STS configuration). ./openshift-install create manifests cp /path/to/custom/cluster-authentication-02-config.yaml ./manifests/ # the name of the file matters ./openshift-install create cluster the file contents look like: [jdiaz@minigoomba os-install-4.8-nightly]$ cat cco/manifests/cluster-authentication-02-config.yaml apiVersion: config.openshift.io/v1 kind: Authentication metadata: name: cluster spec: serviceAccountIssuer: https://jdiaz-a1-oidc.s3.us-east-1.amazonaws.com You don't really need to set up an S3 bucket or anything like that. You should be able to literally use the above file contents.
yes, simply setting service account issuer will provoke the issuer because api-audiences keep being unset which fails validating bound SA tokens as they include audiences now.
We built a cluster using cluster bot with the fix. Cluster is installed but kube-api is degraded. When opening a support case, bugzilla, or issue please include the following summary data along with any other requested information. ClusterID: a73c5589-0de8-46a3-8ae0-2f7ee293f933 ClusterVersion: Stable at "4.8.0-0.ci.test-2021-06-22-144200-ci-ln-t7q8sc2-latest" ClusterOperators: clusteroperator/kube-apiserver is degraded because InstallerPodContainerWaitingDegraded: Pod "installer-5-ci-ln-t7q8sc2-f76d1-2vw48-master-1" on node "ci-ln-t7q8sc2-f76d1-2vw48-master-1" container "installer" is waiting since 2021-06-22 15:00:36 +0000 UTC because ContainerCreating
as discussed on slack the failures seen in https://bugzilla.redhat.com/show_bug.cgi?id=1974716#c7 are not related to this bug.
Verified on 4.8.0-0.nightly-2021-06-23-232238 1. provide my own Authentication CR into cluster manifests $ oc get Authentication cluster -o json | jq -r ".spec" { "oauthMetadata": { "name": "" }, "serviceAccountIssuer": "https://a-lwansts-480-021932120336748-oidc.s3.us-east-2.amazonaws.com", "type": "", "webhookTokenAuthenticator": { "kubeConfig": { "name": "webhook-authentication-integrated-oauth" } } } 2. launch an install 3. there is no longer Unauthorized keywords
sorry, the above info is for 4.8 verify, move this one to ON_QA firstly, and will verify it with 4.9 payload.
Tested with cluster 4.9.0-0.nightly-2021-06-23-160041 1. launch a 4.9 sts cluster and check the result, expected successful cluster is available 2. check the must-gather logs, no 'Unauthorized' keywords is found, which is the expected 3. monitor the cluster status and check the result, no exception found
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: OpenShift Container Platform 4.9.0 bug fix and security update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2021:3759