Bug 1974716 - Using bound SA tokens causes fail to query cluster resource especially in a sts cluster
Summary: Using bound SA tokens causes fail to query cluster resource especially in a s...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: apiserver-auth
Version: 4.8
Hardware: Unspecified
OS: Unspecified
urgent
urgent
Target Milestone: ---
: 4.9.0
Assignee: Standa Laznicka
QA Contact: liyao
URL:
Whiteboard: EmergencyRequest
Depends On:
Blocks: 1974773 1974788
TreeView+ depends on / blocked
 
Reported: 2021-06-22 11:30 UTC by wang lin
Modified: 2021-10-18 17:36 UTC (History)
14 users (show)

Fixed In Version:
Doc Type: No Doc Update
Doc Text:
Clone Of:
: 1974773 1974788 (view as bug list)
Environment:
Last Closed: 2021-10-18 17:35:57 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
must-gather (11.38 MB, application/gzip)
2021-06-22 11:30 UTC, wang lin
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Github openshift cluster-kube-apiserver-operator pull 1158 0 None open Bug 1974716: SA token issuer observer: fix observing api-audiences 2021-06-22 14:06:52 UTC
Red Hat Product Errata RHSA-2021:3759 0 None None None 2021-10-18 17:36:22 UTC

Description wang lin 2021-06-22 11:30:36 UTC
Created attachment 1792992 [details]
must-gather

Created attachment 1792992 [details]
must-gather

Description of problem:
Because of recently SA token mechanism change, when install an sts cluster, almost all components hit the Unauthorized issue when query cluster resources, so the installation will fail. the sts cluster install different from normal install is that it will modify authentications.config.openshift.io "cluster" objects spec.serviceAccountIssuer to s3 bucket URL and inject a bound-service-account-signing-key before install.

#authentications cr
oc get authentications cluster -o json | jq -r ".spec"
{
  "oauthMetadata": {
    "name": ""
  },
  "serviceAccountIssuer": "https://a-lwansts49-0622-021958659976987-oidc.s3.us-east-2.amazonaws.com",
  "type": "",
  "webhookTokenAuthenticator": {
    "kubeConfig": {
      "name": "webhook-authentication-integrated-oauth"
    }
  }
}


Version-Release number of selected component (if applicable):
4.9.0-0.nightly-2021-06-21-191858
4.8.0-0.nightly-2021-06-21-175537

How reproducible:
always

Steps to Reproduce:
1. install an sts cluster , refer to doc https://docs.openshift.com/container-platform/4.8/authentication/managing_cloud_provider_credentials/cco-mode-sts.html#sts-mode-installing


Actual results:
installation fail, many components hit Unauthorized issue


Expected results:
the installation should success.

Additional info:
list several error messages:
##
namespaces/openshift-apiserver-operator/pods/openshift-apiserver-operator-7f4c8855f4-v7x57/openshift-apiserver-operator/openshift-apiserver-operator/logs/previous.log:2021-06-22T03:18:17.951577177Z W0622 03:18:17.951539       1 builder.go:209] unable to get owner reference (falling back to namespace): Unauthorized

namespaces/openshift-cluster-node-tuning-operator/pods/tuned-tb9vc/tuned/tuned/logs/current.log:2021-06-22T02:36:47.644895574Z E0622 02:36:47.644796    7288 reflector.go:138] github.com/openshift/cluster-node-tuning-operator/pkg/generated/informers/externalversions/factory.go:101: Failed to watch *v1.Profile: failed to list *v1.Profile: Unauthorized


namespaces/openshift-oauth-apiserver/pods/apiserver-c9687d7b8-w7t62/oauth-apiserver/oauth-apiserver/logs/current.log:2021-06-22T03:04:57.275692572Z E0622 03:04:57.275642       1 webhook.go:202] Failed to make webhook authorizer request: Unauthorized
namespaces/openshift-oauth-apiserver/pods/apiserver-c9687d7b8-w7t62/oauth-apiserver/oauth-apiserver/logs/current.log:2021-06-22T03:04:59.059527999Z E0622 03:04:59.059480       1 reflector.go:138] k8s.io/client-go.0/tools/cache/reflector.go:167: Failed to watch *v1.ConfigMap: failed to list *v1.ConfigMap: Unauthorized

namespaces/openshift-config-operator/pods/openshift-config-operator-57c7c966d9-kgr8j/openshift-config-operator/openshift-config-operator/logs/previous.log:2021-06-22T03:16:24.630521031Z F0622 03:16:24.630484       1 cmd.go:129] unable to load configmap based request-header-client-ca-file: Unauthorized
namespaces/openshift-kube-controller-manager-operator/pods/kube-controller-manager-operator-d9b645cb8-k828k/kube-controller-manager-operator/kube-controller-manager-operator/logs/previous.log:2021-06-22T03:18:47.434633867Z W0622 03:18:47.434601       1 builder.go:209] unable to get owner reference (falling back to namespace): Unauthorized
namespaces/openshift-kube-controller-manager-operator/pods/kube-controller-manager-operator-d9b645cb8-k828k/kube-controller-manager-operator/kube-controller-manager-operator/logs/previous.log:2021-06-22T03:19:17.764932006Z F0622 03:19:17.764878       1 cmd.go:129] unable to load configmap based request-header-client-ca-file: Unauthorized


namespaces/openshift-machine-config-operator/pods/machine-config-daemon-w7s8h/machine-config-daemon/machine-config-daemon/logs/current.log:2021-06-22T02:53:40.827737185Z E0622 02:53:40.827703    6350 reflector.go:138] k8s.io/client-go/informers/factory.go:134: Failed to watch *v1.Node: failed to list *v1.Node: Unauthorized

namespaces/openshift-kube-apiserver-operator/pods/kube-apiserver-operator-684684c77b-stn2t/kube-apiserver-operator/kube-apiserver-operator/logs/previous.log:2021-06-22T03:18:36.426764094Z W0622 03:18:36.426727       1 builder.go:209] unable to get owner reference (falling back to namespace): Unauthorized
namespaces/openshift-kube-apiserver-operator/pods/kube-apiserver-operator-684684c77b-stn2t/kube-apiserver-operator/kube-apiserver-operator/logs/previous.log:2021-06-22T03:19:06.936142534Z F0622 03:19:06.936091       1 cmd.go:129] unable to load configmap based request-header-client-ca-file: Unauthorized


namespaces/openshift-dns-operator/pods/dns-operator-5968756979-rqxk2/dns-operator/dns-operator/logs/current.log:2021-06-22T02:36:44.955384531Z E0622 02:36:44.952537       1 reflector.go:138] sigs.k8s.io/controller-runtime/pkg/cache/internal/informers_map.go:229: Failed to watch *v1.DNS: failed to list *v1.DNS: Unauthorized

namespaces/openshift-controller-manager/pods/controller-manager-s9mqg/controller-manager/controller-manager/logs/current.log:2021-06-22T02:37:07.480859735Z E0622 02:37:07.480826       1 leaderelection.go:325] error retrieving resource lock openshift-controller-manager/openshift-master-controllers: Unauthorized
amespaces/openshift-controller-manager/pods/controller-manager-s9mqg/controller-manager/controller-manager/logs/previous.log:2021-06-22T02:36:43.626178769Z W0622 02:36:43.625993       1 client_builder_dynamic.go:197] get or create service account failed: Unauthorized
namespaces/openshift-controller-manager/pods/controller-manager-s9mqg/controller-manager/controller-manager/logs/previous.log:2021-06-22T02:36:43.626178769Z W0622 02:36:43.626000       1 client_builder_dynamic.go:197] get or create service account failed: Unauthorized

namespaces/openshift-kube-storage-version-migrator/pods/migrator-667f984cd-l56qq/migrator/migrator/logs/current.log:2021-06-22T02:36:44.891897026Z E0622 02:36:44.891855       1 reflector.go:138] k8s.io/client-go.0/tools/cache/reflector.go:167: Failed to watch *v1alpha1.StorageVersionMigration: failed to list *v1alpha1.StorageVersionMigration: Unauthorized

namespaces/openshift-etcd-operator/pods/etcd-operator-f8b48cb5b-phjsc/etcd-operator/etcd-operator/logs/previous.log:2021-06-22T03:18:47.226949138Z F0622 03:18:47.226900       1 cmd.go:129] unable to load configmap based request-header-client-ca-file: Unauthorized

namespaces/openshift-sdn/pods/sdn-controller-bcgqd/sdn-controller/sdn-controller/logs/current.log:2021-06-22T02:36:44.294778738Z E0622 02:36:44.294736       1 leaderelection.go:325] error retrieving resource lock openshift-sdn/openshift-network-controller: Unauthorized

namespaces/openshift-monitoring/pods/cluster-monitoring-operator-97cb5bfb9-2trcq/cluster-monitoring-operator/cluster-monitoring-operator/logs/current.log:2021-06-22T02:36:45.495872596Z E0622 02:36:45.495829       1 reflector.go:138] github.com/openshift/cluster-monitoring-operator/pkg/operator/operator.go:292: Failed to watch *v1.Infrastructure: failed to list *v1.Infrastructure: Unauthorized

namespaces/openshift-kube-apiserver/pods/kube-apiserver-ip-10-0-208-125.us-east-2.compute.internal/kube-apiserver/kube-apiserver/logs/current.log:2021-06-22T02:37:23.516768180Z E0622 02:37:23.516713      20 controller.go:116] loading OpenAPI spec for "v1.packages.operators.coreos.com" failed with: failed to retrieve openAPI spec, http error: ResponseCode: 500, Body: Internal Server Error: "/openapi/v2": Unauthorized

namespaces/openshift-kube-scheduler/pods/installer-4-ip-10-0-157-106.us-east-2.compute.internal/installer-4-ip-10-0-157-106.us-east-2.compute.internal.yaml:          W0622 02:35:41.683994       1 recorder.go:198] Error creating event &Event{ObjectMeta:{openshift-kube-scheduler.168ac6ee2130ec5a  openshift-kube-scheduler    0 0001-01-01 00:00:00 +0000 UTC <nil> <nil> map[] map[] [] []  []},InvolvedObject:ObjectReference{Kind:Namespace,Namespace:openshift-kube-scheduler,Name:openshift-kube-scheduler,UID:,APIVersion:v1,ResourceVersion:,FieldPath:,},Reason:StaticPodInstallerFailed,Message:Installing revision 4: Unauthorized,Source:EventSource{Component:static-pod-installer,Host:,},FirstTimestamp:2021-06-22 02:35:41.681855578 +0000 UTC m=+0.419368009,LastTimestamp:2021-06-22 02:35:41.681855578 +0000 UTC m=+0.419368009,Count:1,Type:Warning,EventTime:0001-01-01 00:00:00 +0000 UTC,Series:nil,Action:,Related:nil,ReportingController:,ReportingInstance:,}: Unauthorized


amespaces/openshift-operator-lifecycle-manager/pods/catalog-operator-75466b47d8-xbknh/catalog-operator/catalog-operator/logs/current.log:2021-06-22T02:36:43.971404761Z E0622 02:36:43.971385       1 queueinformer_operator.go:290] sync {"update" "openshift-marketplace/redhat-marketplace"} failed: couldn't ensure registry server - error ensuring service account: redhat-marketplace: Unauthorized
namespaces/openshift-operator-lifecycle-manager/pods/catalog-operator-75466b47d8-xbknh/catalog-operator/catalog-operator/logs/current.log:2021-06-22T02:36:44.063298556Z time="2021-06-22T02:36:44Z" level=error msg="error getting catalogsource - Unauthorized" id=pANKp source=redhat-marketplace

namespaces/openshift-kube-storage-version-migrator-operator/pods/kube-storage-version-migrator-operator-869b7c9756-2rtw4/kube-storage-version-migrator-operator/kube-storage-version-migrator-operator/logs/previous.log:2021-06-22T03:18:34.430324747Z W0622 03:18:34.430292       1 builder.go:209] unable to get owner reference (falling back to namespace): Unauthorized

namespaces/openshift-cluster-machine-approver/pods/machine-approver-6b8dc75767-x2mck/machine-approver-controller/machine-approver-controller/logs/current.log:2021-06-22T02:36:44.581337132Z E0622 02:36:44.581297       1 reflector.go:138] sigs.k8s.io/controller-runtime/pkg/cache/internal/informers_map.go:241: Failed to watch *v1.CertificateSigningRequest: failed to list *v1.CertificateSigningRequest: Unauthorized

namespaces/openshift-cluster-csi-drivers/pods/aws-ebs-csi-driver-operator-775cdf957d-k8p4z/aws-ebs-csi-driver-operator/aws-ebs-csi-driver-operator/logs/previous.log:2021-06-22T03:19:34.297851008Z F0622 03:19:34.297786       1 cmd.go:129] unable to load configmap based request-header-client-ca-file: Unauthorized
namespaces/openshift-cluster-csi-drivers/pods/aws-ebs-csi-driver-controller-7f864cf74-zhxhg/csi-resizer/csi-resizer/logs/current.log:2021-06-22T02:36:53.305533645Z E0622 02:36:53.305492       1 leaderelection.go:325] error retrieving resource lock openshift-cluster-csi-drivers/external-resizer-ebs-csi-aws-com: Unauthorized

namespaces/openshift-multus/pods/network-metrics-daemon-sggzf/network-metrics-daemon/network-metrics-daemon/logs/current.log:2021-06-22T02:36:43.967856968Z E0622 02:36:43.967216       1 reflector.go:178] github.com/openshift/network-metrics-daemon/main.go:82: Failed to list *v1.Pod: Unauthorized
namespaces/openshift-multus/pods/network-metrics-daemon-sggzf/network-metrics-daemon/network-metrics-daemon/logs/current.log:2021-06-22T02:36:46.668861519Z E0622 02:36:46.668800       1 reflector.go:178] github.com/openshift/network-metrics-daemon/main.go:82: Failed to list *v1.Pod: Unauthorized
namespaces/openshift-multus/pods/network-metrics-daemon-sggzf/network-metrics-daemon/network-metrics-daemon/logs/current.log:2021-06-22T02:36:50.081617507Z E0622 02:36:50.081576       1 reflector.go:178] github.com/openshift/network-metrics-daemon/main.go:82: Failed to list *v1.Pod: Unauthorized

amespaces/openshift-kube-controller-manager/pods/kube-controller-manager-ip-10-0-188-234.us-east-2.compute.internal/cluster-policy-controller/cluster-policy-controller/logs/current.log:2021-06-22T02:37:05.108751903Z E0622 02:37:05.108706       1 reflector.go:127] k8s.io/client-go.0/tools/cache/reflector.go:156: Failed to watch *v1.ImageStream: failed to list *v1.ImageStream: an error on the server ("Internal Server Error: \"/apis/image.openshift.io/v1/imagestreams?limit=500&amp;resourceVersion=0\": Unauthorized") has prevented the request from succeeding (get imagestreams.image.openshift.io)
namespaces/openshift-kube-controller-manager/pods/kube-controller-manager-ip-10-0-188-234.us-east-2.compute.internal/cluster-policy-controller/cluster-policy-controller/logs/current.log:2021-06-22T02:37:12.249821119Z E0622 02:37:12.249746       1 reconciliation_controller.go:166] unable to retrieve the complete list of server APIs: apps.openshift.io/v1: an error on the server ("Internal Server Error: \"/apis/apps.openshift.io/v1?timeout=32s\": Unauthorized") has prevented the request from succeeding, authorization.openshift.io/v1: an error on the server ("Internal Server Error: \"/apis/authorization.openshift.io/v1?timeout=32s\": Unauthorized") has prevented the request from succeeding, build.openshift.io/v1: an error on the server ("Internal Server Error: \"/apis/build.openshift.io/v1?timeout=32s\": Unauthorized") has prevented the request from succeeding, image.openshift.io/v1: an error on the server ("Internal Server Error: \"/apis/image.openshift.io/v1?timeout=32s\": Unauthorized") has prevented the request from succeeding, oauth.openshift.io/v1: an error on the server ("Internal Server Error: \"/apis/oauth.openshift.io/v1?timeout=32s\": Unauthorized") has prevented the request from succeeding, packages.operators.coreos.com/v1: an error on the server ("Internal Server Error: \"/apis/packages.operators.coreos.com/v1?timeout=32s\": Unauthorized") has prevented the request from succeeding, project.openshift.io/v1: an error on the server ("Internal Server Error: \"/apis/project.openshift.io/v1?timeout=32s\": Unauthorized") has prevented the request from succeeding, quota.openshift.io/v1: an error on the server ("Internal Server Error: \"/apis/quota.openshift.io/v1?timeout=32s\": Unauthorized") has prevented the request from succeeding, route.openshift.io/v1: an error on the server ("Internal Server Error: \"/apis/route.openshift.io/v1?timeout=32s\": Unauthorized") has prevented the request from succeeding, security.openshift.io/v1: an error on the server ("Internal Server Error: \"/apis/security.openshift.io/v1?timeout=32s\": Unauthorized") has prevented the request from succeeding, template.openshift.io/v1: an error on the server ("Internal Server Error: \"/apis/template.openshift.io/v1?timeout=32s\": Unauthorized") has prevented the request from succeeding, user.openshift.io/v1: an error on the server ("Internal Server Error: \"/apis/user.openshift.io/v1?timeout=32s\": Unauthorized") has prevented the request from succeeding
namespaces/openshift-kube-controller-manager/pods/kube-controller-manager-ip-10-0-188-234.us-east-2.compute.internal/cluster-policy-controller/cluster-policy-controller/logs/current.log:2021-06-22T02:37:29.768725217Z E0622 02:37:29.768688       1 reflector.go:127] k8s.io/client-go.0/tools/cache/reflector.go:156: Failed to watch *v1.ImageStream: failed to list *v1.ImageStream: an error on the server ("Internal Server Error: \"/apis/image.openshift.io/v1/imagestreams?limit=500&amp;resourceVersion=0\": Unauthorized") has prevented the request from succeeding (get imagestreams.image.openshift.io)

Comment 1 Michal Fojtik 2021-06-22 11:50:55 UTC
** WARNING **

This BZ claims that this bug is of urgent severity and priority. Note that urgent priority means that you just declared emergency within engineering. 
Engineers are asked to stop whatever they are doing, including putting important release work on hold, potentially risking the release of OCP while working on this case.

Be prepared to have a good justification ready and your own and engineering management are aware and has approved this. Urgent bugs are very expensive and have maximal management visibility.

NOTE: This bug was assigned to engineering manager with severity reset to *unspecified* until the emergency is vetted and confirmed. Please do not manually override the severity.

Comment 4 Sergiusz Urbaniak 2021-06-22 14:13:03 UTC
root cause analysis revealed that we were missing to set api-audiences in the api server configuration. The necessary change was done quite some time ago https://github.com/openshift/cluster-kube-apiserver-operator/pull/1050 but the changes have not been effective as the settings have been pruned away. This is fixed in https://github.com/openshift/cluster-kube-apiserver-operator/pull/1158.

Comment 5 Joel Diaz 2021-06-22 14:38:44 UTC
I was able to repro this just by providing my own Authentication CR into the installer manifests (no need to install the cluster with full STS configuration).

./openshift-install create manifests
cp /path/to/custom/cluster-authentication-02-config.yaml ./manifests/  # the name of the file matters
./openshift-install create cluster

the file contents look like:
[jdiaz@minigoomba os-install-4.8-nightly]$ cat cco/manifests/cluster-authentication-02-config.yaml
apiVersion: config.openshift.io/v1
kind: Authentication
metadata:
  name: cluster
spec:
  serviceAccountIssuer: https://jdiaz-a1-oidc.s3.us-east-1.amazonaws.com

You don't really need to set up an S3 bucket or anything like that. You should be able to literally use the above file contents.

Comment 6 Sergiusz Urbaniak 2021-06-22 14:51:12 UTC
yes, simply setting service account issuer will provoke the issuer because api-audiences keep being unset which fails validating bound SA tokens as they include audiences now.

Comment 7 To Hung Sze 2021-06-22 17:50:40 UTC
We built a cluster using cluster bot with the fix.
Cluster is installed but kube-api is degraded.

When opening a support case, bugzilla, or issue please include the following summary data along with any other requested information.
ClusterID: a73c5589-0de8-46a3-8ae0-2f7ee293f933
ClusterVersion: Stable at "4.8.0-0.ci.test-2021-06-22-144200-ci-ln-t7q8sc2-latest"
ClusterOperators:
	clusteroperator/kube-apiserver is degraded because InstallerPodContainerWaitingDegraded: Pod "installer-5-ci-ln-t7q8sc2-f76d1-2vw48-master-1" on node "ci-ln-t7q8sc2-f76d1-2vw48-master-1" container "installer" is waiting since 2021-06-22 15:00:36 +0000 UTC because ContainerCreating

Comment 9 Sergiusz Urbaniak 2021-06-23 05:40:56 UTC
as discussed on slack the failures seen in https://bugzilla.redhat.com/show_bug.cgi?id=1974716#c7 are not related to this bug.

Comment 11 wang lin 2021-06-24 03:54:38 UTC
Verified on 4.8.0-0.nightly-2021-06-23-232238

1. provide my own Authentication CR into cluster manifests
$ oc get Authentication cluster -o json | jq -r ".spec"
{
  "oauthMetadata": {
    "name": ""
  },
  "serviceAccountIssuer": "https://a-lwansts-480-021932120336748-oidc.s3.us-east-2.amazonaws.com",
  "type": "",
  "webhookTokenAuthenticator": {
    "kubeConfig": {
      "name": "webhook-authentication-integrated-oauth"
    }
  }
}
2. launch an install
3. there is no longer Unauthorized keywords

Comment 12 wang lin 2021-06-24 05:43:41 UTC
sorry, the above info is for 4.8 verify, move this one to ON_QA firstly, and will verify it with 4.9 payload.

Comment 13 liyao 2021-06-24 08:57:09 UTC
Tested with cluster 4.9.0-0.nightly-2021-06-23-160041

1. launch a 4.9 sts cluster and check the result, expected successful cluster is available
2. check the must-gather logs, no 'Unauthorized' keywords is found, which is the expected
3. monitor the cluster status and check the result, no exception found

Comment 16 errata-xmlrpc 2021-10-18 17:35:57 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Container Platform 4.9.0 bug fix and security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2021:3759


Note You need to log in before you can comment on or make changes to this bug.