Bug 1832825
Summary: | "cannot create resource subjectaccessreviews/tokenreviews at the cluster scope" error info in node-exporter pod's kube-rbac-proxy container | ||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Product: | OpenShift Container Platform | Reporter: | Junqi Zhao <juzhao> | ||||||||||
Component: | Monitoring | Assignee: | Simon Pasquier <spasquie> | ||||||||||
Status: | CLOSED DUPLICATE | QA Contact: | Junqi Zhao <juzhao> | ||||||||||
Severity: | medium | Docs Contact: | |||||||||||
Priority: | low | ||||||||||||
Version: | 4.5 | CC: | adeshpan, alegrand, anpicker, arghosh, christopher.obrien, cmarches, erooth, kakkoyun, lcosic, mloibl, naoto30, naygupta, pkrupa, spasquie, surbania | ||||||||||
Target Milestone: | --- | Keywords: | Regression, Reopened | ||||||||||
Target Release: | 4.7.0 | ||||||||||||
Hardware: | Unspecified | ||||||||||||
OS: | Unspecified | ||||||||||||
Whiteboard: | |||||||||||||
Fixed In Version: | Doc Type: | If docs needed, set a value | |||||||||||
Doc Text: | Story Points: | --- | |||||||||||
Clone Of: | Environment: | ||||||||||||
Last Closed: | 2020-10-01 05:50:10 UTC | Type: | Bug | ||||||||||
Regression: | --- | Mount Type: | --- | ||||||||||
Documentation: | --- | CRM: | |||||||||||
Verified Versions: | Category: | --- | |||||||||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||||||
Embargoed: | |||||||||||||
Attachments: |
|
Description
Junqi Zhao
2020-05-07 10:35:04 UTC
4.5.0-0.nightly-2020-05-06-003431 node_exporter v0.18.1 reproduced with 4.5.0-0.nightly-2020-05-19-041951 # oc -n openshift-monitoring logs node-exporter-c2rk7 -c kube-rbac-proxy I0519 23:32:01.233030 32217 main.go:186] Valid token audiences: I0519 23:32:01.233133 32217 main.go:248] Reading certificate files I0519 23:32:01.233367 32217 main.go:281] Starting TCP socket on [10.0.190.192]:9100 I0519 23:32:01.233547 32217 main.go:288] Listening securely on [10.0.190.192]:9100 E0520 04:44:49.266432 32217 webhook.go:109] Failed to make webhook authenticator request: tokenreviews.authentication.k8s.io is forbidden: User "system:serviceaccount:openshift-monitoring:node-exporter" cannot create resource "tokenreviews" in API group "authentication.k8s.io" at the cluster scope E0520 04:44:49.266537 32217 proxy.go:73] Unable to authenticate the request due to an error: tokenreviews.authentication.k8s.io is forbidden: User "system:serviceaccount:openshift-monitoring:node-exporter" cannot create resource "tokenreviews" in API group "authentication.k8s.io" at the cluster scope E0520 04:59:49.267742 32217 webhook.go:109] Failed to make webhook authenticator request: tokenreviews.authentication.k8s.io is forbidden: User "system:serviceaccount:openshift-monitoring:node-exporter" cannot create resource "tokenreviews" in API group "authentication.k8s.io" at the cluster scope E0520 04:59:49.267772 32217 proxy.go:73] Unable to authenticate the request due to an error: tokenreviews.authentication.k8s.io is forbidden: User "system:serviceaccount:openshift-monitoring:node-exporter" cannot create resource "tokenreviews" in API group "authentication.k8s.io" at the cluster scope # oc get tokenreviews -A Error from server (MethodNotAllowed): the server does not allow this method on the requested resource # oc get subjectaccessreviews -A Error from server (MethodNotAllowed): the server does not allow this method on the requested resource Created attachment 1690279 [details]
kube-apiserver logs
Created attachment 1690280 [details]
sum by(code) (rate(apiserver_request_total{resource="tokenreviews",version="v1"}[5m]))
Created attachment 1690281 [details]
up{job="node-exporter"}
Created attachment 1690285 [details]
openshift-state-metrics logs
Created attachment 1690286 [details]
kube-state-metrics logs
I've investigated more deeply this issue. Note that the same class of failures had already been reported for the other monitoring components using oauth-proxy and/or kube-rbac-proxy: https://bugzilla.redhat.com/show_bug.cgi?id=1832830 https://bugzilla.redhat.com/show_bug.cgi?id=1836836 Prometheus scrapes node-exporter through kube-rbac-proxy. When this happens, kube-rbac-proxy sends a POST request to /apis/authentication.k8s.io/v1/tokenreviews to validate the bearer token sent by Prometheus. Once in a while, the API server replies with "403 Forbidden" because it evaluates that the node-exporter's service account can't create tokenreviews (e.g. it's not the token presented by Prometheus that is rejected). Before and after this event, the server will happily authorize the same token for tokenreviews. As a result, Prometheus considers that node-exporter is down for this scrape event (see the dips in attachment 1690281 [details]). I've enabled the TraceAll log level in the API server (oc patch kubeapiservers/cluster --type=json -p '[{"op": "replace", "path": "/spec/logLevel", "value": "TraceAll" }]'). I can see the kube-rbac-proxy request being rejected (search for "RBAC DENY" in attachment 1690279 [details]) but there's no explanation why. I've seen similar failures for prometheus-adapter's kube-rbac-proxy and oauth-proxy's alertmanager but also for prometheus-operator, kube-state-metrics and openshift-metrics when they query the Kubernetes API (see attachment 1690284 [details], attachment 1690285 [details], attachment 1690286 [details]). Looking at the audit logs, only the monitoring components experience those random failures (not sure if it's because they rely heavily on kube-rbac-proxy/auth-proxy or because of something else). Finally the 'sum by(code) (rate(apiserver_request_total{resource="tokenreviews",version="v1"}[5m]))' query returns data only for status code 201, not 403 (see attachment 1690280 [details]). *** Bug 1832830 has been marked as a duplicate of this bug. *** *** Bug 1836836 has been marked as a duplicate of this bug. *** *** Bug 1836087 has been marked as a duplicate of this bug. *** Note that this also happens for the prometheus-adapter pod (https://bugzilla.redhat.com/show_bug.cgi?id=1836087) Hitting this in 4.5, with prometheus-adapter: Warning FailedGetResourceMetric 110m horizontal-pod-autoscaler unable to get metrics for resource cpu: unable to fetch metrics from resource metrics API: an error on the server ("Internal Server Error: \"/apis/metrics.k8s.io/v1beta1/namespaces/namespace-name/pods?labelSelector=app%3Dapp-cem%2example%3D\": subjectaccessreviews.authorization.k8s.io is forbidden: User \"system:serviceaccount:openshift-monitoring:prometheus-adapter\" cannot create resource \"subjectaccessreviews\" in API group \"authorization.k8s.io\" at the cluster scope") has prevented the request from succeeding (get pods.metrics.k8s.io) Since it at first appears to be a permissions error we tried to add a view clusterrole to the mentioned service account: $ oc adm policy add-cluster-role-to-user view system:serviceaccount:openshift-monitoring:prometheus-adapter It didn't work - this is preventing most (but not all) HPA objects from scraping metrics. Is there a workaround to the issue described in c#12? closing out as duplicate of https://bugzilla.redhat.com/show_bug.cgi?id=1863011, the symptoms here are the same, namely RBAC issues despite having the correct assets deployed by cluster-monitoring-operator. *** This bug has been marked as a duplicate of bug 1863011 *** |