Description of problem: https://github.com/openshift/cluster-monitoring-operator/pull/1282 introduced a possibility for the metrics scraper to authenticate with a certificate and therefore omit a single TokenReview call to the kube-apiserver (which happens usually once every 30s per scraped component). The core components and operators should use this capability to lower the API server load and to make it possible to scrape the metrics even when the kube-API is down (only if the contacted component is using static authorization for their /metrics endpoint, though).
oc get clusterversion NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.9.0-0.nightly-2021-08-07-175228 True False 2m10s Cluster version is 4.9.0-0.nightly-2021-08-07-175228 Checked metric client certificate oc get secret -n openshift-monitoring metrics-client-certs Opaque 2 22m oc get car system:openshift:openshift-monitoring-gnqcs 30s kubernetes.io/kube-apiserver-client system:serviceaccount:openshift-monitoring:cluster-monitoring-operator Approved,Issued Check metric client certificate again to check new cert oc get secret -n openshift-monitoring metrics-client-certs Opaque 2 2m30s Gather prometheus metrics by using curl cert for below operators: openshift-apiserver-operator openshift-kube-apiserver-operator openshift-kube-controller-manager-operator openshift-kube-storage-version-migrator-operator For e.g. oc rsh -n openshift-apiserver-operator openshift-apiserver-operator-7f7cd7d86c-5bm49 curl -k --key /tmp/tls.key --cert /tmp/tls.crt https://localhost:8443/metrics > /tmp/metrics.txt The curl commands succeed, and checked /tmp/metrics.txt files is not empty content. Checked Openssl and checked the user of cert in the CN, it is prometheus-k8s. openssl x509 -in tls.crt -noout -text|grep CN Issuer: CN=kube-csr-signer_@1628567334 Subject: CN=system:serviceaccount:openshift-monitoring:prometheus-k8s oc get pod -n openshift-kube-apiserver -l apiserver --show-labels NAME READY STATUS RESTARTS AGE LABELS kube-apiserver-ci-ln-qvmriyb-f76d1-dt7gb-master-0 5/5 Running 0 25m apiserver=true,app=openshift-kube-apiserver,revision=5 kube-apiserver-ci-ln-qvmriyb-f76d1-dt7gb-master-1 5/5 Running 0 32m apiserver=true,app=openshift-kube-apiserver,revision=5 kube-apiserver-ci-ln-qvmriyb-f76d1-dt7gb-master-2 5/5 Running 0 29m apiserver=true,app=openshift-kube-apiserver,revision=5 Configured audit profile from default to WriteRequestBodies in apiserver/cluster and wait to restart kube-apiserver oc get pod -n openshift-kube-apiserver -l apiserver --show-labels NAME READY STATUS RESTARTS AGE LABELS kube-apiserver-ci-ln-qvmriyb-f76d1-dt7gb-master-0 5/5 Running 0 95s apiserver=true,app=openshift-kube-apiserver,revision=6 kube-apiserver-ci-ln-qvmriyb-f76d1-dt7gb-master-1 5/5 Running 0 8m18s apiserver=true,app=openshift-kube-apiserver,revision=6 kube-apiserver-ci-ln-qvmriyb-f76d1-dt7gb-master-2 5/5 Running 0 5m5s apiserver=true,app=openshift-kube-apiserver,revision=6 Check and gather audit logs after kube-apiserver restart and wait for 15mins. Login to all master and gather audit logs. oc debug node/ci-ln-qvmriyb-f76d1-dt7gb-master-2 -T -- chroot /host grep '"requestURI":"/apis/authentication.k8s.io/v1/tokenreviews"' /var/log/kube-apiserver/audit.log > /tmp/all_tokenreviews_requests.log grep '"status":{"authenticated":true,"user":{"username":"system:serviceaccount:openshift-monitoring:prometheus-k8s"' /tmp/all_tokenreviews_requests.log > /tmp/all_tokenreviews_for_serviceaccount_prometheus-k8s.log jq '.user.username' /tmp/all_tokenreviews_for_serviceaccount_prometheus-k8s.log > /tmp/all_users_that_make_traffic_to_check_token_of_serviceaccount_prometheus-k8s.log sort /tmp/all_users_that_make_traffic_to_check_token_of_serviceaccount_prometheus-k8s.log | uniq -c | sort -rh>/tmp/users.txt Check there are no token validation requests sent to kube-apiserver from below users and there will be no output/display. for i in kube-apiserver openshift-apiserver openshift-controller-manager kube-scheduler kubelet node-exporter kube-controller-manager etcd; do grep "$i" /tmp/users.txt;done; 1 "system:serviceaccount:openshift-controller-manager:openshift-controller-manager-sa" 4 "system:kube-scheduler" Still see tokenreview requests from some targets for the prometheus SA and filed bug https://bugzilla.redhat.com/show_bug.cgi?id=1991900 And when we bring kube-apiserver unavailable unable to gather metrics, filed bug https://bugzilla.redhat.com/show_bug.cgi?id=1990281
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: OpenShift Container Platform 4.9.0 bug fix and security update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2021:3759