Our components with delegated authn/z should have a cache duration big enough to not ask kube-apiserver for every /metrics, /healthz, /readyz, /openapi/v2 request. This BZ applies to at least: - openshift-apiservrer - oauth-apiserver - oauth-server - *-operator
Standa Laznicka, sorry, was on other on-hand arising daily work thus not on this bug for long. Looked at https://github.com/openshift/library-go/pull/970 comment and code, looks like need to test not all metrics requests need pass authn/z check. But still not sure how to concretely verify it from functional angle, could you give some guidance? And how to test each operator PR? Thanks
Per discussion with Dev in Slack, could try to check tokenreview metrics rate. But after checking, not sure what the concreate metric name is. Dev also told to check tokenreview rate in audit log. Below are the result of 4.7.0-0.nightly-2021-02-02-223803 env: ssh to one master, then run: # grep -h '"requestURI":"/apis/authentication.k8s.io/v1/tokenreviews[^"]*","verb":"create"' /var/log/kube-apiserver/audit*.log > tokenreview_requests.json # jq -c '.user.username + " " + "\(.requestReceivedTimestamp)"' tokenreview_requests.json | sed 's/"//g' | sort > tokenreview_users_and_timestamps.txt # cat tokenreview_users_and_timestamps.txt # the file looks like: system:kube-controller-manager 2021-02-03T02:59:13.598807Z system:kube-controller-manager 2021-02-03T02:59:17.063235Z ... system:serviceaccount:openshift-service-ca-operator:service-ca-operator 2021-02-03T09:29:00.206307Z system:serviceaccount:openshift-service-ca-operator:service-ca-operator 2021-02-03T09:29:54.321532Z # ALL_USERNAMES=`awk '{print $1}' tokenreview_users_and_timestamps.txt | uniq` # BUG_PRS_USERNAMES="authentication-operator kube-apiserver-operator kube-controller-manager-operator kube-scheduler-operator openshift-apiserver-operator service-ca-operator" # OTHER_COMPONENT_USERNAMES=`echo "$ALL_USERNAMES" | sed -E '/authentication-operator|kube-apiserver-operator|kube-controller-manager-operator|kube-scheduler-operator|openshift-apiserver-operator|service-ca-operator/d'` Then parse and process the tokenreview request rate per user with script: # cat parse.sh for USERNAME in $@ do grep $USERNAME tokenreview_users_and_timestamps.txt | tail -n 6 | awk '{print $2}' > tmp_result.txt NUM=`cat tmp_result.txt | wc -l` TIME_PREV=`awk "NR==1" tmp_result.txt` T1=`date --date "$TIME_PREV" '+%s'` for N in `seq 2 $NUM` do let L=$N-1 TIME_CURR=`awk "NR==$N" tmp_result.txt` T2=`date --date "$TIME_CURR" '+%s'` DELTA=" $((T2 - T1)) seconds" sed -i "${N}s/$/$DELTA/" tmp_result.txt T1="$T2" done echo "tokenreview request timestamps of ${USERNAME}:" cat tmp_result.txt echo done # bash parse.sh $BUG_PRS_USERNAMES tokenreview request timestamps of authentication-operator: 2021-02-03T08:55:23.835038Z 2021-02-03T08:56:05.657564Z 42 seconds 2021-02-03T08:56:53.835139Z 48 seconds 2021-02-03T08:57:35.655711Z 42 seconds 2021-02-03T08:58:23.835369Z 48 seconds 2021-02-03T08:59:05.656274Z 42 seconds tokenreview request timestamps of kube-apiserver-operator: 2021-02-03T03:15:46.346340Z 2021-02-03T03:16:35.189988Z 49 seconds 2021-02-03T03:17:16.346899Z 41 seconds 2021-02-03T03:18:05.189432Z 49 seconds 2021-02-03T03:18:46.347157Z 41 seconds 2021-02-03T03:19:35.189775Z 49 seconds tokenreview request timestamps of kube-controller-manager-operator: 2021-02-03T03:27:57.894032Z 2021-02-03T08:56:03.019302Z 19686 seconds 2021-02-03T08:56:57.200726Z 54 seconds 2021-02-03T08:57:33.020995Z 36 seconds 2021-02-03T08:58:27.201122Z 54 seconds 2021-02-03T08:59:03.019251Z 36 seconds tokenreview request timestamps of kube-scheduler-operator: 2021-02-03T09:54:09.700867Z 2021-02-03T09:55:09.701434Z 60 seconds 2021-02-03T09:56:09.701323Z 60 seconds 2021-02-03T09:57:09.701707Z 60 seconds 2021-02-03T09:58:09.701237Z 60 seconds 2021-02-03T09:59:09.702766Z 60 seconds tokenreview request timestamps of openshift-apiserver-operator: 2021-02-03T09:53:31.972808Z 2021-02-03T09:54:31.972251Z 60 seconds 2021-02-03T09:55:31.973336Z 60 seconds 2021-02-03T09:56:31.973056Z 60 seconds 2021-02-03T09:57:31.972270Z 60 seconds 2021-02-03T09:58:31.972364Z 60 seconds tokenreview request timestamps of service-ca-operator: 2021-02-03T09:55:24.320109Z 2021-02-03T09:56:00.206067Z 36 seconds 2021-02-03T09:56:54.320085Z 54 seconds 2021-02-03T09:57:30.206237Z 36 seconds 2021-02-03T09:58:24.319930Z 54 seconds 2021-02-03T09:59:00.206212Z 36 seconds We can see for all components of this bug PRs, the request timestamp inverval is not less than 35s of https://github.com/openshift/library-go/pull/970/files
But for other components, the tokenreview request timestamp inverval has occurrences that are less than 35s: # echo "$OTHER_COMPONENT_USERNAMES" system:kube-controller-manager system:kube-scheduler system:node:qe-chao23-czbzp-master-0.c.openshift-qe.internal system:node:qe-chao23-czbzp-master-2.c.openshift-qe.internal system:node:qe-chao23-czbzp-worker-a-2rw9m.c.openshift-qe.internal system:serviceaccount:openshift-apiserver:openshift-apiserver-sa system:serviceaccount:openshift-authentication:oauth-openshift system:serviceaccount:openshift-cloud-credential-operator:cloud-credential-operator system:serviceaccount:openshift-cluster-machine-approver:machine-approver-sa system:serviceaccount:openshift-cluster-storage-operator:cluster-storage-operator system:serviceaccount:openshift-config-operator:openshift-config-operator system:serviceaccount:openshift-console-operator:console-operator system:serviceaccount:openshift-controller-manager:openshift-controller-manager-sa system:serviceaccount:openshift-controller-manager-operator:openshift-controller-manager-operator system:serviceaccount:openshift-dns:dns system:serviceaccount:openshift-dns-operator:dns-operator system:serviceaccount:openshift-etcd-operator:etcd-operator system:serviceaccount:openshift-ingress-operator:ingress-operator system:serviceaccount:openshift-ingress:router system:serviceaccount:openshift-insights:operator system:serviceaccount:openshift-machine-api:cluster-autoscaler-operator system:serviceaccount:openshift-machine-api:machine-api-controllers system:serviceaccount:openshift-machine-api:machine-api-operator system:serviceaccount:openshift-machine-config-operator:machine-config-daemon system:serviceaccount:openshift-monitoring:alertmanager-main system:serviceaccount:openshift-monitoring:cluster-monitoring-operator system:serviceaccount:openshift-monitoring:grafana system:serviceaccount:openshift-monitoring:kube-state-metrics system:serviceaccount:openshift-monitoring:node-exporter system:serviceaccount:openshift-monitoring:openshift-state-metrics system:serviceaccount:openshift-monitoring:prometheus-adapter system:serviceaccount:openshift-monitoring:prometheus-k8s system:serviceaccount:openshift-monitoring:prometheus-operator system:serviceaccount:openshift-monitoring:telemeter-client system:serviceaccount:openshift-monitoring:thanos-querier system:serviceaccount:openshift-multus:metrics-daemon-sa system:serviceaccount:openshift-multus:multus system:serviceaccount:openshift-sdn:sdn # bash parse.sh $OTHER_COMPONENT_USERNAMES > other_component_usernames.parsed_result.txt # cat other_component_usernames.parsed_result.txt tokenreview request timestamps of system:kube-controller-manager: 2021-02-03T08:58:02.054496Z 2021-02-03T08:58:02.342213Z 0 seconds 2021-02-03T08:58:32.053797Z 30 seconds 2021-02-03T08:58:32.341769Z 0 seconds 2021-02-03T08:59:02.054225Z 30 seconds 2021-02-03T08:59:02.342292Z 0 seconds tokenreview request timestamps of system:kube-scheduler: 2021-02-03T08:58:07.838701Z 2021-02-03T08:58:20.096050Z 13 seconds 2021-02-03T08:58:37.838006Z 17 seconds 2021-02-03T08:58:50.096504Z 13 seconds 2021-02-03T08:59:07.837946Z 17 seconds 2021-02-03T08:59:20.095181Z 13 seconds ... tokenreview request timestamps of system:serviceaccount:openshift-apiserver:openshift-apiserver-sa: 2021-02-03T09:58:25.055096Z 2021-02-03T09:58:41.871500Z 16 seconds 2021-02-03T09:58:45.342003Z 4 seconds 2021-02-03T09:59:05.927009Z 20 seconds 2021-02-03T09:59:11.872047Z 6 seconds 2021-02-03T09:59:15.342777Z 4 seconds tokenreview request timestamps of system:serviceaccount:openshift-authentication:oauth-openshift: 2021-02-03T09:58:00.615494Z 2021-02-03T09:58:13.302838Z 13 seconds 2021-02-03T09:58:30.615540Z 17 seconds 2021-02-03T09:58:43.301550Z 13 seconds 2021-02-03T09:59:00.615414Z 17 seconds 2021-02-03T09:59:13.304235Z 13 seconds ... tokenreview request timestamps of system:serviceaccount:openshift-multus:metrics-daemon-sa: 2021-02-03T09:55:18.987849Z 2021-02-03T09:57:09.571177Z 111 seconds 2021-02-03T09:57:14.483839Z 5 seconds 2021-02-03T09:57:28.320110Z 14 seconds 2021-02-03T09:59:13.334018Z 105 seconds 2021-02-03T09:59:17.775370Z 4 seconds tokenreview request timestamps of system:serviceaccount:openshift-multus:multus: 2021-02-03T09:46:56.917778Z 2021-02-03T09:49:01.493648Z 125 seconds 2021-02-03T09:49:34.943302Z 33 seconds 2021-02-03T09:51:26.908722Z 112 seconds 2021-02-03T09:51:49.233165Z 23 seconds 2021-02-03T09:58:34.942454Z 405 seconds tokenreview request timestamps of system:serviceaccount:openshift-sdn:sdn: 2021-02-03T09:56:22.711947Z 2021-02-03T09:56:25.504492Z 3 seconds 2021-02-03T09:56:29.301308Z 4 seconds 2021-02-03T09:56:31.621889Z 2 seconds 2021-02-03T09:58:42.124772Z 131 seconds 2021-02-03T09:58:51.964934Z 9 seconds The full output is uploaded in http://file.rdu.redhat.com/~xxia/other_component_usernames.parsed_result.txt . Do they need bump?
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: OpenShift Container Platform 4.7.0 security, bug fix, and enhancement update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2020:5633