Bug 1887392 - openshift-apiserver: delegated authn/z should have ttl > metrics/healthz/readyz/openapi interval
Summary: openshift-apiserver: delegated authn/z should have ttl > metrics/healthz/read...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: openshift-apiserver
Version: 4.7
Hardware: Unspecified
OS: Unspecified
medium
medium
Target Milestone: ---
: 4.7.0
Assignee: Standa Laznicka
QA Contact: Xingxing Xia
URL:
Whiteboard:
Depends On: 1913325
Blocks:
TreeView+ depends on / blocked
 
Reported: 2020-10-12 10:57 UTC by Stefan Schimanski
Modified: 2021-02-24 15:25 UTC (History)
2 users (show)

Fixed In Version:
Doc Type: No Doc Update
Doc Text:
Clone Of:
Environment:
Last Closed: 2021-02-24 15:24:44 UTC
Target Upstream Version:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github openshift cluster-authentication-operator pull 392 0 None closed Bug 1887392: bump kube to 0.20.1 and lib-go to master to pick up authn/z caching 2021-02-01 08:40:07 UTC
Github openshift cluster-kube-apiserver-operator pull 1027 0 None closed Bug 1887392: bump kube to 0.20.1 and lib-go to master to pick up authn/z caching 2021-02-01 08:40:07 UTC
Github openshift cluster-kube-controller-manager-operator pull 491 0 None closed Bug 1887392: bump lib-go to master to pick up authn/z caching 2021-02-01 08:40:07 UTC
Github openshift cluster-kube-descheduler-operator pull 165 0 None closed Bug 1887392: bump lib-go to master to pick up authn/z caching 2021-02-01 08:40:07 UTC
Github openshift cluster-kube-scheduler-operator pull 313 0 None closed Bug 1887392: bump kube to 1.201 and lib-go to master to pick up authn/z caching 2021-02-01 08:40:09 UTC
Github openshift cluster-openshift-apiserver-operator pull 424 0 None closed Bug 1887392: bump kube to 0.20.1 and lib-go to master to pick up authn/z caching 2021-02-01 08:40:09 UTC
Github openshift library-go pull 970 0 None closed Bug 1887392: allow configuring authn/z caches TTLs 2021-02-01 08:40:09 UTC
Github openshift service-ca-operator pull 134 0 None closed Bug 1887392: bump kube to 0.20.1 and lib-go to master to pick up authn/z caching 2021-02-01 08:40:10 UTC
Red Hat Product Errata RHSA-2020:5633 0 None None None 2021-02-24 15:25:28 UTC

Description Stefan Schimanski 2020-10-12 10:57:07 UTC
Our components with delegated authn/z should have a cache duration big enough to not ask kube-apiserver for every /metrics, /healthz, /readyz, /openapi/v2 request.

This BZ applies to at least:
- openshift-apiservrer
- oauth-apiserver
- oauth-server
- *-operator

Comment 7 Xingxing Xia 2021-01-28 11:46:47 UTC
Standa Laznicka,
  sorry, was on other on-hand arising daily work thus not on this bug for long.
  Looked at https://github.com/openshift/library-go/pull/970 comment and code, looks like need to test not all metrics requests need pass authn/z check.
  But still not sure how to concretely verify it from functional angle, could you give some guidance? And how to test each operator PR?
Thanks

Comment 8 Xingxing Xia 2021-02-03 10:10:41 UTC
Per discussion with Dev in Slack, could try to check tokenreview metrics rate. But after checking, not sure what the concreate metric name is. Dev also told to check tokenreview rate in audit log. Below are the result of 4.7.0-0.nightly-2021-02-02-223803 env:
ssh to one master, then run:
# grep -h '"requestURI":"/apis/authentication.k8s.io/v1/tokenreviews[^"]*","verb":"create"' /var/log/kube-apiserver/audit*.log > tokenreview_requests.json
# jq -c '.user.username + "    " + "\(.requestReceivedTimestamp)"' tokenreview_requests.json | sed 's/"//g' | sort > tokenreview_users_and_timestamps.txt

# cat tokenreview_users_and_timestamps.txt # the file looks like:
system:kube-controller-manager    2021-02-03T02:59:13.598807Z
system:kube-controller-manager    2021-02-03T02:59:17.063235Z
...
system:serviceaccount:openshift-service-ca-operator:service-ca-operator    2021-02-03T09:29:00.206307Z
system:serviceaccount:openshift-service-ca-operator:service-ca-operator    2021-02-03T09:29:54.321532Z

# ALL_USERNAMES=`awk '{print $1}' tokenreview_users_and_timestamps.txt | uniq`
# BUG_PRS_USERNAMES="authentication-operator kube-apiserver-operator kube-controller-manager-operator kube-scheduler-operator openshift-apiserver-operator service-ca-operator"
# OTHER_COMPONENT_USERNAMES=`echo "$ALL_USERNAMES" | sed -E '/authentication-operator|kube-apiserver-operator|kube-controller-manager-operator|kube-scheduler-operator|openshift-apiserver-operator|service-ca-operator/d'`

Then parse and process the tokenreview request rate per user with script:
# cat parse.sh
for USERNAME in $@
do
  grep $USERNAME tokenreview_users_and_timestamps.txt | tail -n 6 | awk '{print $2}' > tmp_result.txt
  NUM=`cat tmp_result.txt | wc -l`
  TIME_PREV=`awk "NR==1" tmp_result.txt`
  T1=`date --date "$TIME_PREV" '+%s'`
  for N in `seq 2 $NUM`
  do
    let L=$N-1
    TIME_CURR=`awk "NR==$N" tmp_result.txt`
    T2=`date --date "$TIME_CURR" '+%s'`
    DELTA="    $((T2 - T1)) seconds"
    sed -i "${N}s/$/$DELTA/" tmp_result.txt
    T1="$T2"
  done
  echo "tokenreview request timestamps of ${USERNAME}:"
  cat tmp_result.txt
  echo
done

# bash parse.sh $BUG_PRS_USERNAMES
tokenreview request timestamps of authentication-operator:
2021-02-03T08:55:23.835038Z
2021-02-03T08:56:05.657564Z    42 seconds
2021-02-03T08:56:53.835139Z    48 seconds
2021-02-03T08:57:35.655711Z    42 seconds
2021-02-03T08:58:23.835369Z    48 seconds
2021-02-03T08:59:05.656274Z    42 seconds

tokenreview request timestamps of kube-apiserver-operator:
2021-02-03T03:15:46.346340Z
2021-02-03T03:16:35.189988Z    49 seconds
2021-02-03T03:17:16.346899Z    41 seconds
2021-02-03T03:18:05.189432Z    49 seconds
2021-02-03T03:18:46.347157Z    41 seconds
2021-02-03T03:19:35.189775Z    49 seconds

tokenreview request timestamps of kube-controller-manager-operator:
2021-02-03T03:27:57.894032Z
2021-02-03T08:56:03.019302Z    19686 seconds
2021-02-03T08:56:57.200726Z    54 seconds
2021-02-03T08:57:33.020995Z    36 seconds
2021-02-03T08:58:27.201122Z    54 seconds
2021-02-03T08:59:03.019251Z    36 seconds

tokenreview request timestamps of kube-scheduler-operator:
2021-02-03T09:54:09.700867Z
2021-02-03T09:55:09.701434Z    60 seconds
2021-02-03T09:56:09.701323Z    60 seconds
2021-02-03T09:57:09.701707Z    60 seconds
2021-02-03T09:58:09.701237Z    60 seconds
2021-02-03T09:59:09.702766Z    60 seconds

tokenreview request timestamps of openshift-apiserver-operator:
2021-02-03T09:53:31.972808Z
2021-02-03T09:54:31.972251Z    60 seconds
2021-02-03T09:55:31.973336Z    60 seconds
2021-02-03T09:56:31.973056Z    60 seconds
2021-02-03T09:57:31.972270Z    60 seconds
2021-02-03T09:58:31.972364Z    60 seconds

tokenreview request timestamps of service-ca-operator:
2021-02-03T09:55:24.320109Z
2021-02-03T09:56:00.206067Z    36 seconds
2021-02-03T09:56:54.320085Z    54 seconds
2021-02-03T09:57:30.206237Z    36 seconds
2021-02-03T09:58:24.319930Z    54 seconds
2021-02-03T09:59:00.206212Z    36 seconds

We can see for all components of this bug PRs, the request timestamp inverval is not less than 35s of https://github.com/openshift/library-go/pull/970/files

Comment 9 Xingxing Xia 2021-02-03 10:24:02 UTC
But for other components, the tokenreview request timestamp inverval has occurrences that are less than 35s:
# echo "$OTHER_COMPONENT_USERNAMES"
system:kube-controller-manager
system:kube-scheduler
system:node:qe-chao23-czbzp-master-0.c.openshift-qe.internal
system:node:qe-chao23-czbzp-master-2.c.openshift-qe.internal
system:node:qe-chao23-czbzp-worker-a-2rw9m.c.openshift-qe.internal
system:serviceaccount:openshift-apiserver:openshift-apiserver-sa
system:serviceaccount:openshift-authentication:oauth-openshift
system:serviceaccount:openshift-cloud-credential-operator:cloud-credential-operator
system:serviceaccount:openshift-cluster-machine-approver:machine-approver-sa
system:serviceaccount:openshift-cluster-storage-operator:cluster-storage-operator
system:serviceaccount:openshift-config-operator:openshift-config-operator
system:serviceaccount:openshift-console-operator:console-operator
system:serviceaccount:openshift-controller-manager:openshift-controller-manager-sa
system:serviceaccount:openshift-controller-manager-operator:openshift-controller-manager-operator
system:serviceaccount:openshift-dns:dns
system:serviceaccount:openshift-dns-operator:dns-operator
system:serviceaccount:openshift-etcd-operator:etcd-operator
system:serviceaccount:openshift-ingress-operator:ingress-operator
system:serviceaccount:openshift-ingress:router
system:serviceaccount:openshift-insights:operator
system:serviceaccount:openshift-machine-api:cluster-autoscaler-operator
system:serviceaccount:openshift-machine-api:machine-api-controllers
system:serviceaccount:openshift-machine-api:machine-api-operator
system:serviceaccount:openshift-machine-config-operator:machine-config-daemon
system:serviceaccount:openshift-monitoring:alertmanager-main
system:serviceaccount:openshift-monitoring:cluster-monitoring-operator
system:serviceaccount:openshift-monitoring:grafana
system:serviceaccount:openshift-monitoring:kube-state-metrics
system:serviceaccount:openshift-monitoring:node-exporter
system:serviceaccount:openshift-monitoring:openshift-state-metrics
system:serviceaccount:openshift-monitoring:prometheus-adapter
system:serviceaccount:openshift-monitoring:prometheus-k8s
system:serviceaccount:openshift-monitoring:prometheus-operator
system:serviceaccount:openshift-monitoring:telemeter-client
system:serviceaccount:openshift-monitoring:thanos-querier
system:serviceaccount:openshift-multus:metrics-daemon-sa
system:serviceaccount:openshift-multus:multus
system:serviceaccount:openshift-sdn:sdn

# bash parse.sh $OTHER_COMPONENT_USERNAMES > other_component_usernames.parsed_result.txt
# cat other_component_usernames.parsed_result.txt
tokenreview request timestamps of system:kube-controller-manager:
2021-02-03T08:58:02.054496Z
2021-02-03T08:58:02.342213Z    0 seconds
2021-02-03T08:58:32.053797Z    30 seconds
2021-02-03T08:58:32.341769Z    0 seconds
2021-02-03T08:59:02.054225Z    30 seconds
2021-02-03T08:59:02.342292Z    0 seconds

tokenreview request timestamps of system:kube-scheduler:
2021-02-03T08:58:07.838701Z
2021-02-03T08:58:20.096050Z    13 seconds
2021-02-03T08:58:37.838006Z    17 seconds
2021-02-03T08:58:50.096504Z    13 seconds
2021-02-03T08:59:07.837946Z    17 seconds
2021-02-03T08:59:20.095181Z    13 seconds
...
tokenreview request timestamps of system:serviceaccount:openshift-apiserver:openshift-apiserver-sa:
2021-02-03T09:58:25.055096Z
2021-02-03T09:58:41.871500Z    16 seconds
2021-02-03T09:58:45.342003Z    4 seconds
2021-02-03T09:59:05.927009Z    20 seconds
2021-02-03T09:59:11.872047Z    6 seconds
2021-02-03T09:59:15.342777Z    4 seconds

tokenreview request timestamps of system:serviceaccount:openshift-authentication:oauth-openshift:
2021-02-03T09:58:00.615494Z
2021-02-03T09:58:13.302838Z    13 seconds
2021-02-03T09:58:30.615540Z    17 seconds
2021-02-03T09:58:43.301550Z    13 seconds
2021-02-03T09:59:00.615414Z    17 seconds
2021-02-03T09:59:13.304235Z    13 seconds
...

tokenreview request timestamps of system:serviceaccount:openshift-multus:metrics-daemon-sa:
2021-02-03T09:55:18.987849Z
2021-02-03T09:57:09.571177Z    111 seconds
2021-02-03T09:57:14.483839Z    5 seconds
2021-02-03T09:57:28.320110Z    14 seconds
2021-02-03T09:59:13.334018Z    105 seconds
2021-02-03T09:59:17.775370Z    4 seconds

tokenreview request timestamps of system:serviceaccount:openshift-multus:multus:
2021-02-03T09:46:56.917778Z
2021-02-03T09:49:01.493648Z    125 seconds
2021-02-03T09:49:34.943302Z    33 seconds
2021-02-03T09:51:26.908722Z    112 seconds
2021-02-03T09:51:49.233165Z    23 seconds
2021-02-03T09:58:34.942454Z    405 seconds

tokenreview request timestamps of system:serviceaccount:openshift-sdn:sdn:
2021-02-03T09:56:22.711947Z
2021-02-03T09:56:25.504492Z    3 seconds
2021-02-03T09:56:29.301308Z    4 seconds
2021-02-03T09:56:31.621889Z    2 seconds
2021-02-03T09:58:42.124772Z    131 seconds
2021-02-03T09:58:51.964934Z    9 seconds

The full output is uploaded in http://file.rdu.redhat.com/~xxia/other_component_usernames.parsed_result.txt . Do they need bump?

Comment 12 errata-xmlrpc 2021-02-24 15:24:44 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Container Platform 4.7.0 security, bug fix, and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2020:5633


Note You need to log in before you can comment on or make changes to this bug.