1986829 – [AUTH-20] Make prometheus authenticate with a certificate while scraping the cluster's core components metrics

Bug 1986829 - [AUTH-20] Make prometheus authenticate with a certificate while scraping the cluster's core components metrics

Summary: [AUTH-20] Make prometheus authenticate with a certificate while scraping the ...

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	OpenShift Container Platform
Classification:	Red Hat
Component:	apiserver-auth
Sub Component:
Version:	4.9
Hardware:	Unspecified
OS:	Unspecified
Priority:	high
Severity:	medium
Target Milestone:	---
Target Release:	4.9.0
Assignee:	Standa Laznicka
QA Contact:	Rahul Gangwar
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2021-07-28 12:03 UTC by Standa Laznicka
Modified:	2021-10-18 17:43 UTC (History)
CC List:	6 users (show)
Fixed In Version:
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2021-10-18 17:42:49 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Priority	Status	Summary	Last Updated
Github	openshift cluster-authentication-operator pull 469	None	Merged	Bug 1986829: metrics: use client cert auth for metrics scraping	2023-01-16 10:13:08 UTC
Github	openshift cluster-etcd-operator pull 634	None	Merged	Bug 1986829: metrics: use client cert auth for metrics scraping	2023-01-16 10:13:09 UTC
Github	openshift cluster-kube-apiserver-operator pull 1190	None	Merged	Bug 1986829: metrics: use client cert auth for metrics scraping	2023-01-16 10:13:09 UTC
Github	openshift cluster-kube-controller-manager-operator pull 556	None	Merged	Bug 1986829: metrics: use client cert auth for metrics scraping	2023-01-16 10:13:10 UTC
Github	openshift cluster-kube-scheduler-operator pull 364	None	Merged	Bug 1986829: metrics: use client cert auth for metrics scraping	2023-01-16 10:13:11 UTC
Github	openshift cluster-openshift-apiserver-operator pull 464	None	Merged	Bug 1986829: metrics: use client cert auth for metrics scraping	2023-01-16 10:13:11 UTC
Github	openshift cluster-openshift-controller-manager-operator pull 223	None	Merged	Bug 1986829: metrics: use client cert auth for metrics scraping	2023-01-16 10:13:12 UTC
Github	openshift service-ca-operator pull 173	None	Merged	Bug 1986829: metrics: use client cert auth for metrics scraping	2023-01-16 10:13:12 UTC
Red Hat Product Errata	RHSA-2021:3759	None	None	None	2021-10-18 17:43:06 UTC

Description Standa Laznicka 2021-07-28 12:03:48 UTC

Description of problem:
https://github.com/openshift/cluster-monitoring-operator/pull/1282 introduced a possibility for the metrics scraper to authenticate with a certificate and therefore omit a single TokenReview call to the kube-apiserver (which happens usually once every 30s per scraped component).

The core components and operators should use this capability to lower the API server load and to make it possible to scrape the metrics even when the kube-API is down (only if the contacted component is using static authorization for their /metrics endpoint, though).

Comment 3 Rahul Gangwar 2021-08-11 08:02:42 UTC

oc get clusterversion
NAME      VERSION                             AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.9.0-0.nightly-2021-08-07-175228   True        False         2m10s   Cluster version is 4.9.0-0.nightly-2021-08-07-175228

Checked metric client certificate
 oc get secret -n openshift-monitoring
 
metrics-client-certs                             Opaque                                2      22m


oc get car
system:openshift:openshift-monitoring-gnqcs      30s   kubernetes.io/kube-apiserver-client           system:serviceaccount:openshift-monitoring:cluster-monitoring-operator            Approved,Issued


Check metric client certificate again to check new cert
 oc get secret -n openshift-monitoring
 
metrics-client-certs                             Opaque                                2      2m30s

Gather prometheus metrics by  using curl cert for below operators:
openshift-apiserver-operator 
openshift-kube-apiserver-operator
 openshift-kube-controller-manager-operator
openshift-kube-storage-version-migrator-operator

For e.g. oc rsh -n openshift-apiserver-operator openshift-apiserver-operator-7f7cd7d86c-5bm49  curl -k --key /tmp/tls.key --cert /tmp/tls.crt  https://localhost:8443/metrics > /tmp/metrics.txt

The curl commands succeed, and checked /tmp/metrics.txt files is not empty content.
 
Checked Openssl and checked the user of cert in the CN, it is prometheus-k8s. 
openssl x509 -in tls.crt -noout -text|grep CN
 
Issuer: CN=kube-csr-signer_@1628567334
        Subject: CN=system:serviceaccount:openshift-monitoring:prometheus-k8s
 
 
oc get pod -n openshift-kube-apiserver -l apiserver --show-labels
NAME                                                READY   STATUS    RESTARTS   AGE   LABELS
kube-apiserver-ci-ln-qvmriyb-f76d1-dt7gb-master-0   5/5     Running   0          25m   apiserver=true,app=openshift-kube-apiserver,revision=5
kube-apiserver-ci-ln-qvmriyb-f76d1-dt7gb-master-1   5/5     Running   0          32m   apiserver=true,app=openshift-kube-apiserver,revision=5
kube-apiserver-ci-ln-qvmriyb-f76d1-dt7gb-master-2   5/5     Running   0          29m   apiserver=true,app=openshift-kube-apiserver,revision=5

Configured audit profile from default  to WriteRequestBodies in apiserver/cluster and wait to restart kube-apiserver
 
oc get pod -n openshift-kube-apiserver -l apiserver --show-labels
NAME                                                READY   STATUS    RESTARTS   AGE     LABELS
kube-apiserver-ci-ln-qvmriyb-f76d1-dt7gb-master-0   5/5     Running   0          95s     apiserver=true,app=openshift-kube-apiserver,revision=6
kube-apiserver-ci-ln-qvmriyb-f76d1-dt7gb-master-1   5/5     Running   0          8m18s   apiserver=true,app=openshift-kube-apiserver,revision=6
kube-apiserver-ci-ln-qvmriyb-f76d1-dt7gb-master-2   5/5     Running   0          5m5s    apiserver=true,app=openshift-kube-apiserver,revision=6
 
Check and gather audit logs after kube-apiserver restart and wait for 15mins. Login to all master and gather audit logs.
 
oc debug node/ci-ln-qvmriyb-f76d1-dt7gb-master-2 -T -- chroot /host grep '"requestURI":"/apis/authentication.k8s.io/v1/tokenreviews"' /var/log/kube-apiserver/audit.log > /tmp/all_tokenreviews_requests.log
 
grep '"status":{"authenticated":true,"user":{"username":"system:serviceaccount:openshift-monitoring:prometheus-k8s"' /tmp/all_tokenreviews_requests.log > /tmp/all_tokenreviews_for_serviceaccount_prometheus-k8s.log
 
jq '.user.username' /tmp/all_tokenreviews_for_serviceaccount_prometheus-k8s.log > /tmp/all_users_that_make_traffic_to_check_token_of_serviceaccount_prometheus-k8s.log
 
sort /tmp/all_users_that_make_traffic_to_check_token_of_serviceaccount_prometheus-k8s.log | uniq -c | sort -rh>/tmp/users.txt
 
 
Check there are no token validation requests sent to  kube-apiserver from below users and there will be no output/display.
 
for i in kube-apiserver openshift-apiserver openshift-controller-manager kube-scheduler kubelet node-exporter kube-controller-manager etcd; do grep "$i" /tmp/users.txt;done;
 
1 "system:serviceaccount:openshift-controller-manager:openshift-controller-manager-sa"
4 "system:kube-scheduler"
 
 
Still see tokenreview requests from some targets for the prometheus SA and filed bug https://bugzilla.redhat.com/show_bug.cgi?id=1991900
And when we bring kube-apiserver unavailable unable to gather metrics, filed bug https://bugzilla.redhat.com/show_bug.cgi?id=1990281

Comment 6 errata-xmlrpc 2021-10-18 17:42:49 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Container Platform 4.9.0 bug fix and security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2021:3759

Note You need to log in before you can comment on or make changes to this bug.