Bug 1990281 - Client cert based metrics scraping when kube-apiserver is unavailable does not work in local authorization, still reaching kube-apiserver for subjectaccessreview validation
Summary: Client cert based metrics scraping when kube-apiserver is unavailable does no...
Keywords:
Status: CLOSED DEFERRED
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: kube-controller-manager
Version: 4.9
Hardware: Unspecified
OS: Unspecified
low
medium
Target Milestone: ---
: ---
Assignee: Jan Chaloupka
QA Contact: zhou ying
URL:
Whiteboard: LifecycleStale
: 1991900 (view as bug list)
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2021-08-05 07:39 UTC by Rahul Gangwar
Modified: 2023-01-16 10:03 UTC (History)
6 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2023-01-16 10:03:07 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)

Description Rahul Gangwar 2021-08-05 07:39:37 UTC
Created attachment 1811132 [details]
must-gather logs

Created attachment 1811132 [details]
must-gather logs

Created attachment 1811132 [details]
must-gather logs

Created attachment 1811132 [details]
must-gather logs

Description of problem:
We want to ensure that metrics scraping can always work if the scraper can reach the target, even if the kube-apiserver is unavailable or unreachable.
But this is not the fact in my testing, see below.

Version-Release number of selected component (if applicable):
Latest 4.9 nightly payload

How reproducible:
Always

Steps to Reproduce:
1. Get the certificate:
oc extract secret/metrics-client-certs -n openshift-monitoring

2. Get IPs of metrics targets in advance for use in later steps:
oc get endpoints -A > filename.txt
oc get node -o wide > nodes_ip.txt

3. ssh to all masters and try below on all masters:
sudo mv /etc/kubernetes/manifests/kube-apiserver-pod.yaml /tmp

Wait for a while (about 135s) for the kube-apiserver container to shutdown by watching:
sudo crictl ps -a --name="^kube-apiserver$"

4. Copy tls.crt and tls.key on /tmp of all masters.

5. Given all kube-apiservers are unavailable after previous step, gather metrics by using cert instead of token.
Take the target openshift-apiserver for instance, get its IPs from previous filename.txt, and try below on one master:
# curl -k --key /tmp/tls.key --cert /tmp/tls.crt "https://<openshift-apiserver pod IP>:8443/metrics" > /tmp/metrics.txt

Try similarly for targets of below namespaces:
openshift-controller-manager, openshift-kube-scheduler, openshift-kube-controller-manager, openshift-etcd

Actual results:
5. It still tries to reach kube-apiserver for authorization like below. This means local authorization does not work.
# curl -k --key /tmp/tls.key --cert /tmp/tls.crt "https://10.128.0.41:8443/metrics"
Internal Server Error: "/metrics": Post "https://172.30.0.1:443/apis/authorization.k8s.io/v1/subjectaccessreviews?timeout=10s": dial tcp 172.30.0.1:443: connect: connection refused

Expected results:
Able to gather metrics 

Additional info:

Comment 1 Sergiusz Urbaniak 2021-08-09 14:11:30 UTC
reassigning to workloads team as these components are not owned by auth.

Comment 2 Sergiusz Urbaniak 2021-08-11 07:20:20 UTC
*** Bug 1991900 has been marked as a duplicate of this bug. ***

Comment 3 Sergiusz Urbaniak 2021-08-11 09:36:16 UTC
*** Bug 1991900 has been marked as a duplicate of this bug. ***

Comment 4 Sergiusz Urbaniak 2021-08-11 09:37:37 UTC
to clarify: to fully implement client cert based metrics scraping, both subjectaccessreview (replaced with a local static authorizer) and tokenreview (replaced with client certs) must be implemented.

Comment 5 Rahul Gangwar 2021-09-08 04:50:14 UTC
Hi @ravig, 
Please update as soon as possible. If the bug is not fixed, this epic can't be said it implements the static local function well. Please provide the fix for bug as it is blocker for 4.9 release.

Comment 6 Rahul Gangwar 2021-09-08 06:04:45 UTC
Based on comment5 setting Flag to blocker?

Comment 7 Sergiusz Urbaniak 2021-09-08 06:31:37 UTC
this is not a blocker, there is no degradation in functionality.

Comment 8 ravig 2021-09-08 12:29:36 UTC
Targeting this BZ to 4.10 as this is not a blocker for 4.9

Comment 9 Michal Fojtik 2021-10-09 15:58:34 UTC
This bug hasn't had any activity in the last 30 days. Maybe the problem got resolved, was a duplicate of something else, or became less pressing for some reason - or maybe it's still relevant but just hasn't been looked at yet. As such, we're marking this bug as "LifecycleStale" and decreasing the severity/priority. If you have further information on the current state of the bug, please update it, otherwise this bug can be closed in about 7 days. The information can be, for example, that the problem still occurs, that you still want the feature, that more information is needed, or that the bug is (for whatever reason) no longer relevant. Additionally, you can add LifecycleFrozen into Whiteboard if you think this bug should never be marked as stale. Please consult with bug assignee before you do that.

Comment 10 Rahul Gangwar 2021-10-11 05:39:58 UTC
Hi Michal,
The bug is not fixed.

@ravig: Please update

Comment 11 Rahul Gangwar 2021-10-11 05:40:52 UTC
Hi Michal,
The bug is not fixed.

@ravig: Please update

Comment 13 Michal Fojtik 2021-11-26 15:58:56 UTC
This bug hasn't had any activity in the last 30 days. Maybe the problem got resolved, was a duplicate of something else, or became less pressing for some reason - or maybe it's still relevant but just hasn't been looked at yet. As such, we're marking this bug as "LifecycleStale" and decreasing the severity/priority. If you have further information on the current state of the bug, please update it, otherwise this bug can be closed in about 7 days. The information can be, for example, that the problem still occurs, that you still want the feature, that more information is needed, or that the bug is (for whatever reason) no longer relevant. Additionally, you can add LifecycleFrozen into Whiteboard if you think this bug should never be marked as stale. Please consult with bug assignee before you do that.

Comment 14 Rahul Gangwar 2022-01-21 09:48:05 UTC
@rgudimet Please update on this.

Comment 15 ravig 2022-01-24 13:38:31 UTC
I did not get enough time to work on this during this release considering this is not a blocker.

Comment 17 Jan Chaloupka 2022-08-01 15:19:16 UTC
Hi Rahul,

> We want to ensure that metrics scraping can always work if the scraper can reach the target, even if the kube-apiserver is unavailable or unreachable.
> If the bug is not fixed, this epic can't be said it implements the static local function well

would you please point me to the epic?

Comment 19 Jan Chaloupka 2023-01-16 10:03:07 UTC
Ported into https://issues.redhat.com/browse/WRKLDS-648


Note You need to log in before you can comment on or make changes to this bug.