Bug 1734390

Summary: oauth-proxy is rejecting a valid service account
Product: OpenShift Container Platform Reporter: Pawel Krupa <pkrupa>
Component: apiserver-authAssignee: Matt Rogers <mrogers>
Status: CLOSED NOTABUG QA Contact: Chuan Yu <chuyu>
Severity: medium Docs Contact:
Priority: medium    
Version: 4.2.0CC: alegrand, anpicker, aos-bugs, erooth, gblomqui, mfojtik, mloibl, nagrawal, pkrupa, slaznick, sttts, surbania
Target Milestone: ---   
Target Release: 4.2.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2019-09-02 15:07:22 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Pawel Krupa 2019-07-30 12:28:45 UTC
Description of problem:
oauth proxy is rejecting valid service account. Probably due to problems with certificate rotation.


Version-Release number of selected component (if applicable):
Tested on cluster in version 4.2.0-0.ci-2019-07-30-062021
Was also noticed on previous ones.

This was noticed in oauth-proxy deployed in prometheus pod. Logs are available at https://storage.googleapis.com/origin-ci-test/pr-logs/pull/openshift_console/2210/pull-ci-openshift-console-master-e2e-aws/6271/artifacts/e2e-aws/pods/openshift-monitoring_prometheus-k8s-0_prometheus-proxy.log

It is happening consistently with new clusters.

Comment 1 Standa Laznicka 2019-08-01 12:13:54 UTC
Pawel, this is indeed weird behavior, is this happening in all clusters today? Can you get me the config for that proxy? I was not able to get it by modifying the link.

Comment 2 Pawel Krupa 2019-08-07 14:30:56 UTC
This is happening on all clusters and in every e2e CI job. It can be observed for example in logs gathered from prometheus-proxy container in prometheus-k8s pod. Configuration for that container is available at https://github.com/openshift/cluster-monitoring-operator/blob/master/assets/prometheus-k8s/prometheus.yaml#L38-L70

Comment 3 Standa Laznicka 2019-08-09 13:45:22 UTC
Debugging progress - with requests logging turned on, it shows that the request causing the behavior is:

prometheus-k8s.openshift-monitoring.svc:9091 GET localhost:9090 '/federate?match[]={__name__="up"}&match[]={__name__="cluster_version"}&match[]={__name__="cluster_version_available_updates"}&match[]={__name__="cluster_operator_up"}&match[]={__name__="cluster_operator_conditions"}&match[]={__name__="cluster_version_payload"}&match[]={__name__="cluster_installer"}&match[]={__name__="instance:etcd_object_counts:sum"}&match[]={__name__="ALERTS",alertstate="firing"}&match[]={__name__="code:apiserver_request_count:rate:sum"}&match[]={__name__="cluster:capacity_cpu_cores:sum"}&match[]={__name__="cluster:capacity_memory_bytes:sum"}&match[]={__name__="cluster:cpu_usage_cores:sum"}&match[]={__name__="cluster:memory_usage_bytes:sum"}&match[]={__name__="openshift:cpu_usage_cores:sum"}&match[]={__name__="openshift:memory_usage_bytes:sum"}&match[]={__name__="cluster:node_instance_type_count:sum"}&match[]={__name__="cnv:vmi_status_running:count"}&match[]={__name__="subscription_sync_total"}' HTTP/1.1 "Go-http-client/1.1" 200 5278 0.009

Suspicion falls on telemeter-client

Comment 4 Standa Laznicka 2019-08-09 13:55:50 UTC
tried to add `- -skip-auth-regex=^/federate` which seems to have fixed the problem for me

Comment 5 Standa Laznicka 2019-08-12 10:37:35 UTC
The next endpoint was `/api`, and that we don't want to reveal, turns out even `/federate` should not be visible, there has to be another way around this