Bug 1734390

Summary:	oauth-proxy is rejecting a valid service account
Product:	OpenShift Container Platform	Reporter:	Pawel Krupa <pkrupa>
Component:	apiserver-auth	Assignee:	Matt Rogers <mrogers>
Status:	CLOSED NOTABUG	QA Contact:	Chuan Yu <chuyu>
Severity:	medium	Docs Contact:
Priority:	medium
Version:	4.2.0	CC:	alegrand, anpicker, aos-bugs, erooth, gblomqui, mfojtik, mloibl, nagrawal, pkrupa, slaznick, sttts, surbania
Target Milestone:	---
Target Release:	4.2.0
Hardware:	Unspecified
OS:	Unspecified
Whiteboard:
Fixed In Version:		Doc Type:	If docs needed, set a value
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2019-09-02 15:07:22 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:

Description Pawel Krupa 2019-07-30 12:28:45 UTC

Description of problem:
oauth proxy is rejecting valid service account. Probably due to problems with certificate rotation.


Version-Release number of selected component (if applicable):
Tested on cluster in version 4.2.0-0.ci-2019-07-30-062021
Was also noticed on previous ones.

This was noticed in oauth-proxy deployed in prometheus pod. Logs are available at https://storage.googleapis.com/origin-ci-test/pr-logs/pull/openshift_console/2210/pull-ci-openshift-console-master-e2e-aws/6271/artifacts/e2e-aws/pods/openshift-monitoring_prometheus-k8s-0_prometheus-proxy.log

It is happening consistently with new clusters.

Comment 1 Standa Laznicka 2019-08-01 12:13:54 UTC

Pawel, this is indeed weird behavior, is this happening in all clusters today? Can you get me the config for that proxy? I was not able to get it by modifying the link.

Comment 2 Pawel Krupa 2019-08-07 14:30:56 UTC

This is happening on all clusters and in every e2e CI job. It can be observed for example in logs gathered from prometheus-proxy container in prometheus-k8s pod. Configuration for that container is available at https://github.com/openshift/cluster-monitoring-operator/blob/master/assets/prometheus-k8s/prometheus.yaml#L38-L70

Comment 3 Standa Laznicka 2019-08-09 13:45:22 UTC

Debugging progress - with requests logging turned on, it shows that the request causing the behavior is:

prometheus-k8s.openshift-monitoring.svc:9091 GET localhost:9090 '/federate?match[]={__name__="up"}&match[]={__name__="cluster_version"}&match[]={__name__="cluster_version_available_updates"}&match[]={__name__="cluster_operator_up"}&match[]={__name__="cluster_operator_conditions"}&match[]={__name__="cluster_version_payload"}&match[]={__name__="cluster_installer"}&match[]={__name__="instance:etcd_object_counts:sum"}&match[]={__name__="ALERTS",alertstate="firing"}&match[]={__name__="code:apiserver_request_count:rate:sum"}&match[]={__name__="cluster:capacity_cpu_cores:sum"}&match[]={__name__="cluster:capacity_memory_bytes:sum"}&match[]={__name__="cluster:cpu_usage_cores:sum"}&match[]={__name__="cluster:memory_usage_bytes:sum"}&match[]={__name__="openshift:cpu_usage_cores:sum"}&match[]={__name__="openshift:memory_usage_bytes:sum"}&match[]={__name__="cluster:node_instance_type_count:sum"}&match[]={__name__="cnv:vmi_status_running:count"}&match[]={__name__="subscription_sync_total"}' HTTP/1.1 "Go-http-client/1.1" 200 5278 0.009

Suspicion falls on telemeter-client

Comment 4 Standa Laznicka 2019-08-09 13:55:50 UTC

tried to add `- -skip-auth-regex=^/federate` which seems to have fixed the problem for me

Comment 5 Standa Laznicka 2019-08-12 10:37:35 UTC

The next endpoint was `/api`, and that we don't want to reveal, turns out even `/federate` should not be visible, there has to be another way around this