Description of problem: oauth proxy is rejecting valid service account. Probably due to problems with certificate rotation. Version-Release number of selected component (if applicable): Tested on cluster in version 4.2.0-0.ci-2019-07-30-062021 Was also noticed on previous ones. This was noticed in oauth-proxy deployed in prometheus pod. Logs are available at https://storage.googleapis.com/origin-ci-test/pr-logs/pull/openshift_console/2210/pull-ci-openshift-console-master-e2e-aws/6271/artifacts/e2e-aws/pods/openshift-monitoring_prometheus-k8s-0_prometheus-proxy.log It is happening consistently with new clusters.
Pawel, this is indeed weird behavior, is this happening in all clusters today? Can you get me the config for that proxy? I was not able to get it by modifying the link.
This is happening on all clusters and in every e2e CI job. It can be observed for example in logs gathered from prometheus-proxy container in prometheus-k8s pod. Configuration for that container is available at https://github.com/openshift/cluster-monitoring-operator/blob/master/assets/prometheus-k8s/prometheus.yaml#L38-L70
Debugging progress - with requests logging turned on, it shows that the request causing the behavior is: prometheus-k8s.openshift-monitoring.svc:9091 GET localhost:9090 '/federate?match[]={__name__="up"}&match[]={__name__="cluster_version"}&match[]={__name__="cluster_version_available_updates"}&match[]={__name__="cluster_operator_up"}&match[]={__name__="cluster_operator_conditions"}&match[]={__name__="cluster_version_payload"}&match[]={__name__="cluster_installer"}&match[]={__name__="instance:etcd_object_counts:sum"}&match[]={__name__="ALERTS",alertstate="firing"}&match[]={__name__="code:apiserver_request_count:rate:sum"}&match[]={__name__="cluster:capacity_cpu_cores:sum"}&match[]={__name__="cluster:capacity_memory_bytes:sum"}&match[]={__name__="cluster:cpu_usage_cores:sum"}&match[]={__name__="cluster:memory_usage_bytes:sum"}&match[]={__name__="openshift:cpu_usage_cores:sum"}&match[]={__name__="openshift:memory_usage_bytes:sum"}&match[]={__name__="cluster:node_instance_type_count:sum"}&match[]={__name__="cnv:vmi_status_running:count"}&match[]={__name__="subscription_sync_total"}' HTTP/1.1 "Go-http-client/1.1" 200 5278 0.009 Suspicion falls on telemeter-client
tried to add `- -skip-auth-regex=^/federate` which seems to have fixed the problem for me
The next endpoint was `/api`, and that we don't want to reveal, turns out even `/federate` should not be visible, there has to be another way around this