Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 1953934

Summary: OAuth proxy container for AlertManager and Thanos are flooding the logs
Product: OpenShift Container Platform Reporter: Victor Hernando <vhernand>
Component: Dev ConsoleAssignee: cvogt
Status: CLOSED DUPLICATE QA Contact: Gajanan More <gamore>
Severity: high Docs Contact:
Priority: unspecified    
Version: 4.6CC: alegrand, anpicker, aos-bugs, cvogt, dgautam, dgrisonn, dsantra, erooth, jakumar, kakkoyun, mfojtik, nmukherj, pkrupa, slaznick, spadgett, sttts, surbania
Target Milestone: ---Flags: cvogt: needinfo-
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2021-05-24 13:53:36 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Victor Hernando 2021-04-27 08:34:02 UTC
Description of problem:

This is a similar or related issue reported here: 
https://bugzilla.redhat.com/show_bug.cgi?id=1915667

After upgrading to 4.6.18 "authorizer reason:" messages are flooding many of the openshift-monitoring components logs.

~~~
% tail -10 openshift-monitoring/pods/thanos-querier-74bb56c5b-d7ghv/oauth-proxy/oauth-proxy/logs/current.log
2021-04-20T11:10:34.762048785Z 2021/04/20 11:10:34 provider.go:407: authorizer reason: 
2021-04-20T11:10:37.293217146Z 2021/04/20 11:10:37 provider.go:407: authorizer reason: 
2021-04-20T11:10:38.730652636Z 2021/04/20 11:10:38 provider.go:407: authorizer reason: 
2021-04-20T11:10:39.332019672Z 2021/04/20 11:10:39 provider.go:407: authorizer reason: 
2021-04-20T11:10:39.481910711Z 2021/04/20 11:10:39 provider.go:407: authorizer reason: 
2021-04-20T11:10:40.272032967Z 2021/04/20 11:10:40 provider.go:407: authorizer reason: 
2021-04-20T11:10:40.873662747Z 2021/04/20 11:10:40 provider.go:407: authorizer reason: 
2021-04-20T11:10:41.641975445Z 2021/04/20 11:10:41 provider.go:407: authorizer reason: 
2021-04-20T11:10:41.705171971Z 2021/04/20 11:10:41 provider.go:407: authorizer reason: 
2021-04-20T11:10:42.399313711Z 2021/04/20 11:10:42 provider.go:407: authorizer reason: 

% tail -10 pods/alertmanager-main-0/alertmanager-proxy/alertmanager-proxy/logs/current.log
2021-04-20T11:09:50.837136785Z 2021/04/20 11:09:50 provider.go:407: authorizer reason: 
2021-04-20T11:09:50.871266184Z 2021/04/20 11:09:50 provider.go:407: authorizer reason: 
2021-04-20T11:09:52.640532648Z 2021/04/20 11:09:52 provider.go:407: authorizer reason: 
2021-04-20T11:09:52.739566538Z 2021/04/20 11:09:52 provider.go:407: authorizer reason: 
2021-04-20T11:09:52.763426879Z 2021/04/20 11:09:52 provider.go:407: authorizer reason: 
2021-04-20T11:09:52.966189986Z 2021/04/20 11:09:52 provider.go:407: authorizer reason: 
2021-04-20T11:09:53.392129427Z 2021/04/20 11:09:53 provider.go:407: authorizer reason: 
2021-04-20T11:09:53.706211822Z 2021/04/20 11:09:53 provider.go:407: authorizer reason: 
2021-04-20T11:09:54.873927105Z 2021/04/20 11:09:54 provider.go:407: authorizer reason: 
2021-04-20T11:09:55.735697803Z 2021/04/20 11:09:55 provider.go:407: authorizer reason: 

% tail -10 pods/prometheus-k8s-0/prometheus/prometheus/logs/current.log
2021-04-20T11:08:19.640577479Z level=error ts=2021-04-20T11:08:19.640Z caller=klog.go:96 component=k8s_client_runtime func=ErrorDepth msg="github.com/prometheus/prometheus/discovery/kubernetes/kubernetes.go:428: Failed to watch *v1.Pod: failed to list *v1.Pod: pods is forbidden: User \"system:serviceaccount:openshift-monitoring:prometheus-k8s\" cannot list resource \"pods\" in API group \"\" in the namespace \"openshift-operators\""
2021-04-20T11:08:37.319848959Z level=error ts=2021-04-20T11:08:37.319Z caller=klog.go:96 component=k8s_client_runtime func=ErrorDepth msg="github.com/prometheus/prometheus/discovery/kubernetes/kubernetes.go:426: Failed to watch *v1.Endpoints: failed to list *v1.Endpoints: endpoints is forbidden: User \"system:serviceaccount:openshift-monitoring:prometheus-k8s\" cannot list resource \"endpoints\" in API group \"\" in the namespace \"openshift-operators\""
2021-04-20T11:08:53.448878330Z level=error ts=2021-04-20T11:08:53.448Z caller=klog.go:96 component=k8s_client_runtime func=ErrorDepth msg="github.com/prometheus/prometheus/discovery/kubernetes/kubernetes.go:427: Failed to watch *v1.Service: failed to list *v1.Service: services is forbidden: User \"system:serviceaccount:openshift-monitoring:prometheus-k8s\" cannot list resource \"services\" in API group \"\" in the namespace \"openshift-operators\""
2021-04-20T11:08:58.131243051Z level=error ts=2021-04-20T11:08:58.131Z caller=klog.go:96 component=k8s_client_runtime func=ErrorDepth msg="github.com/prometheus/prometheus/discovery/kubernetes/kubernetes.go:428: Failed to watch *v1.Pod: failed to list *v1.Pod: pods is forbidden: User \"system:serviceaccount:openshift-monitoring:prometheus-k8s\" cannot list resource \"pods\" in API group \"\" in the namespace \"openshift-operators\""
2021-04-20T11:09:25.208319773Z level=error ts=2021-04-20T11:09:25.208Z caller=klog.go:96 component=k8s_client_runtime func=ErrorDepth msg="github.com/prometheus/prometheus/discovery/kubernetes/kubernetes.go:426: Failed to watch *v1.Endpoints: failed to list *v1.Endpoints: endpoints is forbidden: User \"system:serviceaccount:openshift-monitoring:prometheus-k8s\" cannot list resource \"endpoints\" in API group \"\" in the namespace \"openshift-operators\""
2021-04-20T11:09:27.281669576Z level=error ts=2021-04-20T11:09:27.281Z caller=klog.go:96 component=k8s_client_runtime func=ErrorDepth msg="github.com/prometheus/prometheus/discovery/kubernetes/kubernetes.go:427: Failed to watch *v1.Service: failed to list *v1.Service: services is forbidden: User \"system:serviceaccount:openshift-monitoring:prometheus-k8s\" cannot list resource \"services\" in API group \"\" in the namespace \"openshift-operators\""
2021-04-20T11:09:44.840779771Z level=error ts=2021-04-20T11:09:44.840Z caller=klog.go:96 component=k8s_client_runtime func=ErrorDepth msg="github.com/prometheus/prometheus/discovery/kubernetes/kubernetes.go:428: Failed to watch *v1.Pod: failed to list *v1.Pod: pods is forbidden: User \"system:serviceaccount:openshift-monitoring:prometheus-k8s\" cannot list resource \"pods\" in API group \"\" in the namespace \"openshift-operators\""
2021-04-20T11:09:59.866034134Z level=error ts=2021-04-20T11:09:59.865Z caller=klog.go:96 component=k8s_client_runtime func=ErrorDepth msg="github.com/prometheus/prometheus/discovery/kubernetes/kubernetes.go:427: Failed to watch *v1.Service: failed to list *v1.Service: services is forbidden: User \"system:serviceaccount:openshift-monitoring:prometheus-k8s\" cannot list resource \"services\" in API group \"\" in the namespace \"openshift-operators\""
2021-04-20T11:10:18.092895390Z level=error ts=2021-04-20T11:10:18.092Z caller=klog.go:96 component=k8s_client_runtime func=ErrorDepth msg="github.com/prometheus/prometheus/discovery/kubernetes/kubernetes.go:428: Failed to watch *v1.Pod: failed to list *v1.Pod: pods is forbidden: User \"system:serviceaccount:openshift-monitoring:prometheus-k8s\" cannot list resource \"pods\" in API group \"\" in the namespace \"openshift-operators\""
2021-04-20T11:10:21.095560237Z level=error ts=2021-04-20T11:10:21.095Z caller=klog.go:96 component=k8s_client_runtime func=ErrorDepth msg="github.com/prometheus/prometheus/discovery/kubernetes/kubernetes.go:426: Failed to watch *v1.Endpoints: failed to list *v1.Endpoints: endpoints is forbidden: User \"system:serviceaccount:openshift-monitoring:prometheus-k8s\" cannot list resource \"endpoints\" in API group \"\" in the namespace \"openshift-operators\""

% tail -20 openshift-console/pods/console-85f5c9cc86-swncp/console/console/logs/current.log 
2021-04-20T11:03:52.423623023Z 2021-04-20T11:03:52Z Failed to dial backend: 'websocket: bad handshake' Status: '403 Forbidden' URL: 'https://kubernetes.default.svc/apis/config.openshift.io/v1/clusterversions?watch=true&fieldSelector=metadata.name%3Dversion'
2021-04-20T11:03:55.553149818Z 2021-04-20T11:03:55Z Failed to dial backend: 'websocket: bad handshake' Status: '403 Forbidden' URL: 'https://kubernetes.default.svc/apis/config.openshift.io/v1/clusterversions?watch=true&fieldSelector=metadata.name%3Dversion'
2021-04-20T11:04:16.703003590Z 2021-04-20T11:04:16Z Failed to dial backend: 'websocket: bad handshake' Status: '403 Forbidden' URL: 'https://kubernetes.default.svc/apis/config.openshift.io/v1/clusterversions?watch=true&fieldSelector=metadata.name%3Dversion'
2021-04-20T11:04:30.197936031Z 2021-04-20T11:04:30Z Failed to dial backend: 'websocket: bad handshake' Status: '403 Forbidden' URL: 'https://kubernetes.default.svc/apis/config.openshift.io/v1/clusterversions?watch=true&fieldSelector=metadata.name%3Dversion'
2021-04-20T11:04:46.460235087Z 2021-04-20T11:04:46Z Failed to dial backend: 'websocket: bad handshake' Status: '403 Forbidden' URL: 'https://kubernetes.default.svc/apis/config.openshift.io/v1/clusterversions?watch=true&fieldSelector=metadata.name%3Dversion'
2021-04-20T11:04:48.940441485Z 2021-04-20T11:04:48Z http: proxy error: context canceled
2021-04-20T11:04:58.050038234Z 2021-04-20T11:04:58Z Failed to dial backend: 'websocket: bad handshake' Status: '403 Forbidden' URL: 'https://kubernetes.default.svc/apis/config.openshift.io/v1/clusterversions?watch=true&fieldSelector=metadata.name%3Dversion'
2021-04-20T11:05:16.703325267Z 2021-04-20T11:05:16Z Failed to dial backend: 'websocket: bad handshake' Status: '403 Forbidden' URL: 'https://kubernetes.default.svc/apis/config.openshift.io/v1/clusterversions?watch=true&fieldSelector=metadata.name%3Dversion'
2021-04-20T11:05:27.731962054Z 2021-04-20T11:05:27Z Failed to dial backend: 'websocket: bad handshake' Status: '403 Forbidden' URL: 'https://kubernetes.default.svc/apis/config.openshift.io/v1/clusterversions?watch=true&fieldSelector=metadata.name%3Dversion'
2021-04-20T11:05:28.570577367Z 2021-04-20T11:05:28Z Failed to dial backend: 'websocket: bad handshake' Status: '403 Forbidden' URL: 'https://kubernetes.default.svc/apis/config.openshift.io/v1/clusterversions?watch=true&fieldSelector=metadata.name%3Dversion'
2021-04-20T11:05:30.206242655Z 2021-04-20T11:05:30Z Failed to dial backend: 'websocket: bad handshake' Status: '403 Forbidden' URL: 'https://kubernetes.default.svc/apis/config.openshift.io/v1/clusterversions?watch=true&fieldSelector=metadata.name%3Dversion'
2021-04-20T11:05:46.760356760Z 2021-04-20T11:05:46Z Failed to dial backend: 'websocket: bad handshake' Status: '403 Forbidden' URL: 'https://kubernetes.default.svc/apis/config.openshift.io/v1/clusterversions?watch=true&fieldSelector=metadata.name%3Dversion'
2021-04-20T11:05:52.428707459Z 2021-04-20T11:05:52Z Failed to dial backend: 'websocket: bad handshake' Status: '403 Forbidden' URL: 'https://kubernetes.default.svc/apis/config.openshift.io/v1/clusterversions?watch=true&fieldSelector=metadata.name%3Dversion'
2021-04-20T11:06:16.713684354Z 2021-04-20T11:06:16Z Failed to dial backend: 'websocket: bad handshake' Status: '403 Forbidden' URL: 'https://kubernetes.default.svc/apis/config.openshift.io/v1/clusterversions?watch=true&fieldSelector=metadata.name%3Dversion'
2021-04-20T11:06:28.582405339Z 2021-04-20T11:06:28Z Failed to dial backend: 'websocket: bad handshake' Status: '403 Forbidden' URL: 'https://kubernetes.default.svc/apis/config.openshift.io/v1/clusterversions?watch=true&fieldSelector=metadata.name%3Dversion'
2021-04-20T11:06:30.212481881Z 2021-04-20T11:06:30Z Failed to dial backend: 'websocket: bad handshake' Status: '403 Forbidden' URL: 'https://kubernetes.default.svc/apis/config.openshift.io/v1/clusterversions?watch=true&fieldSelector=metadata.name%3Dversion'
2021-04-20T11:06:47.022228828Z 2021-04-20T11:06:47Z Failed to dial backend: 'websocket: bad handshake' Status: '403 Forbidden' URL: 'https://kubernetes.default.svc/apis/config.openshift.io/v1/clusterversions?watch=true&fieldSelector=metadata.name%3Dversion'
2021-04-20T11:06:53.963877457Z 2021-04-20T11:06:53Z Failed to dial backend: 'websocket: bad handshake' Status: '403 Forbidden' URL: 'https://kubernetes.default.svc/apis/config.openshift.io/v1/clusterversions?watch=true&fieldSelector=metadata.name%3Dversion'
2021-04-20T11:07:16.732356144Z 2021-04-20T11:07:16Z Failed to dial backend: 'websocket: bad handshake' Status: '403 Forbidden' URL: 'https://kubernetes.default.svc/apis/config.openshift.io/v1/clusterversions?watch=true&fieldSelector=metadata.name%3Dversion'
2021-04-20T11:07:30.265008003Z 2021-04-20T11:07:30Z Failed to dial backend: 'websocket: bad handshake' Status: '403 Forbidden' URL: 'https://kubernetes.default.svc/apis/config.openshift.io/v1/clusterversions?watch=true&fieldSelector=metadata.name%3Dversion'
~~~

This issue is directly related to the openshift-console pods, scaling down the openshift-console pods those messages are not reported anymore. As soon as the openshift-console operator is set as Managed the log flooding starts again.

Version-Release number of selected component (if applicable):
4.6.18

How reproducible:
Upgrade a cluster from 4.6.z to 4.6.18

Steps to Reproduce:
1. Upgrade a cluster from 4.6.z to 4.6.18
2.
3.

Actual results:
Many useless messages flooding logs in openshift-monitoring pods.

Expected results:
Stop this message flooding.

Additional info:

Comment 4 Standa Laznicka 2021-05-05 13:03:44 UTC
Can you please share a must-gather that will include logs of this issue?

Comment 5 Sergiusz Urbaniak 2021-05-06 07:54:19 UTC
The log message:

2021-04-20T11:10:37.293217146Z 2021/04/20 11:10:37 provider.go:407: authorizer reason: 

are unrelated:
1. prometheus does not communicate via oauth-proxy in order to create kubernetes resources.
2. console communicates via its console-proxy with the api, which is also indicated by "https://kubernetes.default.svc..." which is not being handled by oauth-proxy.

Having said that I can reproduce the empty "authorizer reason: " in a fresh cluster. My hypothesis is that this is just too verbose logging of oauth-proxy.

Initially, a request is not authorized, hence oauth-proxy redirects to the login page. This initial state is unnecessarily logged and can potentially flood logs.

What we can do:
1. Omit logging the initial (unauthorized) state.
2. Replace logging of logging failures with metrics. As many users can cause logging failures, especially externall as it oauth-proxy is exposed via an external route, it is quite possible to even DoS the log output.

Comment 7 Standa Laznicka 2021-05-06 13:08:59 UTC
The oauth-proxy behavior is not that unexpected, but from what I understand from Dhruv, their customer is seeing the same behavior on their cluster with the HTTP proxying configured and the console is unaccessible, therefore I'm going to move this BZ to console.

Comment 8 Victor Hernando 2021-05-17 07:52:15 UTC
(In reply to Sergiusz Urbaniak from comment #5)

> Initially, a request is not authorized, hence oauth-proxy redirects to the
> login page. This initial state is unnecessarily logged and can potentially
> flood logs.
> 
> What we can do:
> 1. Omit logging the initial (unauthorized) state.
> 2. Replace logging of logging failures with metrics. As many users can cause
> logging failures, especially externall as it oauth-proxy is exposed via an
> external route, it is quite possible to even DoS the log output.

Hi Sergiusz, thanks for clarification, greatly appreciated.

Taking your explanation and possible fixes into consideration, I'd like to ask, how we can resolve this? Is there any action currently in progress to fix this log flooding?

Thanks in advance

Regards

Comment 10 Damien Grisonnet 2021-05-20 09:27:42 UTC
For the record, we've had bug 1920898 where the Prometheus oauth proxy wasn't accessible with a global proxy configured. It was fixed in OCP 4.6.23, but it shouldn't be the root cause of the issue you are seeing here.

That said, from the Monitoring side, I don't there is anything we can do here since the monitoring stack seems to be working properly. From the previous investigations there seems to be two different problems here: the oauth-proxy logging, and the console not being accessible. Neither of these problems should be resolvable by us.

Since, the console not being accessible seems to be the most severe problem here, I am sending this bug back to dev console for further investigation. In any case, oauth proxy issues from the monitoring stack shouldn't impact the console accessibility.

Comment 11 Samuel Padgett 2021-05-24 13:53:36 UTC
The 403 bad handshake errors were fixed in bug 1848151. While they created a lot of noise in the logs, they were also harmless and wouldn't cause console to become inaccessible. Please open a separate bug with a must gather if the console is inaccessible as that's a different bug.

The topology page not loading is also fixed, tracked by bug 1952293.

Marking this as a duplicate of bug 1958158 for the monitoring log fix as the other console issues are unrelated and already have bugs.

*** This bug has been marked as a duplicate of bug 1958158 ***