Bug 1953934
| Summary: | OAuth proxy container for AlertManager and Thanos are flooding the logs | ||
|---|---|---|---|
| Product: | OpenShift Container Platform | Reporter: | Victor Hernando <vhernand> |
| Component: | Dev Console | Assignee: | cvogt |
| Status: | CLOSED DUPLICATE | QA Contact: | Gajanan More <gamore> |
| Severity: | high | Docs Contact: | |
| Priority: | unspecified | ||
| Version: | 4.6 | CC: | alegrand, anpicker, aos-bugs, cvogt, dgautam, dgrisonn, dsantra, erooth, jakumar, kakkoyun, mfojtik, nmukherj, pkrupa, slaznick, spadgett, sttts, surbania |
| Target Milestone: | --- | Flags: | cvogt:
needinfo-
|
| Target Release: | --- | ||
| Hardware: | x86_64 | ||
| OS: | Linux | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | If docs needed, set a value | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2021-05-24 13:53:36 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
|
Description
Victor Hernando
2021-04-27 08:34:02 UTC
Can you please share a must-gather that will include logs of this issue? The log message: 2021-04-20T11:10:37.293217146Z 2021/04/20 11:10:37 provider.go:407: authorizer reason: are unrelated: 1. prometheus does not communicate via oauth-proxy in order to create kubernetes resources. 2. console communicates via its console-proxy with the api, which is also indicated by "https://kubernetes.default.svc..." which is not being handled by oauth-proxy. Having said that I can reproduce the empty "authorizer reason: " in a fresh cluster. My hypothesis is that this is just too verbose logging of oauth-proxy. Initially, a request is not authorized, hence oauth-proxy redirects to the login page. This initial state is unnecessarily logged and can potentially flood logs. What we can do: 1. Omit logging the initial (unauthorized) state. 2. Replace logging of logging failures with metrics. As many users can cause logging failures, especially externall as it oauth-proxy is exposed via an external route, it is quite possible to even DoS the log output. The oauth-proxy behavior is not that unexpected, but from what I understand from Dhruv, their customer is seeing the same behavior on their cluster with the HTTP proxying configured and the console is unaccessible, therefore I'm going to move this BZ to console. (In reply to Sergiusz Urbaniak from comment #5) > Initially, a request is not authorized, hence oauth-proxy redirects to the > login page. This initial state is unnecessarily logged and can potentially > flood logs. > > What we can do: > 1. Omit logging the initial (unauthorized) state. > 2. Replace logging of logging failures with metrics. As many users can cause > logging failures, especially externall as it oauth-proxy is exposed via an > external route, it is quite possible to even DoS the log output. Hi Sergiusz, thanks for clarification, greatly appreciated. Taking your explanation and possible fixes into consideration, I'd like to ask, how we can resolve this? Is there any action currently in progress to fix this log flooding? Thanks in advance Regards For the record, we've had bug 1920898 where the Prometheus oauth proxy wasn't accessible with a global proxy configured. It was fixed in OCP 4.6.23, but it shouldn't be the root cause of the issue you are seeing here. That said, from the Monitoring side, I don't there is anything we can do here since the monitoring stack seems to be working properly. From the previous investigations there seems to be two different problems here: the oauth-proxy logging, and the console not being accessible. Neither of these problems should be resolvable by us. Since, the console not being accessible seems to be the most severe problem here, I am sending this bug back to dev console for further investigation. In any case, oauth proxy issues from the monitoring stack shouldn't impact the console accessibility. The 403 bad handshake errors were fixed in bug 1848151. While they created a lot of noise in the logs, they were also harmless and wouldn't cause console to become inaccessible. Please open a separate bug with a must gather if the console is inaccessible as that's a different bug. The topology page not loading is also fixed, tracked by bug 1952293. Marking this as a duplicate of bug 1958158 for the monitoring log fix as the other console issues are unrelated and already have bugs. *** This bug has been marked as a duplicate of bug 1958158 *** |