Bug 1766181

Summary: Authentication "500 Internal Error" when accessing monitoring components
Product: OpenShift Container Platform Reporter: Gabriel Virga <gfelixvirga>
Component: MonitoringAssignee: Christian Heidenreich <cvogel>
Status: CLOSED ERRATA QA Contact: Junqi Zhao <juzhao>
Severity: high Docs Contact:
Priority: high    
Version: 4.3.0CC: aabhishe, adeshpan, ajohn, alegrand, anpicker, atripath, clasohm, cvogel, dahernan, dyocum, erooth, gparente, hcisneir, jeff.li, jkaur, jnordell, kakkoyun, lcosic, lstanton, malonso, mharri, mloibl, nchavan, openshift-bugs-escalate, palonsor, pamoedom, pkrupa, rdiazgav, rhowe, rsandu, sgarciam, sreber, surbania
Target Milestone: ---   
Target Release: 4.3.0   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Previously, external routes accessing monitoring components (Grafana, Alertmanager, Prometheus) were not accessible when the user configured a custom trusted CA bundle. This is fixed now and the above mentioned components are now accessible with custom configured trusted CA bundles.
Story Points: ---
Clone Of:
: 1803957 1807963 (view as bug list) Environment:
Last Closed: 2020-01-23 11:09:38 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1776085, 1776213    
Bug Blocks: 1803957, 1807963    

Description Gabriel Virga 2019-10-28 14:12:51 UTC
Description of problem:
I installed the latest Openshift 4.2 version. And I used the variable "additionalTrustBundle:" to add our internal intermediate and root chains.
The proxy sidecar from all metrics are not receiving the additionalTrustBundle

How reproducible:
Every install using additionalTrustBundle

Steps to Reproduce:
1. Install Openshift 4.2 with additionalTrustBundle for self signed certificate
2. Try to authenticate to
- https://grafana-openshift-monitoring.apps.osesbx.mtb.com/
- https://console-openshift-console.apps.osesbx.mtb.com/
- https://prometheus-k8s-openshift-monitoring.apps.osesbx.mtb.com/
- https://alertmanager-main-openshift-monitoring.apps.osesbx.mtb.com/

Actual results:
Browser error "500 Internal Error"

# Alermanager-proxy container
$ oc logs -c alertmanager-proxy alertmanager-main-2 | grep x509
2019/10/28 12:38:13 oauthproxy.go:645: error redeeming code (client:10.128.0.1:39918): Post https://oauth-openshift.apps.ose.company.com/oauth/token: x509: certificate signed by unknown authority

$ oc logs -c prometheus-proxy prometheus-k8s-1 | grep x509
2019/10/28 13:51:10 oauthproxy.go:645: error redeeming code (client:10.128.0.1:48886): Post https://oauth-openshift.apps.ose.company.com/oauth/token: x509: certificate signed by unknown authority

Expected results:
Login

Additional info:
Conversations I started
https://github.com/openshift/cluster-monitoring-operator/pull/448
https://github.com/openshift/cluster-monitoring-operator/issues/526
CASE 02497459




########
# To fix Grafana I set the operator to Unmanaged then 
########
Under grafana-proxy container I added:
            - name: trusted-ca-bundle
              readOnly: true
              mountPath: /etc/pki/ca-trust/extracted/pem


Under Volumes I added:
        - name: trusted-ca-bundle
          configMap:
            name: trusted-ca-bundle
            items:
              - key: ca-bundle.crt
                path: tls-ca-bundle.pem
            defaultMode: 420

Comment 1 Lili Cosic 2019-10-28 14:26:11 UTC
Thanks for the bugzilla, do you mind doing an `oc version`, so I know which 4.2 cluster version it was. Thank you!

Comment 2 Gabriel Virga 2019-10-29 15:57:06 UTC
oc version
Client Version: version.Info{Major:"", Minor:"", GitVersion:"v4.2.0-alpha.0-2-g8fdb79e5", GitCommit:"8fdb79e549651c0f3c91d54349715309b5d149d3", GitTreeState:"clean", BuildDate:"2019-08-07T17:48:56Z", GoVersion:"go1.12.6", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"14+", GitVersion:"v1.14.6+2e5ed54", GitCommit:"2e5ed54", GitTreeState:"clean", BuildDate:"2019-10-10T22:04:13Z", GoVersion:"go1.12.8", Compiler:"gc", Platform:"linux/amd64"}
OpenShift Version: 4.2.0

Comment 3 Pawel Krupa 2019-10-30 12:21:19 UTC
Let's track oauth-proxy problems here and alertmanager CA bundle in https://bugzilla.redhat.com/show_bug.cgi?id=1766984

Comment 4 Pawel Krupa 2019-11-05 17:51:54 UTC
*** Bug 1768977 has been marked as a duplicate of this bug. ***

Comment 6 Niket Chavan 2019-11-13 06:56:41 UTC
Hello Team,

As per the document[1], customer replaced the default ingress certificate. Post modification, the customer is unable to open the GUI of Grafana/AlertManager/Prometheus,etc with error "500 Internal Error" on the screen.

Grafana pod logs shows;

# oc logs -c grafana-proxy grafana-74bdcddbcb-wl947
[...]
[...]
2019/11/13 04:33:39 oauthproxy.go:645: error redeeming code (client:10.247.4.1:50910): Post https://oauth-openshift.apps.hashed-out.example.com/oauth/token: x509: certificate signed by unknown authority
2019/11/13 04:33:39 oauthproxy.go:438: ErrorPage 500 Internal Error Internal Error
2019/11/13 04:33:39 provider.go:373: authorizer reason:

[1] https://docs.openshift.com/container-platform/4.2/authentication/certificates/replacing-default-ingress-certificate.html

Customer is heavily affected due to this issue as its impacting their business.

-Niket

Comment 7 Pawel Krupa 2019-11-13 16:04:33 UTC
We investigated the issue and we have a potential fix ready. However, we are blocked by apiserver bug regarding the validation of CRDs (kubernetes/kubernetes#84880).

Comment 8 Niket Chavan 2019-11-14 06:57:51 UTC
(In reply to Pawel Krupa from comment #7)
> We investigated the issue and we have a potential fix ready. However, we are
> blocked by apiserver bug regarding the validation of CRDs
> (kubernetes/kubernetes#84880).

Hello,

Can we have a tentative timeline indication of when this can be fixed? this needs to be further discussed with the customer accordingly. As mentioned in #6, the customer is heavily affected by this issue.

-Niket

Comment 9 Niket Chavan 2019-11-15 07:54:11 UTC
Hello

Can I please have a response and further update on this? Need to update customer accordingly.

-Niket

Comment 18 Jeff Li 2019-11-27 16:21:21 UTC
hi (In reply to Gabriel Virga from comment #0)
> Description of problem:
> I installed the latest Openshift 4.2 version. And I used the variable
> "additionalTrustBundle:" to add our internal intermediate and root chains.
> The proxy sidecar from all metrics are not receiving the
> additionalTrustBundle
> 
> How reproducible:
> Every install using additionalTrustBundle
> 
> Steps to Reproduce:
> 1. Install Openshift 4.2 with additionalTrustBundle for self signed
> certificate
> 2. Try to authenticate to
> - https://grafana-openshift-monitoring.apps.osesbx.mtb.com/
> - https://console-openshift-console.apps.osesbx.mtb.com/
> - https://prometheus-k8s-openshift-monitoring.apps.osesbx.mtb.com/
> - https://alertmanager-main-openshift-monitoring.apps.osesbx.mtb.com/
> 
> Actual results:
> Browser error "500 Internal Error"
> 
> # Alermanager-proxy container
> $ oc logs -c alertmanager-proxy alertmanager-main-2 | grep x509
> 2019/10/28 12:38:13 oauthproxy.go:645: error redeeming code
> (client:10.128.0.1:39918): Post
> https://oauth-openshift.apps.ose.company.com/oauth/token: x509: certificate
> signed by unknown authority
> 
> $ oc logs -c prometheus-proxy prometheus-k8s-1 | grep x509
> 2019/10/28 13:51:10 oauthproxy.go:645: error redeeming code
> (client:10.128.0.1:48886): Post
> https://oauth-openshift.apps.ose.company.com/oauth/token: x509: certificate
> signed by unknown authority
> 
> Expected results:
> Login
> 
> Additional info:
> Conversations I started
> https://github.com/openshift/cluster-monitoring-operator/pull/448
> https://github.com/openshift/cluster-monitoring-operator/issues/526
> CASE 02497459
> 
> 
> 
> 
> ########
> # To fix Grafana I set the operator to Unmanaged then 
> ########
> Under grafana-proxy container I added:
>             - name: trusted-ca-bundle
>               readOnly: true
>               mountPath: /etc/pki/ca-trust/extracted/pem
> 
> 
> Under Volumes I added:
>         - name: trusted-ca-bundle
>           configMap:
>             name: trusted-ca-bundle
>             items:
>               - key: ca-bundle.crt
>                 path: tls-ca-bundle.pem
>             defaultMode: 420


According to ocp 4.2 release: 
https://docs.openshift.com/container-platform/4.2/release_notes/ocp-4-2-release-notes.html
and this bug:
https://bugzilla.redhat.com/show_bug.cgi?id=1719188

OCP 4.2 ignores "Unmanaged" for "managementState", which means I can't apply the workaround.

Comment 23 Pawel Krupa 2019-12-19 10:39:47 UTC
https://jira.coreos.com/browse/MON-884 is tracking all efforts regarding this issue.

@Christian please evaluate and prioritize possible backporting of this fix.

Comment 29 errata-xmlrpc 2020-01-23 11:09:38 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:0062