Bug 1766181 - Authentication "500 Internal Error" when accessing monitoring components
Summary: Authentication "500 Internal Error" when accessing monitoring components
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Monitoring
Version: 4.3.0
Hardware: x86_64
OS: Linux
high
high
Target Milestone: ---
: 4.3.0
Assignee: Christian Heidenreich
QA Contact: Junqi Zhao
URL:
Whiteboard:
: 1768977 (view as bug list)
Depends On: 1776085 1776213
Blocks: 1803957 1807963
TreeView+ depends on / blocked
 
Reported: 2019-10-28 14:12 UTC by Gabriel Virga
Modified: 2020-05-26 09:30 UTC (History)
33 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Previously, external routes accessing monitoring components (Grafana, Alertmanager, Prometheus) were not accessible when the user configured a custom trusted CA bundle. This is fixed now and the above mentioned components are now accessible with custom configured trusted CA bundles.
Clone Of:
: 1803957 1807963 (view as bug list)
Environment:
Last Closed: 2020-01-23 11:09:38 UTC
Target Upstream Version:


Attachments (Terms of Use)


Links
System ID Priority Status Summary Last Updated
Github openshift cluster-monitoring-operator issues 526 'None' closed 500 Internal Error Additional Trusted CA Bundle missing 2020-06-24 12:18:35 UTC
Github openshift cluster-monitoring-operator pull 448 'None' closed Add Alertmanager trusted CA bundle support 2020-06-24 12:18:34 UTC
Github openshift cluster-monitoring-operator pull 559 'None' closed Bug 1766181: trustedCA bundle support for prometheus and its sidecars 2020-06-24 12:18:34 UTC
Red Hat Knowledge Base (Solution) 4616241 None None None 2019-11-27 15:50:06 UTC
Red Hat Product Errata RHBA-2020:0062 None None None 2020-01-23 11:09:58 UTC

Internal Links: 1766984 1766988

Description Gabriel Virga 2019-10-28 14:12:51 UTC
Description of problem:
I installed the latest Openshift 4.2 version. And I used the variable "additionalTrustBundle:" to add our internal intermediate and root chains.
The proxy sidecar from all metrics are not receiving the additionalTrustBundle

How reproducible:
Every install using additionalTrustBundle

Steps to Reproduce:
1. Install Openshift 4.2 with additionalTrustBundle for self signed certificate
2. Try to authenticate to
- https://grafana-openshift-monitoring.apps.osesbx.mtb.com/
- https://console-openshift-console.apps.osesbx.mtb.com/
- https://prometheus-k8s-openshift-monitoring.apps.osesbx.mtb.com/
- https://alertmanager-main-openshift-monitoring.apps.osesbx.mtb.com/

Actual results:
Browser error "500 Internal Error"

# Alermanager-proxy container
$ oc logs -c alertmanager-proxy alertmanager-main-2 | grep x509
2019/10/28 12:38:13 oauthproxy.go:645: error redeeming code (client:10.128.0.1:39918): Post https://oauth-openshift.apps.ose.company.com/oauth/token: x509: certificate signed by unknown authority

$ oc logs -c prometheus-proxy prometheus-k8s-1 | grep x509
2019/10/28 13:51:10 oauthproxy.go:645: error redeeming code (client:10.128.0.1:48886): Post https://oauth-openshift.apps.ose.company.com/oauth/token: x509: certificate signed by unknown authority

Expected results:
Login

Additional info:
Conversations I started
https://github.com/openshift/cluster-monitoring-operator/pull/448
https://github.com/openshift/cluster-monitoring-operator/issues/526
CASE 02497459




########
# To fix Grafana I set the operator to Unmanaged then 
########
Under grafana-proxy container I added:
            - name: trusted-ca-bundle
              readOnly: true
              mountPath: /etc/pki/ca-trust/extracted/pem


Under Volumes I added:
        - name: trusted-ca-bundle
          configMap:
            name: trusted-ca-bundle
            items:
              - key: ca-bundle.crt
                path: tls-ca-bundle.pem
            defaultMode: 420

Comment 1 Lili Cosic 2019-10-28 14:26:11 UTC
Thanks for the bugzilla, do you mind doing an `oc version`, so I know which 4.2 cluster version it was. Thank you!

Comment 2 Gabriel Virga 2019-10-29 15:57:06 UTC
oc version
Client Version: version.Info{Major:"", Minor:"", GitVersion:"v4.2.0-alpha.0-2-g8fdb79e5", GitCommit:"8fdb79e549651c0f3c91d54349715309b5d149d3", GitTreeState:"clean", BuildDate:"2019-08-07T17:48:56Z", GoVersion:"go1.12.6", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"14+", GitVersion:"v1.14.6+2e5ed54", GitCommit:"2e5ed54", GitTreeState:"clean", BuildDate:"2019-10-10T22:04:13Z", GoVersion:"go1.12.8", Compiler:"gc", Platform:"linux/amd64"}
OpenShift Version: 4.2.0

Comment 3 Pawel Krupa 2019-10-30 12:21:19 UTC
Let's track oauth-proxy problems here and alertmanager CA bundle in https://bugzilla.redhat.com/show_bug.cgi?id=1766984

Comment 4 Pawel Krupa 2019-11-05 17:51:54 UTC
*** Bug 1768977 has been marked as a duplicate of this bug. ***

Comment 6 Niket Chavan 2019-11-13 06:56:41 UTC
Hello Team,

As per the document[1], customer replaced the default ingress certificate. Post modification, the customer is unable to open the GUI of Grafana/AlertManager/Prometheus,etc with error "500 Internal Error" on the screen.

Grafana pod logs shows;

# oc logs -c grafana-proxy grafana-74bdcddbcb-wl947
[...]
[...]
2019/11/13 04:33:39 oauthproxy.go:645: error redeeming code (client:10.247.4.1:50910): Post https://oauth-openshift.apps.hashed-out.example.com/oauth/token: x509: certificate signed by unknown authority
2019/11/13 04:33:39 oauthproxy.go:438: ErrorPage 500 Internal Error Internal Error
2019/11/13 04:33:39 provider.go:373: authorizer reason:

[1] https://docs.openshift.com/container-platform/4.2/authentication/certificates/replacing-default-ingress-certificate.html

Customer is heavily affected due to this issue as its impacting their business.

-Niket

Comment 7 Pawel Krupa 2019-11-13 16:04:33 UTC
We investigated the issue and we have a potential fix ready. However, we are blocked by apiserver bug regarding the validation of CRDs (kubernetes/kubernetes#84880).

Comment 8 Niket Chavan 2019-11-14 06:57:51 UTC
(In reply to Pawel Krupa from comment #7)
> We investigated the issue and we have a potential fix ready. However, we are
> blocked by apiserver bug regarding the validation of CRDs
> (kubernetes/kubernetes#84880).

Hello,

Can we have a tentative timeline indication of when this can be fixed? this needs to be further discussed with the customer accordingly. As mentioned in #6, the customer is heavily affected by this issue.

-Niket

Comment 9 Niket Chavan 2019-11-15 07:54:11 UTC
Hello

Can I please have a response and further update on this? Need to update customer accordingly.

-Niket

Comment 18 Jeff Li 2019-11-27 16:21:21 UTC
hi (In reply to Gabriel Virga from comment #0)
> Description of problem:
> I installed the latest Openshift 4.2 version. And I used the variable
> "additionalTrustBundle:" to add our internal intermediate and root chains.
> The proxy sidecar from all metrics are not receiving the
> additionalTrustBundle
> 
> How reproducible:
> Every install using additionalTrustBundle
> 
> Steps to Reproduce:
> 1. Install Openshift 4.2 with additionalTrustBundle for self signed
> certificate
> 2. Try to authenticate to
> - https://grafana-openshift-monitoring.apps.osesbx.mtb.com/
> - https://console-openshift-console.apps.osesbx.mtb.com/
> - https://prometheus-k8s-openshift-monitoring.apps.osesbx.mtb.com/
> - https://alertmanager-main-openshift-monitoring.apps.osesbx.mtb.com/
> 
> Actual results:
> Browser error "500 Internal Error"
> 
> # Alermanager-proxy container
> $ oc logs -c alertmanager-proxy alertmanager-main-2 | grep x509
> 2019/10/28 12:38:13 oauthproxy.go:645: error redeeming code
> (client:10.128.0.1:39918): Post
> https://oauth-openshift.apps.ose.company.com/oauth/token: x509: certificate
> signed by unknown authority
> 
> $ oc logs -c prometheus-proxy prometheus-k8s-1 | grep x509
> 2019/10/28 13:51:10 oauthproxy.go:645: error redeeming code
> (client:10.128.0.1:48886): Post
> https://oauth-openshift.apps.ose.company.com/oauth/token: x509: certificate
> signed by unknown authority
> 
> Expected results:
> Login
> 
> Additional info:
> Conversations I started
> https://github.com/openshift/cluster-monitoring-operator/pull/448
> https://github.com/openshift/cluster-monitoring-operator/issues/526
> CASE 02497459
> 
> 
> 
> 
> ########
> # To fix Grafana I set the operator to Unmanaged then 
> ########
> Under grafana-proxy container I added:
>             - name: trusted-ca-bundle
>               readOnly: true
>               mountPath: /etc/pki/ca-trust/extracted/pem
> 
> 
> Under Volumes I added:
>         - name: trusted-ca-bundle
>           configMap:
>             name: trusted-ca-bundle
>             items:
>               - key: ca-bundle.crt
>                 path: tls-ca-bundle.pem
>             defaultMode: 420


According to ocp 4.2 release: 
https://docs.openshift.com/container-platform/4.2/release_notes/ocp-4-2-release-notes.html
and this bug:
https://bugzilla.redhat.com/show_bug.cgi?id=1719188

OCP 4.2 ignores "Unmanaged" for "managementState", which means I can't apply the workaround.

Comment 23 Pawel Krupa 2019-12-19 10:39:47 UTC
https://jira.coreos.com/browse/MON-884 is tracking all efforts regarding this issue.

@Christian please evaluate and prioritize possible backporting of this fix.

Comment 29 errata-xmlrpc 2020-01-23 11:09:38 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:0062


Note You need to log in before you can comment on or make changes to this bug.