Bug 1678645

Summary: 500 Internal Error for grafana/prometheus/alertmanager route
Product: OpenShift Container Platform Reporter: Junqi Zhao <juzhao>
Component: apiserver-authAssignee: Standa Laznicka <slaznick>
Status: CLOSED ERRATA QA Contact: Chuan Yu <chuyu>
Severity: high Docs Contact:
Priority: unspecified    
Version: 4.1.0CC: aos-bugs, evb, mloibl, sgarciam, slaznick, sponnaga, surbania, vlaad, wkulhane, xtian
Target Milestone: ---Keywords: Regression, TestBlocker
Target Release: 4.1.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: No Doc Update
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2019-06-04 10:44:14 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1647492, 1673787    
Attachments:
Description Flags
500 Internal Error for grafana route
none
still 500 error with 4.0.0-0.nightly-2019-02-26-125216 none

Description Junqi Zhao 2019-02-19 10:02:02 UTC
Created attachment 1536280 [details]
500 Internal Error for grafana route

Description of problem:
cloned from https://jira.coreos.com/browse/MON-576
500 Internal Error for grafana route

"Unable to discover default cluster OAuth info: got 404 " in grafana-proxy container

#oc -n openshift-monitoring logs grafana-78765ddcc7-wh5cs -c grafana-proxy
2019/02/19 08:35:15 provider.go:102: Defaulting client-id to system:serviceaccount:openshift-monitoring:grafana
2019/02/19 08:35:15 provider.go:107: Defaulting client-secret to service account token /var/run/secrets/kubernetes.io/serviceaccount/token
2019/02/19 08:35:15 provider.go:542: Performing OAuth discovery against https://172.30.0.1/.well-known/oauth-authorization-server
2019/02/19 08:35:15 provider.go:588: 404 GET https://172.30.0.1/.well-known/oauth-authorization-server
{ "paths": [ "/apis", "/metrics", "/swagger-ui/", "/version" ] }

2019/02/19 08:35:15 provider.go:113: Unable to discover default cluster OAuth info: got 404
{ "paths": [ "/apis", "/metrics", "/swagger-ui/", "/version" ] }

2019/02/19 08:35:15 provider.go:293: Delegation of authentication and authorization to OpenShift is enabled for bearer tokens and client certificates.
2019/02/19 08:35:16 oauthproxy.go:201: mapping path "/" => upstream "http://localhost:3001/"
2019/02/19 08:35:16 oauthproxy.go:222: compiled skip-auth-regex => "^/metrics"
2019/02/19 08:35:16 oauthproxy.go:228: OAuthProxy configured for Client ID: system:serviceaccount:openshift-monitoring:grafana
2019/02/19 08:35:16 oauthproxy.go:238: Cookie settings: name:_oauth_proxy secure(https):true httponly:true expiry:168h0m0s domain:<default> refresh:disabled
2019/02/19 08:35:16 http.go:96: HTTPS: listening on [::]:3000
2019/02/19 08:35:21 server.go:2923: http: TLS handshake error from 10.128.2.1:45216: EOF



Version-Release number of selected component (if applicable):
#oc get clusterversion
NAME VERSION AVAILABLE PROGRESSING SINCE STATUS
version 4.0.0-0.nightly-2019-02-19-024716 True False 50m Cluster version is 4.0.0-0.nightly-2019-02-19-024716

 

configmap-reloader: quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:037fa98f23ff812b6861675127d52eea43caa44bb138e7fe41c7199cb8d4d634
prometheus-config-reloader: quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:31905d24b331859b99852c6f4ef916539508bfb61f443c94e0f46a83093f7dc0
kube-state-metrics: quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:36f168dc7fc6ada9af0f2eeb88f394f2e7311340acc25f801830fe509fd93911
kube-rbac-proxy: quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:451274b24916b97e5ba2116dd0775cdb7e1de98d034ac8874b81c1a3b22cf6b1
cluster-monitoring-operator: quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:534a71a355e3b9c79ef5a192a200730b8641f5e266abe290b6f7c6342210d8a0
prometheus-operator: quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:5b4ba55ab5ec5bb1b4c024a7b99bc67fe108a28e564288734f9884bc1055d4ed
prometheus-alertmanager: quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:5bc582cfbe8b24935e4f9ee1fe6660e13353377473e09a63b51d4e3d24a7ade3
prometheus-node-exporter: quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:6cb6cd27a308c2ae9e0c714c8633792cc151e17312bd74da45255980eabf5ecf
prom-label-proxy: quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:8675adb4a2a367c9205e3879b986da69400b9187df7ac3f3fbf9882e6a356252
telemeter: quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:9021d3e9ce028fc72301f8e0a40c37e488db658e1500a790c794bfd38903bef1
prometheus: quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:ba01869048bf44fc5e8c57f0a34369750ce27e3fb0b5eb47c78f42022640154c
k8s-prometheus-adapter: quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:ee79721af3078dfbcfaa75e9a47da1526464cf6685a7f4195ea214c840b59e9f
grafana: quay.io/openshift/origin-grafana:latest
oauth-proxy: quay.io/openshift/origin-oauth-proxy:latest

#docker pull quay.io/openshift/origin-oauth-proxy:latest
Trying to pull repository quay.io/openshift/origin-oauth-proxy ...
latest: Pulling from quay.io/openshift/origin-oauth-proxy
ef81238e5ad5: Pull complete
8c526450ed00: Pull complete
de2405bff1b7: Pull complete
dce379f7594a: Pull complete
c8ff0b8a37d2: Pull complete
Digest: sha256:ba736ad6815e617605f300d379ca513011fd0f239549db629cbe8c2de2e483de
Status: Downloaded newer image for quay.io/openshift/origin-oauth-proxy:latest

RHCOS build: 47.318


How reproducible:
Always

Steps to Reproduce:
1. login kibana route
2.
3.

Actual results:
500 Internal Error for grafana route

Expected results:


Additional info:

Comment 3 Junqi Zhao 2019-02-20 04:19:42 UTC
unluckily, prometheus and alertmanager meet the same 500 error today
$ oc get clusterversion
NAME      VERSION                             AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.0.0-0.nightly-2019-02-19-205011   True        False         52m     Cluster version is 4.0.0-0.nightly-2019-02-19-205011

Comment 6 Junqi Zhao 2019-02-21 03:37:43 UTC
still 500 Internal Error for grafana/prometheus/alertmanager route with 
#oc get clusterversion
NAME      VERSION                             AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.0.0-0.nightly-2019-02-20-194410   True        False         58m     Cluster version is 4.0.0-0.nightly-2019-02-20-194410

configmap-reloader: quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:037fa98f23ff812b6861675127d52eea43caa44bb138e7fe41c7199cb8d4d634
prometheus-config-reloader: quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:0b88f4c0bfc31f15d368619b951b9020853686ce46d36692f62ef437d83b1012
kube-state-metrics: quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:36f168dc7fc6ada9af0f2eeb88f394f2e7311340acc25f801830fe509fd93911
prometheus-node-exporter: quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:42be8e58f00a54b4f4cbf849203a139c93bebde8cc40e5be84305246be620350
prometheus-alertmanager: quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:455855037348f33f9810f7531d52e86450e5c75d9d06531d144abc5ac53c6786
kube-rbac-proxy: quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:4d229dee301eb7452227fefc2704b30cf58e7a7f85e0c66dd3798b6b64b79728
prometheus-operator: quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:50de7804ddd623f1b4e0f57157ce01102db7e68179c5744bac4e92c81714a881
cluster-monitoring-operator: quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:534a71a355e3b9c79ef5a192a200730b8641f5e266abe290b6f7c6342210d8a0
telemeter: quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:9021d3e9ce028fc72301f8e0a40c37e488db658e1500a790c794bfd38903bef1
prom-label-proxy: quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:90a29a928beffc938345760f88b6890dccdc6f1a6503f09fea7399469a6ca72a
prometheus: quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:ba51ac66b4c3a46d5445bdfa32f1f04b882498fe5405d88dc78a956742657105
k8s-prometheus-adapter: quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:ee79721af3078dfbcfaa75e9a47da1526464cf6685a7f4195ea214c840b59e9f
grafana: quay.io/openshift/origin-grafana:latest
oauth-proxy: quay.io/openshift/origin-oauth-proxy:latest



# docker pull quay.io/openshift/origin-oauth-proxy:latest
Trying to pull repository quay.io/openshift/origin-oauth-proxy ... 
latest: Pulling from quay.io/openshift/origin-oauth-proxy
68cddb23acfe: Pull complete 
b1ae8487cc2f: Pull complete 
1d3caeca8553: Pull complete 
e965bd784e31: Pull complete 
79225473dea5: Pull complete 
Digest: sha256:6bd8b284e646e100f45a9b51b5ca7ec90c5d3c7d3a5f2262bed8ae997cc0a36f
Status: Downloaded newer image for quay.io/openshift/origin-oauth-proxy:latest

Change back to MODIFIED

Comment 7 Frederic Branczyk 2019-02-21 13:27:13 UTC
Opened another pull request that I believe should fix the image replacement, hopefully that also fixes these errors. https://bugzilla.redhat.com/show_bug.cgi?id=1678645

Comment 8 Junqi Zhao 2019-02-22 10:01:56 UTC
Tested with currently latest payload, still see 500 error for grafana/prometheus/alertmanager routes, the fix is not packaged to OCP images now
$ oc get clusterversion
NAME      VERSION                             AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.0.0-0.nightly-2019-02-21-215247   True        False         5h12m   Cluster version is 4.0.0-0.nightly-2019-02-21-215247

Comment 9 Frederic Branczyk 2019-02-22 10:06:59 UTC
Yes that nightly seems to have just missed the patch. (sorry I shared the wrong link in my previous comment) https://github.com/openshift/cluster-monitoring-operator/pull/259

Comment 10 Junqi Zhao 2019-02-26 02:20:51 UTC
Still see the error with
$ oc get clusterversion
NAME      VERSION                             AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.0.0-0.nightly-2019-02-25-194625   True        False         7m36s   Cluster version is 4.0.0-0.nightly-2019-02-25-194625

RHCOS build: 47.330


 - "-images=grafana=quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:51166898d56ed1beacb24bffb4224bd58eedf1d4109fcf74491c2730844724ad"
 - "-images=grafana=quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:51166898d56ed1beacb24bffb4224bd58eedf1d4109fcf74491c2730844724ad"

$ oc -n openshift-monitoring logs grafana-687dcdb49f-qwtqx -c grafana-proxy
2019/02/26 01:03:20 provider.go:102: Defaulting client-id to system:serviceaccount:openshift-monitoring:grafana
2019/02/26 01:03:20 provider.go:107: Defaulting client-secret to service account token /var/run/secrets/kubernetes.io/serviceaccount/token
2019/02/26 01:03:20 provider.go:530: Performing OAuth discovery against https://172.30.0.1/.well-known/oauth-authorization-server
2019/02/26 01:03:20 provider.go:576: 200 GET https://172.30.0.1/.well-known/oauth-authorization-server {
  "issuer": "https://openshift-authentication-openshift-authentication.apps.qe-juzhao2.qe.devcluster.openshift.com",
  "authorization_endpoint": "https://openshift-authentication-openshift-authentication.apps.qe-juzhao2.qe.devcluster.openshift.com/oauth/authorize",
  "token_endpoint": "https://openshift-authentication-openshift-authentication.apps.qe-juzhao2.qe.devcluster.openshift.com/oauth/token",
  "scopes_supported": [
    "user:check-access",
    "user:full",
    "user:info",
    "user:list-projects",
    "user:list-scoped-projects"
  ],
  "response_types_supported": [
    "code",
    "token"
  ],
  "grant_types_supported": [
    "authorization_code",
    "implicit"
  ],
  "code_challenge_methods_supported": [
    "plain",
    "S256"
  ]
}
2019/02/26 01:03:20 provider.go:304: Delegation of authentication and authorization to OpenShift is enabled for bearer tokens and client certificates.
2019/02/26 01:03:20 oauthproxy.go:201: mapping path "/" => upstream "http://localhost:3001/"
2019/02/26 01:03:20 oauthproxy.go:222: compiled skip-auth-regex => "^/metrics"
2019/02/26 01:03:20 oauthproxy.go:228: OAuthProxy configured for  Client ID: system:serviceaccount:openshift-monitoring:grafana
2019/02/26 01:03:20 oauthproxy.go:238: Cookie settings: name:_oauth_proxy secure(https):true httponly:true expiry:168h0m0s domain:<default> refresh:disabled
2019/02/26 01:03:20 http.go:96: HTTPS: listening on [::]:3000
2019/02/26 01:31:42 server.go:2923: http: TLS handshake error from 10.128.2.1:55362: EOF
2019/02/26 01:31:45 provider.go:576: 404 GET https://openshift-authentication-openshift-authentication.apps.qe-juzhao2.qe.devcluster.openshift.com/apis/user.openshift.io/v1/users/~ {
  "paths": [
    "/apis",
    "/healthz",
    "/healthz/log",
    "/healthz/ping",
    "/healthz/poststarthook/oauth.openshift.io-startoauthclientsbootstrapping",
    "/metrics"
  ]
}
2019/02/26 01:31:45 oauthproxy.go:635: error redeeming code (client:10.131.0.5:41936): unable to retrieve email address for user from token: got 404 {
  "paths": [
    "/apis",
    "/healthz",
    "/healthz/log",
    "/healthz/ping",
    "/healthz/poststarthook/oauth.openshift.io-startoauthclientsbootstrapping",
    "/metrics"
  ]
}
2019/02/26 01:31:45 oauthproxy.go:434: ErrorPage 500 Internal Error Internal Error
2019/02/26 01:31:45 provider.go:386: authorizer reason: 
2019/02/26 01:31:52 server.go:2923: http: TLS handshake error from 10.128.2.1:55400: EOF
2019/02/26 01:31:55 oauthproxy.go:635: error redeeming code (client:10.131.0.5:41936): got 400 from "https://openshift-authentication-openshift-authentication.apps.qe-juzhao2.qe.devcluster.openshift.com/oauth/token" {"error":"unauthorized_client","error_description":"The client is not authorized to request a token using this method."}
2019/02/26 01:31:55 oauthproxy.go:434: ErrorPage 500 Internal Error Internal Error
2019/02/26 01:31:55 provider.go:386: authorizer reason: 
2019/02/26 01:32:02 server.go:2923: http: TLS handshake error from 10.128.2.1:55452: EOF
2019/02/26 01:32:04 provider.go:386: authorizer reason: 
2019/02/26 01:32:04 provider.go:386: authorizer reason: 
2019/02/26 01:32:07 provider.go:576: 404 GET https://openshift-authentication-openshift-authentication.apps.qe-juzhao2.qe.devcluster.openshift.com/apis/user.openshift.io/v1/users/~ {
  "paths": [
    "/apis",
    "/healthz",
    "/healthz/log",
    "/healthz/ping",
    "/healthz/poststarthook/oauth.openshift.io-startoauthclientsbootstrapping",
    "/metrics"
  ]
}

cluster-monitoring-operator: quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:ea9b5343f65a91dd31b78a43b931ab776d372da0a7202b0ffd4220ca0e646856
# docker run -u root --rm -it --entrypoint=/bin/sh quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:ea9b5343f65a91dd31b78a43b931ab776d372da0a7202b0ffd4220ca0e646856
Unable to find image 'quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:ea9b5343f65a91dd31b78a43b931ab776d372da0a7202b0ffd4220ca0e646856' locally
Trying to pull repository quay.io/openshift-release-dev/ocp-v4.0-art-dev ... 
sha256:ea9b5343f65a91dd31b78a43b931ab776d372da0a7202b0ffd4220ca0e646856: Pulling from quay.io/openshift-release-dev/ocp-v4.0-art-dev
2cb1196a3b27: Pull complete 
c9c433594a59: Pull complete 
49fcfa54bf62: Pull complete 
343eae12bc62: Pull complete 
Digest: sha256:ea9b5343f65a91dd31b78a43b931ab776d372da0a7202b0ffd4220ca0e646856
Status: Downloaded newer image for quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:ea9b5343f65a91dd31b78a43b931ab776d372da0a7202b0ffd4220ca0e646856
sh-4.2# cat manifests/image-references | grep "name: grafana" -A10
  - name: grafana
    from:
      kind: DockerImage
      name: quay.io/openshift/origin-grafana:latest
  - name: oauth-proxy
    from:
      kind: DockerImage
      name: quay.io/openshift/origin-oauth-proxy:latest
  - name: prometheus-node-exporter
    from:
      kind: DockerImage
*********************************************************************
From https://github.com/openshift/cluster-monitoring-operator/pull/259/files, PR is merged, but issue is not fixed, assign it back

Comment 11 Sergiusz Urbaniak 2019-02-26 11:01:09 UTC
At this point, there is unfortunately not much we can do just by setting the image in the cluster monitoring operator stack.

@erica: Do you mind to advise here? Is there some fix necessary in the oauth proxy or in our configuration?

Comment 14 Junqi Zhao 2019-02-27 01:45:49 UTC
Created attachment 1539019 [details]
still 500 error with 4.0.0-0.nightly-2019-02-26-125216

Comment 21 Junqi Zhao 2019-02-28 01:38:54 UTC
all routes could be accessed now with 4.0.0-0.nightly-2019-02-27-213933 payload
move to VERIFIED

Comment 22 Standa Laznicka 2019-03-04 11:22:36 UTC
*** Bug 1685033 has been marked as a duplicate of this bug. ***

Comment 25 errata-xmlrpc 2019-06-04 10:44:14 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2019:0758