Bug 1920901
| Summary: | [4.7]"500 Internal Error" for prometheus route in https_proxy cluster | ||||||||
|---|---|---|---|---|---|---|---|---|---|
| Product: | OpenShift Container Platform | Reporter: | Junqi Zhao <juzhao> | ||||||
| Component: | Monitoring | Assignee: | Damien Grisonnet <dgrisonn> | ||||||
| Status: | CLOSED ERRATA | QA Contact: | Junqi Zhao <juzhao> | ||||||
| Severity: | medium | Docs Contact: | |||||||
| Priority: | medium | ||||||||
| Version: | 4.7 | CC: | alegrand, anpicker, aos-bugs, dgrisonn, erooth, hongyli, kakkoyun, lcosic, mfojtik, pkrupa | ||||||
| Target Milestone: | --- | Keywords: | Regression | ||||||
| Target Release: | 4.8.0 | ||||||||
| Hardware: | Unspecified | ||||||||
| OS: | Unspecified | ||||||||
| Whiteboard: | |||||||||
| Fixed In Version: | Doc Type: | No Doc Update | |||||||
| Doc Text: | Story Points: | --- | |||||||
| Clone Of: | 1920898 | ||||||||
| : | 1926876 (view as bug list) | Environment: | |||||||
| Last Closed: | 2021-07-27 22:36:45 UTC | Type: | --- | ||||||
| Regression: | --- | Mount Type: | --- | ||||||
| Documentation: | --- | CRM: | |||||||
| Verified Versions: | Category: | --- | |||||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||||
| Embargoed: | |||||||||
| Bug Depends On: | |||||||||
| Bug Blocks: | 1926876 | ||||||||
| Attachments: |
|
||||||||
|
Comment 2
Damien Grisonnet
2021-01-28 09:54:06 UTC
Please don't post comments as private when they don't have to be.
Also, from the must gather from 27-01-2021, this is your oauth-proxy container definition:
```
- args:
- -provider=openshift
- -https-address=:9091
- -http-address=
- -email-domain=*
- -upstream=http://localhost:9090
- -htpasswd-file=/etc/proxy/htpasswd/auth
- -openshift-service-account=prometheus-k8s
- '-openshift-sar={"resource": "namespaces", "verb": "get"}'
- '-openshift-delegate-urls={"/": {"resource": "namespaces", "verb": "get"}}'
- -tls-cert=/etc/tls/private/tls.crt
- -tls-key=/etc/tls/private/tls.key
- -client-secret-file=/var/run/secrets/kubernetes.io/serviceaccount/token
- -cookie-secret-file=/etc/proxy/secrets/session_secret
- -openshift-ca=/etc/pki/tls/cert.pem
- -openshift-ca=/var/run/secrets/kubernetes.io/serviceaccount/ca.crt
- -skip-auth-regex=^/metrics
env:
- name: HTTP_PROXY
value: <redacted>
- name: HTTPS_PROXY
value: <redacted>
- name: NO_PROXY
value: <redacted>
image: quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:a5b0e1ed6b126328877f80b2e9f9161c181b713f8aa10a3dc027ebc60745c087
imagePullPolicy: IfNotPresent
name: prometheus-proxy
ports:
- containerPort: 9091
name: web
protocol: TCP
resources:
requests:
cpu: 1m
memory: 20Mi
securityContext:
capabilities:
drop:
- KILL
- MKNOD
- SETGID
- SETUID
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: FallbackToLogsOnError
volumeMounts:
- mountPath: /etc/tls/private
name: secret-prometheus-k8s-tls
- mountPath: /etc/proxy/secrets
name: secret-prometheus-k8s-proxy
- mountPath: /etc/proxy/htpasswd
name: secret-prometheus-k8s-htpasswd
- mountPath: /var/run/secrets/kubernetes.io/serviceaccount
name: prometheus-k8s-token-xqtsr
readOnly: true
```
Now I'm going to assume the trusted-ca bundle is this volume from the pod definition:
```
- configMap:
defaultMode: 420
items:
- key: ca-bundle.crt
path: tls-ca-bundle.pem
name: prometheus-trusted-ca-bundle-9je0v9ldr1cu1
optional: true
name: prometheus-trusted-ca-bundle
```
Since, as you can see, the trust bundle is neither mounted, nor used in the proxy pod definition, it does not surprise me that it does not work.
followed Comment 3, prometheus-trusted-ca-bundle is wrongly mounted to prometheus container, not prometheus-proxy container containers: ..... livenessProbe: exec: command: - sh - -c - if [ -x "$(command -v curl)" ]; then curl http://localhost:9090/-/healthy; elif [ -x "$(command -v wget)" ]; then wget -q -O /dev/null http://localhost:9090/-/healthy; else exit 1; fi failureThreshold: 6 periodSeconds: 5 successThreshold: 1 timeoutSeconds: 3 name: prometheus ..... volumeMounts: - mountPath: /etc/pki/ca-trust/extracted/pem/ name: prometheus-trusted-ca-bundle readOnly: true (In reply to Junqi Zhao from comment #4) > followed Comment 3, prometheus-trusted-ca-bundle is wrongly mounted to > prometheus container, not prometheus-proxy container ignore this, let the dev confirm if we need to mount prometheus-trusted-ca-bundle to prometheus container, but it can't be found in prometheus-proxy container Indeed, I forgot to check if the volumeMount was correctly set in the pod definition. Thank you for the help on the investigation. That being said, cluster-monitoring-operator should be reconciling the Proxy and updating the volumeMounts to add the `/etc/pki/ca-trust/extracted/pem/`, but it seems that the changes doesn't get applied. We still need to investigate why this is happening. @standa: indeed, super sharp eyes, thank you for the catch! @junqi: in fact, the trusted CA bundle should be mounted in both prometheus, and prometheus-proxy. For prometheus it is necessary as customers could configure remote-write which has to trust that bundle and needs the proxy configured as well. agreed with @damien that we should investigate why it is not mounted, as we seem to do so in code inside CMO. Created attachment 1751689 [details]
prometheus-k8s-0 yaml file in 4.5.0-0.nightly-2021-01-28-055017 https_proxy cluster
yes, prometheus-trusted-ca-bundle is mounted both in prometheus and prometheus-proxy containers
Yes, once the fix is merged in 4.8, I will start the backport processes to 4.6.z and 4.7.z issue is fixed with 4.8.0-0.nightly-2021-02-09-221546, can login prometheus route correctly, no issue
# oc -n openshift-monitoring get pod prometheus-k8s-0 -oyaml
...
name: prometheus
...
volumeMounts:
- mountPath: /etc/pki/ca-trust/extracted/pem/
name: prometheus-trusted-ca-bundle
readOnly: true
...
name: prometheus-proxy
...
volumeMounts:
...
- mountPath: /etc/pki/ca-trust/extracted/pem/
name: prometheus-trusted-ca-bundle
readOnly: true
Created attachment 1756126 [details]
prometheus-k8s-0 yaml file with the fix in https_proxy cluster
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: OpenShift Container Platform 4.8.2 bug fix and security update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2021:2438 |