Bug 1920901 - [4.7]"500 Internal Error" for prometheus route in https_proxy cluster
Summary: [4.7]"500 Internal Error" for prometheus route in https_proxy cluster
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Monitoring
Version: 4.7
Hardware: Unspecified
OS: Unspecified
medium
medium
Target Milestone: ---
: 4.8.0
Assignee: Damien Grisonnet
QA Contact: Junqi Zhao
URL:
Whiteboard:
Depends On:
Blocks: 1926876
TreeView+ depends on / blocked
 
Reported: 2021-01-27 08:48 UTC by Junqi Zhao
Modified: 2021-07-27 22:37 UTC (History)
10 users (show)

Fixed In Version:
Doc Type: No Doc Update
Doc Text:
Clone Of: 1920898
: 1926876 (view as bug list)
Environment:
Last Closed: 2021-07-27 22:36:45 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
prometheus-k8s-0 yaml file in 4.5.0-0.nightly-2021-01-28-055017 https_proxy cluster (35.29 KB, text/plain)
2021-01-28 13:56 UTC, Junqi Zhao
no flags Details
prometheus-k8s-0 yaml file with the fix in https_proxy cluster (34.88 KB, text/plain)
2021-02-10 06:03 UTC, Junqi Zhao
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Github openshift cluster-monitoring-operator pull 1047 0 None closed Bug 1920901: pkg/manifests: fix prometheus-proxy trustedCA 2021-02-21 06:00:11 UTC
Red Hat Product Errata RHSA-2021:2438 0 None None None 2021-07-27 22:37:15 UTC

Comment 2 Damien Grisonnet 2021-01-28 09:54:06 UTC
After some investigations with Sergiusz, we noticed that the `prometheus-trusted-ca-bundle` configmap used by the Prometheus oauth proxy was correctly updated after CMO reconciled the Proxy as it contains the `user-ca-bundle` custom CA used by the Proxy. This can be verified by running the following commands in the must-gather directory:

# openssl crl2pkcs7 -nocrl -certfile <(cat namespaces/openshift-config/core/configmaps.yaml | gojsontoyaml -yamltojson  | jq -r '.items[] | select(.metadata.name == "user-ca-bundle") | .data["ca-bundle.crt"]') | openssl pkcs7 -print_certs -noout -text | less
# openssl crl2pkcs7 -nocrl -certfile <(cat namespaces/openshift-monitoring/core/configmaps.yaml | gojsontoyaml -yamltojson  | jq -r '.items[] | select(.metadata.name == "prometheus-trusted-ca-bundle") | .data["ca-bundle.crt"]') | openssl pkcs7 -print_certs -noout -text | less

Based on this observation, it could be either the mount of the `tls-ca-bundle.pem` cert inside of the OAuth proxy container that failed or a bug in OAuth proxy itself. Since we haven't found any errors reporting a mount failure in the logs, the presence of a bug in OAuth proxy seems possible. Thus, I am transferring this Bugzilla to the oauth-proxy team for further investigation.

Comment 3 Standa Laznicka 2021-01-28 10:21:41 UTC
Please don't post comments as private when they don't have to be.

Also, from the must gather from 27-01-2021, this is your oauth-proxy container definition:
```
  - args:
    - -provider=openshift
    - -https-address=:9091
    - -http-address=
    - -email-domain=*
    - -upstream=http://localhost:9090
    - -htpasswd-file=/etc/proxy/htpasswd/auth
    - -openshift-service-account=prometheus-k8s
    - '-openshift-sar={"resource": "namespaces", "verb": "get"}'
    - '-openshift-delegate-urls={"/": {"resource": "namespaces", "verb": "get"}}'
    - -tls-cert=/etc/tls/private/tls.crt
    - -tls-key=/etc/tls/private/tls.key
    - -client-secret-file=/var/run/secrets/kubernetes.io/serviceaccount/token
    - -cookie-secret-file=/etc/proxy/secrets/session_secret
    - -openshift-ca=/etc/pki/tls/cert.pem
    - -openshift-ca=/var/run/secrets/kubernetes.io/serviceaccount/ca.crt
    - -skip-auth-regex=^/metrics
    env:
    - name: HTTP_PROXY
      value: <redacted>
    - name: HTTPS_PROXY
      value: <redacted>
    - name: NO_PROXY
      value: <redacted>
    image: quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:a5b0e1ed6b126328877f80b2e9f9161c181b713f8aa10a3dc027ebc60745c087
    imagePullPolicy: IfNotPresent
    name: prometheus-proxy
    ports:
    - containerPort: 9091
      name: web
      protocol: TCP
    resources:
      requests:
        cpu: 1m
        memory: 20Mi
    securityContext:
      capabilities:
        drop:
        - KILL
        - MKNOD
        - SETGID
        - SETUID
    terminationMessagePath: /dev/termination-log
    terminationMessagePolicy: FallbackToLogsOnError
    volumeMounts:
    - mountPath: /etc/tls/private
      name: secret-prometheus-k8s-tls
    - mountPath: /etc/proxy/secrets
      name: secret-prometheus-k8s-proxy
    - mountPath: /etc/proxy/htpasswd
      name: secret-prometheus-k8s-htpasswd
    - mountPath: /var/run/secrets/kubernetes.io/serviceaccount
      name: prometheus-k8s-token-xqtsr
      readOnly: true
```

Now I'm going to assume the trusted-ca bundle is this volume from the pod definition:
```
  - configMap:
      defaultMode: 420
      items:
      - key: ca-bundle.crt
        path: tls-ca-bundle.pem
      name: prometheus-trusted-ca-bundle-9je0v9ldr1cu1
      optional: true 
    name: prometheus-trusted-ca-bundle

```

Since, as you can see, the trust bundle is neither mounted, nor used in the proxy pod definition, it does not surprise me that it does not work.

Comment 4 Junqi Zhao 2021-01-28 11:01:45 UTC
followed Comment 3, prometheus-trusted-ca-bundle is wrongly mounted to prometheus container, not prometheus-proxy container
  containers:
    .....
    livenessProbe:
      exec:
        command:
        - sh
        - -c
        - if [ -x "$(command -v curl)" ]; then curl http://localhost:9090/-/healthy; elif [ -x "$(command -v wget)" ]; then wget -q -O /dev/null http://localhost:9090/-/healthy; else exit 1; fi
      failureThreshold: 6
      periodSeconds: 5
      successThreshold: 1
      timeoutSeconds: 3
    name: prometheus
    .....
    volumeMounts:
    - mountPath: /etc/pki/ca-trust/extracted/pem/
      name: prometheus-trusted-ca-bundle
      readOnly: true

Comment 5 Junqi Zhao 2021-01-28 11:22:23 UTC
(In reply to Junqi Zhao from comment #4)
> followed Comment 3, prometheus-trusted-ca-bundle is wrongly mounted to
> prometheus container, not prometheus-proxy container
ignore this, let the dev confirm if we need to mount prometheus-trusted-ca-bundle to prometheus container, but it can't be found in prometheus-proxy container

Comment 6 Damien Grisonnet 2021-01-28 11:28:09 UTC
Indeed, I forgot to check if the volumeMount was correctly set in the pod definition. Thank you for the help on the investigation.

That being said, cluster-monitoring-operator should be reconciling the Proxy and updating the volumeMounts to add the `/etc/pki/ca-trust/extracted/pem/`, but it seems that the changes doesn't get applied. We still need to investigate why this is happening.

Comment 7 Sergiusz Urbaniak 2021-01-28 12:44:39 UTC
@standa: indeed, super sharp eyes, thank you for the catch!

@junqi: in fact, the trusted CA bundle should be mounted in both prometheus, and prometheus-proxy. For prometheus it is necessary as customers could configure remote-write which has to trust that bundle and needs the proxy configured as well.

agreed with @damien that we should investigate why it is not mounted, as we seem to do so in code inside CMO.

Comment 8 Junqi Zhao 2021-01-28 13:56:58 UTC
Created attachment 1751689 [details]
prometheus-k8s-0 yaml file in 4.5.0-0.nightly-2021-01-28-055017 https_proxy cluster

yes, prometheus-trusted-ca-bundle is mounted both in prometheus and prometheus-proxy containers

Comment 11 Damien Grisonnet 2021-02-03 17:36:06 UTC
Yes, once the fix is merged in 4.8, I will start the backport processes to 4.6.z and 4.7.z

Comment 16 Junqi Zhao 2021-02-10 06:03:03 UTC
issue is fixed with 4.8.0-0.nightly-2021-02-09-221546, can login prometheus route correctly, no issue
# oc -n openshift-monitoring get pod prometheus-k8s-0 -oyaml
...
    name: prometheus
    ...
    volumeMounts:
    - mountPath: /etc/pki/ca-trust/extracted/pem/
      name: prometheus-trusted-ca-bundle
      readOnly: true
...
    name: prometheus-proxy
    ...
    volumeMounts:
    ...
    - mountPath: /etc/pki/ca-trust/extracted/pem/
      name: prometheus-trusted-ca-bundle
      readOnly: true

Comment 17 Junqi Zhao 2021-02-10 06:03:46 UTC
Created attachment 1756126 [details]
prometheus-k8s-0 yaml file with the fix in https_proxy cluster

Comment 20 errata-xmlrpc 2021-07-27 22:36:45 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Container Platform 4.8.2 bug fix and security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2021:2438


Note You need to log in before you can comment on or make changes to this bug.