After some investigations with Sergiusz, we noticed that the `prometheus-trusted-ca-bundle` configmap used by the Prometheus oauth proxy was correctly updated after CMO reconciled the Proxy as it contains the `user-ca-bundle` custom CA used by the Proxy. This can be verified by running the following commands in the must-gather directory: # openssl crl2pkcs7 -nocrl -certfile <(cat namespaces/openshift-config/core/configmaps.yaml | gojsontoyaml -yamltojson | jq -r '.items[] | select(.metadata.name == "user-ca-bundle") | .data["ca-bundle.crt"]') | openssl pkcs7 -print_certs -noout -text | less # openssl crl2pkcs7 -nocrl -certfile <(cat namespaces/openshift-monitoring/core/configmaps.yaml | gojsontoyaml -yamltojson | jq -r '.items[] | select(.metadata.name == "prometheus-trusted-ca-bundle") | .data["ca-bundle.crt"]') | openssl pkcs7 -print_certs -noout -text | less Based on this observation, it could be either the mount of the `tls-ca-bundle.pem` cert inside of the OAuth proxy container that failed or a bug in OAuth proxy itself. Since we haven't found any errors reporting a mount failure in the logs, the presence of a bug in OAuth proxy seems possible. Thus, I am transferring this Bugzilla to the oauth-proxy team for further investigation.
Please don't post comments as private when they don't have to be. Also, from the must gather from 27-01-2021, this is your oauth-proxy container definition: ``` - args: - -provider=openshift - -https-address=:9091 - -http-address= - -email-domain=* - -upstream=http://localhost:9090 - -htpasswd-file=/etc/proxy/htpasswd/auth - -openshift-service-account=prometheus-k8s - '-openshift-sar={"resource": "namespaces", "verb": "get"}' - '-openshift-delegate-urls={"/": {"resource": "namespaces", "verb": "get"}}' - -tls-cert=/etc/tls/private/tls.crt - -tls-key=/etc/tls/private/tls.key - -client-secret-file=/var/run/secrets/kubernetes.io/serviceaccount/token - -cookie-secret-file=/etc/proxy/secrets/session_secret - -openshift-ca=/etc/pki/tls/cert.pem - -openshift-ca=/var/run/secrets/kubernetes.io/serviceaccount/ca.crt - -skip-auth-regex=^/metrics env: - name: HTTP_PROXY value: <redacted> - name: HTTPS_PROXY value: <redacted> - name: NO_PROXY value: <redacted> image: quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:a5b0e1ed6b126328877f80b2e9f9161c181b713f8aa10a3dc027ebc60745c087 imagePullPolicy: IfNotPresent name: prometheus-proxy ports: - containerPort: 9091 name: web protocol: TCP resources: requests: cpu: 1m memory: 20Mi securityContext: capabilities: drop: - KILL - MKNOD - SETGID - SETUID terminationMessagePath: /dev/termination-log terminationMessagePolicy: FallbackToLogsOnError volumeMounts: - mountPath: /etc/tls/private name: secret-prometheus-k8s-tls - mountPath: /etc/proxy/secrets name: secret-prometheus-k8s-proxy - mountPath: /etc/proxy/htpasswd name: secret-prometheus-k8s-htpasswd - mountPath: /var/run/secrets/kubernetes.io/serviceaccount name: prometheus-k8s-token-xqtsr readOnly: true ``` Now I'm going to assume the trusted-ca bundle is this volume from the pod definition: ``` - configMap: defaultMode: 420 items: - key: ca-bundle.crt path: tls-ca-bundle.pem name: prometheus-trusted-ca-bundle-9je0v9ldr1cu1 optional: true name: prometheus-trusted-ca-bundle ``` Since, as you can see, the trust bundle is neither mounted, nor used in the proxy pod definition, it does not surprise me that it does not work.
followed Comment 3, prometheus-trusted-ca-bundle is wrongly mounted to prometheus container, not prometheus-proxy container containers: ..... livenessProbe: exec: command: - sh - -c - if [ -x "$(command -v curl)" ]; then curl http://localhost:9090/-/healthy; elif [ -x "$(command -v wget)" ]; then wget -q -O /dev/null http://localhost:9090/-/healthy; else exit 1; fi failureThreshold: 6 periodSeconds: 5 successThreshold: 1 timeoutSeconds: 3 name: prometheus ..... volumeMounts: - mountPath: /etc/pki/ca-trust/extracted/pem/ name: prometheus-trusted-ca-bundle readOnly: true
(In reply to Junqi Zhao from comment #4) > followed Comment 3, prometheus-trusted-ca-bundle is wrongly mounted to > prometheus container, not prometheus-proxy container ignore this, let the dev confirm if we need to mount prometheus-trusted-ca-bundle to prometheus container, but it can't be found in prometheus-proxy container
Indeed, I forgot to check if the volumeMount was correctly set in the pod definition. Thank you for the help on the investigation. That being said, cluster-monitoring-operator should be reconciling the Proxy and updating the volumeMounts to add the `/etc/pki/ca-trust/extracted/pem/`, but it seems that the changes doesn't get applied. We still need to investigate why this is happening.
@standa: indeed, super sharp eyes, thank you for the catch! @junqi: in fact, the trusted CA bundle should be mounted in both prometheus, and prometheus-proxy. For prometheus it is necessary as customers could configure remote-write which has to trust that bundle and needs the proxy configured as well. agreed with @damien that we should investigate why it is not mounted, as we seem to do so in code inside CMO.
Created attachment 1751689 [details] prometheus-k8s-0 yaml file in 4.5.0-0.nightly-2021-01-28-055017 https_proxy cluster yes, prometheus-trusted-ca-bundle is mounted both in prometheus and prometheus-proxy containers
Yes, once the fix is merged in 4.8, I will start the backport processes to 4.6.z and 4.7.z
issue is fixed with 4.8.0-0.nightly-2021-02-09-221546, can login prometheus route correctly, no issue # oc -n openshift-monitoring get pod prometheus-k8s-0 -oyaml ... name: prometheus ... volumeMounts: - mountPath: /etc/pki/ca-trust/extracted/pem/ name: prometheus-trusted-ca-bundle readOnly: true ... name: prometheus-proxy ... volumeMounts: ... - mountPath: /etc/pki/ca-trust/extracted/pem/ name: prometheus-trusted-ca-bundle readOnly: true
Created attachment 1756126 [details] prometheus-k8s-0 yaml file with the fix in https_proxy cluster
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: OpenShift Container Platform 4.8.2 bug fix and security update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2021:2438