Description of problem: If a cluster utilizes an upstream load-balancer/proxy with a separate x509 certificate than the OpenShift cluster, this prevents oauth requests from succeeding in the Prometheus oauth-proxy (or all of the proxies, I only saw issue with prom-proxy so far). https://github.com/redhat-cop/openshift-playbooks/blob/master/playbooks/installation/load_balancing.adoc#custom-certificate-ssl-termination-production The Prometheus playbook needs to utilize the -upstream-ca flag (https://github.com/openshift/oauth-proxy/blob/master/main.go#L97) so upstream CA's are accepted by the oauth-proxy. I think there will also need to be a new Secret added that holds the upstream CA. Actual results: 2018/01/09 11:25:58 oauthproxy.go:582: error redeeming code (client:x.x.x.x:37982): Post https://f5.example.com:8443/oauth/token: x509: certificate signed by unknown authority 2018/01/09 11:25:58 oauthproxy.go:399: ErrorPage 500 Internal Error Internal Error Additional info: I'm not sure but this same issue may apply to both prom-proxy and alerts-proxy. I only see the issue now with prom-proxy though.
Customer has followed steps mentioned in initial comment on this bz and successfully setup Prometheus with an upstream F5 server. The only additional step they took was adding the same F5 CA Certificate to -openshift-ca oauth-proxy arg. I believe if we introduce a custom variable, it should populate both -upstream-ca and -openshift-ca args.
Correction, customer did *not* use the -upstream-ca arg; they only used -openshift-ca.
PR for adding the -openshift-ca arg: https://github.com/openshift/openshift-ansible/pull/6795
They are required to make the CA trusted at the cluster level, not per prometheus level. We should not add this specifically to prometheus.
*** Bug 1543308 has been marked as a duplicate of this bug. ***
Had some discussion last week and this week about what's the right solution to this issue and whether it's possible with the current installer or if we installer enhancements. TL;DR we need an enhancement to the installer. I think the right solution to this is to modify the ca-bundle.crt file, normally located in /etc/origin/master/ca-bundle.crt. This file is usually the same as ca.crt, but I think it isn't required to be the same. ca-bundle.crt is used for the service-account-token and is automatically mounted to containers at /var/run/secrets/kubernetes.io/serviceaccount/ca.crt. The oauth proxy container picks up this location by default (https://github.com/openshift/oauth-proxy#--openshift-ca), however in the prometheus container it's set manually along with the default container CA file (https://github.com/openshift/openshift-ansible/blob/d0b9fd11d52841571edb15402140884473f360f6/roles/openshift_prometheus/templates/prometheus.j2#L65-L66). So to resolve this issue, the customer would need to add the F5 CA cert to ca-bundle.crt (then restart the master-controllers service) and then it would update the ca.crt file mounted to the oauth proxy container. This should have the same effect as adding a new --openshift-ca flag, with the additional benefit that it would cover similar use cases in other containers throughout the cluster. The installer currently supports adding certificates, keys, (and CA) for specific hostnames (https://docs.openshift.org/3.9/install_config/certificate_customization.html), but this could have the undesirable side effect of serving the load balancer cert from the API server. So we need a new installer option to add the load balancer CA just to ca-bundle.crt without needing to specify the key and related cert.
PR against master: https://github.com/openshift/openshift-ansible/pull/8142
set "openshift_additional_ca" which points to a file containing the loadbalancer CA certificate, and run playbooks/openshift-master/redeploy-certificates.yml playbook, it add additional ca file to /etc/origin/master/ca-bundle.crt, and after deploying prometheus, prometheus openshift-ca file /var/run/secrets/kubernetes.io/serviceaccount/ca.crt is the same with /etc/origin/master/ca-bundle.crt. # rpm -qa | grep openshift-ansible openshift-ansible-docs-3.10.0-0.50.0.git.0.bd68ade.el7.noarch openshift-ansible-playbooks-3.10.0-0.50.0.git.0.bd68ade.el7.noarch openshift-ansible-3.10.0-0.50.0.git.0.bd68ade.el7.noarch openshift-ansible-roles-3.10.0-0.50.0.git.0.bd68ade.el7.noarch
PR to backport this fix to the 3.9 branch https://github.com/openshift/openshift-ansible/pull/9338
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2018:1816