Bug 1535585

Summary: Prometheus playbook needs to utilize -openshift-ca flag
Product: OpenShift Container Platform Reporter: Robert Bost <rbost>
Component: MonitoringAssignee: Paul Gier <pgier>
Status: CLOSED ERRATA QA Contact: Junqi Zhao <juzhao>
Severity: low Docs Contact:
Priority: medium    
Version: 3.7.0CC: aivaras.laimikis, andre.rozendaal, aos-bugs, ccoleman, erjones, farandac, fgrosjea, jcantril, mjahangi, per.carlson, pgier, rbost, spasquie
Target Milestone: ---   
Target Release: 3.10.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Enhancement
Doc Text:
Feature: New option in the openshift ansible installer to set additional CA certificate to be distributed to pods in a cluster. Reason: If the cluster is using a load balancer which requires a difference CA than the one generated by the installer for the the master node, then the user will need to add this additional CA cert to the file /etc/origin/master/ca-bundle.crt. This will make it available to pods in the cluster. Result: Added new option to installer called "openshift_additional_ca" which points to a file containing the loadbalancer CA certificate.
Story Points: ---
Clone Of:
: 1685188 (view as bug list) Environment:
Last Closed: 2018-07-30 19:09:00 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1685188    

Description Robert Bost 2018-01-17 16:35:27 UTC
Description of problem: If a cluster utilizes an upstream load-balancer/proxy with a separate x509 certificate than the OpenShift cluster, this prevents oauth requests from succeeding in the Prometheus oauth-proxy (or all of the proxies, I only saw issue with prom-proxy so far).

https://github.com/redhat-cop/openshift-playbooks/blob/master/playbooks/installation/load_balancing.adoc#custom-certificate-ssl-termination-production

The Prometheus playbook needs to utilize the -upstream-ca flag (https://github.com/openshift/oauth-proxy/blob/master/main.go#L97) so upstream CA's are accepted by the oauth-proxy. 

I think there will also need to be a new Secret added that holds the upstream CA.

Actual results:
2018/01/09 11:25:58 oauthproxy.go:582: error redeeming code (client:x.x.x.x:37982): Post https://f5.example.com:8443/oauth/token: x509: certificate signed by unknown authority
2018/01/09 11:25:58 oauthproxy.go:399: ErrorPage 500 Internal Error Internal Error

Additional info:
I'm not sure but this same issue may apply to both prom-proxy and alerts-proxy. I only see the issue now with prom-proxy though.

Comment 2 Robert Bost 2018-01-18 15:16:42 UTC
Customer has followed steps mentioned in initial comment on this bz and successfully setup Prometheus with an upstream F5 server. 

The only additional step they took was adding the same F5 CA Certificate to -openshift-ca oauth-proxy arg. I believe if we introduce a custom variable, it should populate both -upstream-ca and -openshift-ca args.

Comment 3 Robert Bost 2018-01-19 13:58:39 UTC
Correction, customer did *not* use the -upstream-ca arg; they only used -openshift-ca.

Comment 4 Paul Gier 2018-01-19 22:04:20 UTC
PR for adding the -openshift-ca arg: https://github.com/openshift/openshift-ansible/pull/6795

Comment 5 Clayton Coleman 2018-01-22 18:23:45 UTC
They are required to make the CA trusted at the cluster level, not per prometheus level.  We should not add this specifically to prometheus.

Comment 10 Paul Gier 2018-02-22 21:57:12 UTC
*** Bug 1543308 has been marked as a duplicate of this bug. ***

Comment 15 Paul Gier 2018-04-27 19:33:55 UTC
Had some discussion last week and this week about what's the right solution to this issue and whether it's possible with the current installer or if we installer enhancements. TL;DR we need an enhancement to the installer.

I think the right solution to this is to modify the ca-bundle.crt file, normally located in /etc/origin/master/ca-bundle.crt.  This file is usually the same as ca.crt, but I think it isn't required to be the same.  ca-bundle.crt is used for the service-account-token and is automatically mounted to containers at /var/run/secrets/kubernetes.io/serviceaccount/ca.crt.  The oauth proxy container picks up this location by default (https://github.com/openshift/oauth-proxy#--openshift-ca), however in the prometheus container it's set manually along with the default container CA file (https://github.com/openshift/openshift-ansible/blob/d0b9fd11d52841571edb15402140884473f360f6/roles/openshift_prometheus/templates/prometheus.j2#L65-L66).

So to resolve this issue, the customer would need to add the F5 CA cert to ca-bundle.crt (then restart the master-controllers service) and then it would update the ca.crt file mounted to the oauth proxy container.  This should have the same effect as adding a new --openshift-ca flag, with the additional benefit that it would cover similar use cases in other containers throughout the cluster.

The installer currently supports adding certificates, keys, (and CA) for specific hostnames (https://docs.openshift.org/3.9/install_config/certificate_customization.html), but this could have the undesirable side effect of serving the load balancer cert from the API server.

So we need a new installer option to add the load balancer CA just to ca-bundle.crt without needing to specify the key and related cert.

Comment 16 Paul Gier 2018-05-01 17:06:07 UTC
PR against master: https://github.com/openshift/openshift-ansible/pull/8142

Comment 19 Junqi Zhao 2018-05-22 10:57:56 UTC
set "openshift_additional_ca" which points to a file containing the loadbalancer CA certificate, and run playbooks/openshift-master/redeploy-certificates.yml playbook, it add additional ca file to /etc/origin/master/ca-bundle.crt, and after deploying prometheus, prometheus openshift-ca file /var/run/secrets/kubernetes.io/serviceaccount/ca.crt is the same with /etc/origin/master/ca-bundle.crt.

# rpm -qa | grep openshift-ansible
openshift-ansible-docs-3.10.0-0.50.0.git.0.bd68ade.el7.noarch
openshift-ansible-playbooks-3.10.0-0.50.0.git.0.bd68ade.el7.noarch
openshift-ansible-3.10.0-0.50.0.git.0.bd68ade.el7.noarch
openshift-ansible-roles-3.10.0-0.50.0.git.0.bd68ade.el7.noarch

Comment 21 Paul Gier 2018-07-25 19:07:59 UTC
PR to backport this fix to the 3.9 branch https://github.com/openshift/openshift-ansible/pull/9338

Comment 23 errata-xmlrpc 2018-07-30 19:09:00 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2018:1816