Bug 1712525 - Console cannot use oauth endpoint after configuring ingress (wildcard) certificates from custom PKI - users cannot log in
Summary: Console cannot use oauth endpoint after configuring ingress (wildcard) certif...
Keywords:
Status: NEW
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Routing
Version: 4.1.0
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: ---
: 4.3.0
Assignee: Dan Mace
QA Contact: Hongan Li
URL:
Whiteboard:
: 1723445 (view as bug list)
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2019-05-21 17:14 UTC by Vadim Zharov
Modified: 2019-09-18 07:02 UTC (History)
25 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:


Attachments (Terms of Use)


Links
System ID Priority Status Summary Last Updated
Github openshift cluster-ingress-operator pull 294 None None None 2019-08-23 20:06:07 UTC
Red Hat Knowledge Base (Solution) 4245561 Configure None Replacing the default ingress certificate results in the web console becoming inaccessible 2019-07-27 15:31:50 UTC

Internal Links: 1748378

Description Vadim Zharov 2019-05-21 17:14:54 UTC
Description of problem:
Console cannot connect to oauth URL after wildcard certificate, issued by custom PKI applied for ingress router. Users are not able to login into web console.


Version-Release number of selected component (if applicable):
oc get clusterversion
NAME      VERSION      AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.1.0-rc.4   True        False         19h     Cluster version is 4.1.0-rc.4

$ oc adm release info 
  console                                       sha256:5397a1d5c54fef88344a0ec105d117aae69114bb61e4f3f6e4421c71c1795208
  console-operator                              sha256:81abea24d3bbde997aef6e786c89003b15a27b5d33f6a26a6056a663706b7f7a
  cluster-ingress-operator                      sha256:3156518d2677e69341a0d0dc745c56a2a86c85065ab907d45e796aa020e534af

How reproducible:
Always

Steps to Reproduce:
According to the documentation:
https://docs.openshift.com/container-platform/4.1/networking/ingress/configuring-default-certificate.html
1. Issue wildcard certificate using own (custom) PKI infrastructure.
2. Create secret:
oc --namespace openshift-ingress create secret tls custom-ingress-certs --cert=tls.crt --key=tls.key
based on issued certificate/private key. Put certificate + CA into tls.crt.

2. Patch ingress cr:
 oc patch --type=merge --namespace openshift-ingress-operator ingresscontrollers/default \
  --patch '{"spec":{"defaultCertificate":{"name":"custom-ingress-certs"}}}'

3. Ensure that certificate applied:
echo | openssl s_client -showcerts -servername console-openshift-console.apps.vadim-01-ocp4.myinternalsite.com -connect console-openshift-console.apps.vadim-01-ocp4..myinternalsite.com:443  | head -n 5
depth=1 C = US, ST = TX, L = Dallas, O = Myinternal, OU = Site, CN = mypki.myinternalsite.com
verify error:num=20:unable to get local issuer certificate
CONNECTED(00000005)
---
Certificate chain
 0 s:/CN=*.apps.vadim-01-ocp4.myinternalsite.com
   i:/C=US/ST=TX/L=Dallas/O=Myinternal/OU=Site/CN=mypki.myinternalsite.com

Actual results:
Try to log into web console. You'll be redirected to oauth url (oauth-openshift.apps.vadim-01-ocp4.myinternalsite.com), enter your credentials there and then you will see web console with error like "Ooops. Something got wrong" and redirected back to oauth url.

There are multiple errors in console pods:
2019/05/21 15:33:31 auth: failed to get latest auth source data: request to OAuth issuer endpoint https://oauth-openshift.apps.vadim-01-ocp4.myinternalsite.com/oauth/token failed: Head https://oauth-openshift.apps.vadim-01-ocp4.myinternalsite.com: x509: certificate signed by unknown authority


Expected results:
Users are able to login into console.


Additional info:

Console process doesn't trust custom root CA. Once we applied custom wildcard certificate using own PKI we should provide our CA to pods/processes to communicate with all exposed routes.
To do this in OCP 4 you need to add custom root CA to configmap signing-cabundle in namespace openshift-service-ca - and then service-ca operator will populate it.
I did it on my cluster:
1. my root CA added to cm signing-cabundle in namespace openshift-service-ca
oc get cm signing-cabundle -n openshift-service-ca -o yaml
(ensure that I have my root CA here)
2. it was populated to openshift-console namespace, cm service-ca:
oc get cm service-ca -n openshift-console -o yaml
(ensure that this CM has same content as signing-cabundle)
3. Login into console pod:
oc rsh console-5f87f4cbcf-dr6rd
sh-4.2$ ps ax
   PID TTY      STAT   TIME COMMAND
     1 ?        Ssl    0:08 /opt/bridge/bin/bridge --public-dir=/opt/bridge/static --config=/var/console-config/console-config.yaml --service-ca-file=/var/service-ca/service-ca.crt

4. The pod has service-ca configmap mounted, with my root CA:
sh-4.2$ cat /var/service-ca/service-ca.crt
(ensure that content of this file the same as configmap)

5. Check console-config.yaml file - it has different CA (serviceaccount-ca) to validate Oauth route - see oauthEndpointCAFile parameter:
sh-4.2$ cat /var/console-config/console-config.yaml
apiVersion: console.openshift.io/v1
auth:
  clientID: console
  clientSecretFile: /var/oauth-config/clientSecret
  logoutRedirect: ""
  oauthEndpointCAFile: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
clusterInfo:
  consoleBaseAddress: https://console-openshift-console.apps.vadim-01-ocp4.myinternalsite.com
  consoleBasePath: ""
  masterPublicURL: https://api.vadim-01-ocp4.myinternalsite.com:6443
customization:
  branding: ocp
  documentationBaseURL: https://docs.openshift.com/container-platform/4.1/
kind: ConsoleConfig
servingInfo:
  bindAddress: https://0.0.0.0:8443
  certFile: /var/serving-cert/tls.crt
  keyFile: /var/serving-cert/tls.key

Looks like console pod uses serviceaccount-ca bundle, which is not including custom root CA.

6. Ensure we can connect from console pod using configured (by openshift service-ca operator) custom CA bundle --service-ca-file=/var/service-ca/service-ca.crt:
sh-4.2$ curl --cacert /var/service-ca/service-ca.crt https://oauth-openshift.apps.vadim-01-ocp4.myinternalsite.com/oauth/token
{"error":"unsupported_grant_type","error_description":"The authorization grant type is not supported by the authorization server."}

7. Ensure that we cannot connect from console pod using CA bundle provided by oauthEndpointCAFile parameter:
sh-4.2$ curl --cacert /var/run/secrets/kubernetes.io/serviceaccount/ca.crt https://oauth-openshift.apps.vadim-01-ocp4.myinternalsite.com/oauth/token
curl: (60) Peer's Certificate issuer is not recognized.
More details here: http://curl.haxx.se/docs/sslcerts.html

curl performs SSL certificate verification by default, using a "bundle"
 of Certificate Authority (CA) public keys (CA certs). If the default
 bundle file isn't adequate, you can specify an alternate file
 using the --cacert option.
If this HTTPS server uses a certificate signed by a CA represented in
 the bundle, the certificate verification probably failed due to a
 problem with the certificate (it might be expired, or the name might
 not match the domain name in the URL).
If you'd like to turn off curl's verification of the certificate, use
 the -k (or --insecure) option.
sh-4.2$ 

I don't know how to customize serviceaccount-ca bundle (/var/run/secrets/kubernetes.io/serviceaccount/ca.crt) - it created based on kube-api CA certs and router-ca certs, 
but openshift-ingress operator creates router-ca only when use default wildcard certificate. So I didn't find the way how to add custom root CA into kube-api CA certs (not sure it make sense).

Comment 1 Samuel Padgett 2019-05-21 18:02:04 UTC
We include system roots in addition to the serviceaccount ca bundle. Is this a valid certificate?

Comment 2 Samuel Padgett 2019-05-21 18:24:26 UTC
Note that this is specifically called out as a prerequisite in the docs:

> You must have a certificate/key pair in PEM-encoded files, where the certificate is signed by a trusted certificate authority and valid for the Ingress domain.

Comment 3 Vadim Zharov 2019-05-21 18:31:52 UTC
Yes, I noted this.
Does it mean we cannot use our own PKI for wildcard certificates?

What do you mean by "system roots"? CA certs from kube-api? It is not valid if we use certs from our own PKI.

Comment 4 Vadim Zharov 2019-05-21 19:29:24 UTC
Sorry, by system roots you mean root CA from node OS, right?
They are not valid if use own PKI. In OCP 3 for RHEL nodes there is way how to add additional CA (easy to do in RHEL).
But for OCP4 with CoreOS it requires to tweak machineset config, which we should avoid to do. And I think that was the reason to create openshift service ca operator (to manage root CA for pods).

Comment 10 Pablo Alonso Rodriguez 2019-06-19 10:42:15 UTC
The workaround also worked in my tests. Just a note about it: I needed to restart all the console pods (oc delete pod --all -n openshift-console), but it is possible that I needed it because I was a bit impatient and did not wait long enough.

Comment 11 Samuel Padgett 2019-06-25 12:56:40 UTC
*** Bug 1723445 has been marked as a duplicate of this bug. ***

Comment 12 Sergio G. 2019-07-03 08:36:55 UTC
Also confirmed the workaround and NOT needed to delete any pod.

Comment 14 David Eads 2019-08-20 14:57:11 UTC
The workaround suggested in comment 7 is not suitable.

The sa-token-secret/ca.crt is driven by two ca bundles (https://github.com/openshift/cluster-kube-controller-manager-operator/blob/master/pkg/operator/targetconfigcontroller/targetconfigcontroller.go#L301-L315) .  If you've making a change to the router's CA, you need to make your change to the `oc get cm/router-ca -n openshift-config-managed`.  The ingress operator should manage this.

Comment 15 Pablo Alonso Rodriguez 2019-08-21 09:51:59 UTC
Hello,

If the ingress operator should manage this, then this bug should be moved to it. However, there is explicit code at ingress operator that intentionally does not generate that secret. So I think ingress operator guys should confirm whether this is intentional behavior. If it is intentional, then the error has been to assume that router-ca is always present while it is not, generating the situation where custom CAs are not properly propagated to service account token CA. 

In that case, we need some alternative, like changing router-ca behavior at ingress operator and/or provide a proper placeholder to add custom CAs to be appended to the ones exposed at service account mounts (as we were able to do in 3.11 with /etc/origin/master/ca-bundle.crt file). Not sure which one would be faster/safer.

Comment 18 Ryan Howe 2019-08-22 20:11:31 UTC
//  Summary

To fix this issue, the ingress router operator needs to update the router-ca configmap in the openshift-config-managed project. 
             `oc get cm/router-ca -n openshift-config-managed -o yaml`


We would also need to add to the documentation that the CA must be included with this server cert so that the operator will add it to the router-ca config map. 

  https://docs.openshift.com/container-platform/4.1/networking/ingress-operator.html#nw-ingress-setting-a-custom-default-certificate_configuring-ingress


From there once we fix the ingress operator and it handles updating the router-ca configuration map, and then the kube-controller-manager-operator will then handle updating the service ca bundle.

    https://github.com/openshift/cluster-kube-controller-manager-operator/blob/master/pkg/operator/targetconfigcontroller/targetconfigcontroller.go#L301-L315

Lastly with the service ca bundle updated, I assume that other operators will handle restarting other services so that their trust is updated, for example webconsole pod needs to be restarted. 


Currently at this time there is no workaround.

Comment 39 Miciah Dashiel Butler Masters 2019-09-17 18:36:48 UTC
Following is a possible solution using the proxy API that will be introduced in 4.2:

1. Generate a CA and certificate (for testing, if you do not already have a CA and certificate):

    BASE_DOMAIN="$(oc get dns.config/cluster -o 'jsonpath={.spec.baseDomain}')"
    INGRESS_DOMAIN="$(oc get ingress.config/cluster -o 'jsonpath={.spec.domain}')"
    openssl genrsa -out example-ca.key 2048
    openssl req -x509 -new -key example-ca.key -out example-ca.crt -days 1 -subj "/C=US/ST=NC/L=Chocowinity/O=OS3/OU=Eng/CN=$BASE_DOMAIN"
    openssl genrsa -out example.key 2048
    openssl req -new -key example.key -out example.csr -subj "/C=US/ST=NC/L=Chocowinity/O=OS3/OU=Eng/CN=*.$INGRESS_DOMAIN"
    openssl x509 -req -in example.csr -CA example-ca.crt -CAkey example-ca.key -CAcreateserial -out example.crt -days 1

2. Configure the CA as the cluster proxy CA:

    oc -n openshift-config create configmap custom-ca --from-file=ca-bundle.crt=example-ca.crt
    oc patch proxy/cluster --type=merge --patch='{"spec":{"trustedCA":{"name":"custom-ca"}}}'

3. Configure the certificate as the ingresscontroller's default certificate:

    oc -n openshift-ingress create secret tls custom-default-cert --cert=example.crt --key=example.key
    oc -n openshift-ingress-operator patch ingresscontrollers/default --type=merge --patch='{"spec":{"defaultCertificate":{"name":"custom-default-cert"}}}'

I tested the above procedure on a development cluster, but the procedure needs further testing to validate that it is a working, supportable solution.  I'd appreciate feedback from anyone who can test the above steps and verify that it works for their use-cases.

Comment 41 Daneyon Hansen 2019-09-18 00:17:18 UTC
Currently, the cluster-network-operator will only publish the combined user provided trust bundle if it's referenced by proxy/cluster [1]. This is why the proxy.spec.trustedCA must be added/modified to reference a configmap containing the custom CA cert(s). This is could be expanded to better support ingress by:

1. Adding a similar trustedCA field to the ingress api that references the configmap used to supply the custom ca bundle.
2. Update [1] to check ingress.trustedCA.

[1] https://github.com/openshift/cluster-network-operator/blob/master/pkg/controller/proxyconfig/controller.go#L215-L220

Preferably, the workaround can be documented and we can take time to design a solution for managing cluster-wide custom certs.


Note You need to log in before you can comment on or make changes to this bug.