Bug 1712525

Summary: [DOCS] Console cannot use oauth endpoint after configuring ingress (wildcard) certificates from custom PKI - users cannot log in
Product: OpenShift Container Platform Reporter: Vadim Zharov <vzharov>
Component: DocumentationAssignee: Cody Hoag <choag>
Status: CLOSED CURRENTRELEASE QA Contact: Hongan Li <hongli>
Severity: high Docs Contact: Vikram Goyal <vigoyal>
Priority: high    
Version: 4.1.0CC: aarne, agawand, aos-bugs, aprajapa, balici, ChetRHosey, christoph.obexer, clasohm, deads, dhansen, dkaylor, dmace, dmoessne, dyocum, erich, farandac, gdeprati, hongli, jeff.li, jhadvig, jokerman, knewcome, lmartinh, malonso, mcurry, mharri, misalunk, mmasters, mmccomas, mwoodson, mzali, nbhatt, nstielau, palonsor, pamoedom, pescorza, rdiazgav, rhowe, rvanderp, sburke, scuppett, sgarciam, sjenning, spadgett, tmckay, trankin
Target Milestone: ---   
Target Release: 4.2.z   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2019-11-26 20:41:12 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Vadim Zharov 2019-05-21 17:14:54 UTC
Description of problem:
Console cannot connect to oauth URL after wildcard certificate, issued by custom PKI applied for ingress router. Users are not able to login into web console.


Version-Release number of selected component (if applicable):
oc get clusterversion
NAME      VERSION      AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.1.0-rc.4   True        False         19h     Cluster version is 4.1.0-rc.4

$ oc adm release info 
  console                                       sha256:5397a1d5c54fef88344a0ec105d117aae69114bb61e4f3f6e4421c71c1795208
  console-operator                              sha256:81abea24d3bbde997aef6e786c89003b15a27b5d33f6a26a6056a663706b7f7a
  cluster-ingress-operator                      sha256:3156518d2677e69341a0d0dc745c56a2a86c85065ab907d45e796aa020e534af

How reproducible:
Always

Steps to Reproduce:
According to the documentation:
https://docs.openshift.com/container-platform/4.1/networking/ingress/configuring-default-certificate.html
1. Issue wildcard certificate using own (custom) PKI infrastructure.
2. Create secret:
oc --namespace openshift-ingress create secret tls custom-ingress-certs --cert=tls.crt --key=tls.key
based on issued certificate/private key. Put certificate + CA into tls.crt.

2. Patch ingress cr:
 oc patch --type=merge --namespace openshift-ingress-operator ingresscontrollers/default \
  --patch '{"spec":{"defaultCertificate":{"name":"custom-ingress-certs"}}}'

3. Ensure that certificate applied:
echo | openssl s_client -showcerts -servername console-openshift-console.apps.vadim-01-ocp4.myinternalsite.com -connect console-openshift-console.apps.vadim-01-ocp4..myinternalsite.com:443  | head -n 5
depth=1 C = US, ST = TX, L = Dallas, O = Myinternal, OU = Site, CN = mypki.myinternalsite.com
verify error:num=20:unable to get local issuer certificate
CONNECTED(00000005)
---
Certificate chain
 0 s:/CN=*.apps.vadim-01-ocp4.myinternalsite.com
   i:/C=US/ST=TX/L=Dallas/O=Myinternal/OU=Site/CN=mypki.myinternalsite.com

Actual results:
Try to log into web console. You'll be redirected to oauth url (oauth-openshift.apps.vadim-01-ocp4.myinternalsite.com), enter your credentials there and then you will see web console with error like "Ooops. Something got wrong" and redirected back to oauth url.

There are multiple errors in console pods:
2019/05/21 15:33:31 auth: failed to get latest auth source data: request to OAuth issuer endpoint https://oauth-openshift.apps.vadim-01-ocp4.myinternalsite.com/oauth/token failed: Head https://oauth-openshift.apps.vadim-01-ocp4.myinternalsite.com: x509: certificate signed by unknown authority


Expected results:
Users are able to login into console.


Additional info:

Console process doesn't trust custom root CA. Once we applied custom wildcard certificate using own PKI we should provide our CA to pods/processes to communicate with all exposed routes.
To do this in OCP 4 you need to add custom root CA to configmap signing-cabundle in namespace openshift-service-ca - and then service-ca operator will populate it.
I did it on my cluster:
1. my root CA added to cm signing-cabundle in namespace openshift-service-ca
oc get cm signing-cabundle -n openshift-service-ca -o yaml
(ensure that I have my root CA here)
2. it was populated to openshift-console namespace, cm service-ca:
oc get cm service-ca -n openshift-console -o yaml
(ensure that this CM has same content as signing-cabundle)
3. Login into console pod:
oc rsh console-5f87f4cbcf-dr6rd
sh-4.2$ ps ax
   PID TTY      STAT   TIME COMMAND
     1 ?        Ssl    0:08 /opt/bridge/bin/bridge --public-dir=/opt/bridge/static --config=/var/console-config/console-config.yaml --service-ca-file=/var/service-ca/service-ca.crt

4. The pod has service-ca configmap mounted, with my root CA:
sh-4.2$ cat /var/service-ca/service-ca.crt
(ensure that content of this file the same as configmap)

5. Check console-config.yaml file - it has different CA (serviceaccount-ca) to validate Oauth route - see oauthEndpointCAFile parameter:
sh-4.2$ cat /var/console-config/console-config.yaml
apiVersion: console.openshift.io/v1
auth:
  clientID: console
  clientSecretFile: /var/oauth-config/clientSecret
  logoutRedirect: ""
  oauthEndpointCAFile: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
clusterInfo:
  consoleBaseAddress: https://console-openshift-console.apps.vadim-01-ocp4.myinternalsite.com
  consoleBasePath: ""
  masterPublicURL: https://api.vadim-01-ocp4.myinternalsite.com:6443
customization:
  branding: ocp
  documentationBaseURL: https://docs.openshift.com/container-platform/4.1/
kind: ConsoleConfig
servingInfo:
  bindAddress: https://0.0.0.0:8443
  certFile: /var/serving-cert/tls.crt
  keyFile: /var/serving-cert/tls.key

Looks like console pod uses serviceaccount-ca bundle, which is not including custom root CA.

6. Ensure we can connect from console pod using configured (by openshift service-ca operator) custom CA bundle --service-ca-file=/var/service-ca/service-ca.crt:
sh-4.2$ curl --cacert /var/service-ca/service-ca.crt https://oauth-openshift.apps.vadim-01-ocp4.myinternalsite.com/oauth/token
{"error":"unsupported_grant_type","error_description":"The authorization grant type is not supported by the authorization server."}

7. Ensure that we cannot connect from console pod using CA bundle provided by oauthEndpointCAFile parameter:
sh-4.2$ curl --cacert /var/run/secrets/kubernetes.io/serviceaccount/ca.crt https://oauth-openshift.apps.vadim-01-ocp4.myinternalsite.com/oauth/token
curl: (60) Peer's Certificate issuer is not recognized.
More details here: http://curl.haxx.se/docs/sslcerts.html

curl performs SSL certificate verification by default, using a "bundle"
 of Certificate Authority (CA) public keys (CA certs). If the default
 bundle file isn't adequate, you can specify an alternate file
 using the --cacert option.
If this HTTPS server uses a certificate signed by a CA represented in
 the bundle, the certificate verification probably failed due to a
 problem with the certificate (it might be expired, or the name might
 not match the domain name in the URL).
If you'd like to turn off curl's verification of the certificate, use
 the -k (or --insecure) option.
sh-4.2$ 

I don't know how to customize serviceaccount-ca bundle (/var/run/secrets/kubernetes.io/serviceaccount/ca.crt) - it created based on kube-api CA certs and router-ca certs, 
but openshift-ingress operator creates router-ca only when use default wildcard certificate. So I didn't find the way how to add custom root CA into kube-api CA certs (not sure it make sense).

Comment 1 Samuel Padgett 2019-05-21 18:02:04 UTC
We include system roots in addition to the serviceaccount ca bundle. Is this a valid certificate?

Comment 2 Samuel Padgett 2019-05-21 18:24:26 UTC
Note that this is specifically called out as a prerequisite in the docs:

> You must have a certificate/key pair in PEM-encoded files, where the certificate is signed by a trusted certificate authority and valid for the Ingress domain.

Comment 3 Vadim Zharov 2019-05-21 18:31:52 UTC
Yes, I noted this.
Does it mean we cannot use our own PKI for wildcard certificates?

What do you mean by "system roots"? CA certs from kube-api? It is not valid if we use certs from our own PKI.

Comment 4 Vadim Zharov 2019-05-21 19:29:24 UTC
Sorry, by system roots you mean root CA from node OS, right?
They are not valid if use own PKI. In OCP 3 for RHEL nodes there is way how to add additional CA (easy to do in RHEL).
But for OCP4 with CoreOS it requires to tweak machineset config, which we should avoid to do. And I think that was the reason to create openshift service ca operator (to manage root CA for pods).

Comment 10 Pablo Alonso Rodriguez 2019-06-19 10:42:15 UTC
The workaround also worked in my tests. Just a note about it: I needed to restart all the console pods (oc delete pod --all -n openshift-console), but it is possible that I needed it because I was a bit impatient and did not wait long enough.

Comment 11 Samuel Padgett 2019-06-25 12:56:40 UTC
*** Bug 1723445 has been marked as a duplicate of this bug. ***

Comment 12 Sergio G. 2019-07-03 08:36:55 UTC
Also confirmed the workaround and NOT needed to delete any pod.

Comment 14 David Eads 2019-08-20 14:57:11 UTC
The workaround suggested in comment 7 is not suitable.

The sa-token-secret/ca.crt is driven by two ca bundles (https://github.com/openshift/cluster-kube-controller-manager-operator/blob/master/pkg/operator/targetconfigcontroller/targetconfigcontroller.go#L301-L315) .  If you've making a change to the router's CA, you need to make your change to the `oc get cm/router-ca -n openshift-config-managed`.  The ingress operator should manage this.

Comment 15 Pablo Alonso Rodriguez 2019-08-21 09:51:59 UTC
Hello,

If the ingress operator should manage this, then this bug should be moved to it. However, there is explicit code at ingress operator that intentionally does not generate that secret. So I think ingress operator guys should confirm whether this is intentional behavior. If it is intentional, then the error has been to assume that router-ca is always present while it is not, generating the situation where custom CAs are not properly propagated to service account token CA. 

In that case, we need some alternative, like changing router-ca behavior at ingress operator and/or provide a proper placeholder to add custom CAs to be appended to the ones exposed at service account mounts (as we were able to do in 3.11 with /etc/origin/master/ca-bundle.crt file). Not sure which one would be faster/safer.

Comment 18 Ryan Howe 2019-08-22 20:11:31 UTC
//  Summary

To fix this issue, the ingress router operator needs to update the router-ca configmap in the openshift-config-managed project. 
             `oc get cm/router-ca -n openshift-config-managed -o yaml`


We would also need to add to the documentation that the CA must be included with this server cert so that the operator will add it to the router-ca config map. 

  https://docs.openshift.com/container-platform/4.1/networking/ingress-operator.html#nw-ingress-setting-a-custom-default-certificate_configuring-ingress


From there once we fix the ingress operator and it handles updating the router-ca configuration map, and then the kube-controller-manager-operator will then handle updating the service ca bundle.

    https://github.com/openshift/cluster-kube-controller-manager-operator/blob/master/pkg/operator/targetconfigcontroller/targetconfigcontroller.go#L301-L315

Lastly with the service ca bundle updated, I assume that other operators will handle restarting other services so that their trust is updated, for example webconsole pod needs to be restarted. 


Currently at this time there is no workaround.

Comment 39 Miciah Dashiel Butler Masters 2019-09-17 18:36:48 UTC
Following is a possible solution using the proxy API that will be introduced in 4.2:

1. Generate a CA and certificate (for testing, if you do not already have a CA and certificate):

    BASE_DOMAIN="$(oc get dns.config/cluster -o 'jsonpath={.spec.baseDomain}')"
    INGRESS_DOMAIN="$(oc get ingress.config/cluster -o 'jsonpath={.spec.domain}')"
    openssl genrsa -out example-ca.key 2048
    openssl req -x509 -new -key example-ca.key -out example-ca.crt -days 1 -subj "/C=US/ST=NC/L=Chocowinity/O=OS3/OU=Eng/CN=$BASE_DOMAIN"
    openssl genrsa -out example.key 2048
    openssl req -new -key example.key -out example.csr -subj "/C=US/ST=NC/L=Chocowinity/O=OS3/OU=Eng/CN=*.$INGRESS_DOMAIN"
    openssl x509 -req -in example.csr -CA example-ca.crt -CAkey example-ca.key -CAcreateserial -out example.crt -days 1

2. Configure the CA as the cluster proxy CA:

    oc -n openshift-config create configmap custom-ca --from-file=ca-bundle.crt=example-ca.crt
    oc patch proxy/cluster --type=merge --patch='{"spec":{"trustedCA":{"name":"custom-ca"}}}'

3. Configure the certificate as the ingresscontroller's default certificate:

    oc -n openshift-ingress create secret tls custom-default-cert --cert=example.crt --key=example.key
    oc -n openshift-ingress-operator patch ingresscontrollers/default --type=merge --patch='{"spec":{"defaultCertificate":{"name":"custom-default-cert"}}}'

I tested the above procedure on a development cluster, but the procedure needs further testing to validate that it is a working, supportable solution.  I'd appreciate feedback from anyone who can test the above steps and verify that it works for their use-cases.

Comment 40 Daneyon Hansen 2019-09-17 23:52:49 UTC
I successfully validated the https://bugzilla.redhat.com/show_bug.cgi?id=1712525#c39 workaround with one caveat. My dev cluster has proxy enabled, so instead of:

$ oc patch proxy/cluster --type=merge --patch='{"spec":{"trustedCA":{"name":"custom-ca"}}}'

I modified configmap user-ca-bundle referenced by proxy/cluster trustedCA to include the example-ca.crt CA cert:

$ oc edit cm/user-ca-bundle -n openshift-config

If this is an acceptable fix, https://docs.openshift.com/container-platform/4.1/networking/ingress-operator.html#nw-ingress-setting-a-custom-default-certificate_configuring-ingress doc should be updated to include directions for adding the CA cert.

Comment 41 Daneyon Hansen 2019-09-18 00:17:18 UTC
Currently, the cluster-network-operator will only publish the combined user provided trust bundle if it's referenced by proxy/cluster [1]. This is why the proxy.spec.trustedCA must be added/modified to reference a configmap containing the custom CA cert(s). This is could be expanded to better support ingress by:

1. Adding a similar trustedCA field to the ingress api that references the configmap used to supply the custom ca bundle.
2. Update [1] to check ingress.trustedCA.

[1] https://github.com/openshift/cluster-network-operator/blob/master/pkg/controller/proxyconfig/controller.go#L215-L220

Preferably, the workaround can be documented and we can take time to design a solution for managing cluster-wide custom certs.

Comment 42 Dan Mace 2019-10-11 16:21:26 UTC
Reassigning to docs so that the solutions in #39 and #40 get documented (perhaps as part of ingress custom certificate docs cross-referencing the proxy docs for further background).

Comment 44 Dan Yocum 2019-10-22 16:14:16 UTC
I marked comment #40 as public as there is nothing sensitive and goes a bit further toward a resolution.

Comment 47 Sergio G. 2019-10-31 09:49:00 UTC
If there's no workaround in the documentation, at least a warning indicating that not all custom certificates are valid should be included to avoid more people breaking their clusters.

Current text n 4.2 may lead to users to understand that they can use their corporate CAs which is wrong:
"Replacing the default wildcard certificate with one that is issued by a public or organizational CA will allow external clients to connect securely to applications running under the .apps sub-domain."

So, either the workaround is documented or a warning to not use custom CAs to sign custom certificates for ingress is documented. But the documentation as is today will lead to lots of users breaking the clusters and opening support cases.

Comment 48 Muhammad Aizuddin Zali 2019-10-31 11:25:26 UTC
(In reply to Sergio G. from comment #47)
> If there's no workaround in the documentation, at least a warning indicating
> that not all custom certificates are valid should be included to avoid more
> people breaking their clusters.
> 
> Current text n 4.2 may lead to users to understand that they can use their
> corporate CAs which is wrong:
> "Replacing the default wildcard certificate with one that is issued by a
> public or organizational CA will allow external clients to connect securely
> to applications running under the .apps sub-domain."
> 
> So, either the workaround is documented or a warning to not use custom CAs
> to sign custom certificates for ingress is documented. But the documentation
> as is today will lead to lots of users breaking the clusters and opening
> support cases.

Not sure which doc you look at but our official doc has "WARNING" telling this[1].


[1]: https://access.redhat.com/documentation/en-us/openshift_container_platform/4.2/html/networking/configuring-ingress#nw-ingress-setting-a-custom-default-certificate_configuring-ingress

Comment 49 Jeff Li 2019-11-01 21:19:27 UTC
I just hit this problem on ocp v4.2, and I am able to resolve it by adding my internal CA to cm 'trusted-ca-bundle' in project "openshift-config-managed".

$oc edit cm trusted-ca-bundle -n openshift-config-managed

Comment 51 Pedro Amoedo 2019-11-05 09:58:37 UTC
(In reply to Jeff Li from comment #49)
> I just hit this problem on ocp v4.2, and I am able to resolve it by adding
> my internal CA to cm 'trusted-ca-bundle' in project
> "openshift-config-managed".
> 
> $oc edit cm trusted-ca-bundle -n openshift-config-managed

FYI, quoting Engineering:

~~~
you shouldn't be touching things in "openshift-config-managed", those resources are managed by the platform and your changes will be overwritten.

if you want to add content to that configmap, you need to add your CAs to the user configmap referenced by your proxy configuration object as discussed here: https://docs.openshift.com/container-platform/4.2/networking/enable-cluster-wide-proxy.html

Those CAs will then be added to the openshift-config-managed/trusted-ca-bundle configmap by a controller.
~~~

Regards.

Comment 52 Pablo Alonso Rodriguez 2019-11-05 17:21:02 UTC
Hi,

I have found an issue with the workaround at Comment#39. While it works for the console, grafana and prometheus don't trust the proxy CA bundle (they still rely on service account ca), so you cannot access grafana dashboard.

I have filed another bugzilla to get this fixed: https://bugzilla.redhat.com/show_bug.cgi?id=1768977

Comment 55 Daneyon Hansen 2019-11-07 17:35:13 UTC
Should this bug be marked a duplicate of bug 1764704?

Comment 56 Pablo Alonso Rodriguez 2019-11-07 17:38:23 UTC
I don't think so. This bug is older. I understand this bug is now scoped to document the current workaround at Comment#39 and the other one is to give a more definitive solution. But somebody please correct me if needed.

Thanks and regards.

Comment 57 Samuel Padgett 2019-11-07 18:37:27 UTC
(In reply to Daneyon Hansen from comment #55)
> Should this bug be marked a duplicate of bug 1764704?

Probably bug 1764704 is a duplicate of this bug, but I haven't done that since this was changed to a doc bug.

Comment 59 Cody Hoag 2019-11-07 21:17:52 UTC
Made initial draft of proposed changes: https://github.com/openshift/openshift-docs/pull/18004.

Comment 63 Samuel Padgett 2019-11-19 14:35:28 UTC
*** Bug 1764704 has been marked as a duplicate of this bug. ***

Comment 64 Cody Hoag 2019-11-21 14:10:20 UTC
Resent pull request. Reorganized content to cover configuring a custom PKI as separate article, and referring to it from ingress docs: https://github.com/openshift/openshift-docs/pull/18207

Comment 65 Hongan Li 2019-11-25 06:41:15 UTC
the doc PR looks good, thanks.

Comment 69 Daneyon Hansen 2019-12-03 18:13:03 UTC
@Cody ptal at  https://jira.coreos.com/browse/NE-229. We encourage users to include any intermediate certs in tls.crt of the secret containing a custom default certificate.  We believe ordering matters, but putting the server certificate followed by any intermediate certs in tls.crt will suffice.

Comment 86 gdeprati@santandertecnologia.com.ar 2020-04-14 01:08:33 UTC
(In reply to Pedro Amoedo from comment #51)
> (In reply to Jeff Li from comment #49)
> > I just hit this problem on ocp v4.2, and I am able to resolve it by adding
> > my internal CA to cm 'trusted-ca-bundle' in project
> > "openshift-config-managed".
> > 
> > $oc edit cm trusted-ca-bundle -n openshift-config-managed
> 
> FYI, quoting Engineering:
> 
> ~~~
> you shouldn't be touching things in "openshift-config-managed", those
> resources are managed by the platform and your changes will be overwritten.
> 
> if you want to add content to that configmap, you need to add your CAs to
> the user configmap referenced by your proxy configuration object as
> discussed here:
> https://docs.openshift.com/container-platform/4.2/networking/enable-cluster-
> wide-proxy.html
> 
> Those CAs will then be added to the
> openshift-config-managed/trusted-ca-bundle configmap by a controller.
> ~~~
> 
> Regards.

Hi Pedro, are you aware if the issue and the fix that you propose will apply in OpenShift 4.2.19? Because we tried to apply a private CA signed certificate but the console, prometheus, grafana, etc didnt work. But I no tried to apply de Certificate using the procedure that you described. 

Regards!

Comment 87 Pedro Amoedo 2020-04-14 10:22:25 UTC
(In reply to gdeprati.ar from comment #86)
> (In reply to Pedro Amoedo from comment #51)
> > (In reply to Jeff Li from comment #49)
> > > I just hit this problem on ocp v4.2, and I am able to resolve it by adding
> > > my internal CA to cm 'trusted-ca-bundle' in project
> > > "openshift-config-managed".
> > > 
> > > $oc edit cm trusted-ca-bundle -n openshift-config-managed
> > 
> > FYI, quoting Engineering:
> > 
> > ~~~
> > you shouldn't be touching things in "openshift-config-managed", those
> > resources are managed by the platform and your changes will be overwritten.
> > 
> > if you want to add content to that configmap, you need to add your CAs to
> > the user configmap referenced by your proxy configuration object as
> > discussed here:
> > https://docs.openshift.com/container-platform/4.2/networking/enable-cluster-
> > wide-proxy.html
> > 
> > Those CAs will then be added to the
> > openshift-config-managed/trusted-ca-bundle configmap by a controller.
> > ~~~
> > 
> > Regards.
> 
> Hi Pedro, are you aware if the issue and the fix that you propose will apply
> in OpenShift 4.2.19? Because we tried to apply a private CA signed
> certificate but the console, prometheus, grafana, etc didnt work. But I no
> tried to apply de Certificate using the procedure that you described. 
> 
> Regards!

Hi, I'm afraid that version 4.2.19 do NOT contain the fix, AFAIK, for 4.2.x, this is something still in progress, you can check the general status here[1].

My recommendation is to upgrade to 4.3.x[2][3] if possible, on this manner you can get rid of this issue, all default cluster routes will work as expected in 4.3.x when using custom ingress certificate[4] along with cluster-wide PKI (custom CA)[5].

[1] - https://issues.redhat.com/browse/MON-884
[2] - https://docs.openshift.com/container-platform/4.3/updating/updating-cluster-between-minor.html
[3] - https://access.redhat.com/solutions/4606811
[4] - https://docs.openshift.com/container-platform/4.3/networking/ingress-operator.html#nw-ingress-setting-a-custom-default-certificate_configuring-ingress
[5] - https://docs.openshift.com/container-platform/4.3/networking/configuring-a-custom-pki.html#configuring-a-custom-pki

Best Regards.

Comment 88 Red Hat Bugzilla 2024-01-06 04:26:20 UTC
The needinfo request[s] on this closed bug have been removed as they have been unresolved for 120 days