Bug 1712525
Summary: | [DOCS] Console cannot use oauth endpoint after configuring ingress (wildcard) certificates from custom PKI - users cannot log in | ||
---|---|---|---|
Product: | OpenShift Container Platform | Reporter: | Vadim Zharov <vzharov> |
Component: | Documentation | Assignee: | Cody Hoag <choag> |
Status: | CLOSED CURRENTRELEASE | QA Contact: | Hongan Li <hongli> |
Severity: | high | Docs Contact: | Vikram Goyal <vigoyal> |
Priority: | high | ||
Version: | 4.1.0 | CC: | aarne, agawand, aos-bugs, aprajapa, balici, ChetRHosey, christoph.obexer, clasohm, deads, dhansen, dkaylor, dmace, dmoessne, dyocum, erich, farandac, gdeprati, hongli, jeff.li, jhadvig, jokerman, knewcome, lmartinh, malonso, mcurry, mharri, misalunk, mmasters, mmccomas, mwoodson, mzali, nbhatt, nstielau, palonsor, pamoedom, pescorza, rdiazgav, rhowe, rvanderp, sburke, scuppett, sgarciam, sjenning, spadgett, tmckay, trankin |
Target Milestone: | --- | ||
Target Release: | 4.2.z | ||
Hardware: | Unspecified | ||
OS: | Unspecified | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | If docs needed, set a value | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2019-11-26 20:41:12 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
Vadim Zharov
2019-05-21 17:14:54 UTC
We include system roots in addition to the serviceaccount ca bundle. Is this a valid certificate? Note that this is specifically called out as a prerequisite in the docs:
> You must have a certificate/key pair in PEM-encoded files, where the certificate is signed by a trusted certificate authority and valid for the Ingress domain.
Yes, I noted this. Does it mean we cannot use our own PKI for wildcard certificates? What do you mean by "system roots"? CA certs from kube-api? It is not valid if we use certs from our own PKI. Sorry, by system roots you mean root CA from node OS, right? They are not valid if use own PKI. In OCP 3 for RHEL nodes there is way how to add additional CA (easy to do in RHEL). But for OCP4 with CoreOS it requires to tweak machineset config, which we should avoid to do. And I think that was the reason to create openshift service ca operator (to manage root CA for pods). The workaround also worked in my tests. Just a note about it: I needed to restart all the console pods (oc delete pod --all -n openshift-console), but it is possible that I needed it because I was a bit impatient and did not wait long enough. *** Bug 1723445 has been marked as a duplicate of this bug. *** Also confirmed the workaround and NOT needed to delete any pod. The workaround suggested in comment 7 is not suitable. The sa-token-secret/ca.crt is driven by two ca bundles (https://github.com/openshift/cluster-kube-controller-manager-operator/blob/master/pkg/operator/targetconfigcontroller/targetconfigcontroller.go#L301-L315) . If you've making a change to the router's CA, you need to make your change to the `oc get cm/router-ca -n openshift-config-managed`. The ingress operator should manage this. Hello, If the ingress operator should manage this, then this bug should be moved to it. However, there is explicit code at ingress operator that intentionally does not generate that secret. So I think ingress operator guys should confirm whether this is intentional behavior. If it is intentional, then the error has been to assume that router-ca is always present while it is not, generating the situation where custom CAs are not properly propagated to service account token CA. In that case, we need some alternative, like changing router-ca behavior at ingress operator and/or provide a proper placeholder to add custom CAs to be appended to the ones exposed at service account mounts (as we were able to do in 3.11 with /etc/origin/master/ca-bundle.crt file). Not sure which one would be faster/safer. // Summary To fix this issue, the ingress router operator needs to update the router-ca configmap in the openshift-config-managed project. `oc get cm/router-ca -n openshift-config-managed -o yaml` We would also need to add to the documentation that the CA must be included with this server cert so that the operator will add it to the router-ca config map. https://docs.openshift.com/container-platform/4.1/networking/ingress-operator.html#nw-ingress-setting-a-custom-default-certificate_configuring-ingress From there once we fix the ingress operator and it handles updating the router-ca configuration map, and then the kube-controller-manager-operator will then handle updating the service ca bundle. https://github.com/openshift/cluster-kube-controller-manager-operator/blob/master/pkg/operator/targetconfigcontroller/targetconfigcontroller.go#L301-L315 Lastly with the service ca bundle updated, I assume that other operators will handle restarting other services so that their trust is updated, for example webconsole pod needs to be restarted. Currently at this time there is no workaround. Following is a possible solution using the proxy API that will be introduced in 4.2: 1. Generate a CA and certificate (for testing, if you do not already have a CA and certificate): BASE_DOMAIN="$(oc get dns.config/cluster -o 'jsonpath={.spec.baseDomain}')" INGRESS_DOMAIN="$(oc get ingress.config/cluster -o 'jsonpath={.spec.domain}')" openssl genrsa -out example-ca.key 2048 openssl req -x509 -new -key example-ca.key -out example-ca.crt -days 1 -subj "/C=US/ST=NC/L=Chocowinity/O=OS3/OU=Eng/CN=$BASE_DOMAIN" openssl genrsa -out example.key 2048 openssl req -new -key example.key -out example.csr -subj "/C=US/ST=NC/L=Chocowinity/O=OS3/OU=Eng/CN=*.$INGRESS_DOMAIN" openssl x509 -req -in example.csr -CA example-ca.crt -CAkey example-ca.key -CAcreateserial -out example.crt -days 1 2. Configure the CA as the cluster proxy CA: oc -n openshift-config create configmap custom-ca --from-file=ca-bundle.crt=example-ca.crt oc patch proxy/cluster --type=merge --patch='{"spec":{"trustedCA":{"name":"custom-ca"}}}' 3. Configure the certificate as the ingresscontroller's default certificate: oc -n openshift-ingress create secret tls custom-default-cert --cert=example.crt --key=example.key oc -n openshift-ingress-operator patch ingresscontrollers/default --type=merge --patch='{"spec":{"defaultCertificate":{"name":"custom-default-cert"}}}' I tested the above procedure on a development cluster, but the procedure needs further testing to validate that it is a working, supportable solution. I'd appreciate feedback from anyone who can test the above steps and verify that it works for their use-cases. I successfully validated the https://bugzilla.redhat.com/show_bug.cgi?id=1712525#c39 workaround with one caveat. My dev cluster has proxy enabled, so instead of: $ oc patch proxy/cluster --type=merge --patch='{"spec":{"trustedCA":{"name":"custom-ca"}}}' I modified configmap user-ca-bundle referenced by proxy/cluster trustedCA to include the example-ca.crt CA cert: $ oc edit cm/user-ca-bundle -n openshift-config If this is an acceptable fix, https://docs.openshift.com/container-platform/4.1/networking/ingress-operator.html#nw-ingress-setting-a-custom-default-certificate_configuring-ingress doc should be updated to include directions for adding the CA cert. Currently, the cluster-network-operator will only publish the combined user provided trust bundle if it's referenced by proxy/cluster [1]. This is why the proxy.spec.trustedCA must be added/modified to reference a configmap containing the custom CA cert(s). This is could be expanded to better support ingress by: 1. Adding a similar trustedCA field to the ingress api that references the configmap used to supply the custom ca bundle. 2. Update [1] to check ingress.trustedCA. [1] https://github.com/openshift/cluster-network-operator/blob/master/pkg/controller/proxyconfig/controller.go#L215-L220 Preferably, the workaround can be documented and we can take time to design a solution for managing cluster-wide custom certs. Reassigning to docs so that the solutions in #39 and #40 get documented (perhaps as part of ingress custom certificate docs cross-referencing the proxy docs for further background). I marked comment #40 as public as there is nothing sensitive and goes a bit further toward a resolution. If there's no workaround in the documentation, at least a warning indicating that not all custom certificates are valid should be included to avoid more people breaking their clusters. Current text n 4.2 may lead to users to understand that they can use their corporate CAs which is wrong: "Replacing the default wildcard certificate with one that is issued by a public or organizational CA will allow external clients to connect securely to applications running under the .apps sub-domain." So, either the workaround is documented or a warning to not use custom CAs to sign custom certificates for ingress is documented. But the documentation as is today will lead to lots of users breaking the clusters and opening support cases. (In reply to Sergio G. from comment #47) > If there's no workaround in the documentation, at least a warning indicating > that not all custom certificates are valid should be included to avoid more > people breaking their clusters. > > Current text n 4.2 may lead to users to understand that they can use their > corporate CAs which is wrong: > "Replacing the default wildcard certificate with one that is issued by a > public or organizational CA will allow external clients to connect securely > to applications running under the .apps sub-domain." > > So, either the workaround is documented or a warning to not use custom CAs > to sign custom certificates for ingress is documented. But the documentation > as is today will lead to lots of users breaking the clusters and opening > support cases. Not sure which doc you look at but our official doc has "WARNING" telling this[1]. [1]: https://access.redhat.com/documentation/en-us/openshift_container_platform/4.2/html/networking/configuring-ingress#nw-ingress-setting-a-custom-default-certificate_configuring-ingress I just hit this problem on ocp v4.2, and I am able to resolve it by adding my internal CA to cm 'trusted-ca-bundle' in project "openshift-config-managed". $oc edit cm trusted-ca-bundle -n openshift-config-managed (In reply to Jeff Li from comment #49) > I just hit this problem on ocp v4.2, and I am able to resolve it by adding > my internal CA to cm 'trusted-ca-bundle' in project > "openshift-config-managed". > > $oc edit cm trusted-ca-bundle -n openshift-config-managed FYI, quoting Engineering: ~~~ you shouldn't be touching things in "openshift-config-managed", those resources are managed by the platform and your changes will be overwritten. if you want to add content to that configmap, you need to add your CAs to the user configmap referenced by your proxy configuration object as discussed here: https://docs.openshift.com/container-platform/4.2/networking/enable-cluster-wide-proxy.html Those CAs will then be added to the openshift-config-managed/trusted-ca-bundle configmap by a controller. ~~~ Regards. Hi, I have found an issue with the workaround at Comment#39. While it works for the console, grafana and prometheus don't trust the proxy CA bundle (they still rely on service account ca), so you cannot access grafana dashboard. I have filed another bugzilla to get this fixed: https://bugzilla.redhat.com/show_bug.cgi?id=1768977 Should this bug be marked a duplicate of bug 1764704? I don't think so. This bug is older. I understand this bug is now scoped to document the current workaround at Comment#39 and the other one is to give a more definitive solution. But somebody please correct me if needed. Thanks and regards. (In reply to Daneyon Hansen from comment #55) > Should this bug be marked a duplicate of bug 1764704? Probably bug 1764704 is a duplicate of this bug, but I haven't done that since this was changed to a doc bug. Made initial draft of proposed changes: https://github.com/openshift/openshift-docs/pull/18004. *** Bug 1764704 has been marked as a duplicate of this bug. *** Resent pull request. Reorganized content to cover configuring a custom PKI as separate article, and referring to it from ingress docs: https://github.com/openshift/openshift-docs/pull/18207 the doc PR looks good, thanks. Published: https://docs.openshift.com/container-platform/4.2/networking/ingress-operator.html#nw-ingress-setting-a-custom-default-certificate_configuring-ingress https://docs.openshift.com/container-platform/4.2/networking/configuring-a-custom-pki.html#configuring-a-custom-pki https://access.redhat.com/documentation/en-us/openshift_container_platform/4.2/html/networking/configuring-ingress#nw-ingress-setting-a-custom-default-certificate_configuring-ingress https://access.redhat.com/documentation/en-us/openshift_container_platform/4.2/html/networking/configuring-a-custom-pki @Cody ptal at https://jira.coreos.com/browse/NE-229. We encourage users to include any intermediate certs in tls.crt of the secret containing a custom default certificate. We believe ordering matters, but putting the server certificate followed by any intermediate certs in tls.crt will suffice. (In reply to Pedro Amoedo from comment #51) > (In reply to Jeff Li from comment #49) > > I just hit this problem on ocp v4.2, and I am able to resolve it by adding > > my internal CA to cm 'trusted-ca-bundle' in project > > "openshift-config-managed". > > > > $oc edit cm trusted-ca-bundle -n openshift-config-managed > > FYI, quoting Engineering: > > ~~~ > you shouldn't be touching things in "openshift-config-managed", those > resources are managed by the platform and your changes will be overwritten. > > if you want to add content to that configmap, you need to add your CAs to > the user configmap referenced by your proxy configuration object as > discussed here: > https://docs.openshift.com/container-platform/4.2/networking/enable-cluster- > wide-proxy.html > > Those CAs will then be added to the > openshift-config-managed/trusted-ca-bundle configmap by a controller. > ~~~ > > Regards. Hi Pedro, are you aware if the issue and the fix that you propose will apply in OpenShift 4.2.19? Because we tried to apply a private CA signed certificate but the console, prometheus, grafana, etc didnt work. But I no tried to apply de Certificate using the procedure that you described. Regards! (In reply to gdeprati.ar from comment #86) > (In reply to Pedro Amoedo from comment #51) > > (In reply to Jeff Li from comment #49) > > > I just hit this problem on ocp v4.2, and I am able to resolve it by adding > > > my internal CA to cm 'trusted-ca-bundle' in project > > > "openshift-config-managed". > > > > > > $oc edit cm trusted-ca-bundle -n openshift-config-managed > > > > FYI, quoting Engineering: > > > > ~~~ > > you shouldn't be touching things in "openshift-config-managed", those > > resources are managed by the platform and your changes will be overwritten. > > > > if you want to add content to that configmap, you need to add your CAs to > > the user configmap referenced by your proxy configuration object as > > discussed here: > > https://docs.openshift.com/container-platform/4.2/networking/enable-cluster- > > wide-proxy.html > > > > Those CAs will then be added to the > > openshift-config-managed/trusted-ca-bundle configmap by a controller. > > ~~~ > > > > Regards. > > Hi Pedro, are you aware if the issue and the fix that you propose will apply > in OpenShift 4.2.19? Because we tried to apply a private CA signed > certificate but the console, prometheus, grafana, etc didnt work. But I no > tried to apply de Certificate using the procedure that you described. > > Regards! Hi, I'm afraid that version 4.2.19 do NOT contain the fix, AFAIK, for 4.2.x, this is something still in progress, you can check the general status here[1]. My recommendation is to upgrade to 4.3.x[2][3] if possible, on this manner you can get rid of this issue, all default cluster routes will work as expected in 4.3.x when using custom ingress certificate[4] along with cluster-wide PKI (custom CA)[5]. [1] - https://issues.redhat.com/browse/MON-884 [2] - https://docs.openshift.com/container-platform/4.3/updating/updating-cluster-between-minor.html [3] - https://access.redhat.com/solutions/4606811 [4] - https://docs.openshift.com/container-platform/4.3/networking/ingress-operator.html#nw-ingress-setting-a-custom-default-certificate_configuring-ingress [5] - https://docs.openshift.com/container-platform/4.3/networking/configuring-a-custom-pki.html#configuring-a-custom-pki Best Regards. The needinfo request[s] on this closed bug have been removed as they have been unresolved for 120 days |