Bug 1614425
Summary: | 3.10 Upgrade fails when ca.crt contains more than one certificate (ie: intermediates) | ||
---|---|---|---|
Product: | OpenShift Container Platform | Reporter: | Nicolas Nosenzo <nnosenzo> |
Component: | Cluster Version Operator | Assignee: | Jeremiah Stuever <jstuever> |
Status: | CLOSED ERRATA | QA Contact: | Gaoyun Pei <gpei> |
Severity: | high | Docs Contact: | |
Priority: | medium | ||
Version: | 3.10.0 | CC: | aos-bugs, dmoessne, erich, jokerman, mfojtik, mgugino, mmccomas, mrogers, nnosenzo, scuppett, ssorce |
Target Milestone: | --- | Keywords: | Triaged |
Target Release: | 3.10.z | ||
Hardware: | Unspecified | ||
OS: | Unspecified | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | Bug Fix | |
Doc Text: |
Cause: A ca certificate is passed to openshift-ansible during an installation or upgrade containing more than one certificate.
Consequence: The installation or upgrade is unable to sign the csr and atomic-openshift-node fails to start.
Fix: Openshift-ansible now fails if a ca is provided or already exists on the cluster which does not contain exactly one certificate.
Result: The installation/upgrade fails and the user is notified of the reason as well as provided steps to resolve.
|
Story Points: | --- |
Clone Of: | Environment: | ||
Last Closed: | 2018-12-13 17:09:08 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Comment 4
Scott Dodson
2018-08-09 21:18:47 UTC
Nicolas, There's only a sosreport from one of their two masters, can you please gather a sosreport from the other master, and/or verify that it's also got the proper master configuration as follows. Also, lets verify that both masters actually have /etc/origin/master/ca.crt and /etc/origin/master/ca.key and permissions are correct to allow the controllers access to those files. This is the bit of config that I'm looking for kubernetesMasterConfig: controllerArguments: cluster-signing-cert-file: - /etc/origin/master/ca.crt cluster-signing-key-file: - /etc/origin/master/ca.key It's complaining about their CA having more than one object. Aug 08 11:34:18 dsrvlinf0335 atomic-openshift-master-controllers[83884]: E0808 11:34:18.263563 83884 controllermanager.go:456] Error starting "csrsigning" Aug 08 11:34:18 dsrvlinf0335 atomic-openshift-master-controllers[83884]: F0808 11:34:18.263607 83884 controllermanager.go:175] error starting controllers: failed to start certificate controller: error parsing CA cert file "/etc/origin/master/ca.crt": {"code":1003,"message":"the PEM file should contain only one object"} This is likely a cluster that's had it's CA recreated at some point in it's life. Backup /etc/origin/master/ca.crt and then ensure that only the most recently created certificate is present in that file. There can only be one cert in the file, how would you chose which to use ? Any upgrade/mgmt code must make sure only one cert is stored in that file, by removing the older one if necessary. If th file is shared for other uses where multiple ca cert should be used, the file should be renamed ca-bundle.crt and a new file with just the current valid ca created and configured for the signer service. My 2c. We've reviewed the playbooks and it looks like prior to 3.2 there was no ca-bundle.crt only ca.crt, so in that release we migrated[1] people from ca.crt to ca-bundle.crt by moving and symlinking ca.crt to ca-bundle.crt. This happened only during upgrades when we encountered a host that did not have ca-bundle.crt. If this falls into that case we'd need to carefully examine the ca-bundle.crt to identify which certificate is used to sign all certificates in the cluster. Then break that symlink and populate /etc/origin/master/ca.crt with only the contents of that certificate. The only other situation I can come up with where ca.crt would have multiple certificates would be if they've deployed a custom or chained certificate via other means. We should add a check to our upgrade playbooks in 3.10 to ensure that ca.crt only has one certificate and if we find more abort with a link to a KCS article that describes the cleanup process. Do we have any feedback from the customer on where they are with the workaround? 1 - https://github.com/openshift/openshift-ansible/commit/b3d04f1a54c0109ce38be103ddc7c83f1992c10e >> Do we have any feedback from the customer on where they are with the workaround?
Not yet, I've shared them the workaround some hours back. So I expect an answer by EOD today.
https://github.com/kubernetes/kubernetes/issues/67436 is upstream discussion related to csrsigner controller adding the ability to use signing certs with intermediate trust chains included. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2018:3750 |