Bug 1614425

Summary: 3.10 Upgrade fails when ca.crt contains more than one certificate (ie: intermediates)
Product: OpenShift Container Platform Reporter: Nicolas Nosenzo <nnosenzo>
Component: Cluster Version OperatorAssignee: Jeremiah Stuever <jstuever>
Status: CLOSED ERRATA QA Contact: Gaoyun Pei <gpei>
Severity: high Docs Contact:
Priority: medium    
Version: 3.10.0CC: aos-bugs, dmoessne, erich, jokerman, mfojtik, mgugino, mmccomas, mrogers, nnosenzo, scuppett, ssorce
Target Milestone: ---Keywords: Triaged
Target Release: 3.10.z   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Cause: A ca certificate is passed to openshift-ansible during an installation or upgrade containing more than one certificate. Consequence: The installation or upgrade is unable to sign the csr and atomic-openshift-node fails to start. Fix: Openshift-ansible now fails if a ca is provided or already exists on the cluster which does not contain exactly one certificate. Result: The installation/upgrade fails and the user is notified of the reason as well as provided steps to resolve.
Story Points: ---
Clone Of: Environment:
Last Closed: 2018-12-13 17:09:08 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---

Comment 4 Scott Dodson 2018-08-09 21:18:47 UTC
Based on the case comments it looks like their node client CSRs are approved but not issued.

# oc get csr 
NAME AGE REQUESTOR CONDITION
node-csr-9bwk5PmCQjvgTOq6gW-Ejr0c0wcc9xoJpmfReedI2D4 1h system:admin Approved node-csr-CVcSSCU83yM79Dt23qaAj7kTYWSURs_5Yc135Wzv0kw 7m system:admin Approved node-csr-E51cDR52oy4rAGNdxVgHKkZf5RkjwS1NxmPUn1p77fY 1d system:admin Approved node-csr-HHA-bpCEypHqT2U1H9Tvr6OvP6ALoW_CiYitiikZ8tw 1h system:admin Approved node-csr-Ie2HSuYIROpu1wyo9v7EX5yF7jEdG0Ag3AvsC3O5wXw 6h system:admin Approved node-csr-RDVs_j29q_giq2Lcw8PIWNcEtAj3OLp8QbEpX77c7TY 5h system:admin Approved node-csr-Rkxsi1KRT8WS5nqFDZccwfI1hnxPnxq1ZxjD1K2dWi0 6h system:admin Approved node-csr-eJfoX3jUXIbqzmQhflgW-MKyaFLyijF02nKRC-LdTLw 4h system:admin Approved node-csr-oZ6p3qqy3wpsHi8edWjtF3TfR1Y6oPI_nYT6lGLapeM 1d system:admin Approved

The issuance of the certificates should happen via a controller. Their master seems properly configured to enable the signing controller based on master-config.yaml in the sosreport.

I'm still digging, but I'd try `systemctl restart atomic-openshift-openshift-master-controllers` on both of their masters then get CSRs again to see if they go "Approved,Issued"

Comment 5 Scott Dodson 2018-08-09 21:29:40 UTC
Nicolas,

There's only a sosreport from one of their two masters, can you please gather a sosreport from the other master, and/or verify that it's also got the proper master configuration as follows. Also, lets verify that both masters actually have /etc/origin/master/ca.crt and /etc/origin/master/ca.key and permissions are correct to allow the controllers access to those files.

This is the bit of config that I'm looking for

kubernetesMasterConfig:
  controllerArguments:
    cluster-signing-cert-file:
    - /etc/origin/master/ca.crt
    cluster-signing-key-file:
    - /etc/origin/master/ca.key

Comment 6 Scott Dodson 2018-08-09 21:55:10 UTC
It's complaining about their CA having more than one object.

Aug 08 11:34:18 dsrvlinf0335 atomic-openshift-master-controllers[83884]: E0808 11:34:18.263563   83884 controllermanager.go:456] Error starting "csrsigning"
Aug 08 11:34:18 dsrvlinf0335 atomic-openshift-master-controllers[83884]: F0808 11:34:18.263607   83884 controllermanager.go:175] error starting controllers: failed to start certificate controller: error parsing 
CA cert file "/etc/origin/master/ca.crt": {"code":1003,"message":"the PEM file should contain only one object"}


This is likely a cluster that's had it's CA recreated at some point in it's life. Backup /etc/origin/master/ca.crt and then ensure that only the most recently created certificate is present in that file.

Comment 9 Simo Sorce 2018-08-13 12:42:41 UTC
There can only be one cert in the file, how would you chose which to use ?
Any upgrade/mgmt code must make sure only one cert is stored in that file, by removing the older one if necessary.
If th file is shared for other uses where multiple ca cert should be used, the file should be renamed ca-bundle.crt and a new file with just the current valid ca created and configured for the signer service.

My 2c.

Comment 10 Scott Dodson 2018-08-13 14:22:10 UTC
We've reviewed the playbooks and it looks like prior to 3.2 there was no ca-bundle.crt only ca.crt, so in that release we migrated[1] people from ca.crt to ca-bundle.crt by moving and symlinking ca.crt to ca-bundle.crt. This happened only during upgrades when we encountered a host that did not have ca-bundle.crt.

If this falls into that case we'd need to carefully examine the ca-bundle.crt to identify which certificate is used to sign all certificates in the cluster. Then break that symlink and populate /etc/origin/master/ca.crt with only the contents of that certificate.

The only other situation I can come up with where ca.crt would have multiple certificates would be if they've deployed a custom or chained certificate via other means.

We should add a check to our upgrade playbooks in 3.10 to ensure that ca.crt only has one certificate and if we find more abort with a link to a KCS article that describes the cleanup process.

Do we have any feedback from the customer on where they are with the workaround? 


1 - https://github.com/openshift/openshift-ansible/commit/b3d04f1a54c0109ce38be103ddc7c83f1992c10e

Comment 11 Nicolas Nosenzo 2018-08-13 14:32:24 UTC
>> Do we have any feedback from the customer on where they are with the workaround? 

Not yet, I've shared them the workaround some hours back. So I expect an answer by EOD today.

Comment 19 Scott Dodson 2018-08-23 20:51:45 UTC
https://github.com/kubernetes/kubernetes/issues/67436

is upstream discussion related to csrsigner controller adding the ability to use signing certs with intermediate trust chains included.

Comment 34 errata-xmlrpc 2018-12-13 17:09:08 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2018:3750