Bug 1530312
| Summary: | Redeploy openshift CA certificate fails via ansible installer | ||
|---|---|---|---|
| Product: | OpenShift Container Platform | Reporter: | Neeraj <nbhatt> |
| Component: | Installer | Assignee: | Andrew Butcher <abutcher> |
| Status: | CLOSED NOTABUG | QA Contact: | Johnny Liu <jialiu> |
| Severity: | urgent | Docs Contact: | |
| Priority: | urgent | ||
| Version: | 3.6.0 | CC: | aos-bugs, dmoessne, erich, jialiu, jkaur, jokerman, mmccomas, nbhatt, sdodson, sgaikwad, xtian |
| Target Milestone: | --- | ||
| Target Release: | --- | ||
| Hardware: | All | ||
| OS: | Linux | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | If docs needed, set a value | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2018-01-17 19:18:59 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
|
Description
Neeraj
2018-01-02 14:30:35 UTC
The error from etcd client is pretty generic and I've seen this happen when etcd runs out of quota space. Can you verify that there are no alarms tripped? On one of their etcd hosts run this etcdctl3 alarm list If it shows an alarm they need to increase their quota size, add this to /etc/etcd/etcd.conf on each host and restart etcd ETCD_QUOTA_BACKEND_BYTES=4294967296 Then clear the alarms etcdctl3 alarm disarm If no alarms are shown then this isn't the problem. I'm still reviewing the attached logs for other problems. After reviewing some of the attached sosreports we believe that the CA that was originally used to sign internal components has been removed and replaced, either manually or errantly by the playbooks. Does the customer still have access to the CA certificate that was used to sign /etc/origin/master/master.server.crt and other certs like /etc/origin/node/system:node:ocp-master-3pzj.crt ? If so we'd advise them to append that certificate to /etc/origin/master/ca-bundle.crt and /etc/origin/node/ca.crt on all hosts and restart services
Here we examine the issuer for these certs
$ openssl x509 -text -in /etc/origin/node/system\:node\:ocp-master-3pzj.crt | grep Issuer
Issuer: C=IN, L=Bangalore, O=Wipro Ltd, OU=WBPO, CN=www.deltaverge.com
$ openssl x509 -text -in /etc/origin/master/master.server.crt | grep Issue
Issuer: C=IN, L=Bangalore, O=Wipro Ltd, OU=WBPO, CN=www.deltaverge.com
The old CA may have been preserved in /etc/origin/master/legacy-ca/ where we preserve a copy of all CA certificates before replacing them, however if that were the case we'd expect that it should've been appended to the CA bundle in /etc/origin/master/ca-bundle.crt
You can verify that the CA matches the certificate like this, below you see an error
$ openssl verify -CAfile /etc/origin/master/ca-bundle.crt /etc/origin/master/master.server.crt
master.server.crt: CN = 10.140.0.19
error 20 at 0 depth lookup:unable to get local issuer certificate
Here's what successful verification looks like, your etcd certs look fine.
$ openssl verify -CAfile /etc/etcd/ca.crt /etc/etcd/server.crt
server.crt: OK
Once you find the cert that verifies /etc/origin/master/master.server.crt append it to /etc/origin/master/ca-bundle.crt and /etc/origin/node/ca.crt and restart services.
The CA certificate found in the sosreport does not include usage for certificate signing which means that it cannot be used for certificate signing. Examining the existing CA certificate key usage from the sosreport with openssl we see that “Certificate Sign” is not present within usage and basic constraints indicates that CA:FALSE which means that this is not a CA certificate.
$ openssl x509 -in ./sosreport/etc/origin/master/ca.crt -noout -text
...
X509v3 Key Usage: critical
Digital Signature, Key Encipherment
X509v3 Basic Constraints:
CA:FALSE
...
Compare that to an openshift-ansible generated OpenShift CA certificate:
$ openssl x509 -in /etc/origin/master/ca.crt -noout -text
...
X509v3 Key Usage: critical
Digital Signature, Key Encipherment, Certificate Sign
X509v3 Basic Constraints: critical
CA:TRUE
...
Additionally, the current CA certificate is issued by an intermediate Digicert CA which means that our bundle must contain the intermediate and root certificates for the CA we use in order to validate child certificates. openshift-ansible does not currently include a mechanism for specifying additional CA certificates to include in the bundle so using a custom CA certificate with an extended chain may have to be accomplished manually in part. We are working to verify steps for using a custom CA certificate with an extended chain with openshift-ansible.
From this point we can either use a different CA certificate that has been verified to support signing to recreate all cluster certificates while also ensuring that the intermediate and root certificates for that different CA certificate are present in the CA bundle OR generate a new CA certificate using openshift-ansible and create internal certificates using the newly generated CA.
To generate a new OpenShift CA with openshift-ansible, we can ensure that openshift_master_ca_certificate is unset in the inventory and then run the redeploy-openshift-ca.yml playbook (/usr/share/ansible/openshift-ansible/playbooks/byo/openshift-cluster/redeploy-openshift-ca.yml). The existing cluster certificates will not allow us to restart services within the redeploy-openshift-ca.yml playbook so we should skip those service restart steps by commenting out service restart tasks within the playbook. These two service restart blocks must be entirely commented out within the linked playbook on disk within /usr/share/ansible/openshift-ansible/playbooks/common/openshift-cluster/redeploy-certificates/openshift-ca.yml.
Master service restart: https://github.com/openshift/openshift-ansible/blob/release-3.6/playbooks/common/openshift-cluster/redeploy-certificates/openshift-ca.yml#L211-L227
Node service restart: https://github.com/openshift/openshift-ansible/blob/release-3.6/playbooks/common/openshift-cluster/redeploy-certificates/openshift-ca.yml#L276-L301
Once the CA has been re-generated and distributed we can generate new cluster certificates by running the redeploy-certificates.yml playbook (/usr/share/ansible/openshift-ansible/playbooks/byo/openshift-cluster/redeploy-certificates.yml ). Services should now be able to restart as this playbook restarts services after all certificates have been replaced.
Also note that the redeploy-certificates.yml playbook creates tar archives of /etc/origin/master within /etc/origin for each master before creating new certificates. The first master host will also archive generated certificates (/etc/origin/generated-configs) which is where certificates are created before being distributed to masters other than the first master as well as all nodes.
These archives can be found within /etc/origin/master-node-cert-config-backup-{{ ansible_date_time.epoch }}.tgz and the oldest archive should contain the original CA certificate as well as all of the original certificates for the cluster within folders named after the hosts. Note that these archives will only exist if the redeploy-certificates.yml playbook has been ran. The redeploy-openshift-ca.yml playbook simply copies all previous CA artifacts to the /etc/origin/master/legacy-ca directory so that all previous CA certificates may be included in the CA bundle.
(In reply to Andrew Butcher from comment #8) > Additionally, the current CA certificate is issued by an intermediate > Digicert CA which means that our bundle must contain the intermediate and > root certificates for the CA we use in order to validate child certificates. > openshift-ansible does not currently include a mechanism for specifying > additional CA certificates to include in the bundle so using a custom CA > certificate with an extended chain may have to be accomplished manually in > part. We are working to verify steps for using a custom CA certificate with > an extended chain with openshift-ansible. In order to use an intermediate CA certificate with openshift-ansible, the CA certificate supplied as the openshift_master_ca_certificate must contain the full chain. To test this, I created an intermediate CA using Jamie Nguyen's guide [1] but created the keys without passphrases. I combined the intermediate CA certificate and the root CA certificate into a single file beginning with the intermediate CA certificate and used the full chain as my openshift_master_ca_certificate. For example: $ cat intermediate/certs/intermediate.cert.pem \ certs/ca.cert.pem > intermediate/certs/ca-chain.cert.pem openshift_master_ca_certificate={'certfile': '/home/abutcher/ca/intermediate/certs/ca-chain.cert.pem', 'keyfile': '/home/abutcher/ca/intermediate/private/intermediate.key.pem'} Before running openshift-ansible the CA can be tested with oc by trying to create a certificate. Running the first command here will create a test certificate and key in /tmp/. The second command verifies the testing certificate using the existing CA bundle. $ oc adm ca create-server-cert \ --signer-cert=/root/ca-chain.cert.pem \ --signer-key=/root/intermediate.key.pem --signer-serial=/root/ca.serial.txt \ --hostnames="testing.example.com" \ --cert=/tmp/testing.crt \ --key=/tmp/testing.key $ openssl verify -CAfile /root/ca-chain.cert.pem /tmp/testing.crt /tmp/testing.crt: OK My resultant cluster certificates are signed by my intermediate CA certificate and can be verified with the CA bundle. $ openssl x509 -in /etc/origin/master/ca.crt -noout -text Certificate: ... Signature Algorithm: sha256WithRSAEncryption Issuer: C=US, ST=North Carolina, O=Flibjib, CN=Flibjib Root Validity Not Before: Jan 4 20:02:48 2018 GMT Not After : Jan 2 20:02:48 2028 GMT Subject: C=US, ST=North Carolina, O=Flibjib, CN=Flibjib Intermediate ... $ openssl x509 -in /etc/origin/master/master.server.crt -noout -text Certificate: ... Signature Algorithm: sha256WithRSAEncryption Issuer: C=US, ST=North Carolina, O=Flibjib, CN=Flibjib Intermediate Validity Not Before: Jan 4 20:17:52 2018 GMT Not After : Jan 4 20:17:53 2020 GMT ... $ openssl verify -CAfile /etc/origin/master/ca-bundle.crt /etc/origin/master/master.server.crt /etc/origin/master/master.server.crt: OK [1] https://jamielinux.com/docs/openssl-certificate-authority/index.html This was the result of using an invalid certificate authority to re-sign the cluster. Closing notabug. |