Description of problem: Running /usr/share/ansible/openshift-ansible/playbooks/openshift-master/redeploy-openshift-ca.yml without any additional parameters can fail if a cert in the chain is due to expire within 1 year of this being run. Version-Release number of the following components: openshift-ansible-3.11.188-1.git.0.accd104.el7.noarch ansible-2.9.6-1.el7ae.noarch ansible 2.9.6 config file = /root/openshift-cluster-311-lab/ansible.cfg configured module search path = [u'/root/.ansible/plugins/modules', u'/usr/share/ansible/plugins/modules'] ansible python module location = /usr/lib/python2.7/site-packages/ansible executable location = /bin/ansible python version = 2.7.5 (default, Sep 26 2019, 13:23:47) [GCC 4.8.5 20150623 (Red Hat 4.8.5-39)] How reproducible: Always Steps to Reproduce: 1. ansible-playbook -i hosts.cluster /usr/share/ansible/openshift-ansible/playbooks/openshift-master/redeploy-openshift-ca.yml Actual results: 2020-05-05 16:14:12,451 p=43186 u=root n=ansible | TASK [openshift_certificate_expiry : Fail when certs are near or already expired] ************************************************************* 2020-05-05 16:14:12,452 p=43186 u=root n=ansible | fatal: [hostname.example.com]: FAILED! => {"changed": false, "msg": "Cluster certificates found to be expired or within 365 days of expiring. You may view the report at /root/cert-expiry-report.20200505T161100.html or /root/cert-expiry-report.20200505T161100.json.\n"} (Similar errors appear for all members of the cluster) Expected results: All CA certs replaced. Additional info: /usr/share/ansible/openshift-ansible/playbooks/openshift-master/redeploy-openshift-ca.yml calls as part of its process the `openshift_certificate_expiry` role. If `openshift_certificate_expiry_warning_days` is not set to a value (default 365 days) less than the expiry date of the CA you are trying to replace, the playbook will fail and certs will not be replaced. It would make sense to have this fact documented that a variable needs to be set using the `-e openshift_certificate_expiry_warning_days=7` (or some appropriate value less) or not call the `openshift_certificate_expiry` role. I suspect the latter is far more difficult to achieve. Alternatively, add a `set_fact` to set this variable at the start of this play to save the user from themselves and inform the user of this fact.
I agree this is not a good user experience, in that if you are intentionally redeploying certificates, you want it to proceed without having to override the check with an arbitrary value for openshift_certificate_expiry_warning_days. There are several factors at play here where this role (openshift_certificate_expiry) is used to check the expiry during different operations. - playbooks/openshift-checks/certificate_expiry - For checking/reporting on expiry status - playbooks/common/openshift-cluster/upgrades - During cluster upgrades - playbooks/openshift-etcd - Redeploying etcd certificates or CA - playbooks/openshift-master - Redploying OpenShift CA Recently I made a change [1] to remove the hardcoded openshift_certificate_expiry_warning_days in place during upgrades. It was preventing upgrades when certificates were less than 6 months, regardless of what was specified in the inventory. Upgrades are probably now blocked unless openshift_certificate_expiry_warning_days is overridden. The openshift_certificate_expiry role also has an option to fail on warnings (openshift_certificate_expiry_fail_on_warn) with a default value of true. This option can make sense when using the role when checking/reporting on certificate status, but may not make sense when trying to redeploy certificates or during upgrades. Warning the user of near certificate expiry during upgrades can make sense if the warning is done in a way that alerts the user, but I'm not sure we should fail an upgrade if certs expire in less than a year. I will get more input from the team on how we can strike a balance between warning and failing when certificates are within either specified or default openshift_certificate_expiry_warning_days. [1] https://github.com/openshift/openshift-ansible/pull/12154
After discussing with the team, I've opened a PR [1] to remove the expiry checks from the cert redeploy playbooks. Additionally, but not directly related, I opened a follow-up PR to [2] change the default for openshift_certificate_expiry_fail_on_warn to 'false' and to override the default expiry values during upgrades. [1] https://github.com/openshift/openshift-ansible/pull/12159 [2] https://github.com/openshift/openshift-ansible/pull/12158
Verify this bug with openshift-ansible-3.11.218-1.git.0.6f55149.el7.noarch. The redeploy-openshift-ca.yml playbook doesn't check whether certificates expired now, so certificates expired or within 365 days of expiring won't break the redeploy ca playbook.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2020:2215