Bug 1832379

Summary: redeploy-openshift-ca.yml will fail if cert has an expiry less than 1 year
Product: OpenShift Container Platform Reporter: Chuck Douglas <cdouglas>
Component: InstallerAssignee: Russell Teague <rteague>
Installer sub component: openshift-ansible QA Contact: Gaoyun Pei <gpei>
Status: CLOSED ERRATA Docs Contact:
Severity: medium    
Priority: medium    
Version: 3.11.0   
Target Milestone: ---   
Target Release: 3.11.z   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
When redeploying certificates, the cert expiry check provides little value because the expectation is that the certificates will be replaced. Additionally, there are situations where certificates are in an invalid state and redeploy is blocked by the check. Removing the checks will allow certificate redeploy to proceed without requiring additional inventory vars to override expiry days or invalid/missing certificates.
Story Points: ---
Clone Of: Environment:
Last Closed: 2020-05-28 05:44:13 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Chuck Douglas 2020-05-06 15:45:47 UTC
Description of problem:
Running /usr/share/ansible/openshift-ansible/playbooks/openshift-master/redeploy-openshift-ca.yml without any additional parameters can fail if a cert in the chain is due to expire within 1 year of this being run.

Version-Release number of the following components:

openshift-ansible-3.11.188-1.git.0.accd104.el7.noarch
ansible-2.9.6-1.el7ae.noarch
ansible 2.9.6
  config file = /root/openshift-cluster-311-lab/ansible.cfg
  configured module search path = [u'/root/.ansible/plugins/modules', u'/usr/share/ansible/plugins/modules']
  ansible python module location = /usr/lib/python2.7/site-packages/ansible
  executable location = /bin/ansible
  python version = 2.7.5 (default, Sep 26 2019, 13:23:47) [GCC 4.8.5 20150623 (Red Hat 4.8.5-39)]


How reproducible:
Always

Steps to Reproduce:
1. ansible-playbook -i hosts.cluster /usr/share/ansible/openshift-ansible/playbooks/openshift-master/redeploy-openshift-ca.yml


Actual results:
2020-05-05 16:14:12,451 p=43186 u=root n=ansible | TASK [openshift_certificate_expiry : Fail when certs are near or already expired] *************************************************************
2020-05-05 16:14:12,452 p=43186 u=root n=ansible | fatal: [hostname.example.com]: FAILED! => {"changed": false, "msg": "Cluster certificates found to be expired or within 365 days of expiring. You may view the report at /root/cert-expiry-report.20200505T161100.html or /root/cert-expiry-report.20200505T161100.json.\n"}
(Similar errors appear for all members of the cluster)

Expected results:
All CA certs replaced.

Additional info:
/usr/share/ansible/openshift-ansible/playbooks/openshift-master/redeploy-openshift-ca.yml calls as part of its process the `openshift_certificate_expiry` role.  If `openshift_certificate_expiry_warning_days` is not set to a value (default 365 days) less than the expiry date of the CA you are trying to replace, the playbook will fail and certs will not be replaced.

It would make sense to have this fact documented that a variable needs to be set using the `-e openshift_certificate_expiry_warning_days=7` (or some appropriate value less) or not call the `openshift_certificate_expiry` role.  I suspect the latter is far more difficult to achieve.

Alternatively, add a `set_fact` to set this variable at the start of this play to save the user from themselves and inform the user of this fact.

Comment 1 Russell Teague 2020-05-08 21:01:56 UTC
I agree this is not a good user experience, in that if you are intentionally redeploying certificates, you want it to proceed without having to override the check with an arbitrary value for openshift_certificate_expiry_warning_days.  There are several factors at play here where this role (openshift_certificate_expiry) is used to check the expiry during different operations.

- playbooks/openshift-checks/certificate_expiry - For checking/reporting on expiry status
- playbooks/common/openshift-cluster/upgrades - During cluster upgrades
- playbooks/openshift-etcd - Redeploying etcd certificates or CA
- playbooks/openshift-master - Redploying OpenShift CA

Recently I made a change [1] to remove the hardcoded openshift_certificate_expiry_warning_days in place during upgrades.  It was preventing upgrades when certificates were less than 6 months, regardless of what was specified in the inventory.  Upgrades are probably now blocked unless openshift_certificate_expiry_warning_days is overridden.

The openshift_certificate_expiry role also has an option to fail on warnings (openshift_certificate_expiry_fail_on_warn) with a default value of true.  This option can make sense when using the role when checking/reporting on certificate status, but may not make sense when trying to redeploy certificates or during upgrades.  Warning the user of near certificate expiry during upgrades can make sense if the warning is done in a way that alerts the user, but I'm not sure we should fail an upgrade if certs expire in less than a year.


I will get more input from the team on how we can strike a balance between warning and failing when certificates are within either specified or default openshift_certificate_expiry_warning_days.

[1] https://github.com/openshift/openshift-ansible/pull/12154

Comment 2 Russell Teague 2020-05-12 18:37:14 UTC
After discussing with the team, I've opened a PR [1] to remove the expiry checks from the cert redeploy playbooks.  Additionally, but not directly related, I opened a follow-up PR to [2] change the default for openshift_certificate_expiry_fail_on_warn to 'false' and to override the default expiry values during upgrades.

[1] https://github.com/openshift/openshift-ansible/pull/12159
[2] https://github.com/openshift/openshift-ansible/pull/12158

Comment 5 Gaoyun Pei 2020-05-19 08:56:16 UTC
Verify this bug with openshift-ansible-3.11.218-1.git.0.6f55149.el7.noarch.

The redeploy-openshift-ca.yml playbook doesn't check whether certificates expired now, so certificates expired or within 365 days of expiring won't break the redeploy ca playbook.

Comment 7 errata-xmlrpc 2020-05-28 05:44:13 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:2215