Bug 1832379 - redeploy-openshift-ca.yml will fail if cert has an expiry less than 1 year
Summary: redeploy-openshift-ca.yml will fail if cert has an expiry less than 1 year
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Installer
Version: 3.11.0
Hardware: Unspecified
OS: Unspecified
medium
medium
Target Milestone: ---
: 3.11.z
Assignee: Russell Teague
QA Contact: Gaoyun Pei
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2020-05-06 15:45 UTC by Chuck Douglas
Modified: 2020-05-28 05:44 UTC (History)
0 users

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
When redeploying certificates, the cert expiry check provides little value because the expectation is that the certificates will be replaced. Additionally, there are situations where certificates are in an invalid state and redeploy is blocked by the check. Removing the checks will allow certificate redeploy to proceed without requiring additional inventory vars to override expiry days or invalid/missing certificates.
Clone Of:
Environment:
Last Closed: 2020-05-28 05:44:13 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github openshift openshift-ansible pull 12159 0 None closed Bug 1832379: Remove cert expiry check during cert redeploy 2020-06-03 03:59:53 UTC
Red Hat Product Errata RHBA-2020:2215 0 None None None 2020-05-28 05:44:20 UTC

Description Chuck Douglas 2020-05-06 15:45:47 UTC
Description of problem:
Running /usr/share/ansible/openshift-ansible/playbooks/openshift-master/redeploy-openshift-ca.yml without any additional parameters can fail if a cert in the chain is due to expire within 1 year of this being run.

Version-Release number of the following components:

openshift-ansible-3.11.188-1.git.0.accd104.el7.noarch
ansible-2.9.6-1.el7ae.noarch
ansible 2.9.6
  config file = /root/openshift-cluster-311-lab/ansible.cfg
  configured module search path = [u'/root/.ansible/plugins/modules', u'/usr/share/ansible/plugins/modules']
  ansible python module location = /usr/lib/python2.7/site-packages/ansible
  executable location = /bin/ansible
  python version = 2.7.5 (default, Sep 26 2019, 13:23:47) [GCC 4.8.5 20150623 (Red Hat 4.8.5-39)]


How reproducible:
Always

Steps to Reproduce:
1. ansible-playbook -i hosts.cluster /usr/share/ansible/openshift-ansible/playbooks/openshift-master/redeploy-openshift-ca.yml


Actual results:
2020-05-05 16:14:12,451 p=43186 u=root n=ansible | TASK [openshift_certificate_expiry : Fail when certs are near or already expired] *************************************************************
2020-05-05 16:14:12,452 p=43186 u=root n=ansible | fatal: [hostname.example.com]: FAILED! => {"changed": false, "msg": "Cluster certificates found to be expired or within 365 days of expiring. You may view the report at /root/cert-expiry-report.20200505T161100.html or /root/cert-expiry-report.20200505T161100.json.\n"}
(Similar errors appear for all members of the cluster)

Expected results:
All CA certs replaced.

Additional info:
/usr/share/ansible/openshift-ansible/playbooks/openshift-master/redeploy-openshift-ca.yml calls as part of its process the `openshift_certificate_expiry` role.  If `openshift_certificate_expiry_warning_days` is not set to a value (default 365 days) less than the expiry date of the CA you are trying to replace, the playbook will fail and certs will not be replaced.

It would make sense to have this fact documented that a variable needs to be set using the `-e openshift_certificate_expiry_warning_days=7` (or some appropriate value less) or not call the `openshift_certificate_expiry` role.  I suspect the latter is far more difficult to achieve.

Alternatively, add a `set_fact` to set this variable at the start of this play to save the user from themselves and inform the user of this fact.

Comment 1 Russell Teague 2020-05-08 21:01:56 UTC
I agree this is not a good user experience, in that if you are intentionally redeploying certificates, you want it to proceed without having to override the check with an arbitrary value for openshift_certificate_expiry_warning_days.  There are several factors at play here where this role (openshift_certificate_expiry) is used to check the expiry during different operations.

- playbooks/openshift-checks/certificate_expiry - For checking/reporting on expiry status
- playbooks/common/openshift-cluster/upgrades - During cluster upgrades
- playbooks/openshift-etcd - Redeploying etcd certificates or CA
- playbooks/openshift-master - Redploying OpenShift CA

Recently I made a change [1] to remove the hardcoded openshift_certificate_expiry_warning_days in place during upgrades.  It was preventing upgrades when certificates were less than 6 months, regardless of what was specified in the inventory.  Upgrades are probably now blocked unless openshift_certificate_expiry_warning_days is overridden.

The openshift_certificate_expiry role also has an option to fail on warnings (openshift_certificate_expiry_fail_on_warn) with a default value of true.  This option can make sense when using the role when checking/reporting on certificate status, but may not make sense when trying to redeploy certificates or during upgrades.  Warning the user of near certificate expiry during upgrades can make sense if the warning is done in a way that alerts the user, but I'm not sure we should fail an upgrade if certs expire in less than a year.


I will get more input from the team on how we can strike a balance between warning and failing when certificates are within either specified or default openshift_certificate_expiry_warning_days.

[1] https://github.com/openshift/openshift-ansible/pull/12154

Comment 2 Russell Teague 2020-05-12 18:37:14 UTC
After discussing with the team, I've opened a PR [1] to remove the expiry checks from the cert redeploy playbooks.  Additionally, but not directly related, I opened a follow-up PR to [2] change the default for openshift_certificate_expiry_fail_on_warn to 'false' and to override the default expiry values during upgrades.

[1] https://github.com/openshift/openshift-ansible/pull/12159
[2] https://github.com/openshift/openshift-ansible/pull/12158

Comment 5 Gaoyun Pei 2020-05-19 08:56:16 UTC
Verify this bug with openshift-ansible-3.11.218-1.git.0.6f55149.el7.noarch.

The redeploy-openshift-ca.yml playbook doesn't check whether certificates expired now, so certificates expired or within 365 days of expiring won't break the redeploy ca playbook.

Comment 7 errata-xmlrpc 2020-05-28 05:44:13 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:2215


Note You need to log in before you can comment on or make changes to this bug.