Bug 1829492 - Upgrade playbook fails if certificates are going to expire in less than 183 days and the openshift_certificate_expiry_warning_days has been set in the inventory file
Summary: Upgrade playbook fails if certificates are going to expire in less than 183 d...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Installer
Version: 3.11.0
Hardware: Unspecified
OS: All
medium
medium
Target Milestone: ---
: 3.11.z
Assignee: Russell Teague
QA Contact: Gaoyun Pei
URL:
Whiteboard:
: 1829232 (view as bug list)
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2020-04-29 15:58 UTC by Joel Rosental R.
Modified: 2023-10-06 19:48 UTC (History)
2 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Cause: The variable openshift_certificate_expiry_warning_days was hard-coded for one part of the code calling the openshift_certificate_expiry role during upgrades. Consequence: This prevented overriding the variable in the inventory. Fix: Replaced the hard-coded value with a task to set a value of six months if the variable has not been defined by the user. Result: Override possible in inventory and upgrades will default to six months.
Clone Of:
Environment:
Last Closed: 2020-05-28 05:44:13 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github openshift openshift-ansible pull 12154 0 None closed Bug 1829492: Remove hard coded openshift_certificate_expiry_warning_days 2020-10-16 17:38:44 UTC
Github openshift openshift-ansible pull 12158 0 None closed Bug 1829492: Add six month expiry check back to upgrades 2020-10-16 17:38:44 UTC
Red Hat Product Errata RHBA-2020:2215 0 None None None 2020-05-28 05:44:20 UTC

Description Joel Rosental R. 2020-04-29 15:58:32 UTC
Description of problem:
While running upgrade_control_plane.yml playbook it fails if with the following error regardless the "openshift_certificate_expiry_warning_days" has been previously set in the inventory file to a lower value (e.g: 90) if any of the cluster certificates expire in less than 183 days:

"1. Hosts:    master01.myexample.com
    Play:     Inspect cluster certificates
    Task:     Fail when certs are near or already expired
    Message: Cluster certificates found to be expired or within 183 days of expiring. You may view the report at /root/cert-expiry-report.20200416T193315.html or /root/cert-expiry-report.20200416T193315.json."

The reason seems to be due to this value that is hard-coded in as a variable that is passed to this task [0] and overrides any other value that may be set in the inventory because of having a higher precedence when ansible evaluates them.

This was not present on openshift-ansible-3.11.170-2.git.5.8802564.el7.noarch.

[0]: https://github.com/openshift/openshift-ansible/blob/release-3.11/playbooks/common/openshift-cluster/upgrades/init.yml#L20

Version-Release number of the following components:
openshift-ansible-3.11.200-1.git.0.3f37acb.el7.noarch
ansible-2.6.20-1.el7ae.noarch


How reproducible:
Always if conditions are met, i.e: if any cluster certificate is expiring in less than 183 days, and the openshift_certificate_expiry_warning_days variable has been set through the inventory file.

Steps to Reproduce:
1. Set "openshift_certificate_expiry_warning_days" in the inventory to any value
2. Run /usr/share/ansible/openshift-ansible/playbooks/byo/openshift-cluster/upgrades/v3_11/upgrade_control_plane.yml playbook 


Actual results:

2020-04-16 19:59:26,132 p=27757 u=root |  TASK [openshift_certificate_expiry : Fail when certs are near or already expired] ******************************************************************
*********************************
2020-04-16 19:59:26,132 p=27757 u=root |  Thursday 16 April 2020  19:59:26 +0200 (0:00:10.114)       0:26:48.588 ******** 
2020-04-16 19:59:26,514 p=27757 u=root |  fatal: [master01.myexample.com]: FAILED! => {"changed": false, "msg": "Cluster certificates found to be expired or within 183 days of expiring. You m
ay view the report at /root/cert-expiry-report.20200416T193315.html or /root/cert-expiry-report.20200416T193315.json.\n"}
2020-04-16 19:59:27,711 p=27757 u=root |  fatal: [master02.myexample.com]: FAILED! => {"changed": false, "msg": "Cluster certificates found to be expired or within 183 days of expiring. You m
ay view the report at /root/cert-expiry-report.20200416T193242.html or /root/cert-expiry-report.20200416T193242.json.\n"}
2020-04-16 19:59:27,830 p=27757 u=root |  fatal: [master03.myexample.com]: FAILED! => {"changed": false, "msg": "Cluster certificates found to be expired or within 183 days of expiring. You m
ay view the report at /root/cert-expiry-report.20200416T193242.html or /root/cert-expiry-report.20200416T193242.json.\n"}
2020-04-16 19:59:27,832 p=27757 u=root |  NO MORE HOSTS LEFT *********************************************************************************************************************************
*********************************


Expected results:

This variable should not be overriden.

Additional info:

Comment 1 Russell Teague 2020-04-30 20:36:20 UTC
*** Bug 1829232 has been marked as a duplicate of this bug. ***

Comment 5 Gaoyun Pei 2020-05-19 10:53:07 UTC
Verify this bug with openshift-ansible-3.11.218-1.git.0.6f55149.el7.noarch.

When certificates are going to expire in less than 183 days:

1) Upgrade playbook will fail for certificates are near expired the by default

TASK [openshift_certificate_expiry : Check cert expirys on host] ****************************************************
ok: [ci-vm-10-0-149-234.hosted.upshift.rdu2.redhat.com] => 
..."days_remaining": 18, "expiry": "2022-05-19 03:38:23", "health": "warning", "issuer": "CN=openshift-signer@1589859503 ", "path": "/etc/origin/master/master.kubelet-client.crt", "serial": 3, "serial_hex": "0x3"}], "registry": [], "router": []}, "msg": "Checked 16 total certificates. Expired/Warning/OK: 0/7/9. Warning window: 183 days", "rc": 0, "summary": {"etcd_certificates": 3, "expired": 0, "kubeconfig_certificates": 4, "ok": 9, "registry_certs": 0, "router_certs": 0, "system_certificates": 9, "total": 16, "warning": 7}, "warn_certs": true}

...
TASK [openshift_certificate_expiry : Fail when certs are near or already expired] ***********************************
fatal: [ci-vm-10-0-149-234.hosted.upshift.rdu2.redhat.com]: FAILED! => {"changed": false, "msg": "Cluster certificates found to be expired or within 183 days of expiring. You may view the report at /root/cert-expiry-report.20220501T000040.html or /root/cert-expiry-report.20220501T000040.json.\n"}


2) With setting the openshift_certificate_expiry_warning_days to a smaller number, playbook could continue.
openshift_certificate_expiry_warning_days=7

TASK [openshift_certificate_expiry : Check cert expirys on host] ****************************************************
ok: [ci-vm-10-0-149-234.hosted.upshift.rdu2.redhat.com] => {"changed": false, "check_results": {"etcd": [], "kubeconfigs": [], "meta": {"checked_at_time": "2022-05-01 00:02:50.357203", "show_all": "False", "warn_before_date": "2022-05-08 00:02:50.357203", "warning_days": 7}, "ocp_certs": [], "registry": [], "router": []}, "msg": "Checked 16 total certificates. Expired/Warning/OK: 0/0/16. Warning window: 7 days", "rc": 0, "summary": {"etcd_certificates": 3, "expired": 0, "kubeconfig_certificates": 4, "ok": 16, "registry_certs": 0, "router_certs": 0, "system_certificates": 9, "total": 16, "warning": 0}, "warn_certs": false}

...
TASK [openshift_certificate_expiry : Fail when certs are near or already expired] ***********************************
skipping: [ci-vm-10-0-149-234.hosted.upshift.rdu2.redhat.com] => {"changed": false, "skip_reason": "Conditional result was False"}


3) Bypass this check failure with setting openshift_certificate_expiry_fail_on_warn=false

TASK [openshift_certificate_expiry : Check cert expirys on host] ****************************************************
ok: [ci-vm-10-0-149-234.hosted.upshift.rdu2.redhat.com] => 
... "days_remaining": 18, "expiry": "2022-05-19 03:38:23", "health": "warning", "issuer": "CN=openshift-signer@1589859503 ", "path": "/etc/origin/master/master.kubelet-client.crt", "serial": 3, "serial_hex": "0x3"}], "registry": [], "router": []}, "msg": "Checked 16 total certificates. Expired/Warning/OK: 0/7/9. Warning window: 183 days", "rc": 0, "summary": {"etcd_certificates": 3, "expired": 0, "kubeconfig_certificates": 4, "ok": 9, "registry_certs": 0, "router_certs": 0, "system_certificates": 9, "total": 16, "warning": 7}, "warn_certs": true}

...
TASK [openshift_certificate_expiry : Fail when certs are near or already expired] ***********************************
skipping: [ci-vm-10-0-149-234.hosted.upshift.rdu2.redhat.com] => {"changed": false, "skip_reason": "Conditional result was False"}

Comment 7 errata-xmlrpc 2020-05-28 05:44:13 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:2215


Note You need to log in before you can comment on or make changes to this bug.