Bug 1829492

Summary: Upgrade playbook fails if certificates are going to expire in less than 183 days and the openshift_certificate_expiry_warning_days has been set in the inventory file
Product: OpenShift Container Platform Reporter: Joel Rosental R. <jrosenta>
Component: InstallerAssignee: Russell Teague <rteague>
Installer sub component: openshift-ansible QA Contact: Gaoyun Pei <gpei>
Status: CLOSED ERRATA Docs Contact:
Severity: medium    
Priority: medium CC: apjagtap, bleanhar
Version: 3.11.0   
Target Milestone: ---   
Target Release: 3.11.z   
Hardware: Unspecified   
OS: All   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Cause: The variable openshift_certificate_expiry_warning_days was hard-coded for one part of the code calling the openshift_certificate_expiry role during upgrades. Consequence: This prevented overriding the variable in the inventory. Fix: Replaced the hard-coded value with a task to set a value of six months if the variable has not been defined by the user. Result: Override possible in inventory and upgrades will default to six months.
Story Points: ---
Clone Of: Environment:
Last Closed: 2020-05-28 05:44:13 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Joel Rosental R. 2020-04-29 15:58:32 UTC
Description of problem:
While running upgrade_control_plane.yml playbook it fails if with the following error regardless the "openshift_certificate_expiry_warning_days" has been previously set in the inventory file to a lower value (e.g: 90) if any of the cluster certificates expire in less than 183 days:

"1. Hosts:    master01.myexample.com
    Play:     Inspect cluster certificates
    Task:     Fail when certs are near or already expired
    Message: Cluster certificates found to be expired or within 183 days of expiring. You may view the report at /root/cert-expiry-report.20200416T193315.html or /root/cert-expiry-report.20200416T193315.json."

The reason seems to be due to this value that is hard-coded in as a variable that is passed to this task [0] and overrides any other value that may be set in the inventory because of having a higher precedence when ansible evaluates them.

This was not present on openshift-ansible-3.11.170-2.git.5.8802564.el7.noarch.

[0]: https://github.com/openshift/openshift-ansible/blob/release-3.11/playbooks/common/openshift-cluster/upgrades/init.yml#L20

Version-Release number of the following components:
openshift-ansible-3.11.200-1.git.0.3f37acb.el7.noarch
ansible-2.6.20-1.el7ae.noarch


How reproducible:
Always if conditions are met, i.e: if any cluster certificate is expiring in less than 183 days, and the openshift_certificate_expiry_warning_days variable has been set through the inventory file.

Steps to Reproduce:
1. Set "openshift_certificate_expiry_warning_days" in the inventory to any value
2. Run /usr/share/ansible/openshift-ansible/playbooks/byo/openshift-cluster/upgrades/v3_11/upgrade_control_plane.yml playbook 


Actual results:

2020-04-16 19:59:26,132 p=27757 u=root |  TASK [openshift_certificate_expiry : Fail when certs are near or already expired] ******************************************************************
*********************************
2020-04-16 19:59:26,132 p=27757 u=root |  Thursday 16 April 2020  19:59:26 +0200 (0:00:10.114)       0:26:48.588 ******** 
2020-04-16 19:59:26,514 p=27757 u=root |  fatal: [master01.myexample.com]: FAILED! => {"changed": false, "msg": "Cluster certificates found to be expired or within 183 days of expiring. You m
ay view the report at /root/cert-expiry-report.20200416T193315.html or /root/cert-expiry-report.20200416T193315.json.\n"}
2020-04-16 19:59:27,711 p=27757 u=root |  fatal: [master02.myexample.com]: FAILED! => {"changed": false, "msg": "Cluster certificates found to be expired or within 183 days of expiring. You m
ay view the report at /root/cert-expiry-report.20200416T193242.html or /root/cert-expiry-report.20200416T193242.json.\n"}
2020-04-16 19:59:27,830 p=27757 u=root |  fatal: [master03.myexample.com]: FAILED! => {"changed": false, "msg": "Cluster certificates found to be expired or within 183 days of expiring. You m
ay view the report at /root/cert-expiry-report.20200416T193242.html or /root/cert-expiry-report.20200416T193242.json.\n"}
2020-04-16 19:59:27,832 p=27757 u=root |  NO MORE HOSTS LEFT *********************************************************************************************************************************
*********************************


Expected results:

This variable should not be overriden.

Additional info:

Comment 1 Russell Teague 2020-04-30 20:36:20 UTC
*** Bug 1829232 has been marked as a duplicate of this bug. ***

Comment 5 Gaoyun Pei 2020-05-19 10:53:07 UTC
Verify this bug with openshift-ansible-3.11.218-1.git.0.6f55149.el7.noarch.

When certificates are going to expire in less than 183 days:

1) Upgrade playbook will fail for certificates are near expired the by default

TASK [openshift_certificate_expiry : Check cert expirys on host] ****************************************************
ok: [ci-vm-10-0-149-234.hosted.upshift.rdu2.redhat.com] => 
..."days_remaining": 18, "expiry": "2022-05-19 03:38:23", "health": "warning", "issuer": "CN=openshift-signer@1589859503 ", "path": "/etc/origin/master/master.kubelet-client.crt", "serial": 3, "serial_hex": "0x3"}], "registry": [], "router": []}, "msg": "Checked 16 total certificates. Expired/Warning/OK: 0/7/9. Warning window: 183 days", "rc": 0, "summary": {"etcd_certificates": 3, "expired": 0, "kubeconfig_certificates": 4, "ok": 9, "registry_certs": 0, "router_certs": 0, "system_certificates": 9, "total": 16, "warning": 7}, "warn_certs": true}

...
TASK [openshift_certificate_expiry : Fail when certs are near or already expired] ***********************************
fatal: [ci-vm-10-0-149-234.hosted.upshift.rdu2.redhat.com]: FAILED! => {"changed": false, "msg": "Cluster certificates found to be expired or within 183 days of expiring. You may view the report at /root/cert-expiry-report.20220501T000040.html or /root/cert-expiry-report.20220501T000040.json.\n"}


2) With setting the openshift_certificate_expiry_warning_days to a smaller number, playbook could continue.
openshift_certificate_expiry_warning_days=7

TASK [openshift_certificate_expiry : Check cert expirys on host] ****************************************************
ok: [ci-vm-10-0-149-234.hosted.upshift.rdu2.redhat.com] => {"changed": false, "check_results": {"etcd": [], "kubeconfigs": [], "meta": {"checked_at_time": "2022-05-01 00:02:50.357203", "show_all": "False", "warn_before_date": "2022-05-08 00:02:50.357203", "warning_days": 7}, "ocp_certs": [], "registry": [], "router": []}, "msg": "Checked 16 total certificates. Expired/Warning/OK: 0/0/16. Warning window: 7 days", "rc": 0, "summary": {"etcd_certificates": 3, "expired": 0, "kubeconfig_certificates": 4, "ok": 16, "registry_certs": 0, "router_certs": 0, "system_certificates": 9, "total": 16, "warning": 0}, "warn_certs": false}

...
TASK [openshift_certificate_expiry : Fail when certs are near or already expired] ***********************************
skipping: [ci-vm-10-0-149-234.hosted.upshift.rdu2.redhat.com] => {"changed": false, "skip_reason": "Conditional result was False"}


3) Bypass this check failure with setting openshift_certificate_expiry_fail_on_warn=false

TASK [openshift_certificate_expiry : Check cert expirys on host] ****************************************************
ok: [ci-vm-10-0-149-234.hosted.upshift.rdu2.redhat.com] => 
... "days_remaining": 18, "expiry": "2022-05-19 03:38:23", "health": "warning", "issuer": "CN=openshift-signer@1589859503 ", "path": "/etc/origin/master/master.kubelet-client.crt", "serial": 3, "serial_hex": "0x3"}], "registry": [], "router": []}, "msg": "Checked 16 total certificates. Expired/Warning/OK: 0/7/9. Warning window: 183 days", "rc": 0, "summary": {"etcd_certificates": 3, "expired": 0, "kubeconfig_certificates": 4, "ok": 9, "registry_certs": 0, "router_certs": 0, "system_certificates": 9, "total": 16, "warning": 7}, "warn_certs": true}

...
TASK [openshift_certificate_expiry : Fail when certs are near or already expired] ***********************************
skipping: [ci-vm-10-0-149-234.hosted.upshift.rdu2.redhat.com] => {"changed": false, "skip_reason": "Conditional result was False"}

Comment 7 errata-xmlrpc 2020-05-28 05:44:13 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:2215