Bug 1753050

Summary: redeploy cert playbook fails on - TASK [openshift_certificate_expiry : Check cert expirys on host]
Product: OpenShift Container Platform Reporter: Vladislav Walek <vwalek>
Component: InstallerAssignee: Joseph Callen <jcallen>
Installer sub component: openshift-ansible QA Contact: Gaoyun Pei <gpei>
Status: CLOSED ERRATA Docs Contact:
Severity: high    
Priority: high CC: a.dekker, jcallen, rhowe
Version: 3.11.0   
Target Milestone: ---   
Target Release: 3.11.z   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2020-03-20 00:12:40 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Vladislav Walek 2019-09-18 00:10:28 UTC
Description of problem:

the playbook fails on:

TASK [openshift_certificate_expiry : Check cert expirys on host] ***************
task path: /usr/share/ansible/openshift-ansible/roles/openshift_certificate_expiry/tasks/main.yml:8
fatal: [master1]: FAILED! => {"changed": false, "module_stderr": "Shared connection to master1 closed.\r\n", "module_stdout": "Traceback (most recent call last):\r\n  File \"/tmp/ansible_LS1_66/ansible_module_openshift_cert_expiry.py\", line 876, in <module>\r\n    main()\r\n  File \"/tmp/ansible_LS1_66/ansible_module_openshift_cert_expiry.py\", line 661, in main\r\n    c = cfg['users'][0]['user']['client-certificate-data']\r\nKeyError: 'client-certificate-data'\r\n", "msg": "MODULE FAILURE", "rc": 1}
	to retry, use: --limit @/usr/share/ansible/openshift-ansible/playbooks/byo/openshift-cluster/upgrades/v3_11/upgrade_control_plane.retry

Problem located here:
https://github.com/openshift/openshift-ansible/blob/b7ad55a99bc13c0affd2f90f22f7b57e93ef25fb/roles/openshift_certificate_expiry/tasks/main.yml#L8-L13
https://github.com/openshift/openshift-ansible/blob/b7ad55a99bc13c0affd2f90f22f7b57e93ef25fb/roles/lib_utils/library/openshift_cert_expiry.py#L876

Also I found that "node.kubeconfig" was missing "client-certificate-date". Even copying the certificate from the other kubeconfig it did not work.


Version-Release number of the following components:
openshift-ansible-playbooks-3.11.135-1.git.0.b7ad55a.el7.noarch
ansible-2.6.18-1.el7ae.noarch

FYI:
atomic-openshift-3.11.88-1.git.0.47f4e98.el7.x86_64
customer running it before minor upgrade

How reproducible:

Steps to Reproduce:
1.
2.
3.

Actual results:
TASK [openshift_certificate_expiry : Check cert expirys on host] ***************
task path: /usr/share/ansible/openshift-ansible/roles/openshift_certificate_expiry/tasks/main.yml:8
<date redacted>
Using module file /usr/share/ansible/openshift-ansible/roles/lib_utils/library/openshift_cert_expiry.py
<master1> ESTABLISH SSH CONNECTION FOR USER: root
<master1> SSH: EXEC ssh -o ControlMaster=auto -o ControlPersist=600s -o StrictHostKeyChecking=no -o KbdInteractiveAuthentication=no -o PreferredAuthentications=gssapi-with-mic,gssapi-keyex,hostbased,publickey -o PasswordAuthentication=no -o User=root -o ConnectTimeout=30 -o ControlPath=/root/.ansible/cp/%h-%r master1 '/bin/sh -c '"'"'/usr/bin/python && sleep 0'"'"''
<master1> (1, '', 'Traceback (most recent call last):\n  File "/tmp/ansible_UCo68d/ansible_module_openshift_cert_expiry.py", line 876, in <module>\n    main()\n  File "/tmp/ansible_UCo68d/ansible_module_openshift_cert_expiry.py", line 661, in main\n    c = cfg[\'users\'][0][\'user\'][\'client-certificate-data\']\nKeyError: \'client-certificate-data\'\n')
<master1> Failed to connect to the host via ssh: Traceback (most recent call last):
  File "/tmp/ansible_UCo68d/ansible_module_openshift_cert_expiry.py", line 876, in <module>
    main()
  File "/tmp/ansible_UCo68d/ansible_module_openshift_cert_expiry.py", line 661, in main
    c = cfg['users'][0]['user']['client-certificate-data']
KeyError: 'client-certificate-data'
The full traceback is:
Traceback (most recent call last):
  File "/tmp/ansible_UCo68d/ansible_module_openshift_cert_expiry.py", line 876, in <module>
    main()
  File "/tmp/ansible_UCo68d/ansible_module_openshift_cert_expiry.py", line 661, in main
    c = cfg['users'][0]['user']['client-certificate-data']
KeyError: 'client-certificate-data'
fatal: [master1]: FAILED! => {
    "changed": false, 
    "module_stderr": "Traceback (most recent call last):\n  File \"/tmp/ansible_UCo68d/ansible_module_openshift_cert_expiry.py\", line 876, in <module>\n    main()\n  File \"/tmp/ansible_UCo68d/ansible_module_openshift_cert_expiry.py\", line 661, in main\n    c = cfg['users'][0]['user']['client-certificate-data']\nKeyError: 'client-certificate-data'\n", 
    "module_stdout": "", 
    "msg": "MODULE FAILURE", 
    "rc": 1
}

Expected results:

Additional info:
maybe related to:
https://bugzilla.redhat.com/show_bug.cgi?id=1520971#c6
https://github.com/openshift/openshift-ansible/pull/8132/files

Comment 20 Gaoyun Pei 2020-03-10 07:10:14 UTC
Verify this bug with openshift-ansible-3.11.187-1.git.0.154c878.el7.noarch.rpm

For node.kubeconfig that doesn't contain "client-certificate-data", playbook will load the client-certificate file to check.

TASK [openshift_certificate_expiry : Check cert expirys on host] ***************
...
        "kubeconfigs": [
            {
                "cert_cn": "O:system:nodes, CN:system:node:ip-172-18-11-101.ec2.internal", 
                "days_remaining": 159, 
                "expiry": "2021-03-10 01:54:00", 
                "health": "warning", 
                "issuer": "CN=openshift-signer@1583805359 ", 
                "path": "/etc/origin/node/certificates/kubelet-client-current.pem", 
                "serial": 342271291349899671730079880867628947407345218265, 
                "serial_hex": "0x3bf3f9bb937df7abdf6416ad691b1360d9e2f2d9L"
            },

Comment 22 errata-xmlrpc 2020-03-20 00:12:40 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:0793