Description of problem: The report created by running certificate expiration playbook (easy-mode.yaml) doesn't include expiration information for files like: /etc/origin/master/master.kubelet-client.crt and /etc/origin/master/master.proxy-client.crt Version-Release number of selected component (if applicable): Customer is running: openshift v3.5.5.31.36 How reproducible: Always Steps to Reproduce: 1. Execute easy-mode.yaml playbook to check certificate expiration information 2. 3. Actual results: These certs are not currently checked. Expected results: These files should be checked as well. Additional info: The fact of having these certs expired caused some issues for this customer like not being able to perform: $ oc rsh <pod> because of getting an error like: error: unable to upgrade connection: Unauthorized The error want away as soon as these two certs were regenerated again. P.S: They were not using `oc proxy`.
Created PR for 3.10 - https://github.com/openshift/openshift-ansible/pull/7904
Fix for 3.10 is in openshift-ansible-3.10.0-0.21.0 Created PRs for 3.6: https://github.com/openshift/openshift-ansible/pull/7943 3.7: https://github.com/openshift/openshift-ansible/pull/7942 3.9: https://github.com/openshift/openshift-ansible/pull/7941
Tried with openshift-ansible-3.10.0-0.22.0.git.0.b6ec617.el7.noarch. Certificate expiry check playbook failed as below: [root@gpei-preserved ~]# ansible-playbook -i host/host /usr/share/ansible/openshift-ansible/playbooks/openshift-checks/certificate_expiry/easy-mode.yaml -v Using /etc/ansible/ansible.cfg as config file PLAY [Check cert expirys] *************************************************************************************************************************************************** TASK [openshift_certificate_expiry : Check cert expirys on host] ************************************************************************************************************ fatal: [qe-gpei-310test2node-registry-router-1.0418-0gu.qe.rhcloud.com]: FAILED! => {"changed": false, "failed": true, "module_stderr": "Shared connection to qe-gpei-310test2node-registry-router-1.0418-0gu.qe.rhcloud.com closed.\r\n", "module_stdout": "Traceback (most recent call last):\r\n File \"/tmp/ansible_796x07/ansible_module_openshift_cert_expiry.py\", line 805, in <module>\r\n main()\r\n File \"/tmp/ansible_796x07/ansible_module_openshift_cert_expiry.py\", line 507, in main\r\n cert_meta['certFile'] = os.path.join(cfg_path, cfg['servingInfo']['certFile'])\r\nKeyError: 'certFile'\r\n", "msg": "MODULE FAILURE", "rc": 0} fatal: [qe-gpei-310test2master-etcd-1.0418-0gu.qe.rhcloud.com]: FAILED! => {"changed": false, "failed": true, "module_stderr": "Shared connection to qe-gpei-310test2master-etcd-1.0418-0gu.qe.rhcloud.com closed.\r\n", "module_stdout": "Traceback (most recent call last):\r\n File \"/tmp/ansible_jJD8pE/ansible_module_openshift_cert_expiry.py\", line 805, in <module>\r\n main()\r\n File \"/tmp/ansible_jJD8pE/ansible_module_openshift_cert_expiry.py\", line 507, in main\r\n cert_meta['certFile'] = os.path.join(cfg_path, cfg['servingInfo']['certFile'])\r\nKeyError: 'certFile'\r\n", "msg": "MODULE FAILURE", "rc": 0} to retry, use: --limit @/usr/share/ansible/openshift-ansible/playbooks/openshift-checks/certificate_expiry/easy-mode.retry PLAY RECAP ****************************************************************************************************************************************************************** qe-gpei-310test2master-etcd-1.0418-0gu.qe.rhcloud.com : ok=0 changed=0 unreachable=0 failed=1 qe-gpei-310test2node-registry-router-1.0418-0gu.qe.rhcloud.com : ok=0 changed=0 unreachable=0 failed=1
(In reply to Gaoyun Pei from comment #3) > File > \"/tmp/ansible_jJD8pE/ansible_module_openshift_cert_expiry.py\", line 507, > in main\r\n cert_meta['certFile'] = os.path.join(cfg_path, > cfg['servingInfo']['certFile'])\r\nKeyError: 'certFile'\r\n", Right, that happens when we're checking node certificate and there no such field there. Created https://github.com/openshift/openshift-ansible/pull/8017 to fix it
Fix is available in openshift-ansible-3.10.0-0.25.0
Test with openshift-ansible-3.10.0-0.27.0.git.0.abed3b7.el7.noarch, cert-expiry check playbook fails on node. [root@gpei-preserved ~]# ansible-playbook -i host/host /usr/share/ansible/openshift-ansible/playbooks/openshift-checks/certificate_expiry/easy-mode.yaml -v Using /etc/ansible/ansible.cfg as config file PLAY [Check cert expirys] *************************************************************************************************************************************************** TASK [openshift_certificate_expiry : Check cert expirys on host] ************************************************************************************************************ fatal: [qe-gpei-3102node-registry-router-1.0423-2l7.qe.rhcloud.com]: FAILED! => {"changed": false, "failed": true, "module_stderr": "Shared connection to qe-gpei-3102node-registry-router-1.0423-2l7.qe.rhcloud.com closed.\r\n", "module_stdout": "Traceback (most recent call last):\r\n File \"/tmp/ansible_b4DFlC/ansible_module_openshift_cert_expiry.py\", line 826, in <module>\r\n main()\r\n File \"/tmp/ansible_b4DFlC/ansible_module_openshift_cert_expiry.py\", line 590, in main\r\n c = cfg['users'][0]['user']['client-certificate-data']\r\nKeyError: 'client-certificate-data'\r\n", "msg": "MODULE FAILURE", "rc": 0} ok: [qe-gpei-3102master-etcd-1.0423-2l7.qe.rhcloud.com] => {"changed": false, "check_results": {"etcd": [{"cert_cn": "CN:etcd-signer@1524539393", "days_remaining": 1825, "expiry": "2023-04-23 03:10:03", "health": "ok", "path": "/etc/etcd/ca.crt", "serial": 15685371651196948480, "serial_hex"...
Good catch, it does fail on dedicated nodes. Created https://github.com/openshift/openshift-ansible/pull/8132 to fix this
Fix is available in openshift-ansible-3.10.0-0.30.0
Verify this bug with openshift-ansible-3.10.0-0.30.0.git.0.4f02952.el7.noarch. Run easy-mode.yaml playbook, it would generate the detailed cert report as /tmp/cert-expiry-report.html and /tmp/cert-expiry-report.json by default. List the certs checked by the playbook: [root@gpei-preserved host]# grep path /tmp/cert-expiry-report.json "path": "/etc/etcd/ca.crt", "path": "/etc/etcd/server.crt", "path": "/etc/etcd/peer.crt", "path": "/etc/origin/node/node.kubeconfig", "path": "/etc/origin/node/node.kubeconfig", "path": "/etc/origin/master/admin.kubeconfig", "path": "/etc/origin/master/openshift-master.kubeconfig", "path": "/etc/origin/master/master.server.crt", "path": "/etc/origin/master/master.proxy-client.crt", "path": "/etc/origin/master/master.kubelet-client.crt", "path": "/etc/origin/master/service-signer.crt", "path": "/etc/origin/master/master.etcd-client.crt", "path": "/etc/origin/master/master.etcd-ca.crt", "path": "/etc/origin/master/ca.crt", "path": "/etc/origin/node/client-ca.crt", "path": "/etc/origin/node/client-ca.crt", "path": "/api/v1/namespaces/default/secrets/registry-certificates", "path": "/api/v1/namespaces/default/secrets/router-certs", "path": "/etc/origin/node/client-ca.crt", "path": "/etc/origin/node/client-ca.crt", master.kubelet-client.crt and master.proxy-client.crt were checked.