Bug 1622099
| Summary: | Client certificates expire in App and Infra nodes and are not rotated | ||||||
|---|---|---|---|---|---|---|---|
| Product: | OpenShift Container Platform | Reporter: | Jon Uriarte <juriarte> | ||||
| Component: | Node | Assignee: | Ryan Phillips <rphillips> | ||||
| Status: | CLOSED NOTABUG | QA Contact: | DeShuai Ma <dma> | ||||
| Severity: | high | Docs Contact: | |||||
| Priority: | unspecified | ||||||
| Version: | 3.10.0 | CC: | aos-bugs, jokerman, mmccomas, sdodson, sjenning | ||||
| Target Milestone: | --- | ||||||
| Target Release: | --- | ||||||
| Hardware: | Unspecified | ||||||
| OS: | Unspecified | ||||||
| Whiteboard: | |||||||
| Fixed In Version: | Doc Type: | If docs needed, set a value | |||||
| Doc Text: | Story Points: | --- | |||||
| Clone Of: | Environment: | ||||||
| Last Closed: | 2018-08-29 16:14:00 UTC | Type: | Bug | ||||
| Regression: | --- | Mount Type: | --- | ||||
| Documentation: | --- | CRM: | |||||
| Verified Versions: | Category: | --- | |||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||
| Embargoed: | |||||||
| Attachments: |
|
||||||
This appears to be a problem that's occurring after certificate approval and deployment. Moving to Pod team to triage further. |
Created attachment 1478526 [details] Logs and inventory file Description of problem: In a successful OCP on OpenStack deployment, and after some hours, the client certificates in App and Infra nodes do expire and are not rotated. This causes the atomic-openshift-node service in those nodes to be restarted, but cannot start successfully and end up in a service restart loop. App and Infra nodes remain in NotReady status when this happens. Version-Release number of the following components: rpm -q openshift-ansible openshift-ansible-3.10.34-1.git.0.48df172None.noarch rpm -q ansible ansible-2.4.6.0-1.el7ae.noarch ansible --version ansible 2.4.6.0 config file = /etc/ansible/ansible.cfg configured module search path = [u'/home/cloud-user/.ansible/plugins/modules', u'/usr/share/ansible/plugins/modules'] ansible python module location = /usr/lib/python2.7/site-packages/ansible executable location = /usr/bin/ansible python version = 2.7.5 (default, Feb 20 2018, 09:19:12) [GCC 4.8.5 20150623 (Red Hat 4.8.5-28)] How reproducible: always after some hours, when certificates are not rotated Steps to Reproduce: 1. Install OpenStack 2. Install OpenShift via OpenStack playbooks (kuryr enabled) ansible-playbook --user openshift -i /usr/share/ansible/openshift-ansible/playbooks/openstack/inventory.py -i inventory /usr/share/ansible/openshift-ansible/playbooks/openstack/openshift-cluster/install.yml Deployed 1 master, 1 infra and 2 app nodes Actual results: The ansible playbook ends successfully, with no error. The nodes are deployed correctly (in Ready status) and certificate rotation seems to be working. But after some time, the nodes go to NotReady status: $ oc get nodes NAME STATUS ROLES AGE VERSION app-node-0.openshift.example.com NotReady compute 1d v1.10.0+b81c8f8 app-node-1.openshift.example.com NotReady compute 1d v1.10.0+b81c8f8 infra-node-0.openshift.example.com NotReady infra 1d v1.10.0+b81c8f8 master-0.openshift.example.com Ready master 1d v1.10.0+b81c8f8 At this point, atomic-openshift-node service is stopped (see app0-logs.txt) due to client certificate expiration, and started afterwards. The service cannot start successfully (see app0-logs2.txt), so it enters in a service restart loop. It seems to be related to certificate rotation, which apparently stops working so no new certificates are created after the last one expires. [root@app-node-0 ~]# ls -ltr /etc/origin/node/certificates/ total 96 -rw-------. 1 root root 1171 ago 22 10:29 kubelet-client-2018-08-22-10-29-27.pem -rw-------. 1 root root 1293 ago 22 10:29 kubelet-server-2018-08-22-10-29-31.pem -rw-------. 1 root root 1171 ago 22 10:38 kubelet-client-2018-08-22-10-38-55.pem -rw-------. 1 root root 1293 ago 22 10:40 kubelet-server-2018-08-22-10-40-09.pem -rw-------. 1 root root 1293 ago 22 10:50 kubelet-server-2018-08-22-10-50-24.pem -rw-------. 1 root root 1171 ago 22 10:51 kubelet-client-2018-08-22-10-51-38.pem -rw-------. 1 root root 1293 ago 22 11:01 kubelet-server-2018-08-22-11-01-20.pem -rw-------. 1 root root 1171 ago 22 11:04 kubelet-client-2018-08-22-11-04-22.pem -rw-------. 1 root root 1293 ago 22 11:11 kubelet-server-2018-08-22-11-11-12.pem -rw-------. 1 root root 1171 ago 22 11:14 kubelet-client-2018-08-22-11-14-30.pem -rw-------. 1 root root 1293 ago 22 11:23 kubelet-server-2018-08-22-11-23-29.pem -rw-------. 1 root root 1171 ago 22 11:26 kubelet-client-2018-08-22-11-26-58.pem -rw-------. 1 root root 1293 ago 22 11:33 kubelet-server-2018-08-22-11-33-35.pem -rw-------. 1 root root 1171 ago 22 11:36 kubelet-client-2018-08-22-11-36-35.pem -rw-------. 1 root root 1293 ago 22 11:45 kubelet-server-2018-08-22-11-45-09.pem -rw-------. 1 root root 1171 ago 22 11:46 kubelet-client-2018-08-22-11-46-30.pem -rw-------. 1 root root 1293 ago 22 11:56 kubelet-server-2018-08-22-11-56-55.pem -rw-------. 1 root root 1171 ago 22 11:57 kubelet-client-2018-08-22-11-57-19.pem -rw-------. 1 root root 1293 ago 22 12:08 kubelet-server-2018-08-22-12-08-52.pem -rw-------. 1 root root 1171 ago 22 12:09 kubelet-client-2018-08-22-12-09-22.pem -rw-------. 1 root root 1293 ago 22 12:18 kubelet-server-2018-08-22-12-18-23.pem -rw-------. 1 root root 1171 ago 22 12:19 kubelet-client-2018-08-22-12-19-58.pem lrwxrwxrwx. 1 root root 68 ago 22 12:29 kubelet-client-current.pem -> /etc/origin/node/certificates/kubelet-client-2018-08-22-12-29-40.pem -rw-------. 1 root root 1171 ago 22 12:29 kubelet-client-2018-08-22-12-29-40.pem lrwxrwxrwx. 1 root root 68 ago 22 12:30 kubelet-server-current.pem -> /etc/origin/node/certificates/kubelet-server-2018-08-22-12-30-22.pem -rw-------. 1 root root 1293 ago 22 12:30 kubelet-server-2018-08-22-12-30-22.pem When the last valid certificate (kubelet-client-current.pem -> /etc/origin/node/certificates/kubelet-client-2018-08-22-12-29-40.pem) expires: Not Before: Aug 22 16:25:00 2018 GMT Not After : Aug 22 16:45:00 2018 GMT the atomic-openshift-node service restart loop starts. Note that certificate validity is 20 minutes. This happens to the App and Infra nodes in the same way. All the csr-s are in Pending status (see master-csr.txt). Find the inventory in the attached file (see OSEv3.yml). Expected results: All the nodes in Ready status, and client certificates rotated if necessary.