Description of problem: When running the playbook redeploy-node-certificates.yml, every node in the cluster gets its certificates regenerated and a new proper kubeconfig. Once the task is completed, the playbook restarts docker and atomic-openshift-node, this causes an unnecessary downtime in nodes since the only component loading the kubeconfig is atomic-openshift-node. The docker restart happens here: https://github.com/openshift/openshift-ansible/blob/9a405010c5a656f89866906d29866ba98493e91b/playbooks/openshift-node/private/restart.yml#L10 As this seems to be a task called from different playbooks, it would be great to add some kind of check or flag. If this task is called from redeploy_certs, then do not trigger the docker restart. Actual results: docker daemon gets restarted in playbook redeploy-node-certificates.yml Expected results: docker daemon not restarted. Only atomic-openshift-node restart should be needed to load the new kubeconfig and certificates. By doing this, this playbook can run without downtime.
Andrew, Do you agree there's no need to restart docker when re-deploying node certificates? Seems like this would only be necessary when a new CA is generated so that docker picks up that change.
Scott, I agree. Restarting docker should be skipped when we have not replaced the CA certificate.
Master PR https://github.com/openshift/openshift-ansible/pull/6855
Verify this bug with openshift-ansible-3.9.0-0.36.0.git.0.da68f13.el7.noarch Run node cert redeployment playbook, docker was not restart during redeployment. ansible-playbook -i host /usr/share/ansible/openshift-ansible/playbooks/openshift-node/redeploy-certificates.yml PLAY [Restart nodes] ******************************************************************************************************************************************************** TASK [Gathering Facts] ****************************************************************************************************************************************************** ok: [ec2-34-207-99-213.compute-1.amazonaws.com] TASK [Restart docker] ******************************************************************************************************************************************************* skipping: [ec2-34-207-99-213.compute-1.amazonaws.com] => {"changed": false, "skip_reason": "Conditional result was False"}
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2018:0489