Description of problem: Run redeploy-certificates.yml playbook against an ocp-3.10 cluster, playbook may fail as below: PLAY [Restart nodes] ******************************************************************************* ... TASK [Restart docker] ******************************************************************************************************************************************************* changed: [ec2-52-90-247-129.compute-1.amazonaws.com] => {"attempts": 1, "changed": true, "failed": false, "name": "docker", "state": "started", ... TASK [Wait for master API to come back online] ****************************************************************************************************************************** ok: [ec2-52-90-247-129.compute-1.amazonaws.com] => {"changed": false, "elapsed": 25, "failed": false, "path": null, "port": 8443, "search_regex": null, "state": "started"} TASK [restart node] ********************************************************************************************************************************************************* changed: [ec2-52-90-247-129.compute-1.amazonaws.com] => {"changed": true, "failed": false, "name": "atomic-openshift-node", "state": "started", "status": ... TASK [Wait for node to be ready] ******************************************************************************************************************************************** fatal: [ec2-52-90-247-129.compute-1.amazonaws.com]: FAILED! => {"failed": true, "msg": "The conditional check 'node_output.results.returncode == 0 and node_output.results.results[0].status.conditions | selectattr('type', 'match', '^Ready$') | map(attribute='status') | join | bool == True' failed. The error was: error while evaluating conditional (node_output.results.returncode == 0 and node_output.results.results[0].status.conditions | selectattr('type', 'match', '^Ready$') | map(attribute='status') | join | bool == True): 'dict object' has no attribute 'results'"} to retry, use: --limit @/usr/share/ansible/openshift-ansible/playbooks/redeploy-certificates.retry Sometimes master-api service still not available when running "TASK [Wait for node to be ready] ", since docker restart will also trigger master-api/controllers service restart, so maybe we need "verify API server" task before restart node. Version-Release number of the following components: openshift-ansible-3.10.0-0.58.0.git.0.d8f6377.el7.noarch How reproducible: 50% (3 failures in 6 attempts) Steps to Reproduce: 1.Run openshift cert redeploy playbook ansible-playbook -i host/310 -vvv /usr/share/ansible/openshift-ansible/playbooks/redeploy-certificates.yml Actual results: Playbook fails as in Description, then log into master, run "/usr/bin/oc get node ip-172-18-11-10.ec2.internal -o json -n default", it could work. Expected results: Additional info: Ansible inventory file and full log with "-vvv" could be found in attachment
The cert re-deploy playbooks are likely broken pretty badly. We'll try to address these in a 0-day.
There appear to be no active cases related to this bug. As such we're closing this bug in order to focus on bugs that are still tied to active customer cases. Please re-open this bug if you feel it was closed in error or a new active case is attached.