Description of problem: Upgrading from 3.9 to 3.10 can fail due to node-config.yml not landing on a node in time for the checks to finish. This patch written by Ben Draper at Experian fixes that: https://github.com/openshift/openshift-ansible/pull/11576 Also, here's a patch to 3.11 which may or may not be relevent - feel free to suggest/nuke this PR: https://github.com/openshift/openshift-ansible/pull/11574/files Version-Release number of the following components: rpm -q openshift-ansible 3.10.x rpm -q ansible ansible-2.6.x How reproducible: When there are A LOT of nodes >20 it's almost 100% of the time. For small clusters this can happen, too.
Verified. openshift-ansible-3.10.153-1.git.0.2363fa8.el7 upgrade OCP v3.9 cluster of 1 lb + 3 masters + 2 infra nodes + 21 compute nodes, success! PLAY RECAP ********************************************************************* localhost : ok=38 changed=0 unreachable=0 failed=0 qe-wmeng1ug39-lb-1.0715-j7k.qe.rhcloud.com : ok=71 changed=6 unreachable=0 failed=0 qe-wmeng1ug39-master-etcd-1.0715-j7k.qe.rhcloud.com : ok=870 changed=237 unreachable=0 failed=0 qe-wmeng1ug39-master-etcd-2.0715-j7k.qe.rhcloud.com : ok=394 changed=112 unreachable=0 failed=0 qe-wmeng1ug39-master-etcd-3.0715-j7k.qe.rhcloud.com : ok=394 changed=112 unreachable=0 failed=0 qe-wmeng1ug39-node-primary-1.0715-j7k.qe.rhcloud.com : ok=187 changed=60 unreachable=0 failed=0 qe-wmeng1ug39-node-primary-10.0715-j7k.qe.rhcloud.com : ok=187 changed=60 unreachable=0 failed=0 qe-wmeng1ug39-node-primary-11.0715-j7k.qe.rhcloud.com : ok=187 changed=60 unreachable=0 failed=0 qe-wmeng1ug39-node-primary-12.0715-j7k.qe.rhcloud.com : ok=187 changed=60 unreachable=0 failed=0 qe-wmeng1ug39-node-primary-13.0715-j7k.qe.rhcloud.com : ok=187 changed=60 unreachable=0 failed=0 qe-wmeng1ug39-node-primary-14.0715-j7k.qe.rhcloud.com : ok=187 changed=60 unreachable=0 failed=0 qe-wmeng1ug39-node-primary-15.0715-j7k.qe.rhcloud.com : ok=187 changed=60 unreachable=0 failed=0 qe-wmeng1ug39-node-primary-16.0715-j7k.qe.rhcloud.com : ok=187 changed=60 unreachable=0 failed=0 qe-wmeng1ug39-node-primary-17.0715-j7k.qe.rhcloud.com : ok=187 changed=60 unreachable=0 failed=0 qe-wmeng1ug39-node-primary-18.0715-j7k.qe.rhcloud.com : ok=187 changed=60 unreachable=0 failed=0 qe-wmeng1ug39-node-primary-19.0715-j7k.qe.rhcloud.com : ok=187 changed=60 unreachable=0 failed=0 qe-wmeng1ug39-node-primary-2.0715-j7k.qe.rhcloud.com : ok=187 changed=60 unreachable=0 failed=0 qe-wmeng1ug39-node-primary-20.0715-j7k.qe.rhcloud.com : ok=187 changed=60 unreachable=0 failed=0 qe-wmeng1ug39-node-primary-21.0715-j7k.qe.rhcloud.com : ok=187 changed=60 unreachable=0 failed=0 qe-wmeng1ug39-node-primary-3.0715-j7k.qe.rhcloud.com : ok=187 changed=60 unreachable=0 failed=0 qe-wmeng1ug39-node-primary-4.0715-j7k.qe.rhcloud.com : ok=187 changed=60 unreachable=0 failed=0 qe-wmeng1ug39-node-primary-5.0715-j7k.qe.rhcloud.com : ok=187 changed=60 unreachable=0 failed=0 qe-wmeng1ug39-node-primary-6.0715-j7k.qe.rhcloud.com : ok=187 changed=60 unreachable=0 failed=0 qe-wmeng1ug39-node-primary-7.0715-j7k.qe.rhcloud.com : ok=187 changed=60 unreachable=0 failed=0 qe-wmeng1ug39-node-primary-8.0715-j7k.qe.rhcloud.com : ok=187 changed=60 unreachable=0 failed=0 qe-wmeng1ug39-node-primary-9.0715-j7k.qe.rhcloud.com : ok=187 changed=60 unreachable=0 failed=0 qe-wmeng1ug39-nrri-1.0715-j7k.qe.rhcloud.com : ok=187 changed=60 unreachable=0 failed=0 qe-wmeng1ug39-nrri-2.0715-j7k.qe.rhcloud.com : ok=187 changed=60 unreachable=0 failed=0
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2019:1755