Description of problem: The upgrade failed due to node_config_upgrade.yml was running on the standalone etcd. Version-Release number of selected component (if applicable): atomic-openshift-utils-3.3.13 How reproducible: always Steps to Reproduce: 1. install ha environment with standalone etcd 2. run upgrade playbook ansible-playbook /usr/share/ansible/openshift-ansible/playbooks/byo/openshift-cluster/upgrades/v3_3/upgrade.yml Actual results: TASK [Restart containerized services] ****************************************** skipping: [openshift-223.lab.eng.nay.redhat.com] => (item=etcd_container) skipping: [openshift-223.lab.eng.nay.redhat.com] => (item=openvswitch) skipping: [openshift-223.lab.eng.nay.redhat.com] => (item=atomic-openshift-master) skipping: [openshift-223.lab.eng.nay.redhat.com] => (item=atomic-openshift-master-api) skipping: [openshift-223.lab.eng.nay.redhat.com] => (item=atomic-openshift-master-controllers) skipping: [openshift-223.lab.eng.nay.redhat.com] => (item=atomic-openshift-node) TASK [Wait for master API to come back online] ********************************* skipping: [openshift-223.lab.eng.nay.redhat.com] TASK [include] ***************************************************************** included: /usr/share/ansible/openshift-ansible/playbooks/common/openshift-cluster/upgrades/v3_3/node_config_upgrade.yml for openshift-223.lab.eng.nay.redhat.com TASK [modify_yaml] ************************************************************* fatal: [openshift-223.lab.eng.nay.redhat.com]: FAILED! => {"changed": false, "failed": true, "msg": "[Errno 2] No such file or directory: '/etc/origin/node/node-config.yaml'"} NO MORE HOSTS LEFT ************************************************************* to retry, use: --limit @/usr/share/ansible/openshift-ansible/playbooks/byo/openshift-cluster/upgrades/v3_3/upgrade.retry PLAY RECAP ********************************************************************* localhost : ok=22 changed=11 unreachable=0 failed=0 openshift-202.lab.eng.nay.redhat.com : ok=195 changed=25 unreachable=0 failed=0 openshift-210.lab.eng.nay.redhat.com : ok=78 changed=1 unreachable=0 failed=0 openshift-218.lab.eng.nay.redhat.com : ok=199 changed=28 unreachable=0 failed=0 openshift-220.lab.eng.nay.redhat.com : ok=78 changed=1 unreachable=0 failed=0 openshift-223.lab.eng.nay.redhat.com : ok=86 changed=2 unreachable=0 failed=1 Expected results: Additional info:
Technically can be reproduced a little more easily just by having a dedicated etcd node, full HA not required. Fixed in: https://github.com/openshift/openshift-ansible/pull/2348 I missed a conditional on this hook to check that the host is actually a node, this block runs on several systems to help batch things amidst docker restarts.
Verified and pass on atomic-openshift-utils-3.3.14-1
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2016:1933