Description of problem: Run upgrade_control_plane.yml to upgrade ha masters, after restart one of masters system, playbook failed and exit at TASK [Wait for master to restart]. TASK [Wait for master to restart] ********************************************** fatal: [openshift-119.x.x.x]: FAILED! => { "failed": true } MSG: the field 'args' has an invalid value, which appears to include a variable that is undefined. The error was: 'ansible_ssh_port' is undefined The error appears to have been in '/usr/share/ansible/openshift-ansible/playbooks/common/openshift-master/restart_hosts.yml': line 10, column 3, but may be elsewhere in the file depending on the exact syntax problem. The offending line appears to be: - name: Wait for master to restart ^ here Version-Release number of selected component (if applicable): ansible-2.2.1.0-2.el7.noarch openshift-ansible-playbooks-3.5.6-1.git.0.5e6099d.el7.noarch How reproducible: always Steps to Reproduce: 1.Install ocp3.4 for ha env. 2.Edit inventory file to add following variables openshift_rolling_restart_mode=system openshift_master_upgrade_hook=/root/work/playbooks/master_hook.yml openshift_master_upgrade_post_hook=/root/work/playbooks/post_master_hook.yml 3.Run upgrade master playbook # ansible-playbook -i /tmp/hosts /usr/share/ansible/openshift-ansible/playbooks/byo/openshift-cluster/upgrades/v3_5/upgrade_control_plane.yml Actual results: Playbook exited when restart master system. Expected results: Upgrade masters successfully. Additional info: # ansible -i /tmp/hosts masters -m shell -a "openshift version" openshift-151.x.x.x | SUCCESS | rc=0 >> openshift v3.4.1.5 kubernetes v1.4.0+776c994 etcd 3.1.0-rc.0 openshift-149.x.x.x| SUCCESS | rc=0 >> openshift v3.4.1.5 kubernetes v1.4.0+776c994 etcd 3.1.0-rc.0 openshift-119.x.x.x | SUCCESS | rc=0 >> openshift v3.4.1.5 kubernetes v1.4.0+776c994 etcd 3.1.0-rc.0 # ansible -i /tmp/hosts masters -m shell -a "docker ps" openshift-119.x.x.x | SUCCESS | rc=0 >> CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES 6954c4ec9e96 openshift3/node:v3.4.1.5 "/usr/local/bin/origi" About an hour ago Up About an hour atomic-openshift-node 2bfaafa944bc openshift3/ose:v3.5.0.18 "/usr/bin/openshift s" About an hour ago Up About an hour atomic-openshift-master-controllers b10e9c1c15f4 openshift3/ose:v3.5.0.18 "/usr/bin/openshift s" About an hour ago Up About an hour atomic-openshift-master-api 2b09c829d847 registry.access.redhat.com/rhel7/etcd "/usr/bin/etcd" About an hour ago Up About an hour etcd_container 2b5d6b52f5b9 openshift3/openvswitch:v3.4.1.5 "/usr/local/bin/ovs-r" About an hour ago Up About an hour openvswitch openshift-149.x.x.x | SUCCESS | rc=0 >> CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES fb8aa3156801 openshift3/ose:v3.4.1.5 "/usr/bin/openshift s" 2 hours ago Up 2 hours atomic-openshift-master-controllers 418c95fefa20 openshift3/node:v3.4.1.5 "/usr/local/bin/origi" 16 hours ago Up 16 hours atomic-openshift-node fd7a0741b717 openshift3/openvswitch:v3.4.1.5 "/usr/local/bin/ovs-r" 16 hours ago Up 16 hours openvswitch f5cf6e162fc5 openshift3/ose:v3.4.1.5 "/usr/bin/openshift s" 16 hours ago Up 16 hours atomic-openshift-master-api 3a728ebe0576 registry.access.redhat.com/rhel7/etcd "/usr/bin/etcd" 16 hours ago Up 16 hours etcd_container openshift-151.x.x.x | SUCCESS | rc=0 >> CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES 8c37f717ae5f openshift3/node:v3.4.1.5 "/usr/local/bin/origi" 16 hours ago Up 16 hours atomic-openshift-node 6417f46dae4c openshift3/openvswitch:v3.4.1.5 "/usr/local/bin/ovs-r" 16 hours ago Up 16 hours openvswitch 3db2d2fdb35f openshift3/ose:v3.4.1.5 "/usr/bin/openshift s" 16 hours ago Up 16 hours atomic-openshift-master-controllers 79601b2d8106 openshift3/ose:v3.4.1.5 "/usr/bin/openshift s" 16 hours ago Up 16 hours atomic-openshift-master-api 3769517cadf1 registry.access.redhat.com/rhel7/etcd "/usr/bin/etcd" 16 hours ago Up 16 hours etcd_container
We've changed the logic for how we test for host having been restarted. Please test with the latest build.
blocked verify by bug1423425
blocked verify by bug 1425688
The masters are rolling restarted when using openshift-ansible-3.5.13 and there is no abort for OCP upgrade.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2017:0903