Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 1421002

Summary: Fail to upgrade masters when set openshift_rolling_restart_mode=system
Product: OpenShift Container Platform Reporter: liujia <jiajliu>
Component: Cluster Version OperatorAssignee: Scott Dodson <sdodson>
Status: CLOSED ERRATA QA Contact: liujia <jiajliu>
Severity: medium Docs Contact:
Priority: medium    
Version: 3.5.0CC: anli, aos-bugs, jokerman, mmccomas
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Previously, an error in the upgrade playbooks prevented ansible from detecting when a host had successfully been rebooted. This error has been corrected and upgrades that use openshift_rolling_restart_mode=system now work properly.
Story Points: ---
Clone Of: Environment:
Last Closed: 2017-04-12 19:01:44 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description liujia 2017-02-10 05:31:56 UTC
Description of problem:
Run upgrade_control_plane.yml to upgrade ha masters, after restart one of masters system, playbook failed and exit at TASK [Wait for master to restart].
TASK [Wait for master to restart] **********************************************
fatal: [openshift-119.x.x.x]: FAILED! => {
    "failed": true
}
MSG:

the field 'args' has an invalid value, which appears to include a variable that is undefined. The error was: 'ansible_ssh_port' is undefined

The error appears to have been in '/usr/share/ansible/openshift-ansible/playbooks/common/openshift-master/restart_hosts.yml': line 10, column 3, but may
be elsewhere in the file depending on the exact syntax problem.

The offending line appears to be:
- name: Wait for master to restart
  ^ here

Version-Release number of selected component (if applicable):
ansible-2.2.1.0-2.el7.noarch
openshift-ansible-playbooks-3.5.6-1.git.0.5e6099d.el7.noarch

How reproducible:
always

Steps to Reproduce:
1.Install ocp3.4 for ha env. 
2.Edit inventory file to add following variables
openshift_rolling_restart_mode=system
openshift_master_upgrade_hook=/root/work/playbooks/master_hook.yml
openshift_master_upgrade_post_hook=/root/work/playbooks/post_master_hook.yml
3.Run upgrade master playbook
# ansible-playbook -i /tmp/hosts /usr/share/ansible/openshift-ansible/playbooks/byo/openshift-cluster/upgrades/v3_5/upgrade_control_plane.yml


Actual results:
Playbook exited when restart master system.

Expected results:
Upgrade masters successfully.

Additional info:

# ansible -i /tmp/hosts masters -m shell -a "openshift version"
openshift-151.x.x.x | SUCCESS | rc=0 >>
openshift v3.4.1.5
kubernetes v1.4.0+776c994
etcd 3.1.0-rc.0

openshift-149.x.x.x| SUCCESS | rc=0 >>
openshift v3.4.1.5
kubernetes v1.4.0+776c994
etcd 3.1.0-rc.0

openshift-119.x.x.x | SUCCESS | rc=0 >>
openshift v3.4.1.5
kubernetes v1.4.0+776c994
etcd 3.1.0-rc.0


# ansible -i /tmp/hosts masters -m shell -a "docker ps"
openshift-119.x.x.x | SUCCESS | rc=0 >>
CONTAINER ID        IMAGE                                   COMMAND                  CREATED             STATUS              PORTS               NAMES
6954c4ec9e96        openshift3/node:v3.4.1.5                "/usr/local/bin/origi"   About an hour ago   Up About an hour                        atomic-openshift-node
2bfaafa944bc        openshift3/ose:v3.5.0.18                "/usr/bin/openshift s"   About an hour ago   Up About an hour                        atomic-openshift-master-controllers
b10e9c1c15f4        openshift3/ose:v3.5.0.18                "/usr/bin/openshift s"   About an hour ago   Up About an hour                        atomic-openshift-master-api
2b09c829d847        registry.access.redhat.com/rhel7/etcd   "/usr/bin/etcd"          About an hour ago   Up About an hour                        etcd_container
2b5d6b52f5b9        openshift3/openvswitch:v3.4.1.5         "/usr/local/bin/ovs-r"   About an hour ago   Up About an hour                        openvswitch

openshift-149.x.x.x | SUCCESS | rc=0 >>
CONTAINER ID        IMAGE                                   COMMAND                  CREATED             STATUS              PORTS               NAMES
fb8aa3156801        openshift3/ose:v3.4.1.5                 "/usr/bin/openshift s"   2 hours ago         Up 2 hours                              atomic-openshift-master-controllers
418c95fefa20        openshift3/node:v3.4.1.5                "/usr/local/bin/origi"   16 hours ago        Up 16 hours                             atomic-openshift-node
fd7a0741b717        openshift3/openvswitch:v3.4.1.5         "/usr/local/bin/ovs-r"   16 hours ago        Up 16 hours                             openvswitch
f5cf6e162fc5        openshift3/ose:v3.4.1.5                 "/usr/bin/openshift s"   16 hours ago        Up 16 hours                             atomic-openshift-master-api
3a728ebe0576        registry.access.redhat.com/rhel7/etcd   "/usr/bin/etcd"          16 hours ago        Up 16 hours                             etcd_container

openshift-151.x.x.x | SUCCESS | rc=0 >>
CONTAINER ID        IMAGE                                   COMMAND                  CREATED             STATUS              PORTS               NAMES
8c37f717ae5f        openshift3/node:v3.4.1.5                "/usr/local/bin/origi"   16 hours ago        Up 16 hours                             atomic-openshift-node
6417f46dae4c        openshift3/openvswitch:v3.4.1.5         "/usr/local/bin/ovs-r"   16 hours ago        Up 16 hours                             openvswitch
3db2d2fdb35f        openshift3/ose:v3.4.1.5                 "/usr/bin/openshift s"   16 hours ago        Up 16 hours                             atomic-openshift-master-controllers
79601b2d8106        openshift3/ose:v3.4.1.5                 "/usr/bin/openshift s"   16 hours ago        Up 16 hours                             atomic-openshift-master-api
3769517cadf1        registry.access.redhat.com/rhel7/etcd   "/usr/bin/etcd"          16 hours ago        Up 16 hours                             etcd_container

Comment 1 Scott Dodson 2017-02-16 15:36:17 UTC
We've changed the logic for how we test for host having been restarted. Please test with the latest build.

Comment 2 liujia 2017-02-17 09:57:01 UTC
blocked verify by bug1423425

Comment 3 liujia 2017-02-22 07:40:36 UTC
blocked verify by bug 1425688

Comment 4 Anping Li 2017-02-27 10:07:10 UTC
The masters are rolling restarted when using openshift-ansible-3.5.13 and there is no abort for OCP upgrade.

Comment 6 errata-xmlrpc 2017-04-12 19:01:44 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2017:0903