1421002 – Fail to upgrade masters when set openshift_rolling_restart_mode=system

Bug 1421002 - Fail to upgrade masters when set openshift_rolling_restart_mode=system

Summary: Fail to upgrade masters when set openshift_rolling_restart_mode=system

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	OpenShift Container Platform
Classification:	Red Hat
Component:	Cluster Version Operator
Sub Component:
Version:	3.5.0
Hardware:	Unspecified
OS:	Unspecified
Priority:	medium
Severity:	medium
Target Milestone:	---
Target Release:	---
Assignee:	Scott Dodson
QA Contact:	liujia
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2017-02-10 05:31 UTC by liujia
Modified:	2017-07-24 14:11 UTC (History)
CC List:	4 users (show)
Fixed In Version:
Doc Type:	Bug Fix
Doc Text:	Previously, an error in the upgrade playbooks prevented ansible from detecting when a host had successfully been rebooted. This error has been corrected and upgrades that use openshift_rolling_restart_mode=system now work properly.
Clone Of:
Environment:
Last Closed:	2017-04-12 19:01:44 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Product Errata	RHBA-2017:0903	0	normal	SHIPPED_LIVE	OpenShift Container Platform atomic-openshift-utils bug fix and enhancement	2017-04-12 22:45:42 UTC

Description liujia 2017-02-10 05:31:56 UTC

Description of problem:
Run upgrade_control_plane.yml to upgrade ha masters, after restart one of masters system, playbook failed and exit at TASK [Wait for master to restart].
TASK [Wait for master to restart] **********************************************
fatal: [openshift-119.x.x.x]: FAILED! => {
    "failed": true
}
MSG:

the field 'args' has an invalid value, which appears to include a variable that is undefined. The error was: 'ansible_ssh_port' is undefined

The error appears to have been in '/usr/share/ansible/openshift-ansible/playbooks/common/openshift-master/restart_hosts.yml': line 10, column 3, but may
be elsewhere in the file depending on the exact syntax problem.

The offending line appears to be:
- name: Wait for master to restart
  ^ here

Version-Release number of selected component (if applicable):
ansible-2.2.1.0-2.el7.noarch
openshift-ansible-playbooks-3.5.6-1.git.0.5e6099d.el7.noarch

How reproducible:
always

Steps to Reproduce:
1.Install ocp3.4 for ha env. 
2.Edit inventory file to add following variables
openshift_rolling_restart_mode=system
openshift_master_upgrade_hook=/root/work/playbooks/master_hook.yml
openshift_master_upgrade_post_hook=/root/work/playbooks/post_master_hook.yml
3.Run upgrade master playbook
# ansible-playbook -i /tmp/hosts /usr/share/ansible/openshift-ansible/playbooks/byo/openshift-cluster/upgrades/v3_5/upgrade_control_plane.yml


Actual results:
Playbook exited when restart master system.

Expected results:
Upgrade masters successfully.

Additional info:

# ansible -i /tmp/hosts masters -m shell -a "openshift version"
openshift-151.x.x.x | SUCCESS | rc=0 >>
openshift v3.4.1.5
kubernetes v1.4.0+776c994
etcd 3.1.0-rc.0

openshift-149.x.x.x| SUCCESS | rc=0 >>
openshift v3.4.1.5
kubernetes v1.4.0+776c994
etcd 3.1.0-rc.0

openshift-119.x.x.x | SUCCESS | rc=0 >>
openshift v3.4.1.5
kubernetes v1.4.0+776c994
etcd 3.1.0-rc.0


# ansible -i /tmp/hosts masters -m shell -a "docker ps"
openshift-119.x.x.x | SUCCESS | rc=0 >>
CONTAINER ID        IMAGE                                   COMMAND                  CREATED             STATUS              PORTS               NAMES
6954c4ec9e96        openshift3/node:v3.4.1.5                "/usr/local/bin/origi"   About an hour ago   Up About an hour                        atomic-openshift-node
2bfaafa944bc        openshift3/ose:v3.5.0.18                "/usr/bin/openshift s"   About an hour ago   Up About an hour                        atomic-openshift-master-controllers
b10e9c1c15f4        openshift3/ose:v3.5.0.18                "/usr/bin/openshift s"   About an hour ago   Up About an hour                        atomic-openshift-master-api
2b09c829d847        registry.access.redhat.com/rhel7/etcd   "/usr/bin/etcd"          About an hour ago   Up About an hour                        etcd_container
2b5d6b52f5b9        openshift3/openvswitch:v3.4.1.5         "/usr/local/bin/ovs-r"   About an hour ago   Up About an hour                        openvswitch

openshift-149.x.x.x | SUCCESS | rc=0 >>
CONTAINER ID        IMAGE                                   COMMAND                  CREATED             STATUS              PORTS               NAMES
fb8aa3156801        openshift3/ose:v3.4.1.5                 "/usr/bin/openshift s"   2 hours ago         Up 2 hours                              atomic-openshift-master-controllers
418c95fefa20        openshift3/node:v3.4.1.5                "/usr/local/bin/origi"   16 hours ago        Up 16 hours                             atomic-openshift-node
fd7a0741b717        openshift3/openvswitch:v3.4.1.5         "/usr/local/bin/ovs-r"   16 hours ago        Up 16 hours                             openvswitch
f5cf6e162fc5        openshift3/ose:v3.4.1.5                 "/usr/bin/openshift s"   16 hours ago        Up 16 hours                             atomic-openshift-master-api
3a728ebe0576        registry.access.redhat.com/rhel7/etcd   "/usr/bin/etcd"          16 hours ago        Up 16 hours                             etcd_container

openshift-151.x.x.x | SUCCESS | rc=0 >>
CONTAINER ID        IMAGE                                   COMMAND                  CREATED             STATUS              PORTS               NAMES
8c37f717ae5f        openshift3/node:v3.4.1.5                "/usr/local/bin/origi"   16 hours ago        Up 16 hours                             atomic-openshift-node
6417f46dae4c        openshift3/openvswitch:v3.4.1.5         "/usr/local/bin/ovs-r"   16 hours ago        Up 16 hours                             openvswitch
3db2d2fdb35f        openshift3/ose:v3.4.1.5                 "/usr/bin/openshift s"   16 hours ago        Up 16 hours                             atomic-openshift-master-controllers
79601b2d8106        openshift3/ose:v3.4.1.5                 "/usr/bin/openshift s"   16 hours ago        Up 16 hours                             atomic-openshift-master-api
3769517cadf1        registry.access.redhat.com/rhel7/etcd   "/usr/bin/etcd"          16 hours ago        Up 16 hours                             etcd_container

Comment 1 Scott Dodson 2017-02-16 15:36:17 UTC

We've changed the logic for how we test for host having been restarted. Please test with the latest build.

Comment 2 liujia 2017-02-17 09:57:01 UTC

blocked verify by bug1423425

Comment 3 liujia 2017-02-22 07:40:36 UTC

blocked verify by bug 1425688

Comment 4 Anping Li 2017-02-27 10:07:10 UTC

The masters are rolling restarted when using openshift-ansible-3.5.13 and there is no abort for OCP upgrade.

Comment 6 errata-xmlrpc 2017-04-12 19:01:44 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2017:0903

Note You need to log in before you can comment on or make changes to this bug.