1557492 – Task "Wait for master to restart" will break upgrade/install if working through bastion

Bug 1557492 - Task "Wait for master to restart" will break upgrade/install if working through bastion

Summary: Task "Wait for master to restart" will break upgrade/install if working throu...

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	OpenShift Container Platform
Classification:	Red Hat
Component:	Installer
Sub Component:
Version:	3.7.0
Hardware:	Unspecified
OS:	Unspecified
Priority:	medium
Severity:	medium
Target Milestone:	---
Target Release:	3.7.z
Assignee:	Fabian von Feilitzsch
QA Contact:	Johnny Liu
Docs Contact:
URL:
Whiteboard:
Depends On:	1541946
Blocks:
TreeView+	depends on / blocked

Reported:	2018-03-16 17:01 UTC by Fabian von Feilitzsch
Modified:	2018-04-29 14:37 UTC (History)
CC List:	9 users (show)
Fixed In Version:
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:	1541946
Environment:
Last Closed:	2018-04-29 14:36:36 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Product Errata	RHSA-2018:1231	0	None	None	None	2018-04-29 14:37:19 UTC

Comment 1 Fabian von Feilitzsch 2018-03-16 17:10:18 UTC

https://github.com/openshift/openshift-ansible/pull/7557

Comment 2 Vladislav Walek 2018-03-17 09:27:06 UTC

Thank you Fabian.

Comment 4 Johnny Liu 2018-04-20 11:38:37 UTC

The test env to reproduce this bug is not easy to be created, but QE create a dummy env, reproduce it successfully, and test the RP for host behind bastion.

1. configure a target host through bastion in .ssh/config.
Host 35.192.5.114
    User root
    HostName 35.192.5.114
    IdentityFile ~/libra-new.pem
    VerifyHostKeyDNS yes
    StrictHostKeyChecking no
    PasswordAuthentication no
    UserKnownHostsFile /dev/null
    ProxyCommand ssh root@jialiu-pc2 -W %h:%p

2. use iptable to drop direct connect to the target host on ansible host.
# iptables -A OUTPUT -p tcp -m tcp -d 35.192.5.114 -j REJECT

3. create a dummy playbook to call the PR.
$ cat test-playbook.yaml 
- name: Restart masters
  hosts: testhost
  serial: 1
  tasks:
  - include: playbooks/common/openshift-master/restart_hosts.yml

4. adding the following line into inventory file.
[testhost]
35.192.5.114

5. run the test playbooks.
$ pwd
/usr/share/ansible/openshift-ansible
$ ansible-playbook -i /tmp/qe-inventory-host-file test-playbooks.yaml -v

Reproduce with openshift-ansible-3.7.23-1.git.0.bc406aa.el7.noarch.
After 600s, the playbooks failed as the following, actually the host already come back.
PLAY [Restart masters] *********************************************************************************************************************************************************

TASK [Gathering Facts] *********************************************************************************************************************************************************
ok: [35.192.5.114]

TASK [Restart master system] ***************************************************************************************************************************************************
changed: [35.192.5.114] => {"ansible_job_id": "885248510524.11362", "changed": true, "finished": 0, "results_file": "/root/.ansible_async/885248510524.11362", "started": 1}

TASK [set_fact] ****************************************************************************************************************************************************************
ok: [35.192.5.114] => {"ansible_facts": {"wait_for_host": "35.192.5.114"}, "changed": false}

TASK [Wait for master to restart] **********************************************************************************************************************************************


fatal: [35.192.5.114 -> localhost]: FAILED! => {"changed": false, "elapsed": 601, "msg": "Timeout when waiting for 35.192.5.114:22"}


Verified this bug with openshift-ansible-3.7.44-1.git.0.dbb912c.el7.noarch, and PASS.

PLAY [Restart masters] *********************************************************************************************************************************************************

TASK [Gathering Facts] *********************************************************************************************************************************************************
ok: [35.192.5.114]

TASK [Restart master system] ***************************************************************************************************************************************************
changed: [35.192.5.114] => {"ansible_job_id": "624323622282.5673", "changed": true, "finished": 0, "results_file": "/root/.ansible_async/624323622282.5673", "started": 1}

TASK [Wait for master to restart] **********************************************************************************************************************************************
ok: [35.192.5.114] => {"changed": false, "elapsed": 259}

Comment 6 Johnny Liu 2018-04-23 02:58:39 UTC

Per comment 4, move this bug to verified again.

Comment 10 errata-xmlrpc 2018-04-29 14:36:36 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2018:1231

Note You need to log in before you can comment on or make changes to this bug.