Bug 1557492 - Task "Wait for master to restart" will break upgrade/install if working through bastion
Summary: Task "Wait for master to restart" will break upgrade/install if working throu...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Installer
Version: 3.7.0
Hardware: Unspecified
OS: Unspecified
medium
medium
Target Milestone: ---
: 3.7.z
Assignee: Fabian von Feilitzsch
QA Contact: Johnny Liu
URL:
Whiteboard:
Depends On: 1541946
Blocks:
TreeView+ depends on / blocked
 
Reported: 2018-03-16 17:01 UTC by Fabian von Feilitzsch
Modified: 2018-04-29 14:37 UTC (History)
9 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of: 1541946
Environment:
Last Closed: 2018-04-29 14:36:36 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHSA-2018:1231 0 None None None 2018-04-29 14:37:19 UTC

Comment 1 Fabian von Feilitzsch 2018-03-16 17:10:18 UTC
https://github.com/openshift/openshift-ansible/pull/7557

Comment 2 Vladislav Walek 2018-03-17 09:27:06 UTC
Thank you Fabian.

Comment 4 Johnny Liu 2018-04-20 11:38:37 UTC
The test env to reproduce this bug is not easy to be created, but QE create a dummy env, reproduce it successfully, and test the RP for host behind bastion.

1. configure a target host through bastion in .ssh/config.
Host 35.192.5.114
    User root
    HostName 35.192.5.114
    IdentityFile ~/libra-new.pem
    VerifyHostKeyDNS yes
    StrictHostKeyChecking no
    PasswordAuthentication no
    UserKnownHostsFile /dev/null
    ProxyCommand ssh root@jialiu-pc2 -W %h:%p

2. use iptable to drop direct connect to the target host on ansible host.
# iptables -A OUTPUT -p tcp -m tcp -d 35.192.5.114 -j REJECT

3. create a dummy playbook to call the PR.
$ cat test-playbook.yaml 
- name: Restart masters
  hosts: testhost
  serial: 1
  tasks:
  - include: playbooks/common/openshift-master/restart_hosts.yml

4. adding the following line into inventory file.
[testhost]
35.192.5.114

5. run the test playbooks.
$ pwd
/usr/share/ansible/openshift-ansible
$ ansible-playbook -i /tmp/qe-inventory-host-file test-playbooks.yaml -v

Reproduce with openshift-ansible-3.7.23-1.git.0.bc406aa.el7.noarch.
After 600s, the playbooks failed as the following, actually the host already come back.
PLAY [Restart masters] *********************************************************************************************************************************************************

TASK [Gathering Facts] *********************************************************************************************************************************************************
ok: [35.192.5.114]

TASK [Restart master system] ***************************************************************************************************************************************************
changed: [35.192.5.114] => {"ansible_job_id": "885248510524.11362", "changed": true, "finished": 0, "results_file": "/root/.ansible_async/885248510524.11362", "started": 1}

TASK [set_fact] ****************************************************************************************************************************************************************
ok: [35.192.5.114] => {"ansible_facts": {"wait_for_host": "35.192.5.114"}, "changed": false}

TASK [Wait for master to restart] **********************************************************************************************************************************************


fatal: [35.192.5.114 -> localhost]: FAILED! => {"changed": false, "elapsed": 601, "msg": "Timeout when waiting for 35.192.5.114:22"}


Verified this bug with openshift-ansible-3.7.44-1.git.0.dbb912c.el7.noarch, and PASS.

PLAY [Restart masters] *********************************************************************************************************************************************************

TASK [Gathering Facts] *********************************************************************************************************************************************************
ok: [35.192.5.114]

TASK [Restart master system] ***************************************************************************************************************************************************
changed: [35.192.5.114] => {"ansible_job_id": "624323622282.5673", "changed": true, "finished": 0, "results_file": "/root/.ansible_async/624323622282.5673", "started": 1}

TASK [Wait for master to restart] **********************************************************************************************************************************************
ok: [35.192.5.114] => {"changed": false, "elapsed": 259}

Comment 6 Johnny Liu 2018-04-23 02:58:39 UTC
Per comment 4, move this bug to verified again.

Comment 10 errata-xmlrpc 2018-04-29 14:36:36 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2018:1231


Note You need to log in before you can comment on or make changes to this bug.