Bug 1557492
| Summary: | Task "Wait for master to restart" will break upgrade/install if working through bastion | ||
|---|---|---|---|
| Product: | OpenShift Container Platform | Reporter: | Fabian von Feilitzsch <fabian> | 
| Component: | Installer | Assignee: | Fabian von Feilitzsch <fabian> | 
| Status: | CLOSED ERRATA | QA Contact: | Johnny Liu <jialiu> | 
| Severity: | medium | Docs Contact: | |
| Priority: | medium | ||
| Version: | 3.7.0 | CC: | aos-bugs, fabian, jiajliu, jialiu, jokerman, mmccomas, sdodson, vwalek, wmeng | 
| Target Milestone: | --- | ||
| Target Release: | 3.7.z | ||
| Hardware: | Unspecified | ||
| OS: | Unspecified | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | If docs needed, set a value | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | 1541946 | Environment: | |
| Last Closed: | 2018-04-29 14:36:36 UTC | Type: | Bug | 
| Regression: | --- | Mount Type: | --- | 
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
| Bug Depends On: | 1541946 | ||
| Bug Blocks: | |||
| 
 
        
          Comment 1
        
        
          Fabian von Feilitzsch
        
        
        
        
        
          2018-03-16 17:10:18 UTC
        
       
      
      
      
    Thank you Fabian. The test env to reproduce this bug is not easy to be created, but QE create a dummy env, reproduce it successfully, and test the RP for host behind bastion.
1. configure a target host through bastion in .ssh/config.
Host 35.192.5.114
    User root
    HostName 35.192.5.114
    IdentityFile ~/libra-new.pem
    VerifyHostKeyDNS yes
    StrictHostKeyChecking no
    PasswordAuthentication no
    UserKnownHostsFile /dev/null
    ProxyCommand ssh root@jialiu-pc2 -W %h:%p
2. use iptable to drop direct connect to the target host on ansible host.
# iptables -A OUTPUT -p tcp -m tcp -d 35.192.5.114 -j REJECT
3. create a dummy playbook to call the PR.
$ cat test-playbook.yaml 
- name: Restart masters
  hosts: testhost
  serial: 1
  tasks:
  - include: playbooks/common/openshift-master/restart_hosts.yml
4. adding the following line into inventory file.
[testhost]
35.192.5.114
5. run the test playbooks.
$ pwd
/usr/share/ansible/openshift-ansible
$ ansible-playbook -i /tmp/qe-inventory-host-file test-playbooks.yaml -v
Reproduce with openshift-ansible-3.7.23-1.git.0.bc406aa.el7.noarch.
After 600s, the playbooks failed as the following, actually the host already come back.
PLAY [Restart masters] *********************************************************************************************************************************************************
TASK [Gathering Facts] *********************************************************************************************************************************************************
ok: [35.192.5.114]
TASK [Restart master system] ***************************************************************************************************************************************************
changed: [35.192.5.114] => {"ansible_job_id": "885248510524.11362", "changed": true, "finished": 0, "results_file": "/root/.ansible_async/885248510524.11362", "started": 1}
TASK [set_fact] ****************************************************************************************************************************************************************
ok: [35.192.5.114] => {"ansible_facts": {"wait_for_host": "35.192.5.114"}, "changed": false}
TASK [Wait for master to restart] **********************************************************************************************************************************************
fatal: [35.192.5.114 -> localhost]: FAILED! => {"changed": false, "elapsed": 601, "msg": "Timeout when waiting for 35.192.5.114:22"}
Verified this bug with openshift-ansible-3.7.44-1.git.0.dbb912c.el7.noarch, and PASS.
PLAY [Restart masters] *********************************************************************************************************************************************************
TASK [Gathering Facts] *********************************************************************************************************************************************************
ok: [35.192.5.114]
TASK [Restart master system] ***************************************************************************************************************************************************
changed: [35.192.5.114] => {"ansible_job_id": "624323622282.5673", "changed": true, "finished": 0, "results_file": "/root/.ansible_async/624323622282.5673", "started": 1}
TASK [Wait for master to restart] **********************************************************************************************************************************************
ok: [35.192.5.114] => {"changed": false, "elapsed": 259}
    Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2018:1231  |