Bug 1541946
| Summary: | Task "Wait for master to restart" will break upgrade/install if working through bastion | |||
|---|---|---|---|---|
| Product: | OpenShift Container Platform | Reporter: | Vladislav Walek <vwalek> | |
| Component: | Installer | Assignee: | Fabian von Feilitzsch <fabian> | |
| Status: | CLOSED ERRATA | QA Contact: | Johnny Liu <jialiu> | |
| Severity: | medium | Docs Contact: | ||
| Priority: | medium | |||
| Version: | 3.7.0 | CC: | aos-bugs, fabian, jiajliu, jokerman, mmccomas, sdodson, wmeng | |
| Target Milestone: | --- | |||
| Target Release: | 3.9.0 | |||
| Hardware: | Unspecified | |||
| OS: | Unspecified | |||
| Whiteboard: | ||||
| Fixed In Version: | Doc Type: | Bug Fix | ||
| Doc Text: |
Cause: When waiting for the master to restart, the ssh proxy would not be respected
Consequence: When provisioning hosts with an ssh proxy configured, the masters would never appear up
Fix: Changed the task to use an Ansible module that respects ssh proxy configuration
Result: Ansible is able to connect to the hosts and they are marked as up.
|
Story Points: | --- | |
| Clone Of: | ||||
| : | 1557492 (view as bug list) | Environment: | ||
| Last Closed: | 2018-03-28 14:25:43 UTC | Type: | Bug | |
| Regression: | --- | Mount Type: | --- | |
| Documentation: | --- | CRM: | ||
| Verified Versions: | Category: | --- | ||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | ||
| Cloudforms Team: | --- | Target Upstream Version: | ||
| Embargoed: | ||||
| Bug Depends On: | ||||
| Bug Blocks: | 1557492 | |||
just adding update that customer confirmed, the SSH port is correct, but it checks the connection between installation ansible host to the master, ignoring the bastion host. Yeah, the problem makes sense. Looks like there's a suggested solution[1] for this problem, so I wonder if this would work, we'd need them to tell us about their bastion host.
- name: Wait for master to restart
wait_for:
host: "{{ wait_for_host }}"
state: started
delay: 10
timeout: 600
port: "{{ ansible_port | default(ansible_ssh_port | default(22,boolean=True),boolean=True) }}"
become: no
delegate_to: "{{ openshift_bastion_host if openshift_bastion_host is defined else 'localhost' }}"
1 - https://groups.google.com/d/msg/ansible-project/BLgN_mAWh3E/8n6JCqo_AQAJ
Commits pushed to master at https://github.com/openshift/openshift-ansible https://github.com/openshift/openshift-ansible/commit/0c6a4b400fb560515b4dcfc7ea764572b1e2dbd1 Bug 1541946- waiting for master reboot now works behind bastion https://github.com/openshift/openshift-ansible/commit/f4293d34754468ca85c80177d2566c7b45afceb4 Merge pull request #7080 from fabianvf/1541946 Automatic merge from submit-queue. Bug 1541946- waiting for master reboot now works behind bastion https://bugzilla.redhat.com/show_bug.cgi?id=1541946 I made this change because [this ansible PR](https://github.com/ansible/ansible/pull/28450) makes it seem like if we switch to the `wait_for_connection` module we can avoid a lot of the jankiness referenced in the removed code. If I'm interpreting the ansible change properly, this should make it use the full ssh config, proxy jumps and all, without any workarounds. I've marked it WIP because I'm still trying to test and make sure that this works. Verified this bug with openshift-ansible-playbooks-3.9.0-0.51.0.git.0.e26400f.el7.noarch, and PASS.
The test env to reproduce this bug is not easy to be created, but QE create a dummy env, reproduce it successfully, and test the RP for host behind bastion.
1. configure a target host through bastion in .ssh/config.
Host 10.8.244.223
User root
HostName 10.8.244.223
IdentityFile ~/libra-new.pem
VerifyHostKeyDNS yes
StrictHostKeyChecking no
PasswordAuthentication no
UserKnownHostsFile /dev/null
ProxyCommand ssh root@jialiu-pc2 -W %h:%p
2. use iptable to drop direct connect to the target host on ansible host.
# iptables -A OUTPUT -p tcp -m tcp -d 10.8.244.223 -j REJECT
3. create a dummy playbook to call the PR.
$ cat test-playbook.yaml
- name: Restart masters
hosts: testhost
serial: 1
post_tasks:
- include_tasks: tasks/restart_hosts.yml
Run the testing, and PASS.
I have submitted another bug for 3.7.z here: https://bugzilla.redhat.com/show_bug.cgi?id=1557492 and submitted a PR to backport the change: https://github.com/openshift/openshift-ansible/pull/7557 Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2018:0489 |
Description of problem: When running the installation to AWS through bastion, the playbook will fail on: - name: Wait for master to restart local_action: module: wait_for host="{{ wait_for_host }}" state=started delay=10 timeout=600 port="{{ ansible_port | default(ansible_ssh_port | default(22,boolean=True),boolean=True) }}" become: no Because the playbook is not expecting the bastion to be in place. Instead it explicitly requires direct SSH connectivity to target hosts. The playbook logs are not available (as customer already fixed it with workaround). Version-Release number of the following components: openshift-ansible-3.7.14-1.git.0.4b35b2d.el7.noarch ansible-2.4.1.0-1.el7.noarch ansible 2.4.1.0 How reproducible: Steps to Reproduce: 1. 2. 3. Actual results: Please include the entire output from the last TASK line through the end of output if an error is generated Expected results: Additional info: Please attach logs from ansible-playbook with the -vvv flag Description of problem: Version-Release number of the following components: rpm -q openshift-ansible rpm -q ansible ansible --version How reproducible: Steps to Reproduce: 1. 2. 3. Actual results: Please include the entire output from the last TASK line through the end of output if an error is generated Expected results: Additional info: Please attach logs from ansible-playbook with the -vvv flag