Bug 1541946

Summary: Task "Wait for master to restart" will break upgrade/install if working through bastion
Product: OpenShift Container Platform Reporter: Vladislav Walek <vwalek>
Component: InstallerAssignee: Fabian von Feilitzsch <fabian>
Status: CLOSED ERRATA QA Contact: Johnny Liu <jialiu>
Severity: medium Docs Contact:
Priority: medium    
Version: 3.7.0CC: aos-bugs, fabian, jiajliu, jokerman, mmccomas, sdodson, wmeng
Target Milestone: ---   
Target Release: 3.9.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Cause: When waiting for the master to restart, the ssh proxy would not be respected Consequence: When provisioning hosts with an ssh proxy configured, the masters would never appear up Fix: Changed the task to use an Ansible module that respects ssh proxy configuration Result: Ansible is able to connect to the hosts and they are marked as up.
Story Points: ---
Clone Of:
: 1557492 (view as bug list) Environment:
Last Closed: 2018-03-28 14:25:43 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1557492    

Description Vladislav Walek 2018-02-05 09:46:10 UTC
Description of problem:

When running the installation to AWS through bastion, the playbook will fail on:

- name: Wait for master to restart
  local_action:
    module: wait_for
      host="{{ wait_for_host }}"
      state=started
      delay=10
      timeout=600
      port="{{ ansible_port | default(ansible_ssh_port | default(22,boolean=True),boolean=True) }}"
  become: no

Because the playbook is not expecting the bastion to be in place. Instead it explicitly requires direct SSH connectivity to target hosts.

The playbook logs are not available (as customer already fixed it with workaround).

Version-Release number of the following components:
openshift-ansible-3.7.14-1.git.0.4b35b2d.el7.noarch
ansible-2.4.1.0-1.el7.noarch
ansible 2.4.1.0

How reproducible:

Steps to Reproduce:
1.
2.
3.

Actual results:
Please include the entire output from the last TASK line through the end of output if an error is generated

Expected results:

Additional info:
Please attach logs from ansible-playbook with the -vvv flag


Description of problem:

Version-Release number of the following components:
rpm -q openshift-ansible
rpm -q ansible
ansible --version

How reproducible:

Steps to Reproduce:
1.
2.
3.

Actual results:
Please include the entire output from the last TASK line through the end of output if an error is generated

Expected results:

Additional info:
Please attach logs from ansible-playbook with the -vvv flag

Comment 1 Vladislav Walek 2018-02-05 09:52:29 UTC
just adding update that customer confirmed, the SSH port is correct, but it checks the connection between installation ansible host to the master, ignoring the bastion host.

Comment 2 Scott Dodson 2018-02-05 21:33:24 UTC
Yeah, the problem makes sense. Looks like there's a suggested solution[1] for this problem, so I wonder if this would work, we'd need them to tell us about their bastion host.


- name: Wait for master to restart
  wait_for:
    host: "{{ wait_for_host }}"
    state: started
    delay: 10
    timeout: 600
    port: "{{ ansible_port | default(ansible_ssh_port | default(22,boolean=True),boolean=True) }}"
  become: no
  delegate_to: "{{ openshift_bastion_host if openshift_bastion_host is defined else 'localhost' }}"



1 - https://groups.google.com/d/msg/ansible-project/BLgN_mAWh3E/8n6JCqo_AQAJ

Comment 3 Fabian von Feilitzsch 2018-02-08 22:55:28 UTC
https://github.com/openshift/openshift-ansible/pull/7080

Comment 4 openshift-github-bot 2018-02-19 17:41:57 UTC
Commits pushed to master at https://github.com/openshift/openshift-ansible

https://github.com/openshift/openshift-ansible/commit/0c6a4b400fb560515b4dcfc7ea764572b1e2dbd1
Bug 1541946- waiting for master reboot now works behind bastion

https://github.com/openshift/openshift-ansible/commit/f4293d34754468ca85c80177d2566c7b45afceb4
Merge pull request #7080 from fabianvf/1541946

Automatic merge from submit-queue.

Bug 1541946- waiting for master reboot now works behind bastion 

https://bugzilla.redhat.com/show_bug.cgi?id=1541946

I made this change because [this ansible PR](https://github.com/ansible/ansible/pull/28450) makes it seem like if we switch to the `wait_for_connection` module we can avoid a lot of the jankiness referenced in the removed code. If I'm interpreting the ansible change properly, this should make it use the full ssh config, proxy jumps and all, without any workarounds. 

I've marked it WIP because I'm still trying to test and make sure that this works.

Comment 6 Johnny Liu 2018-02-24 08:54:21 UTC
Verified this bug with openshift-ansible-playbooks-3.9.0-0.51.0.git.0.e26400f.el7.noarch, and PASS.

The test env to reproduce this bug is not easy to be created, but QE create a dummy env, reproduce it successfully, and test the RP for host behind bastion.

1. configure a target host through bastion in .ssh/config.
Host 10.8.244.223
    User root
    HostName 10.8.244.223
    IdentityFile ~/libra-new.pem
    VerifyHostKeyDNS yes
    StrictHostKeyChecking no
    PasswordAuthentication no
    UserKnownHostsFile /dev/null
    ProxyCommand ssh root@jialiu-pc2 -W %h:%p

2. use iptable to drop direct connect to the target host on ansible host.
# iptables -A OUTPUT -p tcp -m tcp -d 10.8.244.223 -j REJECT

3. create a dummy playbook to call the PR.
$ cat test-playbook.yaml 
- name: Restart masters
  hosts: testhost
  serial: 1
  post_tasks:
  - include_tasks: tasks/restart_hosts.yml

Run the testing, and PASS.

Comment 8 Fabian von Feilitzsch 2018-03-16 17:12:19 UTC
I have submitted another bug for 3.7.z here: https://bugzilla.redhat.com/show_bug.cgi?id=1557492

and submitted a PR to backport the change: https://github.com/openshift/openshift-ansible/pull/7557

Comment 11 errata-xmlrpc 2018-03-28 14:25:43 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2018:0489