Bug 1541946 - Task "Wait for master to restart" will break upgrade/install if working through bastion
Summary: Task "Wait for master to restart" will break upgrade/install if working throu...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Installer
Version: 3.7.0
Hardware: Unspecified
OS: Unspecified
medium
medium
Target Milestone: ---
: 3.9.0
Assignee: Fabian von Feilitzsch
QA Contact: Johnny Liu
URL:
Whiteboard:
Depends On:
Blocks: 1557492
TreeView+ depends on / blocked
 
Reported: 2018-02-05 09:46 UTC by Vladislav Walek
Modified: 2018-03-28 14:26 UTC (History)
7 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Cause: When waiting for the master to restart, the ssh proxy would not be respected Consequence: When provisioning hosts with an ssh proxy configured, the masters would never appear up Fix: Changed the task to use an Ansible module that respects ssh proxy configuration Result: Ansible is able to connect to the hosts and they are marked as up.
Clone Of:
: 1557492 (view as bug list)
Environment:
Last Closed: 2018-03-28 14:25:43 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2018:0489 0 None None None 2018-03-28 14:26:08 UTC

Description Vladislav Walek 2018-02-05 09:46:10 UTC
Description of problem:

When running the installation to AWS through bastion, the playbook will fail on:

- name: Wait for master to restart
  local_action:
    module: wait_for
      host="{{ wait_for_host }}"
      state=started
      delay=10
      timeout=600
      port="{{ ansible_port | default(ansible_ssh_port | default(22,boolean=True),boolean=True) }}"
  become: no

Because the playbook is not expecting the bastion to be in place. Instead it explicitly requires direct SSH connectivity to target hosts.

The playbook logs are not available (as customer already fixed it with workaround).

Version-Release number of the following components:
openshift-ansible-3.7.14-1.git.0.4b35b2d.el7.noarch
ansible-2.4.1.0-1.el7.noarch
ansible 2.4.1.0

How reproducible:

Steps to Reproduce:
1.
2.
3.

Actual results:
Please include the entire output from the last TASK line through the end of output if an error is generated

Expected results:

Additional info:
Please attach logs from ansible-playbook with the -vvv flag


Description of problem:

Version-Release number of the following components:
rpm -q openshift-ansible
rpm -q ansible
ansible --version

How reproducible:

Steps to Reproduce:
1.
2.
3.

Actual results:
Please include the entire output from the last TASK line through the end of output if an error is generated

Expected results:

Additional info:
Please attach logs from ansible-playbook with the -vvv flag

Comment 1 Vladislav Walek 2018-02-05 09:52:29 UTC
just adding update that customer confirmed, the SSH port is correct, but it checks the connection between installation ansible host to the master, ignoring the bastion host.

Comment 2 Scott Dodson 2018-02-05 21:33:24 UTC
Yeah, the problem makes sense. Looks like there's a suggested solution[1] for this problem, so I wonder if this would work, we'd need them to tell us about their bastion host.


- name: Wait for master to restart
  wait_for:
    host: "{{ wait_for_host }}"
    state: started
    delay: 10
    timeout: 600
    port: "{{ ansible_port | default(ansible_ssh_port | default(22,boolean=True),boolean=True) }}"
  become: no
  delegate_to: "{{ openshift_bastion_host if openshift_bastion_host is defined else 'localhost' }}"



1 - https://groups.google.com/d/msg/ansible-project/BLgN_mAWh3E/8n6JCqo_AQAJ

Comment 3 Fabian von Feilitzsch 2018-02-08 22:55:28 UTC
https://github.com/openshift/openshift-ansible/pull/7080

Comment 4 openshift-github-bot 2018-02-19 17:41:57 UTC
Commits pushed to master at https://github.com/openshift/openshift-ansible

https://github.com/openshift/openshift-ansible/commit/0c6a4b400fb560515b4dcfc7ea764572b1e2dbd1
Bug 1541946- waiting for master reboot now works behind bastion

https://github.com/openshift/openshift-ansible/commit/f4293d34754468ca85c80177d2566c7b45afceb4
Merge pull request #7080 from fabianvf/1541946

Automatic merge from submit-queue.

Bug 1541946- waiting for master reboot now works behind bastion 

https://bugzilla.redhat.com/show_bug.cgi?id=1541946

I made this change because [this ansible PR](https://github.com/ansible/ansible/pull/28450) makes it seem like if we switch to the `wait_for_connection` module we can avoid a lot of the jankiness referenced in the removed code. If I'm interpreting the ansible change properly, this should make it use the full ssh config, proxy jumps and all, without any workarounds. 

I've marked it WIP because I'm still trying to test and make sure that this works.

Comment 6 Johnny Liu 2018-02-24 08:54:21 UTC
Verified this bug with openshift-ansible-playbooks-3.9.0-0.51.0.git.0.e26400f.el7.noarch, and PASS.

The test env to reproduce this bug is not easy to be created, but QE create a dummy env, reproduce it successfully, and test the RP for host behind bastion.

1. configure a target host through bastion in .ssh/config.
Host 10.8.244.223
    User root
    HostName 10.8.244.223
    IdentityFile ~/libra-new.pem
    VerifyHostKeyDNS yes
    StrictHostKeyChecking no
    PasswordAuthentication no
    UserKnownHostsFile /dev/null
    ProxyCommand ssh root@jialiu-pc2 -W %h:%p

2. use iptable to drop direct connect to the target host on ansible host.
# iptables -A OUTPUT -p tcp -m tcp -d 10.8.244.223 -j REJECT

3. create a dummy playbook to call the PR.
$ cat test-playbook.yaml 
- name: Restart masters
  hosts: testhost
  serial: 1
  post_tasks:
  - include_tasks: tasks/restart_hosts.yml

Run the testing, and PASS.

Comment 8 Fabian von Feilitzsch 2018-03-16 17:12:19 UTC
I have submitted another bug for 3.7.z here: https://bugzilla.redhat.com/show_bug.cgi?id=1557492

and submitted a PR to backport the change: https://github.com/openshift/openshift-ansible/pull/7557

Comment 11 errata-xmlrpc 2018-03-28 14:25:43 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2018:0489


Note You need to log in before you can comment on or make changes to this bug.