Bug 1379189 - [3.2] ansible sometimes gets UNREACHABLE error after iptables restarted
Summary: [3.2] ansible sometimes gets UNREACHABLE error after iptables restarted
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Installer
Version: 3.2.1
Hardware: x86_64
OS: Linux
medium
urgent
Target Milestone: ---
: 3.2.1
Assignee: Samuel Munilla
QA Contact: Wenkai Shi
URL:
Whiteboard:
: 1394966 (view as bug list)
Depends On:
Blocks: qci-ocp 1416926 1416927
TreeView+ depends on / blocked
 
Reported: 2016-09-26 01:15 UTC by Kenjiro Nakayama
Modified: 2017-03-06 16:36 UTC (History)
9 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
When executing the installer on a remote host that's also included in the inventory the firewall configuration could potentially cause the installer to hang. We have added a 10 second delay after resetting the firewall which should avoid this problem from occurring.
Clone Of:
: 1416926 1416927 (view as bug list)
Environment:
Last Closed: 2017-03-06 16:36:49 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Bugzilla 1312203 0 medium CLOSED openshift-ansibel get stuck when running on the host to be deployed 2021-02-22 00:41:40 UTC
Red Hat Product Errata RHSA-2017:0448 0 normal SHIPPED_LIVE Important: ansible and openshift-ansible security and bug fix update 2017-03-06 21:36:25 UTC

Internal Links: 1312203

Description Kenjiro Nakayama 2016-09-26 01:15:44 UTC
Description of problem:
====

  OpenShift ansible installer sometimes gets following errors and becomes unreachable.

  ~~~
  2016-09-24 04:09:30,135 p=18851 u=root |  changed: [xx.xx.xx.xx]
  2016-09-24 06:09:42,930 p=18851 u=root |  fatal: [yy.yy.yy.yy]: UNREACHABLE! => {"changed": false, "msg": "Failed to connect to the host via ssh.", "unreachable": true}
  2016-09-24 06:09:42,931 p=18851 u=root |  fatal: [zz.zz.zz.zz]: UNREACHABLE! => {"changed": false, "msg": "Failed to connect to the host via ssh.", "unreachable": true}

    ... snip ...

  2016-09-24 06:11:04,531 p=18851 u=root |  yy.yy.yy.yy                 : ok=56   changed=8    unreachable=1    failed=0
  2016-09-24 06:11:04,531 p=18851 u=root |  zz.zz.zz.zz                 : ok=56   changed=8    unreachable=1    failed=0
  ~~~

Version-Release number of selected component (if applicable):
===

  ansible-2.2.0-0.5.prerelease.el7.noarch
  openshift-ansible-3.2.28-1.git.0.5a85fc5.el7.noarch

How reproducible:
====

  Steps to Reproduce:
  1. Run ansible intaller with multiple Masters

Actual results:
===

  Ansible installer got above unreachable errors after iptables restarted

Expected results:
===

  Ansible installer didn't get error

Additional info:
===

  Attached in private:
  - Ansible inventory file
  - sosreport on xx.xx.xx.xx and yy.yy.yy.yy hosts
  - ansible log

Comment 5 Kenjiro Nakayama 2016-09-27 11:51:49 UTC
NOTE: Although this issue can be solved with ansible_connection=local for local master like https://bugzilla.redhat.com/show_bug.cgi?id=1312203, this ticket is caused on the remote masters.

Comment 6 Scott Dodson 2016-09-27 15:05:12 UTC
Need to document comment 5

Comment 14 Scott Dodson 2016-11-15 16:06:13 UTC
*** Bug 1394966 has been marked as a duplicate of this bug. ***

Comment 16 Eric Jones 2016-11-17 19:06:24 UTC
Hi,

The customer I attached to this case 2016-10-27 on, is seeing this problem and needs a resolution as soon as we can work towards one.

Are there any other ideas of things we can try?

Comment 19 Samuel Munilla 2016-12-08 20:20:47 UTC
After some discussion, I came up with a possible solution as seen here https://github.com/openshift/openshift-ansible/pull/2956 . If we could have the customer test with this, it would be helpful.

Comment 20 Samuel Munilla 2016-12-08 21:41:36 UTC
For more information, the working theory is that firewalld is enabled on the hosts before installation and that disabling it is causing the ssh disconnect. If the above patch fails, having them manually disable firewalld before installation (and possibly enabling iptables afterward) would confirm or dent this theory.

Comment 21 Jason Meyer 2016-12-28 13:49:21 UTC
The customer tested adding a pause after the disable firewalld and it fixed their issue.  This is from the customer:

--------Marriott--------

We tested this and it worked. All we did was copy your pause further down in the file and added it below the second task. Might be something good to incorporate into the base install.

[root@master01-devtest-vxby ~]# head iptables_hanging_fix.yml
---
- name: Check if firewalld is installed
  command: rpm -q firewalld
  args:
    # Disables the following warning:
    # Consider using yum, dnf or zypper module rather than running rpm
   warn: no
  register: pkg_check
  failed_when: pkg_check.rc > 1
  changed_when: no

- name: Ensure firewalld service is not enabled
  service:
    name: firewalld
    state: stopped
    enabled: no
  when: "{{ pkg_check.rc == 0 }}"

- name: Red Hat Support 01727898 Pause
  pause: seconds=10
  when: "{{ result | changed }}"

------------------------------------------

Comment 24 Wenkai Shi 2017-02-07 06:06:42 UTC
Verified with version atomic-openshift-utils-3.2.47-1.git.0.34a924d, the code has effect, installation succeed.

[root@ansible ~]# ansible-playbook -i hosts -v /usr/share/ansible/openshift-ansible/playbooks/byo/config
...
TASK [os_firewall : Wait 10 seconds after disabling firewalld] *****************
Tuesday 07 February 2017  03:25:19 +0000 (0:00:02.785)       0:03:40.765 ****** 
Pausing for 10 seconds
(ctrl+C then 'C' = continue early, ctrl+C then 'A' = abort)
...

Comment 26 errata-xmlrpc 2017-03-06 16:36:49 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2017:0448


Note You need to log in before you can comment on or make changes to this bug.