Bug 1274201 - oo-install get stuck when running on the host to be deployed
oo-install get stuck when running on the host to be deployed
Product: OpenShift Container Platform
Classification: Red Hat
Component: Installer (Show other bugs)
Unspecified Unspecified
medium Severity medium
: ---
: ---
Assigned To: Samuel Munilla
Ma xiaoqiang
Depends On:
  Show dependency treegraph
Reported: 2015-10-22 04:59 EDT by Gaoyun Pei
Modified: 2016-07-03 20:45 EDT (History)
7 users (show)

See Also:
Fixed In Version: atomic-openshift-utils-3.0.7-1.git.76.c73ec7b.el7aos.noarch.rpm
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Last Closed: 2015-11-20 10:42:18 EST
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---

Attachments (Terms of Use)

  None (edit)
Description Gaoyun Pei 2015-10-22 04:59:20 EDT
Description of problem:
oo-install get stuck when doing "Start and enable iptables service" task.

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:
1.sh <(curl -s https://install.openshift.com/ose)

Actual results:
TASK: [os_firewall | Ensure firewalld service is not enabled] ***************** 
changed: [xxx.redhat.com]

TASK: [os_firewall | Reload systemd units] ************************************ 
changed: [xxx.redhat.com]

TASK: [os_firewall | Start and enable iptables service] *********************** 

Then no response after waiting for a long time.

Expected results:

Additional info:
Tried to start and enable iptables service manually on the host, both succeed.
Also tried oo-install from https://lachesis-smunilla.rhcloud.com/, which is working well.
Comment 3 Jason DeTiberus 2015-10-27 11:55:37 EDT
Is this an issue that you are hitting reliably? or is it more intermittent?  

Are you specifically running all services on a single host? If not, I would expect that any bug that would cause you to hit that error on the master would also affect any nodes being deployed as well.

Does the same error occur when running an advanced install or is it limited to running the installer wrapper?

The problem could also be with the base image you are using for deploying. Do you run into the same issue using a different RHEL7 base image?
Comment 7 Jason DeTiberus 2015-10-28 17:08:08 EDT
Related to this is the following github issue: https://github.com/openshift/openshift-ansible/issues/747

The workaround I suggested will also apply in this case as well.
Comment 8 Gaoyun Pei 2015-10-29 03:30:17 EDT
I tried the 1st workaround in https://github.com/openshift/openshift-ansible/issues/747 today, run oo-install from a separate host could work! 
Change the title for more accurate.
Comment 9 Brenton Leanhardt 2015-10-29 08:04:56 EDT

I think what we need to do here is add a check to see if the host where the installer is running is one of the hosts that will become part of the environment.  If that is the case we need to add ansible_connection=local to the inventory.

I'm a little confused about ansible_sudo=no.  I understand if the installer is being run as root that sudo shouldn't be needed.  However, I would expect ansible to do the right thing with a local connection and correctly sudo if needed.  Jason, do you have any more detail on this?
Comment 10 Jason DeTiberus 2015-10-29 11:04:03 EDT

It is a bit more complicated than that and it really comes down to if the user is really running ansible remotely or not...  I believe we explicitly set sudo to false in all of the places that we execute against localhost for those reasons, but I'm not completely sure.

Either way, our goal is to use sudo anywhere the ansible user is not root for operations on a host in the deployment (avoiding sudo for the local actions used for setting groups, variables, transferring files, etc to prevent requiring root on the host running ansible if different than a host in the deployment).
Comment 13 Gaoyun Pei 2015-11-05 00:06:06 EST
Test this bug with atomic-openshift-utils-3.0.7-1.git.48.75d357c.el7aos.noarch, running installer on the master host, after input the host, the installer quit as below:

Gathering information from hosts...
sudo: invalid option -- '-'
usage: sudo [-D level] -h | -K | -k | -V
usage: sudo -v [-AknS] [-D level] [-g groupname|#gid] [-p prompt] [-u user
usage: sudo -l[l] [-AknS] [-D level] [-g groupname|#gid] [-p prompt] [-U user
            name] [-u user name|#uid] [-g groupname|#gid] [command]
usage: sudo [-AbEHknPS] [-r role] [-t type] [-C fd] [-D level] [-g
            groupname|#gid] [-p prompt] [-u user name|#uid] [-g groupname|#gid]
            [VAR=value] [-i|-s] [<command>]
usage: sudo -e [-AknS] [-r role] [-t type] [-C fd] [-D level] [-g
            groupname|#gid] [-p prompt] [-u user name|#uid] file ...
The atomic-openshift-installer requires sudo access without a password.
Comment 14 Brenton Leanhardt 2015-11-05 16:34:30 EST
A lot of bugs were fixed today.  This should be fixed in the latest puddle.  I tried several multi-host installs as non-root where one of the systems was on of the to-be-deployed hosts.
Comment 16 Gaoyun Pei 2015-11-06 02:17:25 EST
Verify this bug with atomic-openshift-utils-3.0.7-1.git.76.c73ec7b.el7aos.noarch

Run 'atomic-openshift-installer install' on master host, the environment could be installed.

Note You need to log in before you can comment on or make changes to this bug.