Bug 1274201

Summary:	oo-install get stuck when running on the host to be deployed
Product:	OpenShift Container Platform	Reporter:	Gaoyun Pei <gpei>
Component:	Installer	Assignee:	Samuel Munilla <smunilla>
Status:	CLOSED CURRENTRELEASE	QA Contact:	Ma xiaoqiang <xiama>
Severity:	medium	Docs Contact:
Priority:	medium
Version:	3.0.0	CC:	aos-bugs, bleanhar, gpei, jdetiber, jokerman, mmccomas, xtian
Target Milestone:	---
Target Release:	---
Hardware:	Unspecified
OS:	Unspecified
Whiteboard:
Fixed In Version:	atomic-openshift-utils-3.0.7-1.git.76.c73ec7b.el7aos.noarch.rpm	Doc Type:	Bug Fix
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2015-11-20 15:42:18 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:

Description Gaoyun Pei 2015-10-22 08:59:20 UTC

Description of problem:
oo-install get stuck when doing "Start and enable iptables service" task.

Version-Release number of selected component (if applicable):
https://install.openshift.com/ose
oo-install-ose-20151021-1917

How reproducible:
Always

Steps to Reproduce:
1.sh <(curl -s https://install.openshift.com/ose)

Actual results:
...
TASK: [os_firewall | Ensure firewalld service is not enabled] ***************** 
changed: [xxx.redhat.com]

TASK: [os_firewall | Reload systemd units] ************************************ 
changed: [xxx.redhat.com]

TASK: [os_firewall | Start and enable iptables service] *********************** 

Then no response after waiting for a long time.

Expected results:


Additional info:
Tried to start and enable iptables service manually on the host, both succeed.
Also tried oo-install from https://lachesis-smunilla.rhcloud.com/, which is working well.

Comment 3 Jason DeTiberus 2015-10-27 15:55:37 UTC

Is this an issue that you are hitting reliably? or is it more intermittent?  

Are you specifically running all services on a single host? If not, I would expect that any bug that would cause you to hit that error on the master would also affect any nodes being deployed as well.

Does the same error occur when running an advanced install or is it limited to running the installer wrapper?

The problem could also be with the base image you are using for deploying. Do you run into the same issue using a different RHEL7 base image?

Comment 7 Jason DeTiberus 2015-10-28 21:08:08 UTC

Related to this is the following github issue: https://github.com/openshift/openshift-ansible/issues/747

The workaround I suggested will also apply in this case as well.

Comment 8 Gaoyun Pei 2015-10-29 07:30:17 UTC

I tried the 1st workaround in https://github.com/openshift/openshift-ansible/issues/747 today, run oo-install from a separate host could work! 
Change the title for more accurate.

Comment 9 Brenton Leanhardt 2015-10-29 12:04:56 UTC

Sam,

I think what we need to do here is add a check to see if the host where the installer is running is one of the hosts that will become part of the environment.  If that is the case we need to add ansible_connection=local to the inventory.

I'm a little confused about ansible_sudo=no.  I understand if the installer is being run as root that sudo shouldn't be needed.  However, I would expect ansible to do the right thing with a local connection and correctly sudo if needed.  Jason, do you have any more detail on this?

Comment 10 Jason DeTiberus 2015-10-29 15:04:03 UTC

Brenton,

It is a bit more complicated than that and it really comes down to if the user is really running ansible remotely or not...  I believe we explicitly set sudo to false in all of the places that we execute against localhost for those reasons, but I'm not completely sure.

Either way, our goal is to use sudo anywhere the ansible user is not root for operations on a host in the deployment (avoiding sudo for the local actions used for setting groups, variables, transferring files, etc to prevent requiring root on the host running ansible if different than a host in the deployment).

Comment 13 Gaoyun Pei 2015-11-05 05:06:06 UTC

Test this bug with atomic-openshift-utils-3.0.7-1.git.48.75d357c.el7aos.noarch, running installer on the master host, after input the host, the installer quit as below:

Gathering information from hosts...
sudo: invalid option -- '-'
usage: sudo [-D level] -h | -K | -k | -V
usage: sudo -v [-AknS] [-D level] [-g groupname|#gid] [-p prompt] [-u user
            name|#uid]
usage: sudo -l[l] [-AknS] [-D level] [-g groupname|#gid] [-p prompt] [-U user
            name] [-u user name|#uid] [-g groupname|#gid] [command]
usage: sudo [-AbEHknPS] [-r role] [-t type] [-C fd] [-D level] [-g
            groupname|#gid] [-p prompt] [-u user name|#uid] [-g groupname|#gid]
            [VAR=value] [-i|-s] [<command>]
usage: sudo -e [-AknS] [-r role] [-t type] [-C fd] [-D level] [-g
            groupname|#gid] [-p prompt] [-u user name|#uid] file ...
The atomic-openshift-installer requires sudo access without a password.

Comment 14 Brenton Leanhardt 2015-11-05 21:34:30 UTC

A lot of bugs were fixed today.  This should be fixed in the latest puddle.  I tried several multi-host installs as non-root where one of the systems was on of the to-be-deployed hosts.

Comment 16 Gaoyun Pei 2015-11-06 07:17:25 UTC

Verify this bug with atomic-openshift-utils-3.0.7-1.git.76.c73ec7b.el7aos.noarch

Run 'atomic-openshift-installer install' on master host, the environment could be installed.