Bug 1312768

Summary: Failed to deploy containerized openshift on Atomic Hosts
Product: OpenShift Container Platform Reporter: Avi Tal <atal>
Component: InstallerAssignee: Jason DeTiberus <jdetiber>
Status: CLOSED DUPLICATE QA Contact: Ma xiaoqiang <xiama>
Severity: medium Docs Contact:
Priority: medium    
Version: 3.1.0CC: aos-bugs, bazulay, bleanhar, gpei, jokerman, mmccomas, mwysocki, srevivo, xtian
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2016-03-04 03:23:25 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Avi Tal 2016-02-29 09:17:58 UTC
Description of problem:
I have used rhel-atomic-cloud 7.2-11 for both Master and Node (1 master and 1 node)

Used external installer server using latest atomic-openshift-utils (3.0.35)
I also have fixed resolve hostname in /etc/hosts (both master and installer nodes)

My Inventory:
# cat ansible-ose-31-ah
# Create an OSEv3 group that contains the master, nodes, etcd, and lb groups.
# The lb group lets Ansible configure HAProxy as the load balancing solution.
# Comment lb out if your load balancer is pre-configured.
[OSEv3:children]
masters
nodes

[OSEv3:vars]
deployment_type=openshift-enterprise
#deployment_type=atomic-enterprise

# Set variables common for all OSEv3 hosts
[OSEv3:vars]
ansible_ssh_user=cloud-user
ansible_sudo=true

# in case openshift is planned to attache to CFME, use this flag to configure serviceacount
openshift_use_manageiq=True

containerized=True

# host group for masters
[masters]
cm-ah-ose-master.novalocal

[nodes]
cm-ah-ose-master.novalocal openshift_node_labels="{'region':'infra','zone':'default'}" openshift_schedulable=true
cm-ah-ose-node01.novalocal openshift_node_labels="{'region':'primary','zone':'east'}"


Deployment Error :
TASK: [openshift_master_ca | Create the master certificates if they do not already exist] ***
failed: [cm-ah-ose-master.novalocal] => {"cmd": "oadm create-master-certs --hostnames=kubernetes.default,10.8.58.186,kubernetes,192.168.100.7,openshift.default,openshift.default.svc,172.30.0.1,openshift.default.svc.cluster.local,kubernetes.default.svc,kubernetes.default.svc.cluster.local,openshift,cm-ah-ose-master.novalocal --master=https://cm-ah-ose-master.novalocal:8443 --public-master=https://cm-ah-ose-master.novalocal:8443 --cert-dir=/etc/origin/master --overwrite=false", "failed": true, "rc": 2}
msg: [Errno 2] No such file or directory

FATAL: all hosts have already failed -- aborting

PLAY RECAP ********************************************************************
           to retry, use: --limit @/root/config.retry

cm-ah-ose-master.novalocal : ok=83   changed=2    unreachable=0    failed=1
cm-ah-ose-node01.novalocal : ok=34   changed=0    unreachable=0    failed=0
localhost                  : ok=8    changed=0    unreachable=0    failed=0

Version-Release number of selected component (if applicable):
atomic-openshift-utils 3.0.35

How reproducible:
Always

Steps to Reproduce:
1. Build two Atomic Hosts: 1 master 1 node
2. Install atomic-openshift-utils 3.0.35 on 3rd host
3. fix hostname resolve in /etc/hosts on the 3rd host as well as on the master
4. run above inventory from 3rd host with openshift-playbook

Actual results:
TASK: [openshift_master_ca | Create the master certificates if they do not already exist] ***
failed: [cm-ah-ose-master.novalocal] => {"cmd": "oadm create-master-certs --hostnames=kubernetes.default,10.8.58.186,kubernetes,192.168.100.7,openshift.default,openshift.default.svc,172.30.0.1,openshift.default.svc.cluster.local,kubernetes.default.svc,kubernetes.default.svc.cluster.local,openshift,cm-ah-ose-master.novalocal --master=https://cm-ah-ose-master.novalocal:8443 --public-master=https://cm-ah-ose-master.novalocal:8443 --cert-dir=/etc/origin/master --overwrite=false", "failed": true, "rc": 2}
msg: [Errno 2] No such file or directory

Expected results:
Deploy OSE as containers on both nodes

Comment 1 Gaoyun Pei 2016-03-03 12:28:34 UTC
QE couldn't reproduce this with the same ansible Inventory using openshift-ansible-3.0.35-1.git.0.6a386dd.el7aos.noarch.rpm.

Containerized master and node could be installed on AH-7.2 without error.
BTW, the latest atomic-openshift-utils has been upgraded to atomic-openshift-utils-3.0.47

Comment 2 Marcel Wysocki 2016-03-03 12:48:00 UTC
I have the same issue. It only works when using the root user.
As cloud-user oadm can not be found.

Comment 3 Marcel Wysocki 2016-03-03 12:56:04 UTC
Perhaps the ansible installer should use the full path to /usr/local/bin/oadm and so on ?

Comment 4 Gaoyun Pei 2016-03-04 03:23:25 UTC
Please ignore my previous comment, didn't catch it was running with non-root user.

This should be the same issue with https://bugzilla.redhat.com/show_bug.cgi?id=1299742 which has been fixed by PR https://github.com/openshift/openshift-ansible/pull/1240.

So suggest updating your openshift-ansible package to openshift-ansible-3.0.47 version. Thanks.

*** This bug has been marked as a duplicate of bug 1299742 ***