Bug 1371066

Summary: Create Admin Service Account fails with "unable to connect to a server to handle"
Product: OpenShift Container Platform Reporter: Bhaskarakiran <byarlaga>
Component: InstallerAssignee: Jason DeTiberus <jdetiber>
Status: CLOSED WORKSFORME QA Contact: Johnny Liu <jialiu>
Severity: medium Docs Contact:
Priority: medium    
Version: 3.3.0CC: aos-bugs, bleanhar, byarlaga, jokerman, mmccomas, mzywusko
Target Milestone: ---Keywords: UpcomingRelease
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2016-08-31 09:12:59 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Attachments:
Description Flags
ansible log file none

Description Bhaskarakiran 2016-08-29 09:31:04 UTC
Created attachment 1195283 [details]
ansible log file

Description of problem:
======================

Setting up openshift cluster with "atomic-openshift-installer install" command and it fails at the below stage / task. Did uninstall but no luck. Tried to run the same command manually and it gets stuck.

TASK [openshift_manageiq : Create Admin Service Account] ***********************
fatal: [10.70.41.151]: FAILED! => {"changed": false, "cmd": "echo '{\"kind\": \"ServiceAccount\", \"apiVersion\": \"v1\", \"metadata\": {\"name\": \"management-admin\"}}' | /usr/local/bin/oc create -n management-infra --config=/tmp/manageiq_admin.kubeconfig -f -", "delta": "0:00:02.567008", "end": "2016-08-29 13:31:25.158109", "failed": true, "failed_when_result": true, "rc": 1, "start": "2016-08-29 13:31:22.591101", "stderr": "unable to connect to a server to handle \"serviceaccounts\": Get https://dhcp41-151.lab.eng.blr.redhat.com:8443/api: dial tcp 10.70.41.151:8443: connection refused", "stdout": "", "stdout_lines": [], "warnings": []}

Version-Release number of selected component (if applicable):
=============================================================
[root@dhcp43-179 ~]# openshift version
openshift v3.3.0.24-dirty
kubernetes v1.3.0+507d3a7
etcd 2.3.0+git

How reproducible:
=================
100%

Steps to Reproduce:
1. Run "atomic-openshift-installer install" to setup the nodes.

Actual results:


Expected results:


Additional info:
================
Attaching ansible.log

Comment 1 Brenton Leanhardt 2016-08-29 12:14:47 UTC
For some reason your Master service isn't running.  Can you check the logs?  On all your masters run:

journalctl -u atomic-openshift-master

There are almost certainly error messages explaining the situation that will help us determine the root cause.

Comment 2 Brenton Leanhardt 2016-08-29 12:21:33 UTC
Looking in your ansible logs there are quite a few errors:

2016-08-25 12:41:37,031 p=16123 u=root |  TASK [openshift_facts : Ensure PyYaml is installed] ****************************
2016-08-25 12:41:44,959 p=16123 u=root |  fatal: [10.70.43.103]: FAILED! => {"changed": false, "failed": true, "msg": "No Package matching 'PyYAML' found available, installed or updated", "rc": 0, "results": []}

2016-08-25 14:13:18,011 p=19929 u=root |  fatal: [10.70.43.103]: FAILED! => {"changed": false, "failed": true, "msg": "No Package matching 'docker' found available, installed or updated", "rc": 0, "results": []}

2016-08-25 15:01:11,287 p=4192 u=root |  fatal: [10.70.43.103]: FAILED! => {"changed": false, "failed": true, "msg": "Could not find the requested service \"'docker'\": "}

2016-08-25 14:32:18,224 p=19929 u=root |  fatal: [10.70.41.151]: FAILED! => {"changed": false, "failed": true, "msg": "Unable to start service atomic-openshift-node: Job for atomic-openshift-node.service failed because the control process exited with error code. See \"systemctl status atomic-openshift-node.service\" and \"journalctl -xe\" for details.\n"}

We can start with the system error first.  Is docker installed?  It seems like you are performing a containerized install of dhcp41-151.lab.eng.blr.redhat.com based on the output of line 40895:

                    "is_containerized": true,

Without Docker installed it will not work.  What is the output of 'yum repolist' in your environment?

Comment 3 Bhaskarakiran 2016-08-30 06:24:41 UTC
Docker is installed on all the nodes. These are RHEL 7.2 and i selected container during installation using atomic-openshift-installer. I am re-doing the setup again. Will update the status.

Comment 4 Brenton Leanhardt 2016-08-30 13:12:48 UTC
We'll continue helping you debug this as you provide more information.  We'd need to see the logs on any services that are failing to start.  For now I'm marking this bug as not a 3.3 blocker.

Comment 5 Bhaskarakiran 2016-08-31 09:12:59 UTC
I am able to set up the openshift cluster successfully with 3.3.0.27 build. No issues seen during the process. Closing this bug for now. 

Thanks Brenton.