Bugzilla (bugzilla.redhat.com) will be under maintenance for infrastructure upgrades and will not be available on July 31st between 12:30 AM - 05:30 AM UTC. We appreciate your understanding and patience. You can follow status.redhat.com for details.
Bug 1570918 - OpenShift 3.9 quick installation failing due to web-console problem - master scheduling not getting enabled
Summary: OpenShift 3.9 quick installation failing due to web-console problem - master ...
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Installer
Version: 3.9.0
Hardware: Unspecified
OS: Unspecified
unspecified
high
Target Milestone: ---
: 3.9.z
Assignee: Vadim Rutkovsky
QA Contact: Wenkai Shi
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2018-04-23 17:43 UTC by Luke Stanton
Modified: 2018-06-18 18:29 UTC (History)
4 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2018-06-18 17:37:18 UTC
Target Upstream Version:


Attachments (Terms of Use)

Description Luke Stanton 2018-04-23 17:43:20 UTC
Description of problem:

Version-Release number of the following components:
rpm -q openshift-ansible
openshift-ansible-3.9.14-1.git.3.c62bc34.el7.noarch

rpm -q ansible
ansible-2.4.3.0-1.el7ae.noarch

ansible --version
ansible 2.4.3.0
  config file = /etc/ansible/ansible.cfg
  configured module search path = [u'/home/mark/.ansible/plugins/modules', u'/usr/share/ansible/plugins/modules']
  ansible python module location = /usr/lib/python2.7/site-packages/ansible
  executable location = /usr/bin/ansible
  python version = 2.7.5 (default, Feb 20 2018, 09:19:12) [GCC 4.8.5 20150623 (Red Hat 4.8.5-28)]

How reproducible:
Always

Steps to Reproduce:
1. Run OpenShift 3.9 Quick Installer to deploy a new cluster

Actual results:
Installation fails due to web-console health check not passing. Inspecting the nodes after the failed install shows the master to have scheduling disabled which, from what I understand, is incorrect and can cause issues like web-console failures:

oc get nodes -o wide
NAME                 STATUS                     ROLES     AGE       VERSION             EXTERNAL-IP   OS-IMAGE               KERNEL-VERSION               CONTAINER-RUNTIME
master-1   Ready,SchedulingDisabled   master    5d        v1.9.1+a0ce1bc657   <none>        OpenShift Enterprise   3.10.0-693.21.1.el7.x86_64   docker://1.13.1
node1-1    Ready                      <none>    5d        v1.9.1+a0ce1bc657   <none>        OpenShift Enterprise   3.10.0-693.21.1.el7.x86_64   docker://1.13.1
node2-1    Ready                      <none>    5d        v1.9.1+a0ce1bc657   <none>        OpenShift Enterprise   3.10.0-693.21.1.el7.x86_64   docker://1.13.1

============================================================================
Play 77/96 (Web Console)
..................Pausing for 30 seconds
(ctrl+C then 'C' = continue early, ctrl+C then 'A' = abort)
.fatal: [10.240.0.66]: FAILED! => {"attempts": 60, "changed": false, "cmd": ["curl", "-k", "https://webconsole.openshift-web-console.svc/healthz"], "delta": "0:00:01.014408", "end": "2018-04-11 09:03:35.725946", "msg": "non-zero return code", "rc": 7, "start": "2018-04-11 09:03:34.711538", "stderr": "  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current\n                                 Dload  Upload   Total   Spent    Left  Speed\n\r  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0\r  0     0    0     0    0     0      0      0 --:--:--  0:00:01 --:--:--     0curl: (7) Failed connect to webconsole.openshift-web-console.svc:443; Connection refused", "stderr_lines": ["  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current", "                                 Dload  Upload   Total   Spent    Left  Speed", "", "  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0", "  0     0    0     0    0     0      0      0 --:--:--  0:00:01 --:--:--     0curl: (7) Failed connect to webconsole.openshift-web-console.svc:443; Connection refused"], "stdout": "", "stdout_lines": []}
...ignoring
..........fatal: [xx.xxx.0.66]: FAILED! => {"changed": false, "msg": "Console install failed."}

xx.xxx.0.62                : ok=120  changed=15   unreachable=0    failed=0   
xx.xxx.0.64                : ok=120  changed=15   unreachable=0    failed=0   
xx.xxx.0.66                : ok=504  changed=81   unreachable=0    failed=1   
localhost                  : ok=13   changed=0    unreachable=0    failed=0   


Installation Complete: Note: Play count is only an estimate, some plays may have been skipped or dynamically added


INSTALLER STATUS **********************************************************************************************************************************************************************************************************************
Initialization             : Complete (0:00:34)
Health Check               : Complete (0:02:05)
etcd Install               : Complete (0:01:16)
NFS Install                : Complete (0:00:16)
Master Install             : Complete (0:02:50)
Master Additional Install  : Complete (0:01:17)
Node Install               : Complete (0:03:50)
Hosted Install             : Complete (0:01:10)
Web Console Install        : In Progress (0:11:58)
	This phase can be restarted by running: playbooks/openshift-web-console/config.yml



Failure summary:


  1. Hosts:    10.240.0.66
     Play:     Web Console
     Task:     Report console errors
     Message:  Console install failed.

An error was detected. After resolving the problem please relaunch the
installation process.
========================================================================

Expected results:
Installation should succeed with scheduling enabled on the master and the web-console up and running.

Comment 4 Vadim Rutkovsky 2018-04-24 13:36:55 UTC
Why is quick installer installing 3.10 packages? I don't we support 3.10 installs using quick installer

Comment 5 Vadim Rutkovsky 2018-05-14 15:27:04 UTC
My fault, the packages and image tags look fine. The quick installer was still assuming masters should not be schedulable and kept rewriting hosts file.

Created https://github.com/openshift/openshift-ansible/pull/8367 to fix this

Comment 6 Vadim Rutkovsky 2018-05-28 08:16:31 UTC
Fix is available in openshift-ansible-3.9.30-1

Comment 7 Wenkai Shi 2018-05-29 07:49:37 UTC
Verified with version atomic-openshift-utils-3.9.30-1.git.0.a91a657.el7, code has merged.
Quick installer translation from installer.cfg.yml to hosts file is marking the node as schedulable.

Comment 8 Wenkai Shi 2018-05-29 07:50:44 UTC
# cat .config/openshift/hosts
...
[nodes]
host-8-250-2.example.com ... openshift_node_labels="{'region': 'infra'}" openshift_schedulable=True
host-8-249-8.example.com ... openshift_node_labels="{'region': 'infra'}" openshift_schedulable=True
...
[masters]
host-8-250-2.example.com ...
...


Note You need to log in before you can comment on or make changes to this bug.