Bug 1530403

Summary:	Installer fails noting no etcd group despite etcd hosts group that IS defined
Product:	OpenShift Container Platform	Reporter:	Eric Jones <erjones>
Component:	Installer	Assignee:	Russell Teague <rteague>
Status:	CLOSED ERRATA	QA Contact:	Gaoyun Pei <gpei>
Severity:	medium	Docs Contact:
Priority:	medium
Version:	3.9.0	CC:	aos-bugs, dmoessne, erjones, jokerman, mmccomas, rteague, sdodson, wmeng, xtian
Target Milestone:	---
Target Release:	3.9.0
Hardware:	Unspecified
OS:	Unspecified
Whiteboard:
Fixed In Version:		Doc Type:	Bug Fix
Doc Text:	Error message on etcd group validation updated to reflect the required configurations to better inform the user of the failure state.	Story Points:	---
Clone Of:
Clones:	1538795 (view as bug list)		Environment:
Last Closed:	2018-03-28 14:17:25 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:

Description Eric Jones 2018-01-02 23:02:59 UTC

Description of problem:
Running [0] to install a new 3.7 cluster fails with error message about "Running etcd as an embedded service is no longer supported." despite the hosts file including an etcd section.

[0] /usr/share/ansible/openshift-ansible/playbooks/byo/config.yml

Version-Release number of the following components:
ansible --version
ansible 2.4.1.0
  config file = /etc/ansible/ansible.cfg
  configured module search path = [u'/root/.ansible/plugins/modules', u'/usr/share/ansible/plugins/modules']
  ansible python module location = /usr/lib/python2.7/site-packages/ansible
  executable location = /usr/bin/ansible
  python version = 2.7.5 (default, May  3 2017, 07:55:04) [GCC 4.8.5 20150623 (Red Hat 4.8.5-14)]

# rpm -qa | grep -ie ansible -ie openshift -ie ose -ie ocp
atomic-openshift-clients-3.7.14-1.git.0.593a50e.el7.x86_64
atomic-openshift-sdn-ovs-3.7.14-1.git.0.593a50e.el7.x86_64
openshift-ansible-lookup-plugins-3.7.14-1.git.0.4b35b2d.el7.noarch
openshift-ansible-callback-plugins-3.7.14-1.git.0.4b35b2d.el7.noarch
procps-ng-3.3.10-16.el7.x86_64
ansible-2.4.1.0-1.el7.noarch
atomic-openshift-3.7.14-1.git.0.593a50e.el7.x86_64
atomic-openshift-node-3.7.14-1.git.0.593a50e.el7.x86_64
atomic-openshift-docker-excluder-3.7.14-1.git.0.593a50e.el7.noarch
openshift-ansible-filter-plugins-3.7.14-1.git.0.4b35b2d.el7.noarch
openshift-ansible-playbooks-3.7.14-1.git.0.4b35b2d.el7.noarch
openshift-ansible-3.7.14-1.git.0.4b35b2d.el7.noarch
tuned-profiles-atomic-openshift-node-3.7.14-1.git.0.593a50e.el7.x86_64
atomic-openshift-excluder-3.7.14-1.git.0.593a50e.el7.noarch
openshift-ansible-roles-3.7.14-1.git.0.4b35b2d.el7.noarch
openshift-ansible-docs-3.7.14-1.git.0.4b35b2d.el7.noarch


Attaching hosts file and ansible playbook output shortly

Comment 3 Scott Dodson 2018-01-04 20:11:10 UTC

It's complaining because there are two etcd hosts which is not a valid number of etcd hosts. In order to provide an HA etcd environment you need three etcd hosts, as it is now if a single etcd host were to fail the entire cluster would fail. If you don't care about HA then you can specify one. If this is a new environment lets leave it at that.

If this is an existing environment we should try to scale them up to three etcd hosts so they have a proper HA environment. To do that add a host to [new_etcd] and run playbooks/byo/openshift-etcd/scaleup.yml.

Comment 4 Eric Jones 2018-01-04 21:28:16 UTC

I understand that one could look at the hosts file to determine that because of the number of etcd required for HA but did you see anything in the ansible output that would indicate that?

If not, then I think That is the bug here as we should note that as the error instead of "Running etcd as an embedded service is no longer supported. If this is a new install please define an 'etcd' group with either one or three hosts. These hosts may be the same hosts as your masters. If this is an upgrade you may set openshift_master_unsupported_embedded_etcd=true until a migration playbook becomes available.\n"

Comment 5 Scott Dodson 2018-01-04 21:29:38 UTC

Yes, the error message you copied and pasted says as much.

"If this is a new install please define an 'etcd' group with either one or three hosts."

Comment 6 Scott Dodson 2018-01-04 21:37:45 UTC

I agree the error should be updated so we'll use this to track that.

Comment 7 Eric Jones 2018-01-04 21:48:13 UTC

Thanks Scott!

Comment 8 Russell Teague 2018-01-24 20:49:47 UTC

Proposed: https://github.com/openshift/openshift-ansible/pull/6858

Comment 9 Russell Teague 2018-01-25 15:03:31 UTC

Merged

Comment 11 Gaoyun Pei 2018-01-26 09:56:33 UTC

The proposed PR not merged in openshift-ansible-3.9.0-0.24.0.git.0.735690f.el7.noarch yet, wait for next build to verify the bug.

Comment 12 Gaoyun Pei 2018-01-29 07:41:17 UTC

Verify this bug with openshift-ansible-3.9.0-0.31.0.git.0.e0a0ad8.el7.noarch

Prepare an ansible inventory file which has two etcd hosts, run playbooks/prerequisites.yml.

#ansible-playbook -i host /usr/share/ansible/openshift-ansible/playbooks/prerequisites.yml
...
TASK [Evaluate groups - Fail if no etcd hosts group is defined] *************************************************************************************************************
fatal: [localhost]: FAILED! => {"changed": false, "msg": "Running etcd as an embedded service is no longer supported. If this is a new install please define an 'etcd' group with either one, three or five hosts. These hosts may be the same hosts as your masters. If this is an upgrade please see https://docs.openshift.com/container-platform/latest/install_config/upgrading/migrating_embedded_etcd.html for documentation on how to migrate from embedded to external etcd.\n"}
	to retry, use: --limit @/usr/share/ansible/openshift-ansible/playbooks/prerequisites.retry

PLAY RECAP ******************************************************************************************************************************************************************
ec2-52-200-181-35.compute-1.amazonaws.com : ok=1    changed=0    unreachable=0    failed=0   
localhost                  : ok=1    changed=0    unreachable=0    failed=1

Comment 15 errata-xmlrpc 2018-03-28 14:17:25 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2018:0489