Bug 1530403 - Installer fails noting no etcd group despite etcd hosts group that IS defined
Summary: Installer fails noting no etcd group despite etcd hosts group that IS defined
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Installer
Version: 3.9.0
Hardware: Unspecified
OS: Unspecified
medium
medium
Target Milestone: ---
: 3.9.0
Assignee: Russell Teague
QA Contact: Gaoyun Pei
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2018-01-02 23:02 UTC by Eric Jones
Modified: 2018-03-28 14:17 UTC (History)
9 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Error message on etcd group validation updated to reflect the required configurations to better inform the user of the failure state.
Clone Of:
: 1538795 (view as bug list)
Environment:
Last Closed: 2018-03-28 14:17:25 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2018:0489 0 None None None 2018-03-28 14:17:57 UTC

Description Eric Jones 2018-01-02 23:02:59 UTC
Description of problem:
Running [0] to install a new 3.7 cluster fails with error message about "Running etcd as an embedded service is no longer supported." despite the hosts file including an etcd section.

[0] /usr/share/ansible/openshift-ansible/playbooks/byo/config.yml

Version-Release number of the following components:
ansible --version
ansible 2.4.1.0
  config file = /etc/ansible/ansible.cfg
  configured module search path = [u'/root/.ansible/plugins/modules', u'/usr/share/ansible/plugins/modules']
  ansible python module location = /usr/lib/python2.7/site-packages/ansible
  executable location = /usr/bin/ansible
  python version = 2.7.5 (default, May  3 2017, 07:55:04) [GCC 4.8.5 20150623 (Red Hat 4.8.5-14)]

# rpm -qa | grep -ie ansible -ie openshift -ie ose -ie ocp
atomic-openshift-clients-3.7.14-1.git.0.593a50e.el7.x86_64
atomic-openshift-sdn-ovs-3.7.14-1.git.0.593a50e.el7.x86_64
openshift-ansible-lookup-plugins-3.7.14-1.git.0.4b35b2d.el7.noarch
openshift-ansible-callback-plugins-3.7.14-1.git.0.4b35b2d.el7.noarch
procps-ng-3.3.10-16.el7.x86_64
ansible-2.4.1.0-1.el7.noarch
atomic-openshift-3.7.14-1.git.0.593a50e.el7.x86_64
atomic-openshift-node-3.7.14-1.git.0.593a50e.el7.x86_64
atomic-openshift-docker-excluder-3.7.14-1.git.0.593a50e.el7.noarch
openshift-ansible-filter-plugins-3.7.14-1.git.0.4b35b2d.el7.noarch
openshift-ansible-playbooks-3.7.14-1.git.0.4b35b2d.el7.noarch
openshift-ansible-3.7.14-1.git.0.4b35b2d.el7.noarch
tuned-profiles-atomic-openshift-node-3.7.14-1.git.0.593a50e.el7.x86_64
atomic-openshift-excluder-3.7.14-1.git.0.593a50e.el7.noarch
openshift-ansible-roles-3.7.14-1.git.0.4b35b2d.el7.noarch
openshift-ansible-docs-3.7.14-1.git.0.4b35b2d.el7.noarch


Attaching hosts file and ansible playbook output shortly

Comment 3 Scott Dodson 2018-01-04 20:11:10 UTC
It's complaining because there are two etcd hosts which is not a valid number of etcd hosts. In order to provide an HA etcd environment you need three etcd hosts, as it is now if a single etcd host were to fail the entire cluster would fail. If you don't care about HA then you can specify one. If this is a new environment lets leave it at that.

If this is an existing environment we should try to scale them up to three etcd hosts so they have a proper HA environment. To do that add a host to [new_etcd] and run playbooks/byo/openshift-etcd/scaleup.yml.

Comment 4 Eric Jones 2018-01-04 21:28:16 UTC
I understand that one could look at the hosts file to determine that because of the number of etcd required for HA but did you see anything in the ansible output that would indicate that?

If not, then I think That is the bug here as we should note that as the error instead of "Running etcd as an embedded service is no longer supported. If this is a new install please define an 'etcd' group with either one or three hosts. These hosts may be the same hosts as your masters. If this is an upgrade you may set openshift_master_unsupported_embedded_etcd=true until a migration playbook becomes available.\n"

Comment 5 Scott Dodson 2018-01-04 21:29:38 UTC
Yes, the error message you copied and pasted says as much.

"If this is a new install please define an 'etcd' group with either one or three hosts."

Comment 6 Scott Dodson 2018-01-04 21:37:45 UTC
I agree the error should be updated so we'll use this to track that.

Comment 7 Eric Jones 2018-01-04 21:48:13 UTC
Thanks Scott!

Comment 8 Russell Teague 2018-01-24 20:49:47 UTC
Proposed: https://github.com/openshift/openshift-ansible/pull/6858

Comment 9 Russell Teague 2018-01-25 15:03:31 UTC
Merged

Comment 11 Gaoyun Pei 2018-01-26 09:56:33 UTC
The proposed PR not merged in openshift-ansible-3.9.0-0.24.0.git.0.735690f.el7.noarch yet, wait for next build to verify the bug.

Comment 12 Gaoyun Pei 2018-01-29 07:41:17 UTC
Verify this bug with openshift-ansible-3.9.0-0.31.0.git.0.e0a0ad8.el7.noarch

Prepare an ansible inventory file which has two etcd hosts, run playbooks/prerequisites.yml.

#ansible-playbook -i host /usr/share/ansible/openshift-ansible/playbooks/prerequisites.yml
...
TASK [Evaluate groups - Fail if no etcd hosts group is defined] *************************************************************************************************************
fatal: [localhost]: FAILED! => {"changed": false, "msg": "Running etcd as an embedded service is no longer supported. If this is a new install please define an 'etcd' group with either one, three or five hosts. These hosts may be the same hosts as your masters. If this is an upgrade please see https://docs.openshift.com/container-platform/latest/install_config/upgrading/migrating_embedded_etcd.html for documentation on how to migrate from embedded to external etcd.\n"}
	to retry, use: --limit @/usr/share/ansible/openshift-ansible/playbooks/prerequisites.retry

PLAY RECAP ******************************************************************************************************************************************************************
ec2-52-200-181-35.compute-1.amazonaws.com : ok=1    changed=0    unreachable=0    failed=0   
localhost                  : ok=1    changed=0    unreachable=0    failed=1

Comment 15 errata-xmlrpc 2018-03-28 14:17:25 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2018:0489


Note You need to log in before you can comment on or make changes to this bug.