Bug 1506177

Summary: Upgrade will fail if the number of etcd hosts is more than 3
Product: OpenShift Container Platform Reporter: liujia <jiajliu>
Component: Cluster Version OperatorAssignee: Scott Dodson <sdodson>
Status: CLOSED ERRATA QA Contact: liujia <jiajliu>
Severity: medium Docs Contact:
Priority: medium    
Version: 3.7.0CC: aos-bugs, hgomes, jchaloup, jokerman, mmccomas
Target Milestone: ---   
Target Release: 3.9.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
The etcd host validation now accepts 1 or more etcd hosts allowing greater flexibility in the number of etcd hosts configured. The recommended number of etcd hosts is still 3.
Story Points: ---
Clone Of: Environment:
Last Closed: 2018-03-28 14:08:09 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description liujia 2017-10-25 10:31:49 UTC
Description of problem:
Run upgrade against cluster with 4 etcd hosts, upgrade will fail at task [Evaluate groups - Fail if no etcd hosts group is defined].

fatal: [localhost]: FAILED! => {"changed": false, "failed": true, "msg": "Running etcd as an embedded service is no longer supported. If this is a new install please define an 'etcd' group with either one or three hosts. These hosts may be the same hosts as your masters. If this is an upgrade you may set openshift_master_unsupported_embedded_etcd=true until a migration playbook becomes available.\n"}

===============
g_etcd_hosts length check should not limit in [3,1].

# vim playbooks/common/openshift-cluster/evaluate_groups.yml
 - name: Evaluate groups - Fail if no etcd hosts group is defined
    fail:
      msg: >
        Running etcd as an embedded service is no longer supported. If this is a
        new install please define an 'etcd' group with either one or three
        hosts. These hosts may be the same hosts as your masters. If this is an
        upgrade you may set openshift_master_unsupported_embedded_etcd=true
        until a migration playbook becomes available.
    when:
    - g_etcd_hosts | default([]) | length not in [3,1]
    - not openshift_master_unsupported_embedded_etcd | default(False)
    - not (openshift_node_bootstrap | default(False))


Version-Release number of the following components:
openshift-ansible-docs-3.7.0-0.178.0.git.0.27a1039.el7.noarch

How reproducible:
always

Steps to Reproduce:
1. Upgrade against ocp with more than 3 etcd hosts
2.
3.

Actual results:
Upgrade failed.

Expected results:
Upgrade succeed.

Additional info:
Please attach logs from ansible-playbook with the -vvv flag

Comment 1 Jan Chaloupka 2017-10-25 12:05:48 UTC
The number of etcd members reflects failure tolerance of the cluster [1]. So creating a cluster of size 4 is not a huge improvement to size 3. I believe the size of the etcd cluster has been kept in bounds since the 1-etcd member and 3-etcd member clusters deployment are known and thoroughly tested.

IINM, it is preferable to deploy a cluster with 3 etcd members and then scale the etcd up with the playbooks/common/openshift-etcd/scaleup.yml. One can deploy a basic cluster, see how it behaves and then scale etcd up in case the number of etcd CRUD requests goes over a reasonable limit.

[1] https://coreos.com/etcd/docs/latest/v2/admin_guide.html#optimal-cluster-size

Comment 2 Scott Dodson 2017-10-25 13:05:45 UTC
Discussed with the master team (Michal Fojtik and Stefan Schimanski) we should accept either 1, 3, or 5 nodes as an acceptable cluster size and we should recommend 3 nodes. Lets update the error message to make that more clear.

Comment 4 Scott Dodson 2018-01-24 15:49:26 UTC
https://github.com/openshift/openshift-ansible/pull/6749 updates the rules to accept 1, 3, or 5 etcd hosts. We're not going to support any other configurations.

Comment 6 liujia 2018-01-30 08:03:47 UTC
Verified on openshift-ansible-3.9.0-0.31.0.git.0.e0a0ad8.el7.noarch.
The fail msg will not block upgrade playbook when upgrade with 5 etcds, but will fail when etcd number is out of [1,3,5].

Comment 10 errata-xmlrpc 2018-03-28 14:08:09 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2018:0489