Bug 1506177 - Upgrade will fail if the number of etcd hosts is more than 3
Summary: Upgrade will fail if the number of etcd hosts is more than 3
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Cluster Version Operator
Version: 3.7.0
Hardware: Unspecified
OS: Unspecified
medium
medium
Target Milestone: ---
: 3.9.0
Assignee: Scott Dodson
QA Contact: liujia
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2017-10-25 10:31 UTC by liujia
Modified: 2021-03-11 16:06 UTC (History)
5 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
The etcd host validation now accepts 1 or more etcd hosts allowing greater flexibility in the number of etcd hosts configured. The recommended number of etcd hosts is still 3.
Clone Of:
Environment:
Last Closed: 2018-03-28 14:08:09 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2018:0489 0 None None None 2018-03-28 14:08:33 UTC

Description liujia 2017-10-25 10:31:49 UTC
Description of problem:
Run upgrade against cluster with 4 etcd hosts, upgrade will fail at task [Evaluate groups - Fail if no etcd hosts group is defined].

fatal: [localhost]: FAILED! => {"changed": false, "failed": true, "msg": "Running etcd as an embedded service is no longer supported. If this is a new install please define an 'etcd' group with either one or three hosts. These hosts may be the same hosts as your masters. If this is an upgrade you may set openshift_master_unsupported_embedded_etcd=true until a migration playbook becomes available.\n"}

===============
g_etcd_hosts length check should not limit in [3,1].

# vim playbooks/common/openshift-cluster/evaluate_groups.yml
 - name: Evaluate groups - Fail if no etcd hosts group is defined
    fail:
      msg: >
        Running etcd as an embedded service is no longer supported. If this is a
        new install please define an 'etcd' group with either one or three
        hosts. These hosts may be the same hosts as your masters. If this is an
        upgrade you may set openshift_master_unsupported_embedded_etcd=true
        until a migration playbook becomes available.
    when:
    - g_etcd_hosts | default([]) | length not in [3,1]
    - not openshift_master_unsupported_embedded_etcd | default(False)
    - not (openshift_node_bootstrap | default(False))


Version-Release number of the following components:
openshift-ansible-docs-3.7.0-0.178.0.git.0.27a1039.el7.noarch

How reproducible:
always

Steps to Reproduce:
1. Upgrade against ocp with more than 3 etcd hosts
2.
3.

Actual results:
Upgrade failed.

Expected results:
Upgrade succeed.

Additional info:
Please attach logs from ansible-playbook with the -vvv flag

Comment 1 Jan Chaloupka 2017-10-25 12:05:48 UTC
The number of etcd members reflects failure tolerance of the cluster [1]. So creating a cluster of size 4 is not a huge improvement to size 3. I believe the size of the etcd cluster has been kept in bounds since the 1-etcd member and 3-etcd member clusters deployment are known and thoroughly tested.

IINM, it is preferable to deploy a cluster with 3 etcd members and then scale the etcd up with the playbooks/common/openshift-etcd/scaleup.yml. One can deploy a basic cluster, see how it behaves and then scale etcd up in case the number of etcd CRUD requests goes over a reasonable limit.

[1] https://coreos.com/etcd/docs/latest/v2/admin_guide.html#optimal-cluster-size

Comment 2 Scott Dodson 2017-10-25 13:05:45 UTC
Discussed with the master team (Michal Fojtik and Stefan Schimanski) we should accept either 1, 3, or 5 nodes as an acceptable cluster size and we should recommend 3 nodes. Lets update the error message to make that more clear.

Comment 4 Scott Dodson 2018-01-24 15:49:26 UTC
https://github.com/openshift/openshift-ansible/pull/6749 updates the rules to accept 1, 3, or 5 etcd hosts. We're not going to support any other configurations.

Comment 6 liujia 2018-01-30 08:03:47 UTC
Verified on openshift-ansible-3.9.0-0.31.0.git.0.e0a0ad8.el7.noarch.
The fail msg will not block upgrade playbook when upgrade with 5 etcds, but will fail when etcd number is out of [1,3,5].

Comment 10 errata-xmlrpc 2018-03-28 14:08:09 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2018:0489


Note You need to log in before you can comment on or make changes to this bug.