Description of problem: Run upgrade against cluster with 4 etcd hosts, upgrade will fail at task [Evaluate groups - Fail if no etcd hosts group is defined]. fatal: [localhost]: FAILED! => {"changed": false, "failed": true, "msg": "Running etcd as an embedded service is no longer supported. If this is a new install please define an 'etcd' group with either one or three hosts. These hosts may be the same hosts as your masters. If this is an upgrade you may set openshift_master_unsupported_embedded_etcd=true until a migration playbook becomes available.\n"} =============== g_etcd_hosts length check should not limit in [3,1]. # vim playbooks/common/openshift-cluster/evaluate_groups.yml - name: Evaluate groups - Fail if no etcd hosts group is defined fail: msg: > Running etcd as an embedded service is no longer supported. If this is a new install please define an 'etcd' group with either one or three hosts. These hosts may be the same hosts as your masters. If this is an upgrade you may set openshift_master_unsupported_embedded_etcd=true until a migration playbook becomes available. when: - g_etcd_hosts | default([]) | length not in [3,1] - not openshift_master_unsupported_embedded_etcd | default(False) - not (openshift_node_bootstrap | default(False)) Version-Release number of the following components: openshift-ansible-docs-3.7.0-0.178.0.git.0.27a1039.el7.noarch How reproducible: always Steps to Reproduce: 1. Upgrade against ocp with more than 3 etcd hosts 2. 3. Actual results: Upgrade failed. Expected results: Upgrade succeed. Additional info: Please attach logs from ansible-playbook with the -vvv flag
The number of etcd members reflects failure tolerance of the cluster [1]. So creating a cluster of size 4 is not a huge improvement to size 3. I believe the size of the etcd cluster has been kept in bounds since the 1-etcd member and 3-etcd member clusters deployment are known and thoroughly tested. IINM, it is preferable to deploy a cluster with 3 etcd members and then scale the etcd up with the playbooks/common/openshift-etcd/scaleup.yml. One can deploy a basic cluster, see how it behaves and then scale etcd up in case the number of etcd CRUD requests goes over a reasonable limit. [1] https://coreos.com/etcd/docs/latest/v2/admin_guide.html#optimal-cluster-size
Discussed with the master team (Michal Fojtik and Stefan Schimanski) we should accept either 1, 3, or 5 nodes as an acceptable cluster size and we should recommend 3 nodes. Lets update the error message to make that more clear.
https://github.com/openshift/openshift-ansible/pull/6749 updates the rules to accept 1, 3, or 5 etcd hosts. We're not going to support any other configurations.
Verified on openshift-ansible-3.9.0-0.31.0.git.0.e0a0ad8.el7.noarch. The fail msg will not block upgrade playbook when upgrade with 5 etcds, but will fail when etcd number is out of [1,3,5].
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2018:0489