If you specify only one new etcd host in the inventory file, the scaling up works. The problem is the scaling up from a 1-node etcd cluster to 2-node etcd cluster. The new cluster needs to elect its leader. If the inventory file has 2 or more hosts under the new_etcd group (two in this case), the scale-up playbook generates the following env for the second member (the first new_etcd host to scale up): ETCD_INITIAL_CLUSTER=<etcd1_ip>=https://<etcd1_ip>:2380,<etcd2_ip>=https://<etcd2_ip>:2380,<etcd3_ip>=https://<etcd3_ip>:2380 When the etcd service of the second member is started, it waits for the <etcd3_ip> member which never starts. Thus, the leader is never elected and the cluster becomes unhealthy. After removing the `,<etcd3_ip>=https://<etcd3_ip>:2380` I am able to add a new member.
UpcomingRelease as system containers are tech preview.
Upstream PR: https://github.com/openshift/openshift-ansible/pull/5747
Commit pushed to master at https://github.com/openshift/openshift-ansible https://github.com/openshift/openshift-ansible/commit/1720af442f0b02359ce4cc70d32adca15d9d26ab Merge pull request #5747 from ingvagabund/set-initial-etcd-cluster-properly-system-container-scale-up Automatic merge from submit-queue. Set initial etcd cluster properly during system container scale up When a cluster is scaled up, the ETCD_INITIAL_CLUSTER must not contain etcd members that are not about to start or are not part of the etcd cluster. Consolidating `initial_etcd_cluster` and `etcd_initial_cluster` as they do exactly the same. Bug: 1500667
Verify this bug with openshift-ansible-3.7.0-0.161.0.git.0.2ca2c69.el7.noarch.rpm, etcd scale-up playbook could add two new etcd members successfully to the cluster which only had one single external etcd in the beginning, all etcd members were running as system container.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2017:3188