Bug 1500667

Summary: Fail to scale-up etcd when running as system container
Product: OpenShift Container Platform Reporter: Gaoyun Pei <gpei>
Component: InstallerAssignee: Jan Chaloupka <jchaloup>
Status: CLOSED ERRATA QA Contact: Gaoyun Pei <gpei>
Severity: medium Docs Contact:
Priority: medium    
Version: 3.7.0CC: aos-bugs, jokerman, mmccomas
Target Milestone: ---   
Target Release: 3.7.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2017-11-28 22:16:24 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Comment 4 Jan Chaloupka 2017-10-13 12:03:01 UTC
If you specify only one new etcd host in the inventory file, the scaling up works. The problem is the scaling up from a 1-node etcd cluster to 2-node etcd cluster. The new cluster needs to elect its leader.

If the inventory file has 2 or more hosts under the new_etcd group (two in this case), the scale-up playbook generates the following env for the second member (the first new_etcd host to scale up):

ETCD_INITIAL_CLUSTER=<etcd1_ip>=https://<etcd1_ip>:2380,<etcd2_ip>=https://<etcd2_ip>:2380,<etcd3_ip>=https://<etcd3_ip>:2380

When the etcd service of the second member is started, it waits for the <etcd3_ip> member which never starts. Thus, the leader is never elected and the cluster becomes unhealthy. After removing the `,<etcd3_ip>=https://<etcd3_ip>:2380` I am able to add a new member.

Comment 5 Scott Dodson 2017-10-13 13:07:19 UTC
UpcomingRelease as system containers are tech preview.

Comment 6 Jan Chaloupka 2017-10-13 13:39:22 UTC
Upstream PR: https://github.com/openshift/openshift-ansible/pull/5747

Comment 7 openshift-github-bot 2017-10-15 13:28:12 UTC
Commit pushed to master at https://github.com/openshift/openshift-ansible

https://github.com/openshift/openshift-ansible/commit/1720af442f0b02359ce4cc70d32adca15d9d26ab
Merge pull request #5747 from ingvagabund/set-initial-etcd-cluster-properly-system-container-scale-up

Automatic merge from submit-queue.

Set initial etcd cluster properly during system container scale up

When a cluster is scaled up, the ETCD_INITIAL_CLUSTER must not contain etcd members that are not about to start or are not part of the etcd cluster.

Consolidating `initial_etcd_cluster` and `etcd_initial_cluster` as they do exactly the same.

Bug: 1500667

Comment 9 Gaoyun Pei 2017-10-19 07:19:11 UTC
Verify this bug with openshift-ansible-3.7.0-0.161.0.git.0.2ca2c69.el7.noarch.rpm,
etcd scale-up playbook could add two new etcd members successfully to the cluster which only had one single external etcd in the beginning, all etcd members were running as system container.

Comment 12 errata-xmlrpc 2017-11-28 22:16:24 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2017:3188