Bug 1500667

Summary:	Fail to scale-up etcd when running as system container
Product:	OpenShift Container Platform	Reporter:	Gaoyun Pei <gpei>
Component:	Installer	Assignee:	Jan Chaloupka <jchaloup>
Status:	CLOSED ERRATA	QA Contact:	Gaoyun Pei <gpei>
Severity:	medium	Docs Contact:
Priority:	medium
Version:	3.7.0	CC:	aos-bugs, jokerman, mmccomas
Target Milestone:	---
Target Release:	3.7.0
Hardware:	Unspecified
OS:	Unspecified
Whiteboard:
Fixed In Version:		Doc Type:	If docs needed, set a value
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2017-11-28 22:16:24 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:

Comment 4 Jan Chaloupka 2017-10-13 12:03:01 UTC

If you specify only one new etcd host in the inventory file, the scaling up works. The problem is the scaling up from a 1-node etcd cluster to 2-node etcd cluster. The new cluster needs to elect its leader.

If the inventory file has 2 or more hosts under the new_etcd group (two in this case), the scale-up playbook generates the following env for the second member (the first new_etcd host to scale up):

ETCD_INITIAL_CLUSTER=<etcd1_ip>=https://<etcd1_ip>:2380,<etcd2_ip>=https://<etcd2_ip>:2380,<etcd3_ip>=https://<etcd3_ip>:2380

When the etcd service of the second member is started, it waits for the <etcd3_ip> member which never starts. Thus, the leader is never elected and the cluster becomes unhealthy. After removing the `,<etcd3_ip>=https://<etcd3_ip>:2380` I am able to add a new member.

Comment 5 Scott Dodson 2017-10-13 13:07:19 UTC

UpcomingRelease as system containers are tech preview.

Comment 6 Jan Chaloupka 2017-10-13 13:39:22 UTC

Upstream PR: https://github.com/openshift/openshift-ansible/pull/5747

Comment 7 openshift-github-bot 2017-10-15 13:28:12 UTC

Commit pushed to master at https://github.com/openshift/openshift-ansible

https://github.com/openshift/openshift-ansible/commit/1720af442f0b02359ce4cc70d32adca15d9d26ab
Merge pull request #5747 from ingvagabund/set-initial-etcd-cluster-properly-system-container-scale-up

Automatic merge from submit-queue.

Set initial etcd cluster properly during system container scale up

When a cluster is scaled up, the ETCD_INITIAL_CLUSTER must not contain etcd members that are not about to start or are not part of the etcd cluster.

Consolidating `initial_etcd_cluster` and `etcd_initial_cluster` as they do exactly the same.

Bug: 1500667

Comment 9 Gaoyun Pei 2017-10-19 07:19:11 UTC

Verify this bug with openshift-ansible-3.7.0-0.161.0.git.0.2ca2c69.el7.noarch.rpm,
etcd scale-up playbook could add two new etcd members successfully to the cluster which only had one single external etcd in the beginning, all etcd members were running as system container.

Comment 12 errata-xmlrpc 2017-11-28 22:16:24 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2017:3188