Bugzilla will be upgraded to version 5.0. The upgrade date is tentatively scheduled for 2 December 2018, pending final testing and feedback.
Bug 1500667 - Fail to scale-up etcd when running as system container
Fail to scale-up etcd when running as system container
Status: CLOSED ERRATA
Product: OpenShift Container Platform
Classification: Red Hat
Component: Installer (Show other bugs)
3.7.0
Unspecified Unspecified
medium Severity medium
: ---
: 3.7.0
Assigned To: Jan Chaloupka
Gaoyun Pei
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2017-10-11 05:59 EDT by Gaoyun Pei
Modified: 2017-11-28 17:16 EST (History)
3 users (show)

See Also:
Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2017-11-28 17:16:24 EST
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)


External Trackers
Tracker ID Priority Status Summary Last Updated
Red Hat Product Errata RHSA-2017:3188 normal SHIPPED_LIVE Moderate: Red Hat OpenShift Container Platform 3.7 security, bug, and enhancement update 2017-11-28 21:34:54 EST

  None (edit)
Comment 4 Jan Chaloupka 2017-10-13 08:03:01 EDT
If you specify only one new etcd host in the inventory file, the scaling up works. The problem is the scaling up from a 1-node etcd cluster to 2-node etcd cluster. The new cluster needs to elect its leader.

If the inventory file has 2 or more hosts under the new_etcd group (two in this case), the scale-up playbook generates the following env for the second member (the first new_etcd host to scale up):

ETCD_INITIAL_CLUSTER=<etcd1_ip>=https://<etcd1_ip>:2380,<etcd2_ip>=https://<etcd2_ip>:2380,<etcd3_ip>=https://<etcd3_ip>:2380

When the etcd service of the second member is started, it waits for the <etcd3_ip> member which never starts. Thus, the leader is never elected and the cluster becomes unhealthy. After removing the `,<etcd3_ip>=https://<etcd3_ip>:2380` I am able to add a new member.
Comment 5 Scott Dodson 2017-10-13 09:07:19 EDT
UpcomingRelease as system containers are tech preview.
Comment 6 Jan Chaloupka 2017-10-13 09:39:22 EDT
Upstream PR: https://github.com/openshift/openshift-ansible/pull/5747
Comment 7 openshift-github-bot 2017-10-15 09:28:12 EDT
Commit pushed to master at https://github.com/openshift/openshift-ansible

https://github.com/openshift/openshift-ansible/commit/1720af442f0b02359ce4cc70d32adca15d9d26ab
Merge pull request #5747 from ingvagabund/set-initial-etcd-cluster-properly-system-container-scale-up

Automatic merge from submit-queue.

Set initial etcd cluster properly during system container scale up

When a cluster is scaled up, the ETCD_INITIAL_CLUSTER must not contain etcd members that are not about to start or are not part of the etcd cluster.

Consolidating `initial_etcd_cluster` and `etcd_initial_cluster` as they do exactly the same.

Bug: 1500667
Comment 9 Gaoyun Pei 2017-10-19 03:19:11 EDT
Verify this bug with openshift-ansible-3.7.0-0.161.0.git.0.2ca2c69.el7.noarch.rpm,
etcd scale-up playbook could add two new etcd members successfully to the cluster which only had one single external etcd in the beginning, all etcd members were running as system container.
Comment 12 errata-xmlrpc 2017-11-28 17:16:24 EST
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2017:3188

Note You need to log in before you can comment on or make changes to this bug.