Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.

Bug 1389804

Summary:	etcd3 cluster keeps electing new leaders during OpenShift cluster load to 1K namespaces
Product:	Red Hat Enterprise Linux 7	Reporter:	Mike Fiedler <mifiedle>
Component:	etcd3	Assignee:	Timothy St. Clair <tstclair>
Status:	CLOSED NOTABUG	QA Contact:	Martin Jenner <mjenner>
Severity:	medium	Docs Contact:
Priority:	medium
Version:	7.3	CC:	jeder, mifiedle, sjr, tstclair, vlaad
Target Milestone:	rc	Keywords:	Extras
Target Release:	---
Hardware:	x86_64
OS:	Linux
Whiteboard:	aos-scalability-34
Fixed In Version:		Doc Type:	If docs needed, set a value
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2017-01-25 13:56:43 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:

Description Mike Fiedler 2016-10-28 17:07:13 UTC

Description of problem:

During our standard OpenShift cluster-load horizontal scale test to 1K nodes, 2K deployments, 4K running pods on 300 nodes, the etcd3 3.0.12-3 cluster frequently called for new elections and changed leaders.   The same workload on etcd 2.3.7 normally results in no leader changes.   On this scale up it happened 28 times.


Version-Release number of selected component (if applicable):  OpenShift 3.4.0.16 with etcd3 3.0.12-3


How reproducible: unknown.   Happened frequently on this run


Steps to Reproduce:
1.  HA cluster with 3 masters, 3 etcd, 2 infra nodes and 300 application nodes
2.  Run the https://github.com/openshift/svt/blob/master/openshift_scalability/config/pyconfigMasterVirtScalePause.yaml workload configured for 1000 projects
3.  Watch the etcd3 logs for leader changes

Actual results:

Frequent leader changes that seem to come in bursts.  Occasional oc command failures from the cluster-loader script due to temporary leaderless etcd cluster.

Expected results:

No unnecessary etcd3 cluster churn.

Comment 3 Timothy St. Clair 2016-10-28 18:28:38 UTC

Issue log'd upstream https://github.com/coreos/etcd/issues/6753

Comment 4 Timothy St. Clair 2016-11-03 20:32:53 UTC

The only thing I can think of is if there is write contention on the VMs. Could you check to make certain that the etcd nodes are landing on different hypervisors. An easy way to do this in our environment is to make the instance sizes so large that the eat a whole host.

Comment 9 Vikas Laad 2016-11-23 01:00:06 UTC

I am running into this problem with 1000 nodes cluster when trying to run conformance tests.

Comment 10 Vikas Laad 2016-11-28 18:07:03 UTC

(In reply to Vikas Laad from comment #9)
> I am running into this problem with 1000 nodes cluster when trying to run
> conformance tests.

There are other issues related to network etc in this env, please ignore this comment.

Comment 13 Timothy St. Clair 2017-01-25 13:56:43 UTC

Closing this issue as we rooted the causes on a couple of conditions due to the storage subsystems write latency on openstack environments.  

1. Was host anti-affinity is needed if using local storage
2. Shared ceph cluster write latency issues occur during fsyncs

Once putting etcd on dedicated storage, issues were resolved. 

Please reference upstream guidelines on deployment: https://github.com/coreos/etcd/blob/master/Documentation/op-guide/hardware.md#hardware-recommendations