Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 1497326

Summary: [RFE] Alert on >25% etcd memory usage on a master, take action
Product: OpenShift Container Platform Reporter: Max Whittingham <mwhittin>
Component: EtcdAssignee: Sam Batschelet <sbatsche>
Status: CLOSED WONTFIX QA Contact: ge liu <geliu>
Severity: medium Docs Contact:
Priority: medium    
Version: 3.5.1CC: aos-bugs, ccoleman, gblomqui, jgoulding, jokerman, mfojtik, mmccomas, mwhittin, nagrawal, scuppett, tkatarki
Target Milestone: ---Keywords: OpsBlocker
Target Release: 4.3.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
We need to add a general tuning guide recommendation that etcd memory should never be allowed to grow past 25%. This is the doc to add it.
Story Points: ---
Clone Of: Environment:
Last Closed: 2019-11-06 13:24:59 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Max Whittingham 2017-09-29 19:23:31 UTC
Description of problem:
etcd timed out, lost connectivity to one of the members briefly, then reconnected.

Version-Release number of selected component (if applicable):
3.5.5.31-1

How reproducible:
I've not been able to reproduce this

Actual results:
Sep 29 19:14:04 $node.ec2.internal etcd[74733]: etcdserver: request timed out, possibly due to previous leader failure
Sep 29 19:14:06 $node.ec2.internal etcd[74733]: etcdserver: request timed out, possibly due to previous leader failure
Sep 29 19:14:11 $node.ec2.internal etcd[74733]: lost the TCP streaming connection with peer 3727844635f090bc (stream MsgApp v2 reader)

Expected results:
the etcdserver requests should not lose the streaming connection

Comment 5 Clayton Coleman 2017-10-03 18:44:54 UTC
We need to alert on >25% etcd memory usage on a master and force an upgrade or prune.

Comment 9 Red Hat Bugzilla 2023-09-14 04:09:14 UTC
The needinfo request[s] on this closed bug have been removed as they have been unresolved for 1000 days