Bug 1381716

Summary: etcd has increased amount of ESTABLISHED connections on 2379
Product: OpenShift Container Platform Reporter: ihorvath
Component: MasterAssignee: Timothy St. Clair <tstclair>
Status: CLOSED NOTABUG QA Contact: weiwei jiang <wjiang>
Severity: medium Docs Contact:
Priority: medium    
Version: 3.3.0CC: aos-bugs, jliggitt, jokerman, mmccomas, sdodson, tstclair, wsun
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2016-10-07 19:11:26 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Attachments:
Description Flags
log from one of the masters
none
after upgrade connection numbers tripled none

Description ihorvath 2016-10-04 19:45:00 UTC
Created attachment 1207333 [details]
log from one of the masters

Description of problem:
After upgrading a large cluster from 3.2 to 3.3.0.33 alerts started to come in that etcd has more than 1200 ESTABLISHED connections. As the attached log will show it also often mentions that the server is likely overloaded

Version-Release number of selected component (if applicable):
Openshift Master 3.3.0.33
etcd 2.3.7

How reproducible:
We have a cluster with 100 computer nodes and this is constant.

Steps to Reproduce:
1. Install openshift with 100 nodes
2. Check how many connections in ESTABLISHED state the etcd process has
3.

Actual results:
1200-1600 of connections

Expected results:
Was told by Clayton Coleman that this number should stay under 500

Additional info:

Comment 1 ihorvath 2016-10-04 19:45:56 UTC
Created attachment 1207335 [details]
after upgrade connection numbers tripled

Comment 2 Jordan Liggitt 2016-10-04 20:11:56 UTC
In 3.2, we had two etcd clients, one for origin and one for kubernetes.

In 3.3, we switched to using kubernetes' etcd client initialization for kubernetes resources, which means we have many more etcd clients, each with a 500-connection limit.

That would explain the much larger numbers of idle connections from 3.2 to 3.3.

Comment 3 Timothy St. Clair 2016-10-07 18:39:11 UTC
Even at those connection counts I'm not concerned, even if it does appear slightly higher then our numbers from our 300-node run.  Also apiserver and http2 fixes have gone in upstream for 1.5 and are enabled.  

Move to close->notabug

Comment 7 Timothy St. Clair 2016-10-07 19:11:26 UTC
Closing.  

If there is a Bug associated with this, please feel free to reopen.