Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 1404959

Summary:	pods creation slows down over time
Product:	OpenShift Container Platform	Reporter:	Elvir Kuric <ekuric>
Component:	Node	Assignee:	Jordan Liggitt <jliggitt>
Status:	CLOSED ERRATA	QA Contact:	Mike Fiedler <mifiedle>
Severity:	medium	Docs Contact:
Priority:	unspecified
Version:	3.4.0	CC:	aos-bugs, eparis, jeder, jmencak, jokerman, mifiedle, mmccomas, smunilla, tstclair
Target Milestone:	---
Target Release:	---
Hardware:	x86_64
OS:	Linux
Whiteboard:	aos-scalability-34
Fixed In Version:		Doc Type:	No Doc Update
Doc Text:	undefined	Story Points:	---
Clone Of:		Environment:
Last Closed:	2017-08-10 05:17:28 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:

Description Elvir Kuric 2016-12-15 08:46:32 UTC

Description of problem:

when creating 15k pods, across 100 projects and 5 pods at time there is slowdown in pods creation over time. 
I sent 5 pods to be created, wait all sent in previous step are in "Running state", then send next batch of pods. 

I have noticed, when there is 500-1000 or 2000 pods present on cluster, or lower number of pods on system, I am able to create and start cca 120 pods / 2 min, as time goes and more pods is added to cluster, the number of pods created during 2 mins reduces to number between 10-20, what is considerable slow

This is environment with 3 masters, 3 etcd servers ( separate and on baremetal with 32CPUs, and 132 GB of memory per etcd machine ), and 982 application nodes. 


Version-Release number of selected component (if applicable):

below atomic-openshift packages are installed 
atomic-openshift-node-3.4.0.32-1.git.0.d349492.gh37721.el7.x86_64
atomic-openshift-master-3.4.0.32-1.git.0.d349492.gh37721.el7.x86_64
atomic-openshift-3.4.0.32-1.git.0.d349492.gh37721.el7.x86_64
atomic-openshift-tests-3.4.0.32-1.git.0.d349492.gh37721.el7.x86_64
tuned-profiles-atomic-2.7.1-3.el7.noarch
atomic-openshift-clients-3.4.0.32-1.git.0.d349492.gh37721.el7.x86_64
atomic-openshift-dockerregistry-3.4.0.32-1.git.0.d349492.gh37721.el7.x86_64
atomic-openshift-clients-redistributable-3.4.0.32-1.git.0.d349492.gh37721.el7.x86_64
tuned-profiles-atomic-openshift-node-3.4.0.32-1.git.0.d349492.gh37721.el7.x86_64
atomic-openshift-sdn-ovs-3.4.0.32-1.git.0.d349492.gh37721.el7.x86_64
atomic-openshift-pod-3.4.0.32-1.git.0.d349492.gh37721.el7.x86_64


How reproducible:

I can reproduce this almost every time I run test

Steps to Reproduce:

1. delete all pods except infra pods on cluster 
2. start creation of new pods 
3. watch number of pods added during time ( I was monitoring oc get pods --all-namespaces | wc -l every 2 minutes ) 

Actual results:

Pods creation slows down with increasing number of pods added to cluster 

Expected results:

Pods creation to keep approximate values over time 

Additional info:
Logs will be sent to some file server ( GBs - cannot attache to BZ directly )

Comment 4 Timothy St. Clair 2016-12-15 16:26:11 UTC

So I have more comments on the experiment, b/c slower throughput on trickled input with a large cluster is expected.  Also, this behavior is not see on cluster horizontal tests.  

Let me elaborate: 

Scheduling throughput is a f(running pods) + new_pods_per_cycle.  So if you are consistently adding small batches of pods the overall throughput of the scheduler goes way down because the scheduler will only take the new batch and will pay the penalty of f(running pods) several times + there is a delay between next evaluation.  It's better to batch the submissions either concurrently or in bulk if the goal is to load faster.

Comment 7 Timothy St. Clair 2017-01-31 16:38:03 UTC

Cancel needinfo and switching to: https://bugzilla.redhat.com/show_bug.cgi?id=1343196

Comment 9 Eric Paris 2017-02-14 18:30:51 UTC

Since this is blocked on the switch to etcd 3.1 and the v3 schema and we aren't doing that until 3.6, moving to UpcomingRelease.

Comment 12 Andy Goldstein 2017-03-22 19:42:31 UTC

Still waiting on 1.6 rebase

Comment 13 Andy Goldstein 2017-04-17 14:17:44 UTC

Still waiting on 1.6 rebase

Comment 14 Andy Goldstein 2017-05-02 18:33:44 UTC

1.6.1 in in, moving to MODIFIED

Comment 16 Mike Fiedler 2017-07-06 13:57:46 UTC

Verified on 3.6.122

Comment 18 errata-xmlrpc 2017-08-10 05:17:28 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2017:1716