Bugzilla will be upgraded to version 5.0. The upgrade date is tentatively scheduled for 2 December 2018, pending final testing and feedback.
Bug 1404959 - pods creation slows down over time
pods creation slows down over time
Status: CLOSED ERRATA
Product: OpenShift Container Platform
Classification: Red Hat
Component: Pod (Show other bugs)
3.4.0
x86_64 Linux
unspecified Severity medium
: ---
: ---
Assigned To: Jordan Liggitt
Mike Fiedler
aos-scalability-34
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2016-12-15 03:46 EST by Elvir Kuric
Modified: 2017-08-16 15 EDT (History)
9 users (show)

See Also:
Fixed In Version:
Doc Type: No Doc Update
Doc Text:
undefined
Story Points: ---
Clone Of:
Environment:
Last Closed: 2017-08-10 01:17:28 EDT
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)


External Trackers
Tracker ID Priority Status Summary Last Updated
Red Hat Product Errata RHEA-2017:1716 normal SHIPPED_LIVE Red Hat OpenShift Container Platform 3.6 RPM Release Advisory 2017-08-10 05:02:50 EDT

  None (edit)
Description Elvir Kuric 2016-12-15 03:46:32 EST
Description of problem:

when creating 15k pods, across 100 projects and 5 pods at time there is slowdown in pods creation over time. 
I sent 5 pods to be created, wait all sent in previous step are in "Running state", then send next batch of pods. 

I have noticed, when there is 500-1000 or 2000 pods present on cluster, or lower number of pods on system, I am able to create and start cca 120 pods / 2 min, as time goes and more pods is added to cluster, the number of pods created during 2 mins reduces to number between 10-20, what is considerable slow

This is environment with 3 masters, 3 etcd servers ( separate and on baremetal with 32CPUs, and 132 GB of memory per etcd machine ), and 982 application nodes. 


Version-Release number of selected component (if applicable):

below atomic-openshift packages are installed 
atomic-openshift-node-3.4.0.32-1.git.0.d349492.gh37721.el7.x86_64
atomic-openshift-master-3.4.0.32-1.git.0.d349492.gh37721.el7.x86_64
atomic-openshift-3.4.0.32-1.git.0.d349492.gh37721.el7.x86_64
atomic-openshift-tests-3.4.0.32-1.git.0.d349492.gh37721.el7.x86_64
tuned-profiles-atomic-2.7.1-3.el7.noarch
atomic-openshift-clients-3.4.0.32-1.git.0.d349492.gh37721.el7.x86_64
atomic-openshift-dockerregistry-3.4.0.32-1.git.0.d349492.gh37721.el7.x86_64
atomic-openshift-clients-redistributable-3.4.0.32-1.git.0.d349492.gh37721.el7.x86_64
tuned-profiles-atomic-openshift-node-3.4.0.32-1.git.0.d349492.gh37721.el7.x86_64
atomic-openshift-sdn-ovs-3.4.0.32-1.git.0.d349492.gh37721.el7.x86_64
atomic-openshift-pod-3.4.0.32-1.git.0.d349492.gh37721.el7.x86_64


How reproducible:

I can reproduce this almost every time I run test

Steps to Reproduce:

1. delete all pods except infra pods on cluster 
2. start creation of new pods 
3. watch number of pods added during time ( I was monitoring oc get pods --all-namespaces | wc -l every 2 minutes ) 

Actual results:

Pods creation slows down with increasing number of pods added to cluster 

Expected results:

Pods creation to keep approximate values over time 

Additional info:
Logs will be sent to some file server ( GBs - cannot attache to BZ directly )
Comment 4 Timothy St. Clair 2016-12-15 11:26:11 EST
So I have more comments on the experiment, b/c slower throughput on trickled input with a large cluster is expected.  Also, this behavior is not see on cluster horizontal tests.  

Let me elaborate: 

Scheduling throughput is a f(running pods) + new_pods_per_cycle.  So if you are consistently adding small batches of pods the overall throughput of the scheduler goes way down because the scheduler will only take the new batch and will pay the penalty of f(running pods) several times + there is a delay between next evaluation.  It's better to batch the submissions either concurrently or in bulk if the goal is to load faster.
Comment 7 Timothy St. Clair 2017-01-31 11:38:03 EST
Cancel needinfo and switching to: https://bugzilla.redhat.com/show_bug.cgi?id=1343196
Comment 9 Eric Paris 2017-02-14 13:30:51 EST
Since this is blocked on the switch to etcd 3.1 and the v3 schema and we aren't doing that until 3.6, moving to UpcomingRelease.
Comment 12 Andy Goldstein 2017-03-22 15:42:31 EDT
Still waiting on 1.6 rebase
Comment 13 Andy Goldstein 2017-04-17 10:17:44 EDT
Still waiting on 1.6 rebase
Comment 14 Andy Goldstein 2017-05-02 14:33:44 EDT
1.6.1 in in, moving to MODIFIED
Comment 16 Mike Fiedler 2017-07-06 09:57:46 EDT
Verified on 3.6.122
Comment 18 errata-xmlrpc 2017-08-10 01:17:28 EDT
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2017:1716

Note You need to log in before you can comment on or make changes to this bug.