Bug 1393391 - rhel-push-plugin has unexpectedly high cpu usage when starting 250 OCP pods. Pushes not part of workload.
Summary: rhel-push-plugin has unexpectedly high cpu usage when starting 250 OCP pods. ...
Keywords:
Status: CLOSED WONTFIX
Alias: None
Product: Red Hat Enterprise Linux 7
Classification: Red Hat
Component: docker
Version: 7.3
Hardware: x86_64
OS: Linux
unspecified
medium
Target Milestone: rc
: ---
Assignee: Antonio Murdaca
QA Contact: atomic-bugs@redhat.com
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2016-11-09 12:44 UTC by Mike Fiedler
Modified: 2020-06-09 21:01 UTC (History)
6 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2020-06-09 21:01:54 UTC
Target Upstream Version:


Attachments (Terms of Use)

Description Mike Fiedler 2016-11-09 12:44:36 UTC
Description of problem:

This has been happening for a while but I keep forgetting to write it up.

Workload is starting 250 pause pods, which boils down to 250 ose-pod containers and 250 gcr.io/google_containers/pause-amd64:3.0 containers.   The workload is strictly starting containers.

40 pods are started at a time with a break between each batch.

A pidstat profile of the run shows rhel-push-plugin as the #3 CPU user at ~5%.   This seems high for a plugin which should be getting out of the way as quickly as it can, especially when no push is involved.

See an example here:  http://perf-infra.ec2.breakage.org/pbench/results/ip-172-31-12-67/pbench-user-benchmark_nodeVert314_2016-11-08_19:08:38/1/reference-result/tools-default/ip-172-31-48-59.us-west-2.compute.internal/pidstat/cpu_usage.html

Click on each of the top 3 lines of the table to see the top users (openshift node, docker daemon and rhel-push-plugin).

This pattern is consistent across runs

Version-Release number of selected component (if applicable):  1.12.3-3 from Extras.


How reproducible: Always

Comment 1 Antonio Murdaca 2016-11-09 13:06:10 UTC
So is this just a matter of how much resources the plugin consumes or is it blocked at some point preventing pushes? (the title may refers to the former but the description the latter)

Comment 2 Mike Fiedler 2016-11-09 13:11:21 UTC
runcom, sorry for the title switch :-).   This refers to the cpu resources consumed by the plugin while starting 250 OCP pods.   There is no blockage and there are no pushes in this workload.

Comment 3 Antonio Murdaca 2016-11-09 13:34:46 UTC
I'll try to reproduce this somehow, I suspect this is how Docker manages plugin s and has nothing to do with our plugin (since the plugin it's just an if not pushing then exists, it's really simple and basic).

Comment 4 Mike Fiedler 2016-11-09 13:39:36 UTC
Contact me offline if you want help reproducing.   The cluster-loader script described at https://github.com/openshift/svt/tree/master/openshift_scalability can help when used with this input file:  https://github.com/openshift/svt/blob/master/openshift_scalability/config/nodeVertical.yaml

Comment 5 Mike Fiedler 2016-11-09 13:40:07 UTC
The config file in comment 4 presumes 2 application nodes loaded to 250 pods each.

Comment 7 Antonio Murdaca 2016-11-10 16:35:25 UTC
found nothing so far, started a conversation with upstream https://github.com/docker/docker/issues/28244

Comment 8 Daniel Walsh 2017-06-30 15:24:43 UTC
Antonio anything ever come of this?

Comment 9 Tom Sweeney 2020-06-09 21:01:54 UTC
We have no plans to ship another version of Docker at this time. RHEL7 is in final support stages where only security fixes will get released.  Customers should move to use Podman which is available starting in RHEL 7.6.


Note You need to log in before you can comment on or make changes to this bug.