Bug 1392980 - Limit the number of pods with the starting state on a node
Summary: Limit the number of pods with the starting state on a node
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: RFE
Version: 3.2.1
Hardware: Unspecified
OS: Unspecified
unspecified
medium
Target Milestone: ---
: ---
Assignee: Derek Carr
QA Contact: Xiaoli Tian
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2016-11-08 15:27 UTC by Frederic Giloux
Modified: 2019-04-18 19:55 UTC (History)
6 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2019-04-18 19:55:23 UTC
Target Upstream Version:


Attachments (Terms of Use)

Description Frederic Giloux 2016-11-08 15:27:19 UTC
Description of problem:

Tomcat and other Java applications are CPU intensive at startup compared to normal run. When a node dies or is evacuated lots of such applications may be restarted at once on other nodes. The startup process takes significantly longer: 2-3 minutes instead of 30s in such cases, which has for consequences that the readiness probes fail and... that the pods get restarted. It is agreed that the expiration period could be manually changed for the readiness check when a node needs to get evacuated but it is a manual process that we would like to avoid and it would not be doable when a node dies. The issue may vanish when additional nodes are provisioned (better distribution of the load) but it would be nice to have a way to limit the number of "starting" pods on a node.

Version-Release number of selected component (if applicable):

3.2.1.13 and 3.2.1.15 

How reproducible:
 and several pods with Tomcat running on them. Evacuate the node.

Steps to Reproduce:
1. Have a small cluster with 2 nodes
2. Starts a Tomcat application (quickstart)
3. Scale a single Tomcat instance and take note of the time required for it
4. Scale so that there are several instances running
5. Evacuate one of the nodes

Actual results:

readiness probe fails.

Expected results:

to be able to limit the number of pods starting at the same time on the remaining node so that additional pods get only started when the first ones are successfully running.

Comment 2 Jeremy Eder 2016-11-14 15:43:11 UTC
We've built into our cluster-loader utility something called a tuningset, which is a way of enforcing some "pacing" on clients.

These are ways to set intervals and rates so that we can load at maximum speed, while keeping the system stable.

We had to do this in openshift v2 as well, but v3 is even worse in terms of parallelism.  In the case of container creation, much of the failures or fragility can be pinned to docker.

We're prototyping a way to measure current "busy-ness" of docker by reading it's API, and using that as auto-tuning backpressure that our client will use.  In this way, we can load as fast as docker can safely go.  I don't yet know if docker will have the features we need, and it might also not be the only source of information we need.

It might be beneficial to look not only at docker but at the system resource profile as well, potentially detecting storage I/O saturation and pacing (queuing) client requests.  Essentially we need a way to "protect" docker (and any other runtime) from Kubernetes.  Amazon does this by rate-limiting their API to protect their control plane.

Comment 3 Derek Carr 2016-12-12 21:16:40 UTC
This is an RFE to rate limit via QPS the number of container start operations that are made to the container runtime from the kubelet.

Comment 5 Tushar Katarki 2019-04-18 19:55:23 UTC
I think this ask has been discussed upstream thoroughly. See https://github.com/kubernetes/kubernetes/issues/3312

It point to some best practices and other features and issues that can address the issue.

I don't think there is any other work in upstream in this problem space.


Note You need to log in before you can comment on or make changes to this bug.