1930100 – When number of pods exceed something around 100 running pods new pods stay at status Pending or ContainerCreating.

Bug 1930100 - When number of pods exceed something around 100 running pods new pods stay at status Pending or ContainerCreating.

Summary: When number of pods exceed something around 100 running pods new pods stay at...

Keywords:
Status:	CLOSED WONTFIX
Alias:	None
Product:	OpenShift Container Platform
Classification:	Red Hat
Component:	Node
Sub Component:
Version:	3.11.0
Hardware:	x86_64
OS:	Linux
Priority:	high
Severity:	high
Target Milestone:	---
Target Release:	3.11.z
Assignee:	Ryan Phillips
QA Contact:	MinLi
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2021-02-18 11:13 UTC by Andy Bartlett
Modified:	2021-11-12 14:53 UTC (History)
CC List:	9 users (show)
Fixed In Version:
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2021-11-12 14:53:23 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Github	openshift origin pull 26237	0	None	open	[release-3.11] Bug 1930100: Pod status patches	2021-06-21 21:30:37 UTC
Red Hat Product Errata	RHBA-2021:2928	0	None	None	None	2021-08-04 11:18:31 UTC

Description Andy Bartlett 2021-02-18 11:13:23 UTC

Description of problem:
My customer has the following issue:

On two nodes for our OpenShift cluster when number of pods exceed something around 100 running pods new pods  stay at status Pending or ContainerCreating.

There is sufficient memory and cpu available to schedule pods.
This does not happen on other nodes in this cluster.

OpenShift details:
    OpenShift Version: 3.11.317
    Number of masters: 3
    Number of infra nodes: 2
    Number of workers nodes: 18

Host detail:
   Hostname:
       XXX.XXX.XXX.XXX
       XXX.XXX.XXX.XXX
   Physical server
   Model: PowerEdge R740xd
   Atomic Version : 7.9.1

This influence  our customers they are unable to schedule production loads.
as Workaround we changed max-pods number to 100 in /etc/origin/node/node-config.yaml on those two nodes.


Version-Release number of selected component (if applicable):

OCP 3.11

How reproducible:
100% but only at the customers environment


Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:

Comment 16 Ben Bennett 2021-03-08 16:36:11 UTC

You can't directly change the size of a host portion of the IP range assigned to the SDN.  However, you can add a new pod IP range and give it a larger host allocation and then delete and add a node to move it to the new range.

Take a look at:
 https://docs.openshift.com/container-platform/3.11/install_config/configuring_sdn.html#configuring-the-pod-network-on-masters

Comment 27 MinLi 2021-07-31 09:01:05 UTC

verified with version: v3.11.487

keep more than 100 running pods per node, and when create new pods, they also become running status.

Comment 29 errata-xmlrpc 2021-08-04 11:18:17 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (OpenShift Container Platform 3.11.487 bug fix and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2021:2928

Note You need to log in before you can comment on or make changes to this bug.