1509289 – [3.5] eviction manager sometimes evicts all pods

Bug 1509289 - [3.5] eviction manager sometimes evicts all pods

Summary: [3.5] eviction manager sometimes evicts all pods

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	OpenShift Container Platform
Classification:	Red Hat
Component:	Node
Sub Component:
Version:	3.5.1
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	medium
Target Milestone:	---
Target Release:	3.6.z
Assignee:	Seth Jennings
QA Contact:	DeShuai Ma
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2017-11-03 13:32 UTC by Carsten Lichy-Bittendorf
Modified:	2018-04-12 05:59 UTC (History)
CC List:	6 users (show)
Fixed In Version:
Doc Type:	Bug Fix
Doc Text:	Fixes an issue where slow pod deletion on a node under eviction pressure could result in the eviction of all pods.
Clone Of:
Environment:
Last Closed:	2018-04-12 05:59:07 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)
logs where eviction behaves badly (110.10 KB, application/x-gzip) 2017-11-03 13:32 UTC, Carsten Lichy-Bittendorf	no flags	Details
View All

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Product Errata	RHBA-2018:1106	0	None	None	None	2018-04-12 05:59:32 UTC

Description Carsten Lichy-Bittendorf 2017-11-03 13:32:20 UTC

Created attachment 1347364 [details]
logs where eviction behaves badly

Description of problem:
since upgrade to OCP 3.5 sometimes the eviction manager deletes all pods from a node under pressure

Version-Release number of selected component (if applicable):


How reproducible:

nodes have this eviction setting in /etc/origin/node/node-config.yaml

kubeletArguments:
  eviction-hard:
  - memory.available<20%

Before the eviction :
see files describe-node-before.txt and pods-before.txt

launch a memory hog on the node to slightly raise the limit :
see the file Screenshot-2017-11-3 Grafana - PaaS cluster view(1).png at 11:13


Actual results:
- On some occasions all pods get killed, see files describe-node-after.txt, pods-after.txt and cdyi0544-journal.log.gz

Expected results:
- Eviction should select pod to kill to free memory, kill the pod and stop processing when the threshold fall back below 80%.

Additional info:
- looks like a regression as it worked on OCP3.4 as expected when tested last
- In the logs cdyi0544-journal.log.gz, there are many lines like that  :
W1103 11:18:20.779379   75247 eviction_manager.go:117] Failed to admit pod dc-springboot-1-ngvlm_springboot-sdev(5120a265-c080-11e7-8374-005056bf9134) - node has conditions: %v%!(EXTRA []api.NodeConditionType=[MemoryPressure]) which are not seen in a good run

Comment 10 Seth Jennings 2017-11-06 14:42:45 UTC

OCP 3.6 PR:
https://github.com/openshift/ose/pull/920

Backporting to 3.5 is problematic since the code has changed quite a bit.

Comment 16 weiwei jiang 2018-01-25 08:50:31 UTC

Checked with # openshift version
openshift v3.6.173.0.96
kubernetes v1.6.1+5115d708d7
etcd 3.2.1

And now eviction will not evict all pods at the same time, so verify this issue.

Comment 19 errata-xmlrpc 2018-04-12 05:59:07 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2018:1106

Note You need to log in before you can comment on or make changes to this bug.