Bugzilla will be upgraded to version 5.0. The upgrade date is tentatively scheduled for 2 December 2018, pending final testing and feedback.
Bug 1509289 - [3.5] eviction manager sometimes evicts all pods
[3.5] eviction manager sometimes evicts all pods
Status: CLOSED ERRATA
Product: OpenShift Container Platform
Classification: Red Hat
Component: Pod (Show other bugs)
3.5.1
Unspecified Unspecified
unspecified Severity medium
: ---
: 3.6.z
Assigned To: Seth Jennings
DeShuai Ma
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2017-11-03 09:32 EDT by Carsten Lichy-Bittendorf
Modified: 2018-04-12 01:59 EDT (History)
6 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Fixes an issue where slow pod deletion on a node under eviction pressure could result in the eviction of all pods.
Story Points: ---
Clone Of:
Environment:
Last Closed: 2018-04-12 01:59:07 EDT
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
logs where eviction behaves badly (110.10 KB, application/x-gzip)
2017-11-03 09:32 EDT, Carsten Lichy-Bittendorf
no flags Details


External Trackers
Tracker ID Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2018:1106 None None None 2018-04-12 01:59 EDT

  None (edit)
Description Carsten Lichy-Bittendorf 2017-11-03 09:32:20 EDT
Created attachment 1347364 [details]
logs where eviction behaves badly

Description of problem:
since upgrade to OCP 3.5 sometimes the eviction manager deletes all pods from a node under pressure

Version-Release number of selected component (if applicable):


How reproducible:

nodes have this eviction setting in /etc/origin/node/node-config.yaml

kubeletArguments:
  eviction-hard:
  - memory.available<20%

Before the eviction :
see files describe-node-before.txt and pods-before.txt

launch a memory hog on the node to slightly raise the limit :
see the file Screenshot-2017-11-3 Grafana - PaaS cluster view(1).png at 11:13


Actual results:
- On some occasions all pods get killed, see files describe-node-after.txt, pods-after.txt and cdyi0544-journal.log.gz

Expected results:
- Eviction should select pod to kill to free memory, kill the pod and stop processing when the threshold fall back below 80%.

Additional info:
- looks like a regression as it worked on OCP3.4 as expected when tested last
- In the logs cdyi0544-journal.log.gz, there are many lines like that  :
W1103 11:18:20.779379   75247 eviction_manager.go:117] Failed to admit pod dc-springboot-1-ngvlm_springboot-sdev(5120a265-c080-11e7-8374-005056bf9134) - node has conditions: %v%!(EXTRA []api.NodeConditionType=[MemoryPressure]) which are not seen in a good run
Comment 10 Seth Jennings 2017-11-06 09:42:45 EST
OCP 3.6 PR:
https://github.com/openshift/ose/pull/920

Backporting to 3.5 is problematic since the code has changed quite a bit.
Comment 16 weiwei jiang 2018-01-25 03:50:31 EST
Checked with # openshift version
openshift v3.6.173.0.96
kubernetes v1.6.1+5115d708d7
etcd 3.2.1

And now eviction will not evict all pods at the same time, so verify this issue.
Comment 19 errata-xmlrpc 2018-04-12 01:59:07 EDT
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2018:1106

Note You need to log in before you can comment on or make changes to this bug.