Created attachment 1347364 [details] logs where eviction behaves badly Description of problem: since upgrade to OCP 3.5 sometimes the eviction manager deletes all pods from a node under pressure Version-Release number of selected component (if applicable): How reproducible: nodes have this eviction setting in /etc/origin/node/node-config.yaml kubeletArguments: eviction-hard: - memory.available<20% Before the eviction : see files describe-node-before.txt and pods-before.txt launch a memory hog on the node to slightly raise the limit : see the file Screenshot-2017-11-3 Grafana - PaaS cluster view(1).png at 11:13 Actual results: - On some occasions all pods get killed, see files describe-node-after.txt, pods-after.txt and cdyi0544-journal.log.gz Expected results: - Eviction should select pod to kill to free memory, kill the pod and stop processing when the threshold fall back below 80%. Additional info: - looks like a regression as it worked on OCP3.4 as expected when tested last - In the logs cdyi0544-journal.log.gz, there are many lines like that : W1103 11:18:20.779379 75247 eviction_manager.go:117] Failed to admit pod dc-springboot-1-ngvlm_springboot-sdev(5120a265-c080-11e7-8374-005056bf9134) - node has conditions: %v%!(EXTRA []api.NodeConditionType=[MemoryPressure]) which are not seen in a good run
OCP 3.6 PR: https://github.com/openshift/ose/pull/920 Backporting to 3.5 is problematic since the code has changed quite a bit.
Checked with # openshift version openshift v3.6.173.0.96 kubernetes v1.6.1+5115d708d7 etcd 3.2.1 And now eviction will not evict all pods at the same time, so verify this issue.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2018:1106