| Summary: | Support for pod rescheduler to ensure cluster stays balanced and shows proper status of the pods. | ||
|---|---|---|---|
| Product: | OpenShift Container Platform | Reporter: | Jaspreet Kaur <jkaur> |
| Component: | RFE | Assignee: | Derek Carr <decarr> |
| Status: | CLOSED DEFERRED | QA Contact: | MinLi <minmli> |
| Severity: | high | Docs Contact: | |
| Priority: | high | ||
| Version: | 3.1.0 | CC: | aos-bugs, byount, decarr, erich, fcami, fshaikh, hgomes, jialiu, jkaur, jokerman, jswensso, jtudelag, knakayam, mbarrett, michael.voegele, mmccomas, myllynen, pep, sdehn, simon.gunzenreiner |
| Target Milestone: | --- | ||
| Target Release: | 3.11.0 | ||
| Hardware: | Unspecified | ||
| OS: | Unspecified | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | Bug Fix | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2019-06-11 21:17:06 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Bug Depends On: | |||
| Bug Blocks: | 1267746 | ||
|
Description
Jaspreet Kaur
2016-03-21 11:06:49 UTC
*** Bug 1314624 has been marked as a duplicate of this bug. *** A pod with best effort quality of service is able to consume as much memory as is available on the node. Running large numbers of best effort pods on a node increases the risk of inducing a system OOM as the scheduler is not placing the pods on nodes with any understanding of their potential resource requirements. Each container in a pod is given an OOM_SCORE_ADJ value that is evaluated in response to an OOM event on the node to determine which containers to kill to reclaim memory. The value range is -1000 to 1000 where the higher the number the more likely you are to be targeted by the oom_killer. Best-effort pods are given an OOM_SCORE_ADJ of 1000 so they are targeted first in response to OOM events. Guaranteed processes are given a score of -998. Burstable containers (make a request and an optional limit) and are scored in the range of 2-999 based on how much memory it is consuming relative to its request. So if a container is under its request, it will have a lower value, and if a container is over its request, it will have a higher value. This means the oom_killer will target best-effort, burstable, and guaranteed containers in that order. System daemons should have an OOM_SCORE_ADJ of -999 so its not targeted (docker, openshift-node). An OOM event can make the node unstable for an extended period of time as it starves cpu while reclaiming memory. When a container is killed by the oom_killer, the container may be restarted based on the restart policy on the pod definition. If the restart policy is always, the container will just restart, and the pod will continue to report running status. There is work planned for OpenShift 3.3 and Kubernetes 1.3 to support evictions when the node is reaching resource pressure conditions as documented here: https://github.com/kubernetes/kubernetes/blob/master/docs/proposals/kubelet-eviction.md With the above feature, the node will monitor available memory exceeding an admin defined trigger and attempt to evict pods (i.e. fail a pod) from the node when memory is under pressure before inducing a system OOM. Should this not be solved by https://docs.openshift.com/container-platform/3.4/admin_guide/out_of_resource_handling.html ? The descheduler is Tech Preview in OpenShift 3.10: https://docs.openshift.org/latest/admin_guide/scheduling/descheduler.html Red Hat is moving OpenShift feature requests to a new JIRA RFE system. This bz (RFE) has been identified as a feature request which is still being evaluated and has been moved. As the new Jira RFE system is not yet public, Red Hat Support can help answer your questions about your RFEs via the same support case system. https://.jira.coreos.com/browse/RFE-168 |