Description of problem: The RHEL5 kernel contains a tunable out-of-memory killer algorithm, but the mechanism used to tune it is undocumented. See /proc/<PID>/(oom_adj|oom_score). Please document this so we know how to tune the OOM killer. Version-Release number of selected component (if applicable): How reproducible: Steps to Reproduce: 1. 2. 3. Actual results: Expected results: Additional info:
I discovered this tunable a couple months ago and finally got around to adding a bz request to add tunable to sshd initscript for protection from oom-kill: https://bugzilla.redhat.com/show_bug.cgi?id=360341 What I discovered is that it is only adjustable at /proc/pid/task/pid/oom_adj Thanks for adding this knob!
Documantation of oom_adj and oom_score: The RHEL5 kernel includes 2 files for each process to control when that process will be considered for termination when the system must start OOM killing. /proc/<pid>/oom_adj - Adjust the oom-killer score. This file can be used to adjust the score used to select which processes shall be killed in an out-of-memory situation. Giving a process a high score, increase the likelihood of this process being killed by the oom-killer. Valid values are in the range [-16:15], plus the special value '-17', which disables oom-killing that process altogether. Example: "echo 15 > proc/<pid>/oom_adj" significantly increase the likelyhood that process <pid> will be OOM killed. Example: "echo -16 > proc/<pid>/oom_adj" significantly decrease the likelyhood that process <pid> will be OOM killed. Example: "echo -17 > /proc/<pid>/oom_adj" will disable OOM killing for process <pid> totally. Note, the oom score is passed from parent process to child process durring fork() operations. /proc/<pid>/oom_score - Display current oom-killer score. This file can be used to check what the current score used by the oom-killer for any given <pid>. Use it together with /proc/<pid>/oom_adj to tune which process will be killed in an out-of-memory situation. Example: "cat /proc/<pid>/oom_score" will display the current OOM score for process <pid>.
Hi Linda, Larry, thanks for the note. as for which book to document this in, i'm thinking Deployment Guide for now. the Performance Tuning Guide/s are coming along quite slowly at the moment, seeing as i'm prioritizing release notes over it (we're currently working on migrating relnotes to publican). anyhow, just to verify: are these tunables present as of RHEL5.0 (RHEL5 GA)? i'll need to know so I can figure out which branches to apply these changes to. [reassigning bug to me and updating component] thanks!
This request was evaluated by Red Hat Product Management for inclusion in a Red Hat Enterprise Linux maintenance release. Product Management has requested further review of this request by Red Hat Engineering, for potential inclusion in a Red Hat Enterprise Linux Update release for currently deployed products. This request is not yet committed for inclusion in an Update release.
This request was previously evaluated by Red Hat Product Management for inclusion in the current Red Hat Enterprise Linux release, but Red Hat was unable to resolve it in time. This request will be reviewed for a future Red Hat Enterprise Linux release.
Just for your info, kbase has a entry "How do I determine and configure the likelihood that a process will be killed in a out-of-memory situation?" http://kbase.redhat.com/faq/docs/DOC-5359
Larry, here is a proposed solution for this bug, a new section under the "3.3 Directories withing /proc" section. Please let me know if this text will suffice and is correct. Thank you! <SNIP> 3.3.12. /proc/<PID>/ Out of Memory (OOM) refers to a computing state where all available memory, including swap space, has been allocated. Normally this will cause the system to panic and stop functioning as expected. There is a switch that controls OOM behavior in /proc/sys/vm/panic_on_oom. When set to 1 the kernel will panic on OOM. A setting of 0 instructs the kernel to call a function named oom_killer on an OOM. Usually, oom_killer can kill rogue processes and the system will survive. The easiest way to change this is to echo the new value to /proc/sys/vm/panic_on_oom. # cat /proc/sys/vm/panic_on_oom 1 # echo 0 > /proc/sys/vm/panic_on_oom # cat /proc/sys/vm/panic_on_oom 0 It is also possible to prioritize which processes get killed by adjusting the oom_killer score. In /proc/<PID>/ there are two tools labelled oom_adj and oom_score. Valid scores for oom_adj are in the range -16 to +15. To see the current oom_killer score, view the oom_score for the process. oom_killer will kill processes with the highest scores first. This example adjusts the oom_score of a process with a PID of 12465 to make it less likely that oom_killer will kill it. # cat /proc/12465/oom_score 79872 # echo -5 > /proc/12465/oom_adj # cat /proc/12465/oom_score 78 There is also a special value of -17, which disables oom_killer for that process. In the example below, oom_score returns a value of 0, indicating that this process would not be killed. # cat /proc/12465/oom_score 78 # echo -17 > /proc/12465/oom_adj # cat /proc/12465/oom_score 0 A function called badness() is used to determine the actual score for each process. This is done by adding up 'points' for each examined process. The process scoring is done in the following way: 1. The basis of each of the process's scores is its memory size. 2. The memory size of any of the process's children (not including a kernel thread) is also added to the score 3. The process's score is increased for 'niced' processes and decreased for long running processes. 4. Processes with the CAP_SYS_ADMIN and CAP_SYS_RAWIO capabilities have their scores reduced. 5. The final score is then bitshifted by the value saved in the oom_adj file. Thus, a process with the highest oom_score value will most probably be a non-priviliged, recently started process that, along with its children, uses a large amount of memory, has been 'niced', and handles no raw I/O. </SNIP>
Fixed in RHEL5 trunk, -r31751