User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.8.1.20) Gecko/20081217 Firefox/2.0.0.20 I am doing some enhancements for the oom killer logic in rh5.3 (will be valid for rh5.3 2.6.18-138 and later branches) which solve a couple of problems with oom killing behavior: (a) even with the bits added in various parts of alloc_pages() and the try to free pages paths to detect TIF_MEMDIE, it is out of the box fairly easy to create a situation, especially on a swapless system, where OOM is not declared by return values from try_to_free_pages(). Some of this is specifically due to a Redhat patch (as of 5.2) that just about guarantees that OOM will never be returned by try_to_free_pages(). In playing with it, I was able to get it to detect OOM and at least sometimes return the OOM indication to slloc_pages() [at which point the final checks for available free pages are done and out_of_memory() is called]. This is a pre-condition for the OOM killing logic to trigger in the first place. (b) If using badness() to select the target of the OOM kill picks a process then in my experience with multi-threaded large servers, it only kills the thread group leader, which leads to latencies because that has to land in the group exit function before it kicks the other threads. It is not clear that this makes any sense. The goal is to get all the threads on t->mm to land in exit_mm() before the physical pages have a prayer of being released. (c) Meanwhile other threads are down in the reclaim code causing scheduler havoc and if they are unable to reap many new pages AND they don't return an OOM indication, then the call to the page allocators is tried and if it fails, they end up (subject to gfp mask bits) backing off for HZ/10 and retrying the whole thing. If no processes exit on their own and the process which is expected to die (because OOM kill => SIGKILL to it) does not die, then we have OOM dead lock bigtime. Before pushing any of the patches which actually seem to solve these problems I have some pre-patches in mm/oom_kill.c that I'd like to submit for consideration in rhel5.4 or some later version of rhel5. These are pretty obvious patches, mostly dealing with avoiding kernel threads which have borrowed user pages (e.g. AIO worker threads), a missing task_unlock() for the swapoff process and in the search for any thread on the same mm to check for OOM disable (which aborts the whole thing), a minor typo. Finally when doing the oomkilladj > 0 case, if points is 0, the logic done by the left shift is fouled up. I found this one in some upstream kernel.org release (can't remember which). None of this is very testable because the OOM killer seems pretty hard to hit in my experience. Whoever knows the mm code pretty well would recognize the correctness of these patches. Also - there is this OOM_DISABLED property which a task with the right privileges can use the exempt itself from being OOM killed. This is a very worthwhile notion, but an added enhancement which I've found to be worthwhile is a global variant of this which is not wired to -17 and if set to say 1 instead of --17 can exempt all processes which have not 'volunteered' for OOM killing by increasing their oom adjustment value > whatever this value is. We call this limited oom killing and by doing a wrapper called setoom around the process startup, we have predictable candidates for the oom killing. It works particularly well if the candidate is restartable by its parent. With such a scheme we can do massive leak injections (to mimic a memory leak), watch the oom killer take down the largest non-exempt process it can find and the system pretty gracefully recovers the memory leaked. For this to work, out_of_memory() has to be called. Also - in the interest of not thrashing while the target of OOM killing exits and releases the memory, I have found it very worthwhile to hook the final drop of the mm (in exit_mm()) with a wakeup of processes who are blocking for the event. Coupled with some robust timers, this avoids the thrashing in the reclaim code which spikes the load average and in general does nothing useful if no other processes are exiting and releasing physical pages. Reproducible: Always
Created attachment 341271 [details] Fixes miscellaneous bugs in mm/oom_kill.c
This bug/component is not included in scope for RHEL-5.11.0 which is the last RHEL5 minor release. This Bugzilla will soon be CLOSED as WONTFIX (at the end of RHEL5.11 development phase (Apr 22, 2014)). Please contact your account manager or support representative in case you need to escalate this bug.
Thank you for submitting this request for inclusion in Red Hat Enterprise Linux 5. We've carefully evaluated the request, but are unable to include it in RHEL5 stream. If the issue is critical for your business, please provide additional business justification through the appropriate support channels (https://access.redhat.com/site/support).
Not sure why info is needed if this bug is WONTFIX.