Bug 85397 - System hang with heavy memory using apps
System hang with heavy memory using apps
Product: Red Hat Enterprise Linux 2.1
Classification: Red Hat
Component: kernel (Show other bugs)
i386 Linux
medium Severity high
: ---
: ---
Assigned To: Larry Woodman
Brian Brock
Depends On:
  Show dependency treegraph
Reported: 2003-02-28 21:18 EST by Venkatesh Pallipadi
Modified: 2007-11-30 17:06 EST (History)
3 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Last Closed: 2003-04-14 09:27:11 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---

Attachments (Terms of Use)
oom_patch1.patch (428 bytes, patch)
2003-03-28 13:51 EST, Venkatesh Pallipadi
no flags Details | Diff
oom_patch2.patch (1.08 KB, patch)
2003-03-28 13:51 EST, Venkatesh Pallipadi
no flags Details | Diff
oom_patch3.patch (600 bytes, patch)
2003-03-28 13:52 EST, Venkatesh Pallipadi
no flags Details | Diff

  None (edit)
Description Venkatesh Pallipadi 2003-02-28 21:18:32 EST
From Bugzilla Helper:
User-Agent: Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1)

Description of problem:

The AS 2.1 system (with latest e.12 kernel), running gpg-encryption tests 
hangs within couple of hours after starting the test. The system has 4G memory 
and 2G swap. The tests makes use of a lot of memory, and swap space.

The same tests runs fine on RH 8.1 based system. 

At the point of hang system has zero free swap space and very little amount of 
memory (4M) available. So low that OS has to kill some process(es) to make 
progress. On RH 8.0 based system some processes does get killed on the way. I 
dont think that is actually happening with AS 2.1 kernel.

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:
1.Configure gpg-encryption test program
2.Start the tests, with number of threads depending on number of processesors 
in the system
3.The system will hang in 1-2 hours

Additional info:
Comment 1 Venkatesh Pallipadi 2003-03-28 13:51:21 EST
Created attachment 90767 [details]
Comment 2 Venkatesh Pallipadi 2003-03-28 13:51:54 EST
Created attachment 90768 [details]
Comment 3 Venkatesh Pallipadi 2003-03-28 13:52:20 EST
Created attachment 90769 [details]
Comment 4 Venkatesh Pallipadi 2003-03-28 13:54:07 EST
Did more analysis into this one. At the point of hang, oom_kill has infact 
sent a kill signal to one of the processes, but the corresponding process 
never gets to handle the signal and do an exit. As that process is waiting on 
memory, and never gets woken up as we have a memory shortage condition.

Following are the issues that when fixed will resolve this hang:
1) wakeup_memwaiters() in kswapd should wake up all processes waiting on 
memory everytime. Not only at (!VM_SHOULD_SLEEP) condition.
This will prevent the situation where oom_kill sends a kill signal to a 
processs, and that process never gets to handle the signal as it is waiting on 
2.4.18+ kernels use this sort of mechanism to wakeup all processes waiting, 
ireespective of available memory, in kswapd.
Attached oom_patch1.patch does this.

2) Second issue is what should happen if the process that got the first kill 
signal is sleeping. The current code keeps on sending the kill signal to the 
same process, thus ending up not doing any forward progress.
A better approach, as suggested by Rik in lkml, is to mark the processes as 
oom_kill is sent to them, and try killing some other process when we reach the 
low memory condition next time. Attached patch oom_patch2.patch originally 
posted by Rik, rebased to AS 2.1.

3) Looks like a real bug in page_alloc.c. In _wrapped_alloc_pages(), in a 
check to retry alloc_pages, there is a condition where we should check 
(!free_shortage()), in place of (free_shortage()). Only when there is no 
shortage, we need to keep on retrying. If there is a shortage and we are 
looking for order > 0, then we should return failure rather than waiting 
At the hang we also noticed that if the process getting the kill signal 
happens to be in this state, wherein it keeps on retrying indefinitely to get 
memory, doing non-interruptible sleeps inbetween. This again can result in 
hang as no process gets actually killed, even after oom_kill sends a kill 
Another change that can help low memory condition is to retry indefinitely 
only on zero order pages. And not for page order <= 3.
Attached oom_patch3.patch does the above changes.

Comment 5 Susan Denham 2003-03-28 17:47:27 EST
Per Larry Woodman:

Problem 1)above:  fixed and in latest kernel build for AS 2.1 x86 (sent to Intel)
Problem 2) above:  under review, but patch would require a great deal of testing.
Problem 3) above:  fixed and in latest kernel build for AS 2.1 x86 (sent to Intel)

Please test latest AS 2.1 x86 kernel build (pointer sent to
Paul.Gutierrez@intel.com in email)and report back.
Comment 6 Tim Burke 2003-03-31 08:18:32 EST
Adding in more detail, also based on input from Larry

Overall, we do not consider this problem to be of highest criticality.  Reason
being that it refers to an edge condition whereby all of memory and swap have
been consumed.  We can not compromise the integrity of the normal operating
condition for an edge condition.

Problem 1)above:  fixed and in latest kernel build for AS 2.1 x86 (sent to Intel)
Problem 2) above:  we do not agree with the proposed patch.  
Problem 3) above:  Improved in the prior Q1 errata. If we blindly accepted this
patch, the end result would be substantially more process killing than
necessary.  Definitely too heavy handed, as our primary goal under these
circumstances is to keep the system alive.

We feel that the the majority of the problems highlighted here have been
addressed under the best tradeoff policies.  There may still be cases in which
processes don't get killed during complete depletion, but that balance is deemed
Comment 7 Venkatesh Pallipadi 2003-03-31 12:10:57 EST
This was a part of the changes in patch 3 above. Somehow I still believe that 
this is a bug in the code. mm/page_alloc.c has a condition
if (!order || free_shortage()) {
and I feel it should be
if (!order || !free_shortage()) {

As per the existing check, we do a retry on the page request when we are low 
on memory, and we _do_not_ do a retry on a page request (with order > 1) when 
we have no shortage of free memory. 

Am I missing something here?
Comment 8 Venkatesh Pallipadi 2003-03-31 12:46:44 EST
Status update with kernel-2.4.9-e.16.3:
We ran the tests on two systems that used to display this problem before.
The issue still persists with the new kernel..
Dell 870 based platform:
            3/28/03 16:35 - gpgstress test started
            3/31/03 03:30 - system hung
            swap space left: 2007840K
Another 4-way system:
            3/28/03 22:23 - gpgstress test started        
            3/28/03 23:13 - system hung
            swap space left: 4K

I will try to do some analysis on these hanging system and will update the 
details later.
Comment 9 Venkatesh Pallipadi 2003-03-31 12:55:37 EST
While we tend to agree that this test case is an edge condition wherein all of 
memory and swap have been consumed. 
But, we dont feel comfortable with the system going into hang state. I mean, 
even going into a panic() state, with this kind of workload is a much better 
way. With that atleast system can gracefully report the error and restart. If 
the system goes into the hang state, there is very little option for the 
system administrator, in terms of identifying the failure, especially with a 
remote system.
And another point of concern is, this can be reproduced, on some systems, 
within couple of hours of gpgstress test run.

Another fact observed from our tests. RH 8.0 survives this test on both 
platforms above, for more than 72 hours.
Comment 10 Larry Woodman 2003-03-31 16:11:39 EST
First of all, I dont think that the "if (!order || free_shortage())"
is wrong.  By the time you get to this test, previously in the __alloc_pages()
routine, you have woken up kswapd and yield()'d and you have drained any 
pages on the inactive clean list onto the free list without satisfying the 
order>0 allocation.  Looping back up and around again wont do anything for 
order>0 allocation unless there is a free shortage. nt forget that order>0
allocations must come from the buddy allocator free lists, the inactive free
pages cant be used for order>0 allocations. 

I already applied the equivalent of your patch 1 in the 2.4.9-e.14.1 kernel
although I dont instantly wakeup every process waiting inside wakeup_kswapd,
I dont let any process sleep any longer than 30 itterations of kswapd.  This
is less aggressive but provides the same finctionality.

I have include the equilivent of patch 3 in the latest kernel with the 
exception of the free_shortage() logic described above.  The current logic

    if(order ==0) goto try_again
    if(order <= 3 && process is not being oom killed) goto try_again
    if(order > 3 && havent tried 3 times yet) goto try_agian

I have not included patch 2 because of the potential undesirable side
effects.  With this patch, it is possible to oom kill many processes
that are not currently in a killable state.  Once one process is killable
and it gets killed, the system will kill all other processes that were
previously marked to be killed.  This is very undesirable especially for
an edge condition that you really have to forcefully try yo get the system

Larry Woodman

Comment 11 Venkatesh Pallipadi 2003-03-31 17:48:29 EST
Thanks for the detailed explanation.

The kernel that were used for over the weekend tests (e-16.3), does have a 
variant of patch 1. But, I didn't see the "order <= 3 && process is not being 
oom killed" check that you have mentioned above. Can we get the latest kernel, 
so that we can rerun the tests and see the progress.

As you say, patch 3 is kind of really aggressive. But, good thing about that 
is it wont hang with this workload, as eventually it would empty all killable 
processes and call panic(). But as you have mentioned, that may not help much 
in a normal workload.

Another thing that we found during our analysis here is:
The problem does not necessarily come from one single process that is waiting 
forever in alloc_pages. This process that is waiting forever, unfortunately is 
also holding some file system lock (inode, page) and there are a bunch of 
other process doing uninterruptible sleep on these locks. This way none of the 
process in this bunch can ever get killed too.

Thanks again.
Comment 12 Venkatesh Pallipadi 2003-04-04 17:52:52 EST
Did the test with the latest update kernel. 

And we successfully completed 72 hours test run on Dell 870 based platform. No 
failures were seen on other platforms too.

We can go ahead and close this bug now.
Thanks for all the support.
Comment 13 Larry Troan 2003-04-14 09:27:11 EDT
PER ABOVE COMMENT BY  Venkatesh Pallipadi of Intel on 2003-04-04 17:52, CLOSING

Note You need to log in before you can comment on or make changes to this bug.