Bug 208210 - Run Bonnie++ for memory test fail and system shows error message "out of memory"
Run Bonnie++ for memory test fail and system shows error message "out of memory"
Status: CLOSED DUPLICATE of bug 193542
Product: Red Hat Enterprise Linux 4
Classification: Red Hat
Component: kernel (Show other bugs)
4.0
x86_64 Linux
medium Severity high
: ---
: ---
Assigned To: Larry Woodman
Brian Brock
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2006-09-26 22:13 EDT by allance
Modified: 2007-11-30 17:07 EST (History)
0 users

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2006-12-08 08:36:46 EST
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:


Attachments (Terms of Use)
log file as OOM happen (76.54 KB, text/plain)
2006-09-27 22:13 EDT, allance
no flags Details
Display of killing process (147.15 KB, image/pjpeg)
2006-09-27 22:15 EDT, allance
no flags Details
logfile of OOM after installing hugmem kernel (122.46 KB, text/plain)
2006-10-03 05:41 EDT, allance
no flags Details
log file after reclaim lowmem (77.61 KB, text/plain)
2006-10-04 03:15 EDT, allance
no flags Details
log file for hugmem kernel load (78.61 KB, text/plain)
2006-10-10 22:53 EDT, allance
no flags Details

  None (edit)
Description allance 2006-09-26 22:13:16 EDT
Description of problem:
I use bonnie++1.03 to do memory test with 20GB system memory on 
RH3.0 U7 32 bit (install Hugemem rpm), RH4.0 U3 64 bit, SLES10 32 bit, 
and then all work well, only fail in RH4.0 U3 32 bit.
So, I think that it maybe RH4.0 U3 32 bit issue.
 
Besides, in RH4.0 U3 32 bit, system will kill other process to liberation 
memory space 
when system shows "Out of memory". That's not correct.
 
BTW,  it's weird that RH3.0 U7 32 bit shows only 4GB system memory size 
if I didn't install Hugemem rpm, but RH4.0 U3 32 bit always shows 20GB system
memory size regardless of Hugemem is install or not.


Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1.
2.
3.
  
Actual results:


Expected results:


Additional info:
Comment 1 Larry Woodman 2006-09-27 14:57:33 EDT
Pere, please attach the show_mem() output that was written to /var/log/messages
when the OOM kill occurs.

Larry
Comment 2 allance 2006-09-27 22:13:34 EDT
Created attachment 137282 [details]
log file as OOM happen
Comment 3 allance 2006-09-27 22:15:59 EDT
Created attachment 137283 [details]
Display of killing process

Hi Larry, I have attached the information you want. Please help us to find
root-cuase and solution soon. Thanks.
Comment 4 Larry Woodman 2006-09-28 10:36:20 EDT
The problem is that you are running the smp kernel on a 32GB system and thats
not supported.  Please install the hugemem kernel, reboot your system and that
will fix the problem.  Let me know the results as soon as you can do this.

Larry Woodman
Comment 5 allance 2006-10-01 21:29:49 EDT
We did install hugemem kernel, but the OOM still occured. You can refer to the 
description on the top of this bug ID. This problem happens on system with > 
16GB memory, and only happens on RH4 32bit OS. 
Our system supports AMD opetron rev.F 1207 cpu + DDR-II 667 registered 
memory.  

Allance Chen
Comment 6 Larry Woodman 2006-10-02 09:47:30 EDT
Can you attach the logfile from the OOM kill when it was running the hugemem kernel?

Thanks, Larry Woodman
Comment 7 allance 2006-10-03 05:41:32 EDT
Created attachment 137635 [details]
logfile of OOM after installing hugmem kernel
Comment 8 Larry Woodman 2006-10-03 13:57:13 EDT
The problem is that the system consumed most of lowmem with bounce buffers:
 
>>Normal free:640kB min:928kB low:1856kB high:2784kB active:1756kB inactive:1604kB
>>154963 bounce buffer pages

Please try "echo 100 > /proc/sys/vm/lower_zone_protection" to start reclaiming
lowmem earlier and see if that prevents the OOMkills.

Larry Woodman
Comment 9 allance 2006-10-04 03:14:25 EDT
OMMKills still happen by your comment, please check the attached log file.
Comment 10 allance 2006-10-04 03:15:29 EDT
Created attachment 137722 [details]
log file after reclaim lowmem
Comment 11 Larry Woodman 2006-10-05 15:19:42 EDT
Wait!!!  none of these OOM kills are running a hugemem kernel.  Please make sure
yout /boot/grub/grub.conf file selects the "kernel-hugemem-2.6.9-,whatever>.EL".
It is currently booting the smp kernel.

Sorry I didnt notice that after your comment #7.

Larry Woodman
Comment 12 allance 2006-10-10 22:53:37 EDT
Created attachment 138215 [details]
log file for hugmem kernel load
Comment 13 allance 2006-10-10 22:56:20 EDT
Still failed as booting into hugmem kernel. Refer to attachment.

Comment 14 Larry Woodman 2006-10-11 11:57:13 EDT
Once agian, this is due to bounce buffers.

>>>Oct 11 10:11:19 uut432 kernel: 776797 bounce buffer pages

1.) Does this only happen on 32-bit systems? 

2.) Can you try setting /proc/sys/vm/dirty_ratio to 5 and rerunning the test?
Comment 15 allance 2006-10-12 03:48:28 EDT
1.) It doesn't happen on 32-bit systems. Only happen on RH4 64bit.
2.) It can be passed bonnie test after set dirty_ratio to 5. Could you explain 
in detail for the root-cuase? And, will this be implement in next update of 
RH4 64bit?
Comment 16 Larry Woodman 2006-10-12 16:29:40 EDT
Are you sure its the 64-bit systems that have the problem?  All of the
show_mem() outputs are from 32-bit kernels, in other words they have lowmem and
highmem.  In this case we cant do IO to highmem so we use lowmem bounce buffers
and that exhausts lowmem before highmem and it cant be reclaimed.  If you lower
dirty_ratio from 40 to 5 the system starts writing out bounce buffers when 5% ot
RAM is dirty instead of 40%.  That prevents the system from getting into this
state. Are you OK with this???

Larry
Comment 17 allance 2006-10-12 21:42:40 EDT
Yes, we try many times on different systems/configurations in all RH 32/64bit 
OS. Only happened on RH4 64bit, can't find on RH3 64bit and other 32bit OS.

Thanks for dubuggind and rootcausing this problem. But, I do believe this is 
an kernel bug, so will this issue be fixed in new kernel of RH?
Please advise it.
Comment 18 allance 2006-10-18 01:15:53 EDT
Will this workaround/solution be implement into new kernel or OS?
Comment 19 Larry Woodman 2006-12-08 08:36:46 EST

*** This bug has been marked as a duplicate of 193542 ***

Note You need to log in before you can comment on or make changes to this bug.