Red Hat Bugzilla – Bug 143628
Out of Memory Killer is trigered
Last modified: 2007-11-30 17:07:15 EST
Description of problem:
Out of Memory Killer is trigered when stress test is running
Version-Release number of selected component (if applicable):
Steps to Reproduce:
OS version: rhel4-pre-rc1
Could you please test with the kernels in
I think I've already fixed this bug, but the patches just need to make
it into the RHEL4 tree. Please let me know if the bug still exists
with the test kernel.
I still see the OOM killer on a RC2 kernel.
We will test your kernel.
I added another patch (rhel4-vm-extraround.patch) and uploaded a new
test kernel (congest3) to
The previous test kernel could still trigger an OOM in our own
internal tests, though it took a few days for the error to trigger.
Please verify that the congest3 kernel works fine for you.
This kernel hanged after more than 2 days stress test running.
However, this time I did not see OOM killer.
I saw some
"SCSI error <0 0 0 0> return code = 0x800002"
printed on the screen.
I've uploaded a -congest4 kernel with further fixes
Could you please try that kernel to check how that one behaves?
I will test it.
Do you think the SCSI error message is related to VM?
I have tested the SCSI disk by
"dd if=/dev/sda of=/dev/null" successfully.
So it should not be a hardware problem.
I have tested this kernel.
Although I have not seen oops on screen, the system is realy unusable
after 72 hours runing.
It almost stop response.
Only very few free memory is left in the system.
When I do cat /proc/slabinfo
I see there is huge amount of size-64 slab.
Kernel memory leak?
It should be this bug.
It is a memory leak in sysfs.
Base kerenl has already fixed it.
Back to assigned.
I have tested new RHEL4-RC
The memory leak in sysfs is still there.
Simply do a
ls -lR /sys
will see a leak in slab memory.
Created attachment 109953 [details]
Patch to fix the sysfs memory leak
Tim, you may want to add this to the Day-0, or at least U1 lists.
Hi I work for VERITAS and have been seeing Out of memory killer being
triggered from out tests. I have investigated further and still get a
memory leak without any of our products loaded.
I am doing 'parted -s /dev/sdb print' in a loop and see memory
Could this be related?
I am running on 2.6.9-1.648_ELsmp
Andy, that's a very old kernel. Please try again with the latest and greatest
code which has been made available to our partners. The latest kernel available
is 2.6.9-5.EL. Thanks.
Ok I re-installed with 2.6.9-5.ELsmp and am still seeing memory
leakage when running 'parted -s /dev/XXXX print' in a loop.
I also sometimes see the parted command hang - I think this could be
related to 140472. If when parted has hung I do another command to
the disk this will also hang. The processes are unkillable.
Andy - can we get some more info, such as:
- what architecture? (x86, IPF, x86_64)
- what type of disk, what disk driver
- how big is the disk? Does it reproduce on smaller, vs larger disks?
- how much memory in your system
- is it doing anything else at the time?
- please attach a small test script consisting of your parted loop
- how long does it take to reproduce the oops
Created attachment 110496 [details]
Sun Dual Opteron x86_64
2 Gig Memory
LSI53C1030 - Fusion MPT SCSI Host driver 3.01.16
Disk - Vendor: SEAGATE Model: ST373307LC (74 Gig)
Just running the attached script (parted -s /dev/sdb print)
You can watch memory leaking and after about 2 hours it will
start killing processes.
Also observe memory leak if do he same to fibre attached disk:
qla2300 - 3Pardata array
I guess I should have added that I also see the message:
program parted is using a deprecated SCSI ioctl, please convert it to
on the console continuously while me test is running
Do you still get the OOM kills if the test is done using the qla driver?
Could #145695 be triggering the same problem?
Andy, in comment #16 you say that with the latest kernel you still see
"leakage". But, are you still seeing the oom kills? Can you better describe
the specific problem exhibited with the latest kernel?
Andy, can you attach the console outout(/var/log/messages) when the OOM kill
Thanks, Larry Woodman
I run the test script I have attached which calls parted in a loop
and uses 'top' to display memory useage, this can been seen to decrease.
I have also used 'echo m > /proc/sysrq-trigger' to check teh memory.
I booted my box with reduced memory (mem=256M) and this then did hit OOM
I have attached extract from /var/log/messages ...
Created attachment 110548 [details]
extract from /var/log/messages showing OOM killer
OOM killer when running parted -s /dev/sda print in a loop
Andy, you said you booted with 256MB? Thats weird, 256MB is 65535 pages but
your system only has about half of that! First of all we dont support less than
256MB for any architecture on RHEL4 but this might indicate a problem siging
memory when its limited at the boot command line with the mem= option.
DMA: present:16384kB which is 4096 pages
Normal: present:115712kB which is 28928 pages
Highmem: present:0kB which is 0 pages
Can you send along the outputs of "cat /proc/meminfo", "cat /proc/slabinfo" and
"cat /proc/cmdline". Also, are you running that memory leak patch that is
I'm sorry, my mistake I actually booted with mem=128M in order to get the
OOM to happen quicker.
I will retry (again) with mem=256M. Also I have not tried with the patch
included here (to fix sysfs memleak) as I dont have a kernel build environment
setup yet for the 2.6.9-5 kernel and I am not doing anything to /sys.
Does it not look like there is a memory leak with running parted? Surely it
would be a simple exercise for you to try this ....
I also see that parted command sometime hangs on one of the disks - as I said
above - this seems to be worse on the fibre disks but also happens on the
locally attached disks.
I will attach a new messages file if (when) I get OOM with 256M.
Before you reboot grab me that /proc/slabinfo data, "slab:26127" is all of
memory which isnt a surprise when you boot with 128MB.
Created attachment 110555 [details]
mem=256M OOM killer /var/log/messages extract
Booted with mem=256M and ran multiple
parted -s /dev/sda print >/dev/null 2>&1
OK, please get me that /proc/slabinfo just after an OOM kill happens.
Created attachment 110557 [details]
slabinfo/cmdline/meminfo when booted mem=128M
Information requested when booted mem=128M
Andy, are you sure this /proc/slabinfo was at the time of the OOM kill? All of
the memory is accountable on the page lists and the slabcache is pretty much
empty. I need a /proc/slabinfo output at the time the OOM kills occur to debug
MemTotal: 123600 kB
MemFree: 12392 kB
Buffers: 4692 kB
Cached: 58956 kB
SwapCached: 0 kB
Active: 53532 kB
Inactive: 38156 kB
Sorry I misunderstod I thought you wanted info on the 128M boot,
I am running another test now and we grab slabinfo when it OOM
(Its quite hard to catch this .... with all the 'deprecated' noise
on the console)
This does not appear to be still a problem on RHEL4 pre-RC3.
Originally we were unable to run our test cases to completion (on beta2)
without memory starvation - we saw OOM and even PANICs (kdb_panic()).
I tried to make a test case that showed the problem without any of our
I have since ported our code to RC3 and can now run our test cases to
completeion, without any apparent memory lose - so whatever the issue was
on beta2 it has now gone.
I also had to apply the patch we have developed for the scsi inquiry hang
issue we have reported as bugzilla 140472
Closing this out based on comment 35.