Red Hat Bugzilla – Bug 805645
Kernell freeze under rapid allocation of memory
Last modified: 2012-03-26 12:01:14 EDT
Created attachment 571801 [details]
C program that will allocate and touch memory.
Description of problem:
Running two simultaneous instances of the attached program with the argument set to one below the number of giga-bytes of RAM in the system will cause a hard freeze.
Version-Release number of selected component (if applicable):
I have observed this behavior under various kernels using Fedora 15 and the current kernel I am using on Fedora 16 which is:
Linux version 3.2.9-1.fc16.x86_64 (email@example.com) (gcc version 4.6.2 20111027 (Red Hat 4.6.2-1) (GCC) ) #1 SMP Thu Mar 1 01:41:10 UTC 2012
On hardware where it happens it is 100% reproducible.
Steps to Reproduce:
1. Compile the attached c program
gcc -o MemoryCrashProg MemoryCrashProg.c
2. Run two instances of the program with the argument set to one less than the
number of gigs of RAM in the system. On my current thinkpad W520 system
with 16 gigs of RAM run it as
MemoryCrashProg 15 & MemoryCrashProg 15
3. The usage of memory by the program can be tracked by running htop in another
window. The freeze will happen just as all of the physical RAM is used up
A hard system freeze where nothing responds.
One or both of the processes will be terminated when available resources are
The program uses malloc to allocate the specified memory and then
forces it to really be available by writing an integer to each location.
I have seen the freeze happen on 3 different systems with 4, 8, and
48 processors and varying amounts of RAM and swap. Two systems I have had
access to did not show this crash. One was a single processor 32 bit and
the other was a dual core 64 bit machine.
The /proc/meminfo for the current machine I see this on is
MemTotal: 16317252 kB
MemFree: 14243252 kB
Buffers: 161732 kB
Cached: 846476 kB
SwapCached: 0 kB
Active: 894748 kB
Inactive: 788028 kB
Active(anon): 686108 kB
Inactive(anon): 149968 kB
Active(file): 208640 kB
Inactive(file): 638060 kB
Unevictable: 3512 kB
Mlocked: 3512 kB
SwapTotal: 34832380 kB
SwapFree: 34832380 kB
Dirty: 276 kB
Writeback: 0 kB
AnonPages: 678132 kB
Mapped: 172284 kB
Shmem: 159240 kB
Slab: 117280 kB
SReclaimable: 59944 kB
SUnreclaim: 57336 kB
KernelStack: 3296 kB
PageTables: 53700 kB
NFS_Unstable: 0 kB
Bounce: 0 kB
WritebackTmp: 0 kB
CommitLimit: 42991004 kB
Committed_AS: 1856096 kB
VmallocTotal: 34359738367 kB
VmallocUsed: 401436 kB
VmallocChunk: 34359250620 kB
HardwareCorrupted: 0 kB
AnonHugePages: 348160 kB
Hugepagesize: 2048 kB
DirectMap4k: 114688 kB
DirectMap2M: 16553984 kB
kernel-3.3.0-4.fc16 has been pushed to the Fedora 16 stable repository.
Please retest with this update.
Seems to work as designed here. Once the system runs of memory, it kills the process hogging it all...
[ 7264.912293] Out of memory: Kill process 4387 (a.out) score 545 or sacrifice child
[ 7264.912919] Killed process 4387 (a.out) total-vm:7344156kB, anon-rss:4490552kB, file-rss:4kB
Thanks for the pointers. I upgraded to the new kernel. It did change at least my perception of what was happening. The system was sluggish but it still showed signs of life. That made me wonder if I perhaps got duped into thinking the system had frozen when the problem was just that the X server was not updating the screen or responding to key strokes. So I went back to the previous kernel and ran from a text console. There I was able to still switch consoles and the one running htop showed updates. So I again tried running under X. It showed the symptoms I had seen before. I have been reluctant to just let the system run to see if it comes out of it on its own because the lockout could stop the fan control from responding to a hot component. So it might be that it would have recovered given enough time.
Bottom line is that I was wrong about the freeze. However, something has changed between the two kernels that changes the perception under X of what has happened. The newer kernel allows the X server to update the screen occasionally. It is sluggish like might be expected on a heavily loaded machine but it still shows signs of life. The previous kernel left a situation where even ctrl-alt-Fx combinations seemed to be ignored. Much earlier I saw this same thing on a machine I was connected to through a network connection. That machine stopped responding also. So it is not just the X server that was affected.
It seems that what happens is not a freeze so my title is inaccurate and the problem I was having is more of a lockout of processes. And that seems to have been mitigated with the current kernel. As far as I am concerned this bug report can be closed.