805645 – Kernell freeze under rapid allocation of memory

Bug 805645 - Kernell freeze under rapid allocation of memory

Summary: Kernell freeze under rapid allocation of memory

Keywords:
Status:	CLOSED WORKSFORME
Alias:	None
Product:	Fedora
Classification:	Fedora
Component:	kernel
Sub Component:
Version:	16
Hardware:	x86_64
OS:	Linux
Priority:	unspecified
Severity:	unspecified
Target Milestone:	---
Assignee:	Kernel Maintainer List
QA Contact:	Fedora Extras Quality Assurance
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2012-03-21 18:01 UTC by Bob Fries
Modified:	2012-03-26 16:01 UTC (History)
CC List:	5 users (show)
Fixed In Version:
Clone Of:
Environment:
Last Closed:	2012-03-26 16:01:14 UTC
Type:	---
Embargoed:
Dependent Products:

Attachments	(Terms of Use)
C program that will allocate and touch memory. (586 bytes, text/x-csrc) 2012-03-21 18:01 UTC, Bob Fries	no flags	Details
View All

Description Bob Fries 2012-03-21 18:01:26 UTC

Created attachment 571801 [details]
C program that will allocate and touch memory.

Description of problem:
Running two simultaneous instances of the attached program with the argument set to one below the number of giga-bytes of RAM in the system will cause a hard freeze.


Version-Release number of selected component (if applicable):
I have observed this behavior under various kernels using Fedora 15 and the current kernel I am using on Fedora 16 which is:

Linux version 3.2.9-1.fc16.x86_64 (mockbuild.fedoraproject.org) (gcc version 4.6.2 20111027 (Red Hat 4.6.2-1) (GCC) ) #1 SMP Thu Mar 1 01:41:10 UTC 2012

How reproducible:
On hardware where it happens it is 100% reproducible.

Steps to Reproduce:
1. Compile the attached c program 
    gcc -o MemoryCrashProg MemoryCrashProg.c
2. Run two instances of the program with the argument set to one less than the
   number of gigs of RAM in the system.  On my current thinkpad W520 system
   with 16 gigs of RAM run it as
   MemoryCrashProg 15 & MemoryCrashProg 15
3. The usage of memory by the program can be tracked by running htop in another 
   window.   The freeze will happen just as all of the physical RAM is used up

  
Actual results:
A hard system freeze where nothing responds.

Expected results:
One or both of the processes will be terminated when available resources are 
exceeded.

Additional info:
   The program uses malloc to allocate the specified memory and then 
forces it to really be  available by writing an integer to each location.
    I have seen the freeze happen on 3 different systems with 4, 8, and
48 processors and varying amounts of RAM and swap.  Two systems I have had 
access to did not show this crash.  One was a single processor 32 bit and 
the other was a dual core 64 bit machine.
    The /proc/meminfo for the current machine I see this on is
MemTotal:       16317252 kB
MemFree:        14243252 kB
Buffers:          161732 kB
Cached:           846476 kB
SwapCached:            0 kB
Active:           894748 kB
Inactive:         788028 kB
Active(anon):     686108 kB                                                     
Inactive(anon):   149968 kB                                                     
Active(file):     208640 kB                                                     
Inactive(file):   638060 kB                                                     
Unevictable:        3512 kB                                                     
Mlocked:            3512 kB                                                     
SwapTotal:      34832380 kB                                                     
SwapFree:       34832380 kB                                                     
Dirty:               276 kB                                                     
Writeback:             0 kB                                                     
AnonPages:        678132 kB                                                     
Mapped:           172284 kB                                                     
Shmem:            159240 kB                                                     
Slab:             117280 kB                                                     
SReclaimable:      59944 kB                                                     
SUnreclaim:        57336 kB                                                     
KernelStack:        3296 kB                                                     
PageTables:        53700 kB                                                     
NFS_Unstable:          0 kB                                                     
Bounce:                0 kB                                                     
WritebackTmp:          0 kB                                                     
CommitLimit:    42991004 kB                                                     
Committed_AS:    1856096 kB                                                     
VmallocTotal:   34359738367 kB                                                  
VmallocUsed:      401436 kB                                                     
VmallocChunk:   34359250620 kB                                                  
HardwareCorrupted:     0 kB                                                     
AnonHugePages:    348160 kB                                                     
HugePages_Total:       0                                                        
HugePages_Free:        0                                                        
HugePages_Rsvd:        0                                                        
HugePages_Surp:        0                                                        
Hugepagesize:       2048 kB                                                     
DirectMap4k:      114688 kB                                                     
DirectMap2M:    16553984 kB

Comment 1 Dave Jones 2012-03-22 17:08:54 UTC

[mass update]
kernel-3.3.0-4.fc16 has been pushed to the Fedora 16 stable repository.
Please retest with this update.

Comment 2 Dave Jones 2012-03-22 17:11:47 UTC

[mass update]
kernel-3.3.0-4.fc16 has been pushed to the Fedora 16 stable repository.
Please retest with this update.

Comment 3 Dave Jones 2012-03-22 17:21:29 UTC

[mass update]
kernel-3.3.0-4.fc16 has been pushed to the Fedora 16 stable repository.
Please retest with this update.

Comment 4 Dave Jones 2012-03-22 18:34:59 UTC

Seems to work as designed here. Once the system runs of memory, it kills the process hogging it all...

[ 7264.912293] Out of memory: Kill process 4387 (a.out) score 545 or sacrifice child
[ 7264.912919] Killed process 4387 (a.out) total-vm:7344156kB, anon-rss:4490552kB, file-rss:4kB

Comment 5 Bob Fries 2012-03-26 01:56:12 UTC

Dave,

Thanks for the pointers. I upgraded to the new kernel. It did change at least my perception of what was happening. The system was sluggish but it still showed signs of life. That made me wonder if I perhaps got duped into thinking the system had frozen when the problem was just that the X server was not updating the screen or responding to key strokes. So I went back to the previous kernel and ran from a text console. There I was able to still switch consoles and the one running htop showed updates. So I again tried running under X. It showed the symptoms I had seen before. I have been reluctant to just let the system run to see if it comes out of it on its own because the lockout could stop the fan control from responding to a hot component. So it might be that it would have recovered given enough time.

Bottom line is that I was wrong about the freeze. However, something has changed between the two kernels that changes the perception under X of what has happened. The newer kernel allows the X server to update the screen occasionally. It is sluggish like might be expected on a heavily loaded machine but it still shows signs of life. The previous kernel left a situation where even ctrl-alt-Fx combinations seemed to be ignored. Much earlier I saw this same thing on a machine I was connected to through a network connection. That machine stopped responding also. So it is not just the X server that was affected.

It seems that what happens is not a freeze so my title is inaccurate and the problem I was having is more of a lockout of processes. And that seems to have been mitigated with the current kernel. As far as I am concerned this bug report can be closed.

-bob

Note You need to log in before you can comment on or make changes to this bug.