Bug 102650

Summary:	Memory leaks cause system to freeze
Product:	[Retired] Red Hat Linux	Reporter:	Hrunting Johnson <hrunting>
Component:	kernel	Assignee:	Arjan van de Ven <arjanv>
Status:	CLOSED WONTFIX	QA Contact:	Brian Brock <bbrock>
Severity:	medium	Docs Contact:
Priority:	medium
Version:	9	CC:	riel
Target Milestone:	---
Target Release:	---
Hardware:	i686
OS:	Linux
Whiteboard:
Fixed In Version:		Doc Type:	Bug Fix
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2004-09-30 15:41:27 UTC	Type:	---
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:

Description Hrunting Johnson 2003-08-19 13:53:10 UTC

From Bugzilla Helper:
User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.5a)
Gecko/20030729 Mozilla Firebird/0.6.1

Description of problem:
We are running a system without swap (necessary because Linux aggressively swaps
and disk utilization is at a premium).  If we have a program that leaks memory,
eventually, the system freezes in what looks like a kswapd runaway.  Here is top
output from moments before the freeze:

 08:29:02  up 3 min,  3 users,  load average: 0.21, 0.11, 0.04
52 processes: 50 sleeping, 2 running, 0 zombie, 0 stopped
CPU0 states:   0.0% user   0.0% system    0.0% nice   0.0% iowait 100.0% idle
CPU1 states:   8.0% user  91.0% system    0.0% nice   0.0% iowait   0.0% idle
CPU2 states:   0.0% user   1.0% system    0.0% nice   0.0% iowait  98.0% idle
CPU3 states:   0.0% user  34.0% system    0.0% nice   0.0% iowait  65.0% idle
Mem:  2063936k av, 2053552k used,   10384k free,       0k shrd,     208k buff
      1947248k active,               1660k inactive
Swap:       0k av,       0k used,       0k free                   34196k cached

  PID USER     PRI  NI  SIZE  RSS SHARE STAT %CPU %MEM   TIME CPU COMMAND
 2115 root      25   0 1864M 1.8G    32 R    98.6 92.5   0:06   1 usemm
   11 root      15   0     0    0     0 DW   34.6  0.0   0:00   3 kswapd
 1876 root      15   0  1412 1412  1176 S     1.7  0.0   0:00   1 sshd
 2091 root      15   0   728  728   540 R     1.7  0.0   0:00   2 top

And here is the source code of the usemm programm designed to make the system
freeze:

#include <stdio.h>
#include <malloc.h>
#include <errno.h>
  
int main(int argc, char *argv[]) {
  size_t chunksize = atoi(argv[1]);
  unsigned char *cptr;
  int i;
  
  while (1) {
    cptr = (char *)malloc(chunksize * sizeof(char));
    if (cptr == NULL) {
      fprintf(
        stderr,
        "Unable to allocate %d bytes: %s\n",
        sizeof(char) * chunksize, strerror(errno)
      );
      exit(-1);
    }
  
    /* now make it all dirty. */
    for (i = 0; i < chunksize; i++) {
      cptr[i] = (unsigned char)(i&0xff);
    }
  
  /* do it again; let our memory leak! */
  }
}

This is reproducible whether the system is running under load or has just booted
up.  Now I understand that running out of memory on a system is a Bad Thing(tm),
but the kernel should be able to kill these processes, no?  If anything, when
the program goes to malloc memory and there is no more free, it should fail and
exit, but that never happens.  It looks like the kernel goes into a state where
its either trying to kill the process and can't or doing so much work trying to
free memory that it doesn't have time for anything else.

What's worse is that while this program chews through the memory and freezes the
system in seconds, a slow memory leak (literally over the span of weeks) causes
the same behavior.  It seems that no matter how slow you go, the process never
fails to allocate memory; the system always freezes first.

Version-Release number of selected component (if applicable):
kernel-2.4.20-19.9

How reproducible:
Always

Steps to Reproduce:
1. run usemm program
2.
3.
    

Actual Results:  System freezes.  Kswapd is usually using a lot of CPU before
system becomes unresponsive.

Expected Results:  Process should die when it allocates more memory than it's
able to.  Either that or the system should panic or something.  The box
shouldn't freeze up like this.

Additional info:

While we tested on systems without swap, we were able to reproduce the problem
in the early stages on systems with swap.  Thinking that excessive swapping may
have been the problem, we disabled swap and got the same results.

sysctl vm settings:
vm.max_map_count = 65536
vm.max-readahead = 31
vm.min-readahead = 3
vm.page-cluster = 3
vm.pagetable_cache = 25 50
vm.kswapd = 512 32 8
vm.overcommit_memory = 0
vm.bdflush = 30 500 0 0 50 300 60 20 0

Comment 1 Bugzilla owner 2004-09-30 15:41:27 UTC

Thanks for the bug report. However, Red Hat no longer maintains this version of
the product. Please upgrade to the latest version and open a new bug if the problem
persists.

The Fedora Legacy project (http://fedoralegacy.org/) maintains some older releases, 
and if you believe this bug is interesting to them, please report the problem in
the bug tracker at: http://bugzilla.fedora.us/