47017 – hang/deadlock under heavy memory use

Bug 47017 - hang/deadlock under heavy memory use

Summary: hang/deadlock under heavy memory use

Keywords:
Status:	CLOSED CURRENTRELEASE
Alias:	None
Product:	Red Hat Linux
Classification:	Retired
Component:	kernel
Sub Component:
Version:	7.1
Hardware:	i386
OS:	Linux
Priority:	medium
Severity:	high
Target Milestone:	---
Assignee:	Arjan van de Ven
QA Contact:	Brock Organ
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2001-07-02 18:21 UTC by Norm Murray
Modified:	2008-08-01 16:22 UTC (History)
CC List:	1 user (show)
Fixed In Version:
Clone Of:
Environment:
Last Closed:	2004-09-30 15:39:03 UTC
Embargoed:

Attachments	(Terms of Use)

Description Norm Murray 2001-07-02 18:21:08 UTC

From Bugzilla Helper:
User-Agent: Mozilla/4.77 [en] (X11; U; Linux 2.2.19-6.2.1smp i686)

Description of problem:
Running 2.4.3-12enterprise I'm seeing fairly reproducible hangs on heavy
memory usage.  The exact point of failure is not reproducible though the
mode of failure is consistent. 

How reproducible:
Always

Steps to Reproduce:
Attempting to simulate load on the memory system using the c program below:

#include <stdio.h>
#include <stdlib.h>
#include <sys/time.h>
#include <unistd.h>
#include <string.h>

/* want to stress memory allocation and deallocation, preferably while
 * many other tests are going in the background
 */


#define MB (1024 * 1024)

/* Sets maximum number of 1MB blocks to try to allocate */
/* Currently fixed to 256GB total */
#define MAX 2100


int main (int argc, char **argv) {
  struct timeval pre,post;
  int i;
  long TotalTime;
  void *MemBlocks[MAX];
  long TimeDiff;
  for (;;) {
    /* allocate */
    i=0;
    TotalTime = 0;
    gettimeofday(&pre, NULL);
    while ( (i < MAX) && ((MemBlocks[i] = malloc(MB)) != NULL)) {
      memset(MemBlocks[i],0,MB); /* write to memory to force the grab,
otherwise it's a lazy grab in the kernel and won't be really allocated */
      gettimeofday(&post, NULL);
      TimeDiff = ((post.tv_sec - pre.tv_sec) * 1000000);
      if (TimeDiff) {
	TimeDiff -= (pre.tv_usec - post.tv_usec);
      } else {
	TimeDiff = post.tv_usec - pre.tv_usec;
      }
      printf("Allocation of block %i succeeded and took %ld usec\n",i,
TimeDiff);
      TotalTime += TimeDiff;
      i++;
      gettimeofday(&pre, NULL);
    }

    printf("Allocation of block %d failed. Deallocating all memory.\n",i);
    printf("Total of %ld usec spent in allocation\n",TotalTime);
    printf("Average of %ld usec/MB\n",TotalTime/i);
    fflush(stdout);
    sleep (5);

    /* deallocate */
    i--;
    for (; i>=0; i--) {
      free(MemBlocks[i]);
    }
    printf("Done freeing, pausing before starting again.\n");
    fflush(stdout);
    sleep (5);


  }

  return 0;
}

I'm running two instances of the resulting program with the designed effect
of taking me deep into swap. I have 128MB of ram and 6 GB of swap enabled
as a 2gb swap partition and 2 2gb swap files. 

 



	

Actual Results:  Somewhere in the allocation loop, so far seen with both
instances of the above code having allocated more than 1gb, the system
hangs. I have seen both instances cycle once through allocation and
deallocation and then fail in the second allocation loop.

Additional info:

The one bit of consistency I've seen so far is an alt-sysrq induced 
mem-info has consistently shown 
Free Pages:  1396kB (0 kB High Mem)
The first memory line =512 kB   (seen once as 2=256kb and the rest as
1=512kB)
The second consistenly totals 884kB

Comment 1 Arjan van de Ven 2001-07-02 18:24:03 UTC

How much ram and swap do you have ?

Comment 2 Norm Murray 2001-07-02 18:33:35 UTC

As stated in the above report 
128 MB Ram
6GB Swap broken into
	2GB Partition
	2GB file
	2GB file

Comment 3 Norm Murray 2001-07-03 16:20:40 UTC

I got this to reproduce on another system with 2.4.3-12enterprise

4x PIII 550
256 MB RAM
5GB Swap
[root@dhcpd134 /root]# cat /proc/meminfo 
        total:    used:    free:  shared: buffers:  cached:
Mem:  261275648 237207552 24068096    49152 192446464 15745024
Swap: 953319424        0 953319424
MemTotal:       255152 kB
MemFree:         23504 kB
MemShared:          48 kB
Buffers:        187936 kB
Cached:          15376 kB
Active:         203248 kB
Inact_dirty:       112 kB
Inact_clean:         0 kB
Inact_target:     6112 kB
HighTotal:           0 kB
HighFree:            0 kB
LowTotal:       255152 kB
LowFree:         23504 kB
SwapTotal:     5125280 kB
SwapFree:      5125280 kB

[root@dhcpd134 /root]# swapon -s
Filename			Type		Size	Used	Priority
/dev/sda2                       partition	1028152	0	-1
/dev/sdb1                       partition	2097136	0	-2
/swap/swap                      file		1999992	0	-3

The system locked with one memgobble having allocated 1849 blocks and the other
1798


I don't think this is a peak condition on either system as I have seen both
memgobble processes finish the allocation loop and deallocate and then lock the
system in their second allocation run.

Comment 4 Norm Murray 2001-07-09 20:17:04 UTC

As a point of reference, I ran the same load on the original system running
2.2.14-5
It's gone through 3 alloc and dealloc cycles so far without failing. Observation
is that
it is MUCH slower than 2.4, sometimes taking as much as 45 seconds to get a
block of
memory.

Comment 5 Yue Shi Lai 2001-07-23 07:55:44 UTC

I have a similar problem with 2.4.3 (plain) kernel, running XMMS and Opera at
the same time causes reproducable hang due to heavy swapping (memory hole in the
kernel?).

I used the 2.4.3 kernel with SGI XFS patches, but it should be very unlikely
that this behaviour is related to the XFS patches.

Comment 6 Arjan van de Ven 2001-07-23 08:02:13 UTC

Well, the XFS patches cause a totally different VM loadpattern, so this could
very well be related to those patches.

Comment 7 Rob McMillin 2001-12-28 22:46:24 UTC

I also see *similar* results with this running 2.4.16 on a four-processor Dell 
8450 with 4 GB RAM and 2GB swap.  It takes seconds to get memory; in a high-
volume transaction system, this could effectively hang the server, which indeed 
is a result we see in at least one case.

Comment 8 Bugzilla owner 2004-09-30 15:39:03 UTC

Thanks for the bug report. However, Red Hat no longer maintains this version of
the product. Please upgrade to the latest version and open a new bug if the problem
persists.

The Fedora Legacy project (http://fedoralegacy.org/) maintains some older releases, 
and if you believe this bug is interesting to them, please report the problem in
the bug tracker at: http://bugzilla.fedora.us/

Note You need to log in before you can comment on or make changes to this bug.