91090 – mm_struct leak in custom kernel

Bug 91090 - mm_struct leak in custom kernel

Summary: mm_struct leak in custom kernel

Keywords:
Status:	CLOSED WONTFIX
Alias:	None
Product:	Red Hat Linux
Classification:	Retired
Component:	kernel
Sub Component:
Version:	9
Hardware:	athlon
OS:	Linux
Priority:	medium
Severity:	high
Target Milestone:	---
Assignee:	Dave Jones
QA Contact:
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:	CambridgeTarget
TreeView+	depends on / blocked

Reported:	2003-05-17 18:45 UTC by Bernie Innocenti
Modified:	2015-01-04 22:02 UTC (History)
CC List:	7 users (show)
Fixed In Version:
Clone Of:
Environment:
Last Closed:	2004-09-30 15:40:57 UTC
Embargoed:

Attachments	(Terms of Use)
this is the .config file for which the bug shows (27.63 KB, text/plain) 2003-05-17 18:48 UTC, Bernie Innocenti	no flags	Details
this is the .config file for the computer which doesn't show the bug (28.53 KB, text/plain) 2003-05-17 18:49 UTC, Bernie Innocenti	no flags	Details
An hourly concatination of /proc/slabinfo showing the increasing mm_struct (20.81 KB, text/plain) 2004-01-08 21:13 UTC, William Gorder	no flags	Details
View All

Description Bernie Innocenti 2003-05-17 18:45:11 UTC

From Bugzilla Helper:
User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.4b) Gecko/20030509

Description of problem:
After one day of heavy activity such as compiling KDE,
my system becomes terribly swappy and unusable.

Inspecting /proc/slabinfo reveals several hundered
thousands of allocated mm_struct objects, which would
exhaust almost all my memory.

I've analyzed the problem on my custom version of
2.4.20-9, but I've seen the same behavior for some
months on several other kernel versions from RedHat,
both older and newer.

I'm not sure wether the leak is being caused by
creating processes, filesystem activity or even
network activity (I'm using distcc and I keep the
source tree on NFS). I'm also using ReiserFS.

This doesn't happen on my server which is also running
2.4.20-9 with a slightly different configuration.

I'm attaching my .config file for reference.


Version-Release number of selected component (if applicable):
kernel-source-2.4.20-9 (and several other versions)

How reproducible:
Always

Steps to Reproduce:
1. run big compiling job
2. cat /proc/slabinfo and look at mm_struct
3. you should see _MANY_ instances of mm_struct (>1000)

Comment 1 Bernie Innocenti 2003-05-17 18:48:48 UTC

Created attachment 91758 [details]
this is the .config file for which the bug shows

Comment 2 Bernie Innocenti 2003-05-17 18:49:46 UTC

Created attachment 91759 [details]
this is the .config file for the computer which doesn't show the bug

Comment 3 Bernie Innocenti 2003-08-01 22:58:36 UTC

This appears to be fixed in 2.4.21-20.1.2024.2.1.nptl, 
you could probably close this bug now.

Comment 4 Henrik Størner 2003-10-20 08:22:21 UTC

There appears to be some similar leak in the currently shipping kernel for Red
Hat 9 (kernel-2.4.20-20.9). Tracking the mm_struct "allocated pages" value from
/proc/slabinfo shows a steady growth over time. See for instance
http://tyge.sslug.dk/bb-cgi/larrd-grapher.cgi?host=tyge.sslug.dk&service=slabinfo&graph=daily
which is a graph of the slabinfo values sampled every 5 minutes.

Some software appears to trigger this leak. It has been the subject of much
discussion on the "Big Brother network Monitor" mailing list (http://bb4.com/)
since the code implementing the Big Brother paging scheme appears to trigger
this leak quite often. So systems running this monitoring system gradually goes
into a thrashing mode, where everything gets swapped out and the system requires
a reboot.

Comment 5 yuval yeret 2003-11-26 13:51:20 UTC

I'm seeing a constant leak in size-4096 on a machine running 2.4.20-
18 SMP BIGMEM, which might / might not be related to the machine 
finally going out of memory and going into a hang. 

I'm trying 2.4.20-20.9 now to see if it helps. 

from first look it appears like the problem might be gone. will 
update on long-term results.

Comment 6 Klaus Wurmstein 2003-12-03 13:40:41 UTC

The Problem is still in Kernel 2.4.20-24.9smp

I'm using the "BigBrother network monitor" as described above
by Henrik Storner.

I am forced to make "planned reboots" on my RedHat-Box every
3-4 days (like it's usual for the OS from Redmond)

Comment 7 William Gorder 2004-01-08 21:13:13 UTC

I have also seen this problem on a RH9 (2.4.20-28.9) system running
BigBrother.  I am have to reboot the server roughly every two days.

mm_struct values increase until the system becomes unresponsive and
must be rebooted.  In my case the system is 200Mhz Pentium.  

I have attached a file containing /proc/slabinfo for a few hours,
showing the increasing  mm_struct.  The system has been up for roughly
26 hours at this point and free memory is already down 50MB.  This
system is a server running BigBrother, Apache and not much else.

Comment 8 William Gorder 2004-01-08 21:13:59 UTC

Created attachment 96838 [details]
An hourly concatination of /proc/slabinfo showing the increasing mm_struct

Comment 9 Joe Orton 2004-01-21 15:30:54 UTC

I'm also reproducing this regularly on a RHL9 build machine:

mm_struct          56138  56145    256 3743 3743    1

the machine is used as a nightly build system, copying a bunch of
large tar files off NFS and building them.

(this is also an Athlon box)

Comment 10 Joe Orton 2004-01-21 15:31:52 UTC

Dave, if you want access to the box, let me know.

Comment 11 Tom 2004-05-05 00:22:06 UTC

Comment #5 mentions size-4096. There is a leak in ext3. See 
http://marc.theaimsgroup.com/?l=linux-kernel&m=106637047820058&w=2 
for a patch.

Comment 12 Bernie Innocenti 2004-05-05 14:59:53 UTC

This can't be the cause of the original bug report.
At the time, I was exclusively using riserfs on the
machine where the bug had shown up.

Also, it seems unlikely that a filesystem
allocates mm_struct objects.

BTW, when I switched to 2.6.x, I've never
seen this problem again.

Comment 13 Bugzilla owner 2004-09-30 15:40:57 UTC

Thanks for the bug report. However, Red Hat no longer maintains this version of
the product. Please upgrade to the latest version and open a new bug if the problem
persists.

The Fedora Legacy project (http://fedoralegacy.org/) maintains some older releases, 
and if you believe this bug is interesting to them, please report the problem in
the bug tracker at: http://bugzilla.fedora.us/

Note You need to log in before you can comment on or make changes to this bug.