50711 – mm problems under heavy load

Bug 50711 - mm problems under heavy load

Summary: mm problems under heavy load

Keywords:
Status:	CLOSED CURRENTRELEASE
Alias:	None
Product:	Red Hat Linux
Classification:	Retired
Component:	kernel
Sub Component:
Version:	7.1
Hardware:	i686
OS:	Linux
Priority:	medium
Severity:	high
Target Milestone:	---
Assignee:	Arjan van de Ven
QA Contact:	Brock Organ
Docs Contact:
URL:	not specific
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2001-08-02 14:34 UTC by Renato
Modified:	2008-08-01 16:22 UTC (History)
CC List:	2 users (show)
Fixed In Version:
Clone Of:
Environment:
Last Closed:	2004-09-30 15:39:06 UTC
Embargoed:

Attachments	(Terms of Use)

Description Renato 2001-08-02 14:34:14 UTC

From Bugzilla Helper:
User-Agent: Mozilla/4.0 (compatible; MSIE 5.5; Windows NT 5.0)

Description of problem:
I'm using kernel-2.4.3-12smp.

The server is based on qmail, which means a lot of small processes - qmail-
smtpd, qmail-pop3d, etc - are forked and finished in miliseconds. 

After 1-2 days, one of the CPUs is 100% in use by the system ( it can be 
seen using 'top') and the network starts to fail ( pinging my machine I 
see some packets being lost ). 

I see the following messages when the system starts to slows down:
mm: critical shortage of bounce buffers
net: 23 messages suppressed
...

After this I have to reboot the machine. 

I have other machines with the same configuration and even more load 
running kernel-2.2.19 ( red hat 6.2 based ) with no problems and uptimes > 
100 days.

How reproducible:
Always

Steps to Reproduce:
1. qmail loaded server. 
2. start thousands of small processes during hours
	

Actual Results:  I see the following messages when the system starts to 
slows down:
mm: critical shortage of bounce buffers
net: 23 messages suppressed
...

After this I have to reboot the machine. 



Expected Results:  The machine shouldn't have to be rebooted.

Additional info:

Modules loaded:

Module                  Size  Used by
eepro100               16144   1  (autoclean)
ipchains               32000   0  (unused)
usb-uhci               21392   0  (unused)
usbcore                50560   1  [usb-uhci]
raid1                  13408   2
aic7xxx               113840   6
sd_mod                 11040   6
scsi_mod               88864   2  [aic7xxx sd_mod]

Comment 1 Renato 2001-08-02 14:38:18 UTC

I also tried with the latest raw hide kernel - 2.4.6-3.1smp, but with this one 
I have a kernel panic after a couple of hours.

Comment 2 Jackie Meese 2001-08-05 00:15:16 UTC

I have the same error using the 2.4.3-12enterprise kernel on a Dell PowerEdge 6400 with the megaraid driver for the RAID controller.  It has 2GB of RAM.  This occurrs under any heavy load and effectively kills the server.  This is a newly "upgraded" server (formatted RH 6.2 and put on a new install of 7.1).  This occurs when I issue an rsync command to get the data from it's partner in a failover cluster rsync -ave ssh --exclude="/.../" 192.168.1.1::home/ /home/

It then begins transferring 93GB on the rsync, but it is so drained as to be unusable during this time.  The "tigger" message as it has been nicknamed than begins to come up on the console.  (Glad I don't do this often...)  Simply killing the rsync  process removes the errors and the performance is back to normal.  This did not occur in the 2.2.19 kernel used previous to the 7.1 install.

Research gives this discussion: http://lwn.net/2001/0607/kernel.php3 but the success of this patch is not later discussed.

Comment 3 Bugzilla owner 2004-09-30 15:39:06 UTC

Thanks for the bug report. However, Red Hat no longer maintains this version of
the product. Please upgrade to the latest version and open a new bug if the problem
persists.

The Fedora Legacy project (http://fedoralegacy.org/) maintains some older releases, 
and if you believe this bug is interesting to them, please report the problem in
the bug tracker at: http://bugzilla.fedora.us/

Note You need to log in before you can comment on or make changes to this bug.