150130 – e1000 has memory leak when run continuously getting new dhcp leases.

Bug 150130 - e1000 has memory leak when run continuously getting new dhcp leases.

Summary: e1000 has memory leak when run continuously getting new dhcp leases.

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Enterprise Linux 3
Classification:	Red Hat
Component:	kernel
Sub Component:
Version:	3.0
Hardware:	i686
OS:	Linux
Priority:	medium
Severity:	medium
Target Milestone:	---
Assignee:	John W. Linville
QA Contact:	Brian Brock
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:	156320
TreeView+	depends on / blocked

Reported:	2005-03-02 20:05 UTC by David Knierim
Modified:	2007-11-30 22:07 UTC (History)
CC List:	3 users (show)
Fixed In Version:	RHSA-2005-663
Doc Type:	Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed:	2005-09-28 14:50:20 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)
Script to reproduce failure. (609 bytes, text/plain) 2005-03-14 13:46 UTC, David Knierim	no flags	Details
View All

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Product Errata	RHSA-2005:663	0	qe-ready	SHIPPED_LIVE	Important: Updated kernel packages available for Red Hat Enterprise Linux 3 Update 6	2005-09-28 04:00:00 UTC

Description David Knierim 2005-03-02 20:05:11 UTC

From Bugzilla Helper:
User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.7.5)
Gecko/20041111 Firefox/1.0

Description of problem:
As part of a diagnostic suite, we have a script that shuts down all
interfaces and then brings them all back up configured to get IP
information from a DHCP server.   Once a good IP address is found, the
script then brings the interfaces down and the process is repeated.

Initially, this code runs without any error, but after running this
code for a while, errors like this are displayed when the test
attempts to bring up the interface:
"e1000: ethX: e1000_setup_tx_resources: Unble to Allocate Memory for
the Transmit descriptor ring" or a similar message for the Receive
descriptor ring.

We had no problem with Red Hat 9 with the same hardware and test
scripts.  

I upgraded the e1000 driver to the latest version from Intel (5.7.6)
and the problem did not occur.

The problem seems to happen  more frequently if the box in question
has more interfaces under test.

Version-Release number of selected component (if applicable):
kernel 2.4.21-27.0.2.ELsmp, e1000 version 5.3.19-k2-NAPI

How reproducible:
Always

Steps to Reproduce:
1.configure a network with dhcp server
2.attach test client to network
3.configure test client to get IP addresses from DHCP
4.repeatedly have client bring up and down it's interface
5.eventually, it will fail
    

Actual Results:  Eventually, you will get an error: "e1000: ethX:
e1000_setup_tx_resources: Unble to Allocate Memory for the Transmit
descriptor ring" and you will no longer be able to bring up any
interfaces that are not already up.   Even something as simple as
"ifconfig eth0 up" will fail.

Expected Results:  Should be able to bring up and down interfaces for
an infinite number of times.

Additional info:

Comment 1 John W. Linville 2005-03-03 19:18:38 UTC

David,

Can you try kernel 2.4.21-28.ELsmp?  It includes an update of the
e1000 driver to 5.6.10.1-k2-NAPI.

If you can't get hold of that version, the update is available in the
test kernels here:

   http://people.redhat.com/linville/kernels/RHEL3/

Please attempt to recreate the issue and report the results.  Thanks!

Comment 2 David Knierim 2005-03-08 13:36:58 UTC

I ran the test profile against kernel
kernel-smp-2.4.21-28.EL.jwltest.3.i686.rpm.  It failed in the same
manner as 2.4.21-27.0.2.ELsmp.

Comment 3 John W. Linville 2005-03-08 21:53:05 UTC

David, can I persuade you to attach the test script you are using?

Does the problem still occur if static IP addresses are used?

Comment 4 David Knierim 2005-03-14 13:46:01 UTC

Created attachment 111977 [details]
Script to reproduce failure.

This script is hard coded to test the available interfaces on my box.	I do
not test eth6, since it's how I access the box :^).

Comment 5 John W. Linville 2005-03-14 16:10:08 UTC

David,

Thanks for the script...at least now I know you aren't doing anything
crazy... :-)

Any word on whether or not it happens when using only static IP addresses?

Comment 6 David Knierim 2005-03-15 18:38:55 UTC

I have retested using static IP addresses.   The same problem occurs.
 It took 3 hours, 51 minutes to occur.

Comment 7 John W. Linville 2005-03-18 18:33:08 UTC

David,

I have posted some test kernels here:

   http://people.redhat.com/linville/kernels/rhel3/

These include patches to update the e1000 driver to what is currently upstream,
as well as a few other e1000 fixes.  I'd like to start with this as a baseline.

Would you mind testing these kernels to see if you can recreate the issue? 
Please post the results.  Thanks!

Comment 8 David Knierim 2005-03-21 13:35:46 UTC

I tested kernel 2.4.21-31.EL.jwltest.9.1smp it ran for 8 hours, 55 minutes
before failing.  It sucessfully ran 416 up/down loops before failing.  FYI, my
box is now configured with 15 e1000 interfaces that are being brought up and
down.   I don't know if this matters, but the box is based on the Intel 7520
chipset with two 3.20GHz hyperthreaded Xeon processors. The box has 2GB of DRAM.

Comment 9 John W. Linville 2005-05-03 20:57:40 UTC

Well, after all this time I wish I had something more concrete to offer, 
however... 
 
I do have test kernels w/ yet another updated e1000 driver available at the 
same URL referenced in comment 7.  I'd appreciate it if you could give those a 
try in the hopes that Intel already fixed this issue for us... :-)  Please 
post the results here.  Thanks!

Comment 10 David Knierim 2005-05-17 16:56:40 UTC

I retried this test with the latest kernel from your test area
(2.4.21-32.3.EL.jwltest.24smp) with the code commented out as requested in
comment #26 of bug 151054.  The test has been running for just under 1day, 4
hours without error.

Comment 11 John W. Linville 2005-06-07 19:23:08 UTC

Marking this as duplicate of bug 151054 as they appear to have the same root 
cause...solution remains elusive... 

*** This bug has been marked as a duplicate of 151054 ***

Comment 12 Ernie Petrides 2005-07-12 01:08:41 UTC

A fix for this problem has just been committed to the RHEL3 U6
patch pool this evening (in kernel version 2.4.21-32.10.EL).

Comment 14 Ernie Petrides 2005-07-22 00:05:13 UTC


*** This bug has been marked as a duplicate of 151054 ***

Comment 15 Red Hat Bugzilla 2005-09-28 14:50:20 UTC

An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on the solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHSA-2005-663.html

Note You need to log in before you can comment on or make changes to this bug.