From Bugzilla Helper: User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.7.5) Gecko/20041111 Firefox/1.0 Description of problem: As part of a diagnostic suite, we have a script that shuts down all interfaces and then brings them all back up configured to get IP information from a DHCP server. Once a good IP address is found, the script then brings the interfaces down and the process is repeated. Initially, this code runs without any error, but after running this code for a while, errors like this are displayed when the test attempts to bring up the interface: "e1000: ethX: e1000_setup_tx_resources: Unble to Allocate Memory for the Transmit descriptor ring" or a similar message for the Receive descriptor ring. We had no problem with Red Hat 9 with the same hardware and test scripts. I upgraded the e1000 driver to the latest version from Intel (5.7.6) and the problem did not occur. The problem seems to happen more frequently if the box in question has more interfaces under test. Version-Release number of selected component (if applicable): kernel 2.4.21-27.0.2.ELsmp, e1000 version 5.3.19-k2-NAPI How reproducible: Always Steps to Reproduce: 1.configure a network with dhcp server 2.attach test client to network 3.configure test client to get IP addresses from DHCP 4.repeatedly have client bring up and down it's interface 5.eventually, it will fail Actual Results: Eventually, you will get an error: "e1000: ethX: e1000_setup_tx_resources: Unble to Allocate Memory for the Transmit descriptor ring" and you will no longer be able to bring up any interfaces that are not already up. Even something as simple as "ifconfig eth0 up" will fail. Expected Results: Should be able to bring up and down interfaces for an infinite number of times. Additional info:
David, Can you try kernel 2.4.21-28.ELsmp? It includes an update of the e1000 driver to 5.6.10.1-k2-NAPI. If you can't get hold of that version, the update is available in the test kernels here: http://people.redhat.com/linville/kernels/RHEL3/ Please attempt to recreate the issue and report the results. Thanks!
I ran the test profile against kernel kernel-smp-2.4.21-28.EL.jwltest.3.i686.rpm. It failed in the same manner as 2.4.21-27.0.2.ELsmp.
David, can I persuade you to attach the test script you are using? Does the problem still occur if static IP addresses are used?
Created attachment 111977 [details] Script to reproduce failure. This script is hard coded to test the available interfaces on my box. I do not test eth6, since it's how I access the box :^).
David, Thanks for the script...at least now I know you aren't doing anything crazy... :-) Any word on whether or not it happens when using only static IP addresses?
I have retested using static IP addresses. The same problem occurs. It took 3 hours, 51 minutes to occur.
David, I have posted some test kernels here: http://people.redhat.com/linville/kernels/rhel3/ These include patches to update the e1000 driver to what is currently upstream, as well as a few other e1000 fixes. I'd like to start with this as a baseline. Would you mind testing these kernels to see if you can recreate the issue? Please post the results. Thanks!
I tested kernel 2.4.21-31.EL.jwltest.9.1smp it ran for 8 hours, 55 minutes before failing. It sucessfully ran 416 up/down loops before failing. FYI, my box is now configured with 15 e1000 interfaces that are being brought up and down. I don't know if this matters, but the box is based on the Intel 7520 chipset with two 3.20GHz hyperthreaded Xeon processors. The box has 2GB of DRAM.
Well, after all this time I wish I had something more concrete to offer, however... I do have test kernels w/ yet another updated e1000 driver available at the same URL referenced in comment 7. I'd appreciate it if you could give those a try in the hopes that Intel already fixed this issue for us... :-) Please post the results here. Thanks!
I retried this test with the latest kernel from your test area (2.4.21-32.3.EL.jwltest.24smp) with the code commented out as requested in comment #26 of bug 151054. The test has been running for just under 1day, 4 hours without error.
Marking this as duplicate of bug 151054 as they appear to have the same root cause...solution remains elusive... *** This bug has been marked as a duplicate of 151054 ***
A fix for this problem has just been committed to the RHEL3 U6 patch pool this evening (in kernel version 2.4.21-32.10.EL).
*** This bug has been marked as a duplicate of 151054 ***
An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on the solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHSA-2005-663.html