Bug 227786 - Network install over e1000 is very slow
Network install over e1000 is very slow
Status: CLOSED NOTABUG
Product: Red Hat Enterprise Linux 5
Classification: Red Hat
Component: kernel (Show other bugs)
5.0
i386 Linux
medium Severity medium
: ---
: ---
Assigned To: Linda Wang
Brian Brock
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2007-02-07 23:04 EST by John Walicki
Modified: 2007-11-30 17:07 EST (History)
1 user (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2007-06-20 12:07:28 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:


Attachments (Terms of Use)

  None (edit)
Description John Walicki 2007-02-07 23:04:49 EST
From Bugzilla Helper:
User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.8.1.1) Gecko/20070206 (CK-IBM) Firefox/2.0.0.1

Description of problem:
E1000 RHEL 5 Network Install Issue.  Network build on a machine with an e1000 network card may take 12 hours to complete.  Other machines complete in aprox 1 hour.



Problem: Network install times for machines with the Intel e1000 network driver extremely slow.  Issue seemed to be specifically the time it takes to reconnect to the server to download each rpm required for the install.  As the build progresses the time it takes to make a connection increases.  We are certain the network card is the problem, we are able to install an external (PCMCIA) network card in the same machine and the install completes without issue.



We performed a packet capture on this machine  We compared the results of a normal running machine to a e1000 nic machine.  The following is the major difference.  



FTP - Response: 22

    Time delta from previous packet: 0.003043000 seconds

UDP Source port: 2967 Destination port: 2967

    Time delta from previous packet: 0.226744000 seconds

[TCP Retransmission] Responce: 22

  Time delta from previous packet: 0.696182000 seconds



The above happens with almost every cycle (FTP Connect, Download RPM, Disconnect) on the slow build.  On a normal build it almost never happens.  " Time delta from previous packet"

from the above certainly shows exactly why the build is much slower.  Here are some other time deltas pulled at random out of the capture file of  "normal" traffic.



Time delta from previous packet: 0.001271000 seconds

Time delta from previous packet: 0.000113000 seconds

Time delta from previous packet: 0.000081000 seconds

Time delta from previous packet: 0.000513000 seconds





Workarounds Tried (None Fixed the problem)

1. Different amounts of physical memory... 512mb 1GB 2GB

2. Different ftp clients and versions vsftpd and proftpd both slower than other machines (vsftpd seems faster than proftpd but still much slower than other machines)

3. Different protocols http vs ftp 

4. Pulled the install server out behind firewall

5. Different cables and dedicates switch (only server and client on switch)

6. Updated machine bios

7. RHEL 4 vs RHEL 5 kernal

8. Verified connection speed was 100 Full Duplex



IBM Machines the Problem have be reproduced on all with the Intel e1000 network card

1. T60 Thinkpad

2. T60p Thinkpad

3. x60s Thinkpad



Version-Release number of selected component (if applicable):
kernel-2.6.18-8.el5

How reproducible:
Always


Steps to Reproduce:
1.Boot RHEL 5 install boot CD
2.set up a network install - either FTP or HTTP
3.Install will take many hours on T60/T60p/X60s Thinkpads that have e1000

Actual Results:


Expected Results:


Additional info:
Comment 1 Jason Baron 2007-02-22 17:06:48 EST
hmmm this looks like a rhel5 kernel issue. HOwever the version field has 4.4
selected. Does this also apply to rhel4? just trying to get this properly
assigned. thanks.
Comment 2 John Walicki 2007-02-22 17:28:39 EST
Sorry - There was no way for me to tag this against the RHEL5 kernel.   It
definitely affects RHEL5 boot media installs.
Comment 5 John Walicki 2007-03-21 14:38:20 EDT
The various teams have traced this back to the Intel 82573L NIC (device id
8086:109a) and that chipsets interaction with Anaconda.

There is a suggested partial workaround that modifies the RxIntDelay value by
passing the
 following additional boot options (and modprobe options once the

system is installed):



modprobe e1000 RxIntDelay=8



See the following for additional info:


http://www.thinkwiki.org/wiki/Problem_with_e1000:_Open_issue_with_latency

http://bugme.osdl.org/show_bug.cgi?id=6929

Comment 6 John Walicki 2007-03-28 10:35:11 EDT
This problem has been resolved by a EEPROM fix for the Intel 82573L NIC (device id
8086:109a) that modifies the Active State Power Management (ASPM) behavior. 
Lenovo / Intel provided the EEPROM fix.

We have tested this fix across a wide vary of ThinkPad T60/T60p/X60 that include
 the Intel 82573L NIC (device id 8086:109a).  The problem has been solved.

The bugzilla ticket can be closed.

Note You need to log in before you can comment on or make changes to this bug.