Bug 55178 - e1000 driver included in distribution performs poorly
Summary: e1000 driver included in distribution performs poorly
Status: CLOSED RAWHIDE
Alias: None
Product: Red Hat Linux
Classification: Retired
Component: kernel   
(Show other bugs)
Version: 7.3
Hardware: ia64
OS: Linux
high
high
Target Milestone: ---
Assignee: Arjan van de Ven
QA Contact: Brock Organ
URL:
Whiteboard:
Keywords:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2001-10-26 17:33 UTC by Matthew Tolentino
Modified: 2005-10-31 22:00 UTC (History)
4 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2001-12-06 17:15:23 UTC
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)

Description Matthew Tolentino 2001-10-26 17:33:47 UTC
Description of Problem:

The e1000 driver (version 3.1.23) for the Intel Pro/1000 Gigabit Ethernet 
Adapter exhibits extremely low performance.  Pings to other systems often 
take as long as 1 second.  Ftp and telnet sessions to other systems across 
the adapter are extremely slow.  

Version-Release number of selected component (if applicable):
driver version 3.1.23

How Reproducible:

Always

Steps to Reproduce:
1. load e1000 driver fro Pro/1000 Gb NIC
2. ping <other_system_ip> OR
3. telnet/ftp to <other_system)ip>

Actual Results:
System appears extremely sluggish.  Almost unusable.

Expected Results:
pings should succeed.  Also, telnet/ftp sessions should be reasonably 
fast - at least as fast as across a 10/100 NIC.

Additional Information:
	
I downloaded the same version driver (v3.1.23) from the intel.com/support 
website, built it standalone, and found it to function as expected.  This 
data points to a problem only with the one included in the distribution RH 
tree.

Comment 1 Arjan van de Ven 2001-10-26 17:37:44 UTC
This is interesting as we ship the same driver except for the Intel Proprietary
and Non-Free parts. Which kernel is this exactly ?

Comment 2 Matthew Tolentino 2001-10-26 17:41:02 UTC
This is the default 2.4.9-0.12smp kernel.

Comment 3 Scott Feldman 2001-10-27 02:09:28 UTC
The RH merge of the Intel Proprietary driver (version 3.1.23) into the kernel 
breaks the driver on IA-64.  Several critical hardware structures have been 
redefined by RH's merge script as 64-bit values rather than the intended 32-
bit values.  unsigned long's on IA-64 are 64-bits long!!!  (Note that the 
kernel driver is not broken for IA-32).

The patch below corrects the driver on IA-64.  This patch should be 
incorporated into the final build.

--- drivers/addon/e1000/e1000_fxhw.h	Tue Sep 25 09:19:30 2001
+++ drivers/addon/e1000.fixed/e1000_fxhw.h	Fri Oct 26 18:14:28 2001
@@ -195,20 +195,20 @@ typedef struct _E1000_TRANSMIT_DESCRIPTO
 	E1000_64_BIT_PHYSICAL_ADDRESS BufferAddress;
 
 	union {
-		unsigned long DwordData;
+		u32 DwordData;
 		struct _TXD_FLAGS {
-			unsigned short Length;
-			unsigned char Cso;
-			unsigned char Cmd;
+			u16 Length;
+			u8 Cso;
+			u8 Cmd;
 		} Flags;
 	} Lower;
 
 	union {
-		unsigned long DwordData;
+		u32 DwordData;
 		struct _TXD_FIELDS {
-			unsigned char TransmitStatus;
-			unsigned char Css;
-			unsigned short Special;
+			u8 TransmitStatus;
+			u8 Css;
+			u16 Special;
 		} Fields;
 	} Upper;
 
@@ -216,31 +216,31 @@ typedef struct _E1000_TRANSMIT_DESCRIPTO
 
 typedef struct _E1000_TCPIP_CONTEXT_TRANSMIT_DESCRIPTOR {
 	union {
-		unsigned long IpXsumConfig;
+		u32 IpXsumConfig;
 		struct _IP_XSUM_FIELDS {
-			unsigned char Ipcss;
-			unsigned char Ipcso;
-			unsigned short Ipcse;
+			u8 Ipcss;
+			u8 Ipcso;
+			u16 Ipcse;
 		} IpFields;
 	} LowerXsumSetup;
 
 	union {
-		unsigned long TcpXsumConfig;
+		u32 TcpXsumConfig;
 		struct _TCP_XSUM_FIELDS {
-			unsigned char Tucss;
-			unsigned char Tucso;
-			unsigned short Tucse;
+			u8 Tucss;
+			u8 Tucso;
+			u16 Tucse;
 		} TcpFields;
 	} UpperXsumSetup;
 
-	unsigned long CmdAndLength;
+	u32 CmdAndLength;
 
 	union {
-		unsigned long DwordData;
+		u32 DwordData;
 		struct _TCP_SEG_FIELDS {
-			unsigned char Status;
-			unsigned char HdrLen;
-			unsigned short Mss;
+			u8 Status;
+			u8 HdrLen;
+			u16 Mss;
 		} Fields;
 	} TcpSegSetup;
 
@@ -251,20 +251,20 @@ typedef struct _E1000_TCPIP_DATA_TRANSMI
 	E1000_64_BIT_PHYSICAL_ADDRESS BufferAddress;
 
 	union {
-		unsigned long DwordData;
+		u32 DwordData;
 		struct _TXD_OD_FLAGS {
-			unsigned short Length;
-			unsigned char TypLenExt;
-			unsigned char Cmd;
+			u16 Length;
+			u8 TypLenExt;
+			u8 Cmd;
 		} Flags;
 	} Lower;
 
 	union {
-		unsigned long DwordData;
+		u32 DwordData;
 		struct _TXD_OD_FIELDS {
-			unsigned char TransmitStatus;
-			unsigned char Popts;
-			unsigned short Special;
+			u8 TransmitStatus;
+			u8 Popts;
+			u16 Special;
 		} Fields;
 	} Upper;
 


Comment 4 Arjan van de Ven 2001-11-02 21:59:06 UTC
Fixed in 2.4.9-12 and later, released as erratum for 7.1

Comment 5 Matt Domsch 2001-11-05 21:39:40 UTC
In 2.4.9-13, this part of the patch didn't make it in:
-	unsigned long CmdAndLength;
+	u32 CmdAndLength;

It's part of a hardware descriptor, and only a u32 is written into it.  For 
correctness on IA-64, this needs to be included.


Comment 6 Scott Feldman 2001-11-07 20:57:55 UTC
CmdAndLength is used for Tx Checksum offloading and consequently ZEROCOPY.  
This needs to be typed u32, per the patch given earlier.

Comment 7 Arjan van de Ven 2001-11-07 21:47:20 UTC
Fixed in my current tree; e1000 is still a tad slow, see the nttcp results below
(on a 32 GB RAM Dell IA64 way machine with 4 cpus)
(5th column is mbit tcp payload)

l 83886080    0.76    0.72    884.8381    929.8965   20480  27003.12   28378.2
1 83886080    0.76    0.75    883.9884    894.7849   21147  27855.79   28196.0
l 83886080    0.76    0.73    883.7637    924.8907   20480  26970.33   28225.4
1 83886080    0.76    0.73    883.0183    919.2995   21211  27909.43   29056.2

eepro100 is now also fixed in my tree now for this machine:

l 83886080    7.14    0.33     94.0244   2033.1214   20480   2869.40   62045.9
1 83886080    7.15    1.64     93.8813    409.2004   57921   8102.80   35317.7
l 83886080    7.14    0.32     94.0540   2095.1027   20480   2870.30   63937.5
1 83886080    7.15    1.50     93.8907    447.3924   57911   8102.22   38607.3


Comment 8 Arjan van de Ven 2001-11-07 22:34:00 UTC
update: the poor e1000 performance seems to be caused by the other side of the
test (a bcm5700) that doesn't go fast enough to max out the e1000

Comment 9 Clay Cooper 2001-11-13 14:35:57 UTC
w/ qa1108 (2.4.9-13.3smp), running nttcp between two e1000 nics in two
bordeauxs, I'm seeing 400-500MB/s sustained, with infrequent peaks of 800MB/s.

Each bordeaux has 4cpus.  Client machine has 64GB ram, and server has 32GB ram. 

Cpu utilization on both machines is 75% idle.

Comment 10 Arjan van de Ven 2001-11-13 14:37:43 UTC
This is bad performance. Is there a switch involved in the setup ?

Comment 11 Clay Cooper 2001-11-14 15:47:36 UTC
Initially yes.  I just tried with nothing between but a crossover cable and got
the same speed as going through a gigabit switch.

Comment 12 Scott Feldman 2001-11-19 18:49:23 UTC
Run more nttcp sessions (~10) to get an aggregate performance.  A single nttcp 
session over gigabit is like trying to flood a culvert with a garden hose.

Comment 13 Clay Cooper 2001-11-19 19:58:54 UTC
Ok, aggregate performance with 2 e1000's and multiple nttcp sessions is roughly
1000-1200Mb/s.  It seems to exceed the gigabit bandwidth, but the total is
fairly consistent as sessions are added:

1 session  -- 500Mb/s
2 sessions -- 500Mb/s each
3 sessions -- 400Mb/s each
4 sessions -- 300Mb/s each
5 sessions -- 225Mb/s each

Comment 14 Michael K. Johnson 2001-12-06 17:06:39 UTC
Sounds fixed to me.

Anyone disagree?

Comment 15 Clay Cooper 2001-12-06 17:15:18 UTC
I agree

Comment 16 John A. Hull 2001-12-06 17:17:19 UTC
Closing.


Note You need to log in before you can comment on or make changes to this bug.