50829 – eepro100 driver fails after start of nttcp on IA64

Bug 50829 - eepro100 driver fails after start of nttcp on IA64

Summary: eepro100 driver fails after start of nttcp on IA64

Keywords:
Status:	CLOSED RAWHIDE
Alias:	None
Product:	Red Hat Linux
Classification:	Retired
Component:	kernel
Sub Component:
Version:	7.3
Hardware:	ia64
OS:	Linux
Priority:	medium
Severity:	medium
Target Milestone:	---
Assignee:	Arjan van de Ven
QA Contact:	Brock Organ
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2001-08-03 19:18 UTC by Clay Cooper
Modified:	2007-04-18 16:35 UTC (History)
CC List:	5 users (show)
Fixed In Version:
Clone Of:
Environment:
Last Closed:	2001-11-05 16:39:33 UTC
Embargoed:

Attachments	(Terms of Use)

Description Clay Cooper 2001-08-03 19:18:06 UTC

From Bugzilla Helper:
User-Agent: Mozilla/5.0 (X11; U; Linux 2.4.2-2 i686; en-US; m18)
Gecko/20010131 Netscape6/6.01

Description of problem:
On a pe7150 running fairfax beta3, bios X15, onboard nic controlled by
eepro100 driver.  Network interface fails immediately after initiation of
nttcp-1.46 between onboard nic and a client machine.  By failure, I mean
that the client can't continue communication with the eepro100 onboard nic,
and the eepro100 interface is no longer able to ping out to another
machine.  Ifconfig shows the interface still configured and still lists an
IP address, but at this point you can't ping anything.
Simply bringing the interface down and up regains function.

This does not occur when using the e100 driver, nor does it occur when
using eepro100 on an I386 platform

How reproducible:
Always

Steps to Reproduce:
1.Install fairfax beta3 on a pe7150 and use eepro100 for onboard intel nic.
2.Start nttcp-1.46 between client and onboard nic...note failure.
3.
	

Actual Results:  Network interface fails and must be restarted.		

Expected Results:  Normal network activity

Additional info:

Comment 1 Bill Nottingham 2001-08-06 00:28:31 UTC

Has the network card eeprom been fixed?

Comment 2 Matt Domsch 2001-08-06 15:57:48 UTC

If by fixed you mean has the eeprom sleep bit been disabled (either using the 
e100 driver or eepro100-diag -Gwww), yes.

Comment 3 Arjan van de Ven 2001-08-06 16:01:43 UTC

fyi, the e100 driver in roswell does change the eeprom and Intel is going to
support the eepro100-diag tool for this.

Comment 4 Glen Foster 2001-08-06 22:49:47 UTC

This defect is considered MUST-FIX for Fairfax

Comment 5 Matt Domsch 2001-08-10 18:05:50 UTC

Bug reported to Intel (Walker, Timothy E [timothy.e.walker]) on 10 
August 2001.  Tim said he'd pass this on for investigation by the appropriate 
driver team.  Since Intel didn't write the driver, it's unlikely they'll do 
anything about it, he said.

Comment 6 Arjan van de Ven 2001-08-14 19:13:24 UTC

I've worked on this driver today and made some tentative hardening patches to
it. Currently, nttcp is running between my ia64 and my desktop (both eepro100)
for over an hour without problems (reporting >95Mbit), I'll leave it running for
the night and see tomorrow if it survives that long. 

Was there any kind of special other load on the machines during the test ?

Comment 7 Clay Cooper 2001-08-14 19:17:24 UTC

Negative, just nttcp.  Sounds like it's working.

Comment 8 Arjan van de Ven 2001-08-15 21:11:17 UTC

Running fine after like 30 hours still.
So either it's really fixed or there's duff hardware involved (NIC or
switch/router)

Fix will be in 2.4.7-2.2 or later; if you want to test (yes please) let me know

Comment 9 Clay Cooper 2001-08-21 20:52:44 UTC

Checked on two different machines w/ RC1 using onboard nic.  One worked fine,
the other continued to exhibit the error.  I'll continue to investigate, but it
has to be hardware-related at this point.

Comment 10 Arjan van de Ven 2001-08-21 20:57:15 UTC

note that RC1 has an older kernel without the fixes.
but it working on one machine and not at the other points to at least a hardware
influence. Things to check out of the top of my head:
* swap cables
* are both machines attached to the same hub/switch/router 
* full or half duplex

Comment 11 Bill Nottingham 2001-08-23 03:03:41 UTC

Both machines were Bordeaux, right (i.e., not ia64 workstations?)

Comment 12 Matt Domsch 2001-08-23 03:45:20 UTC

Right.  Server, not workstations.

Comment 13 Matt Domsch 2001-08-23 03:48:53 UTC

Actually, I'll let Clay properly respond (I shouldn't do this stuff from home), 
as I think one system was a 32-bit server or at least a non-IA-64 system, but 
the failure was always seen on the IA-64 server.

Comment 14 Clay Cooper 2001-08-23 19:50:15 UTC

Latest testing with 2.4.7-2.3smp kernel:
Three different bordeaux's using eepro100 driver w/ onboard nic fail immediately
after nttcp starts.  The bordeaux are running the server script and another
machine is acting as the client.

Failure occurs w/ bordeaux bios X15 and X16.

Clients have been i386 and ia64

Bordeauxs have been running rc1 and 2.4.7-2.3smp

E100 does not have the same problem.

Clients have been 7.1sbe and 7.2

Three different bordeauxs exhibit same problem.

Clients have been on different switch than server and same switch with server.

Bordeaux's have had 2.5GB, 16GB, and 32GB.

Comment 15 Arjan van de Ven 2001-09-14 16:56:21 UTC

I've been running nttcp on a B3 lion for a long time now; maybe your server has
a different chip ? 

00:05.0 Class 0200: 8086:1229 (rev 08)
        Subsystem: 8086:3000
        Flags: bus master, medium devsel, latency 64, IRQ 55
        Memory at 00000000f3f90000 (32-bit, non-prefetchable) [size=4K]
        I/O ports at 6f00 [size=64]
        Memory at 00000000f3e00000 (32-bit, non-prefetchable) [size=1M]
        Capabilities: [dc] Power Management version 2

Comment 16 Matt Domsch 2001-09-20 20:27:06 UTC

Ours reports identical information as that above.

Comment 17 Clay Cooper 2001-10-08 16:02:49 UTC

2.4.9-0.18smp kernel eepro100 driver still failing immediately after start of
nttcp.  E100 driver was loaded first to ensure the sleep-bit was disabled.

Comment 18 Bill Nottingham 2001-10-08 16:27:20 UTC

Have you done any further testing to see if the switch involved is related?

Comment 19 Clay Cooper 2001-10-08 16:35:00 UTC

I have tried a different switch of the same type (Foundry FastIron Workgroup). 
I have also tried putting the client on the same switch as the server.  Let me
try a crossover cable to make it as simple as possible.

Comment 20 Clay Cooper 2001-10-09 15:12:42 UTC

Eepro100 driver still fails with kernel 2.4.9-0.18smp and direct crossover cable
connection to client machine.

Comment 21 Clay Cooper 2001-10-15 15:20:37 UTC

Failure reproduced with kernel 2.4.9-0.18smp using a pci Intel pro 100 card
rather than the onboard nic.

Comment 22 Matt Domsch 2001-11-05 16:39:25 UTC

Trimming the cc: list.

Comment 23 Michael K. Johnson 2001-11-08 15:57:55 UTC

Seems to be fixed by our swiommu fix; will be in 2.4.9-13.2 or later.
We were able to reproduce it on the bordeaux you sent, and now with
that fix we are getting ~94Mbit sustained.

Comment 24 Clay Cooper 2001-11-13 13:51:09 UTC

w/ qa1108 (2.4.9-13.3smp), nttcp ran overnight on two eepro100 controlled
bordeaux onboard nics with speeds of 90-95MB/s.  I'm satisfied.

Note You need to log in before you can comment on or make changes to this bug.