Bug 50829 - eepro100 driver fails after start of nttcp on IA64
eepro100 driver fails after start of nttcp on IA64
Status: CLOSED RAWHIDE
Product: Red Hat Linux
Classification: Retired
Component: kernel (Show other bugs)
7.3
ia64 Linux
medium Severity medium
: ---
: ---
Assigned To: Arjan van de Ven
Brock Organ
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2001-08-03 15:18 EDT by Clay Cooper
Modified: 2007-04-18 12:35 EDT (History)
5 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2001-11-05 11:39:33 EST
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)

  None (edit)
Description Clay Cooper 2001-08-03 15:18:06 EDT
From Bugzilla Helper:
User-Agent: Mozilla/5.0 (X11; U; Linux 2.4.2-2 i686; en-US; m18)
Gecko/20010131 Netscape6/6.01

Description of problem:
On a pe7150 running fairfax beta3, bios X15, onboard nic controlled by
eepro100 driver.  Network interface fails immediately after initiation of
nttcp-1.46 between onboard nic and a client machine.  By failure, I mean
that the client can't continue communication with the eepro100 onboard nic,
and the eepro100 interface is no longer able to ping out to another
machine.  Ifconfig shows the interface still configured and still lists an
IP address, but at this point you can't ping anything.
Simply bringing the interface down and up regains function.

This does not occur when using the e100 driver, nor does it occur when
using eepro100 on an I386 platform

How reproducible:
Always

Steps to Reproduce:
1.Install fairfax beta3 on a pe7150 and use eepro100 for onboard intel nic.
2.Start nttcp-1.46 between client and onboard nic...note failure.
3.
	

Actual Results:  Network interface fails and must be restarted.		

Expected Results:  Normal network activity

Additional info:
Comment 1 Bill Nottingham 2001-08-05 20:28:31 EDT
Has the network card eeprom been fixed?
Comment 2 Matt Domsch 2001-08-06 11:57:48 EDT
If by fixed you mean has the eeprom sleep bit been disabled (either using the 
e100 driver or eepro100-diag -Gwww), yes.
Comment 3 Arjan van de Ven 2001-08-06 12:01:43 EDT
fyi, the e100 driver in roswell does change the eeprom and Intel is going to
support the eepro100-diag tool for this.
Comment 4 Glen Foster 2001-08-06 18:49:47 EDT
This defect is considered MUST-FIX for Fairfax
Comment 5 Matt Domsch 2001-08-10 14:05:50 EDT
Bug reported to Intel (Walker, Timothy E [timothy.e.walker@intel.com]) on 10 
August 2001.  Tim said he'd pass this on for investigation by the appropriate 
driver team.  Since Intel didn't write the driver, it's unlikely they'll do 
anything about it, he said.
Comment 6 Arjan van de Ven 2001-08-14 15:13:24 EDT
I've worked on this driver today and made some tentative hardening patches to
it. Currently, nttcp is running between my ia64 and my desktop (both eepro100)
for over an hour without problems (reporting >95Mbit), I'll leave it running for
the night and see tomorrow if it survives that long. 

Was there any kind of special other load on the machines during the test ?
Comment 7 Clay Cooper 2001-08-14 15:17:24 EDT
Negative, just nttcp.  Sounds like it's working.
Comment 8 Arjan van de Ven 2001-08-15 17:11:17 EDT
Running fine after like 30 hours still.
So either it's really fixed or there's duff hardware involved (NIC or
switch/router)

Fix will be in 2.4.7-2.2 or later; if you want to test (yes please) let me know
Comment 9 Clay Cooper 2001-08-21 16:52:44 EDT
Checked on two different machines w/ RC1 using onboard nic.  One worked fine,
the other continued to exhibit the error.  I'll continue to investigate, but it
has to be hardware-related at this point.
Comment 10 Arjan van de Ven 2001-08-21 16:57:15 EDT
note that RC1 has an older kernel without the fixes.
but it working on one machine and not at the other points to at least a hardware
influence. Things to check out of the top of my head:
* swap cables
* are both machines attached to the same hub/switch/router 
* full or half duplex
Comment 11 Bill Nottingham 2001-08-22 23:03:41 EDT
Both machines were Bordeaux, right (i.e., not ia64 workstations?)
Comment 12 Matt Domsch 2001-08-22 23:45:20 EDT
Right.  Server, not workstations.
Comment 13 Matt Domsch 2001-08-22 23:48:53 EDT
Actually, I'll let Clay properly respond (I shouldn't do this stuff from home), 
as I think one system was a 32-bit server or at least a non-IA-64 system, but 
the failure was always seen on the IA-64 server.
Comment 14 Clay Cooper 2001-08-23 15:50:15 EDT
Latest testing with 2.4.7-2.3smp kernel:
Three different bordeaux's using eepro100 driver w/ onboard nic fail immediately
after nttcp starts.  The bordeaux are running the server script and another
machine is acting as the client.

Failure occurs w/ bordeaux bios X15 and X16.

Clients have been i386 and ia64

Bordeauxs have been running rc1 and 2.4.7-2.3smp

E100 does not have the same problem.

Clients have been 7.1sbe and 7.2

Three different bordeauxs exhibit same problem.

Clients have been on different switch than server and same switch with server.

Bordeaux's have had 2.5GB, 16GB, and 32GB.
Comment 15 Arjan van de Ven 2001-09-14 12:56:21 EDT
I've been running nttcp on a B3 lion for a long time now; maybe your server has
a different chip ? 

00:05.0 Class 0200: 8086:1229 (rev 08)
        Subsystem: 8086:3000
        Flags: bus master, medium devsel, latency 64, IRQ 55
        Memory at 00000000f3f90000 (32-bit, non-prefetchable) [size=4K]
        I/O ports at 6f00 [size=64]
        Memory at 00000000f3e00000 (32-bit, non-prefetchable) [size=1M]
        Capabilities: [dc] Power Management version 2
Comment 16 Matt Domsch 2001-09-20 16:27:06 EDT
Ours reports identical information as that above.
Comment 17 Clay Cooper 2001-10-08 12:02:49 EDT
2.4.9-0.18smp kernel eepro100 driver still failing immediately after start of
nttcp.  E100 driver was loaded first to ensure the sleep-bit was disabled.
Comment 18 Bill Nottingham 2001-10-08 12:27:20 EDT
Have you done any further testing to see if the switch involved is related?
Comment 19 Clay Cooper 2001-10-08 12:35:00 EDT
I have tried a different switch of the same type (Foundry FastIron Workgroup). 
I have also tried putting the client on the same switch as the server.  Let me
try a crossover cable to make it as simple as possible.
Comment 20 Clay Cooper 2001-10-09 11:12:42 EDT
Eepro100 driver still fails with kernel 2.4.9-0.18smp and direct crossover cable
connection to client machine.
Comment 21 Clay Cooper 2001-10-15 11:20:37 EDT
Failure reproduced with kernel 2.4.9-0.18smp using a pci Intel pro 100 card
rather than the onboard nic.
Comment 22 Matt Domsch 2001-11-05 11:39:25 EST
Trimming the cc: list.
Comment 23 Michael K. Johnson 2001-11-08 10:57:55 EST
Seems to be fixed by our swiommu fix; will be in 2.4.9-13.2 or later.
We were able to reproduce it on the bordeaux you sent, and now with
that fix we are getting ~94Mbit sustained.
Comment 24 Clay Cooper 2001-11-13 08:51:09 EST
w/ qa1108 (2.4.9-13.3smp), nttcp ran overnight on two eepro100 controlled
bordeaux onboard nics with speeds of 90-95MB/s.  I'm satisfied.

Note You need to log in before you can comment on or make changes to this bug.