Red Hat Bugzilla – Bug 50829
eepro100 driver fails after start of nttcp on IA64
Last modified: 2007-04-18 12:35:27 EDT
From Bugzilla Helper:
User-Agent: Mozilla/5.0 (X11; U; Linux 2.4.2-2 i686; en-US; m18)
Description of problem:
On a pe7150 running fairfax beta3, bios X15, onboard nic controlled by
eepro100 driver. Network interface fails immediately after initiation of
nttcp-1.46 between onboard nic and a client machine. By failure, I mean
that the client can't continue communication with the eepro100 onboard nic,
and the eepro100 interface is no longer able to ping out to another
machine. Ifconfig shows the interface still configured and still lists an
IP address, but at this point you can't ping anything.
Simply bringing the interface down and up regains function.
This does not occur when using the e100 driver, nor does it occur when
using eepro100 on an I386 platform
Steps to Reproduce:
1.Install fairfax beta3 on a pe7150 and use eepro100 for onboard intel nic.
2.Start nttcp-1.46 between client and onboard nic...note failure.
Actual Results: Network interface fails and must be restarted.
Expected Results: Normal network activity
Has the network card eeprom been fixed?
If by fixed you mean has the eeprom sleep bit been disabled (either using the
e100 driver or eepro100-diag -Gwww), yes.
fyi, the e100 driver in roswell does change the eeprom and Intel is going to
support the eepro100-diag tool for this.
This defect is considered MUST-FIX for Fairfax
Bug reported to Intel (Walker, Timothy E [firstname.lastname@example.org]) on 10
August 2001. Tim said he'd pass this on for investigation by the appropriate
driver team. Since Intel didn't write the driver, it's unlikely they'll do
anything about it, he said.
I've worked on this driver today and made some tentative hardening patches to
it. Currently, nttcp is running between my ia64 and my desktop (both eepro100)
for over an hour without problems (reporting >95Mbit), I'll leave it running for
the night and see tomorrow if it survives that long.
Was there any kind of special other load on the machines during the test ?
Negative, just nttcp. Sounds like it's working.
Running fine after like 30 hours still.
So either it's really fixed or there's duff hardware involved (NIC or
Fix will be in 2.4.7-2.2 or later; if you want to test (yes please) let me know
Checked on two different machines w/ RC1 using onboard nic. One worked fine,
the other continued to exhibit the error. I'll continue to investigate, but it
has to be hardware-related at this point.
note that RC1 has an older kernel without the fixes.
but it working on one machine and not at the other points to at least a hardware
influence. Things to check out of the top of my head:
* swap cables
* are both machines attached to the same hub/switch/router
* full or half duplex
Both machines were Bordeaux, right (i.e., not ia64 workstations?)
Right. Server, not workstations.
Actually, I'll let Clay properly respond (I shouldn't do this stuff from home),
as I think one system was a 32-bit server or at least a non-IA-64 system, but
the failure was always seen on the IA-64 server.
Latest testing with 2.4.7-2.3smp kernel:
Three different bordeaux's using eepro100 driver w/ onboard nic fail immediately
after nttcp starts. The bordeaux are running the server script and another
machine is acting as the client.
Failure occurs w/ bordeaux bios X15 and X16.
Clients have been i386 and ia64
Bordeauxs have been running rc1 and 2.4.7-2.3smp
E100 does not have the same problem.
Clients have been 7.1sbe and 7.2
Three different bordeauxs exhibit same problem.
Clients have been on different switch than server and same switch with server.
Bordeaux's have had 2.5GB, 16GB, and 32GB.
I've been running nttcp on a B3 lion for a long time now; maybe your server has
a different chip ?
00:05.0 Class 0200: 8086:1229 (rev 08)
Flags: bus master, medium devsel, latency 64, IRQ 55
Memory at 00000000f3f90000 (32-bit, non-prefetchable) [size=4K]
I/O ports at 6f00 [size=64]
Memory at 00000000f3e00000 (32-bit, non-prefetchable) [size=1M]
Capabilities: [dc] Power Management version 2
Ours reports identical information as that above.
2.4.9-0.18smp kernel eepro100 driver still failing immediately after start of
nttcp. E100 driver was loaded first to ensure the sleep-bit was disabled.
Have you done any further testing to see if the switch involved is related?
I have tried a different switch of the same type (Foundry FastIron Workgroup).
I have also tried putting the client on the same switch as the server. Let me
try a crossover cable to make it as simple as possible.
Eepro100 driver still fails with kernel 2.4.9-0.18smp and direct crossover cable
connection to client machine.
Failure reproduced with kernel 2.4.9-0.18smp using a pci Intel pro 100 card
rather than the onboard nic.
Trimming the cc: list.
Seems to be fixed by our swiommu fix; will be in 2.4.9-13.2 or later.
We were able to reproduce it on the bordeaux you sent, and now with
that fix we are getting ~94Mbit sustained.
w/ qa1108 (2.4.9-13.3smp), nttcp ran overnight on two eepro100 controlled
bordeaux onboard nics with speeds of 90-95MB/s. I'm satisfied.