From Bugzilla Helper: User-Agent: Mozilla/5.0 (X11; U; Linux 2.4.2-2 i686; en-US; m18) Gecko/20010131 Netscape6/6.01 Description of problem: On a pe7150 running fairfax beta3, bios X15, onboard nic controlled by eepro100 driver. Network interface fails immediately after initiation of nttcp-1.46 between onboard nic and a client machine. By failure, I mean that the client can't continue communication with the eepro100 onboard nic, and the eepro100 interface is no longer able to ping out to another machine. Ifconfig shows the interface still configured and still lists an IP address, but at this point you can't ping anything. Simply bringing the interface down and up regains function. This does not occur when using the e100 driver, nor does it occur when using eepro100 on an I386 platform How reproducible: Always Steps to Reproduce: 1.Install fairfax beta3 on a pe7150 and use eepro100 for onboard intel nic. 2.Start nttcp-1.46 between client and onboard nic...note failure. 3. Actual Results: Network interface fails and must be restarted. Expected Results: Normal network activity Additional info:
Has the network card eeprom been fixed?
If by fixed you mean has the eeprom sleep bit been disabled (either using the e100 driver or eepro100-diag -Gwww), yes.
fyi, the e100 driver in roswell does change the eeprom and Intel is going to support the eepro100-diag tool for this.
This defect is considered MUST-FIX for Fairfax
Bug reported to Intel (Walker, Timothy E [timothy.e.walker]) on 10 August 2001. Tim said he'd pass this on for investigation by the appropriate driver team. Since Intel didn't write the driver, it's unlikely they'll do anything about it, he said.
I've worked on this driver today and made some tentative hardening patches to it. Currently, nttcp is running between my ia64 and my desktop (both eepro100) for over an hour without problems (reporting >95Mbit), I'll leave it running for the night and see tomorrow if it survives that long. Was there any kind of special other load on the machines during the test ?
Negative, just nttcp. Sounds like it's working.
Running fine after like 30 hours still. So either it's really fixed or there's duff hardware involved (NIC or switch/router) Fix will be in 2.4.7-2.2 or later; if you want to test (yes please) let me know
Checked on two different machines w/ RC1 using onboard nic. One worked fine, the other continued to exhibit the error. I'll continue to investigate, but it has to be hardware-related at this point.
note that RC1 has an older kernel without the fixes. but it working on one machine and not at the other points to at least a hardware influence. Things to check out of the top of my head: * swap cables * are both machines attached to the same hub/switch/router * full or half duplex
Both machines were Bordeaux, right (i.e., not ia64 workstations?)
Right. Server, not workstations.
Actually, I'll let Clay properly respond (I shouldn't do this stuff from home), as I think one system was a 32-bit server or at least a non-IA-64 system, but the failure was always seen on the IA-64 server.
Latest testing with 2.4.7-2.3smp kernel: Three different bordeaux's using eepro100 driver w/ onboard nic fail immediately after nttcp starts. The bordeaux are running the server script and another machine is acting as the client. Failure occurs w/ bordeaux bios X15 and X16. Clients have been i386 and ia64 Bordeauxs have been running rc1 and 2.4.7-2.3smp E100 does not have the same problem. Clients have been 7.1sbe and 7.2 Three different bordeauxs exhibit same problem. Clients have been on different switch than server and same switch with server. Bordeaux's have had 2.5GB, 16GB, and 32GB.
I've been running nttcp on a B3 lion for a long time now; maybe your server has a different chip ? 00:05.0 Class 0200: 8086:1229 (rev 08) Subsystem: 8086:3000 Flags: bus master, medium devsel, latency 64, IRQ 55 Memory at 00000000f3f90000 (32-bit, non-prefetchable) [size=4K] I/O ports at 6f00 [size=64] Memory at 00000000f3e00000 (32-bit, non-prefetchable) [size=1M] Capabilities: [dc] Power Management version 2
Ours reports identical information as that above.
2.4.9-0.18smp kernel eepro100 driver still failing immediately after start of nttcp. E100 driver was loaded first to ensure the sleep-bit was disabled.
Have you done any further testing to see if the switch involved is related?
I have tried a different switch of the same type (Foundry FastIron Workgroup). I have also tried putting the client on the same switch as the server. Let me try a crossover cable to make it as simple as possible.
Eepro100 driver still fails with kernel 2.4.9-0.18smp and direct crossover cable connection to client machine.
Failure reproduced with kernel 2.4.9-0.18smp using a pci Intel pro 100 card rather than the onboard nic.
Trimming the cc: list.
Seems to be fixed by our swiommu fix; will be in 2.4.9-13.2 or later. We were able to reproduce it on the bordeaux you sent, and now with that fix we are getting ~94Mbit sustained.
w/ qa1108 (2.4.9-13.3smp), nttcp ran overnight on two eepro100 controlled bordeaux onboard nics with speeds of 90-95MB/s. I'm satisfied.