Bug 61816

Summary: (NET EEPRO100) Update to kernel hoses networking on Acer Travelmate
Product: [Retired] Red Hat Linux Reporter: Michal Jaegermann <michal>
Component: kernelAssignee: Jeff Garzik <jgarzik>
Status: CLOSED UPSTREAM QA Contact: Brian Brock <bbrock>
Severity: medium Docs Contact:
Priority: medium    
Version: 7.2CC: peterm
Target Milestone: ---   
Target Release: ---   
Hardware: i386   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2004-03-03 05:56:27 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
dmesg output from an NFS session none

Description Red Hat Bugzilla 2002-03-24 20:59:40 UTC
Description of Problem:

Kernels 2.4.9-31 and 2.4.9-21 make ethernet interface (eepro100) in
Acer Travelmate laptop unusable.

I have here now such laptop where top of 'dmidecode' output shows

Handle 0x0000
	DMI type 0, 19 bytes.
	BIOS Information Block
		Vendor: ACER
		Version: V3.3 R01-A2j  EN                        
		Release: 08/10/2001
		BIOS base: 0xF0000
		ROM size: 448K
		Capabilities:
			Flags: 0x000000007F399F90
Handle 0x0100
	DMI type 1, 25 bytes.
	System Information Block
		Vendor: Acer            
		Product: TravelMate 740  
		Version: -1
		Serial Number: 9142R012051410013AM000          

It has a built-in ethernet card which is detected as eepro100
"eepro100.c: $Revision: 1.36 $ 2000/11/17 Modified by Andrey V. Savochkin
<saw.com.sg> and others
PCI: Guessed IRQ 10 for device 02:08.0"

'lspci -tv' and 'lspci -tvn' show these:

-[00]-+-00.0  Intel Corporation: Unknown device 3575
      +-01.0-[01]----00.0  ATI Technologies Inc: Unknown device 4c59
      +-1d.0  Intel Corporation: Unknown device 2482
      +-1d.1  Intel Corporation: Unknown device 2484
      +-1d.2  Intel Corporation: Unknown device 2487
      +-1e.0-[02]--+-08.0  Intel Corporation: Unknown device 1031
      |            +-09.0  O2 Micro, Inc. OZ6933 Cardbus Controller
      |            \-09.1  O2 Micro, Inc. OZ6933 Cardbus Controller
      +-1f.0  Intel Corporation: Unknown device 248c
      +-1f.1  Intel Corporation: Unknown device 248a
      +-1f.3  Intel Corporation: Unknown device 2483
      +-1f.5  Intel Corporation: Unknown device 2485
      \-1f.6  Intel Corporation: Unknown device 2486


-[00]-+-00.0  8086:3575
      +-01.0-[01]----00.0  1002:4c59
      +-1d.0  8086:2482
      +-1d.1  8086:2484
      +-1d.2  8086:2487
      +-1e.0-[02]--+-08.0  8086:1031
      |            +-09.0  1217:6933
      |            \-09.1  1217:6933
      +-1f.0  8086:248c
      +-1f.1  8086:248a
      +-1f.3  8086:2483
      +-1f.5  8086:2485
      \-1f.6  8086:2486

When trying to use with 2.4.9-31 (and at least 2.4.9-21) kernels from
7.2 distro updates ethernet connections go away rather quickly filling
logs with messages:
.......
eepro100: wait_for_cmd_done timeout!
eepro100: wait_for_cmd_done timeout!
.......

"Quickly" means in this context that an attempt to grab over NFS
a binary rpm of a kernel require multiple executions (three, four,...)
of 'ifdown eth0; ifup eth0' after which transfers usually resume.
ssh connection lock up unpredictably and often even before completing
a login sequence or soon thereafter.

'mii-tool -v' invariably shows:
eth0: no autonegotiation, 10baseT-HD, link ok
  product info: vendor 00:aa:00, model 51 rev 0
  basic mode:   autonegotiation enabled
  basic status: autonegotiation complete, link ok
  capabilities: 100baseTx-FD 100baseTx-HD 10baseT-FD 10baseT-HD
  advertising:  100baseTx-FD 100baseTx-HD 10baseT-FD 10baseT-HD flow-control
  link partner: 10baseT-HD

The only possible clue comes from 'eepro100-diag', which has to be
called with '-p 0x7000' as it is not finding the card by itself, which
gives that with '-mm' flags:

eepro100-diag.c:v2.05 6/13/2001 Donald Becker (becker)
 http://www.scyld.com/diag/index.html
Assuming a Intel i82557/8/9 EtherExpressPro100 adapter at 0x7000.
EEPROM contents, size 64x16:
    00: 0000 52e2 bf23 1a03 0000 0201 4701 0000
  0x08: 0000 0000 49a2 1017 1025 0000 0000 0000
  0x10: 0000 0000 0000 0000 0000 0000 0000 1031
      ...
  0x20: 0000 0000 0000 1031 0000 0000 0000 0000
      ...
  0x30: 002c 0000 0000 0000 0000 0000 0000 0000
  0x38: 0000 0000 0000 4030 0000 0000 0000 7b14
 The EEPROM checksum is correct.
Intel EtherExpress Pro 10/100 EEPROM contents:
  Station address 00:00:E2:52:23:BF.
  Board assembly 000000-000, Physical connectors present: RJ45
  Primary interface chip i82555 PHY #1.
   Sleep mode is enabled.  This is not recommended.
   Under high load the card may not respond to
    PCI requests, and thus cause a master abort.

It is not clear what enables this "sleep mode" and how possibly to
disable it.

The problem does not seem to occur with the original 2.4.7-10 kernel
from distribution CDs as I run the whole update of this installation
from an NFS mounted directory without any incidents.

It does not seem to be a problem of only a network driver as with
both incriminated kernels from time to time screen updates slow down
to ridiculous pace and one can see a terminal window "scrolling" in
distinct multiple "waves".

The current solution seem to be to replace the whole kernel with the
one from "skipjack" public beta.  At least nominally the same version
of a network driver is in use.  Still I just copied around 1.5 GB of
data from an NFS mounted directory to a destination on a laptop and it
survived.  Not everything is perfect as there was one incident during
this transfer when a network temporary wedged but it recovered by
itself.  I attach a dmesg output for this exercise.  All transfers
are on my "home" network between machines sitting side by side on
a table.

  Michal
  michal

Version-Release number of selected component (if applicable):
kernel 2.4.9-31

Comment 1 Red Hat Bugzilla 2002-03-24 21:12:05 UTC
Created attachment 50004 [details]
dmesg output from an NFS session

Comment 2 Red Hat Bugzilla 2002-03-24 21:14:50 UTC
On the less optimistic note just now an ssh session _to_ this laptop
running 2.4.18-0.4 wedged up and it does not look that it is going to
recover. :-(   It is already few minutes with a dead network.

Comment 3 Red Hat Bugzilla 2002-03-24 21:19:30 UTC
One more possible clue.  After I did 'ifdown eth0; iup eth0' on a laptop
a network is back and dmesg reports a bunch of

eepro100: wait_for_cmd_done timeout!

("last message repeated 4 times" and "last message repeated 13 times")
followed by

eth0: 0 multicast blocks dropped.

Comment 4 Red Hat Bugzilla 2002-04-09 15:47:01 UTC
I'm getting this same thing with kernel 2.4.8-0.16, under RH7.2.  Everything
works fine after a reboot/startup, but after suspending and unsuspending, I get
(copied from /var/log/messages):



Apr  8 10:44:07 rain kernel: eepro100.c:v1.09j-t 9/29/99 Donald Becker
http://www.scyld.com/network/eepro100.html
Apr  8 10:44:07 rain kernel: eepro100.c: $Revision: 1.36 $ 2000/11/17 Modified
by Andrey V. Savochkin <saw.com.sg> and others
Apr  8 10:44:07 rain kernel: PCI: Enabling device 08:04.0 (0000 -> 0003)
Apr  8 10:44:07 rain kernel: PCI: Setting latency timer of device 08:04.0 to 64
Apr  8 10:44:07 rain kernel: eth0: Invalid EEPROM checksum 0xff00, check
settings before activating this device!
Apr  8 10:44:07 rain kernel: eth0: OEM i82557/i82558 10/100 Ethernet,
FF:FF:FF:FF:FF:FF, IRQ 10.
Apr  8 10:44:07 rain kernel:   Board assembly ffffff-255, Physical connectors
present: RJ45 BNC AUI MII
Apr  8 10:44:07 rain kernel:   Primary interface chip unknown-15 PHY #31.
Apr  8 10:44:07 rain kernel:     Secondary interface chip i82555.
Apr  8 10:44:07 rain kernel: Self test failed, status ffffffff:
Apr  8 10:44:07 rain kernel:  Failure to initialize the i82557.
Apr  8 10:44:07 rain kernel:  Verify that the card is a bus-master capable slot.

diff -u of 'lspci -vv' from pre and post suspend shows a number of differences:

--- ee.pre	Tue Apr  9 11:40:57 2002
+++ ee.post	Tue Apr  9 11:40:35 2002
@@ -1,15 +1,13 @@
 08:04.0 Ethernet controller: Intel Corporation 82557 [Ethernet Pro 100] (rev 08)
 	Subsystem: Action Tec Electronics Inc: Unknown device 1100
-
Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV+ VGASnoop- ParErr- Stepping-
SERR+ FastB2B-
-
Status: Cap+ 66Mhz- UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort- <TAbort
-- <MAbort- >SERR- <PERR-
-
Latency: 32 (2000ns min, 14000ns max), cache line size 08
+
Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping-
SERR- FastB2B-
+
Status: Cap+ 66Mhz- UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort- <TAbort-
<MAbort- >SERR- <PERR-
+
Latency: 64 (2000ns min, 14000ns max)
 	Interrupt: pin A routed to IRQ 10
-
Region 0: Memory at f8fff000 (32-bit, non-prefetchable) [size=4K]
+
Region 0: [virtual] Memory at f8fff000 (32-bit, non-prefetchable) [size=4K]
 	Region 1: I/O ports at ecc0 [size=64]
-
Region 2: Memory at f8e00000 (32-bit, non-prefetchable) [size=1M]
+
Region 2: [virtual] Memory at f8e00000 (32-bit, non-prefetchable) [size=1M]
 	Expansion ROM at f9000000 [disabled] [size=1M]
 	Capabilities: [dc] Power Management version 2
-
	Flags: PMEClk- DSI+ D1+ D2+ AuxCurrent=0mA PME(D0+,D1+,D2+,D3hot
-+,D3cold+)
-
	Status: D2 PME-Enable- DSel=0 DScale=2 PME-
+
	Flags: PMEClk- DSI+ D1+ D2+ AuxCurrent=0mA PME(D0+,D1+,D2+,D3hot+,D3cold+)
+
	Status: D0 PME-Enable+ DSel=0 DScale=2 PME-

(I hope bugzilla doesn't mangle this)

Comment 5 Red Hat Bugzilla 2002-04-09 16:42:21 UTC
2.4.18-0.13 from the current "public beta" included e100 among "addon"
drivers.  This one seems to work without going away after a while.
It also survives suspend.

Comment 6 Red Hat Bugzilla 2003-06-07 22:25:37 UTC
The value of this report seems to be mostly historical.  I did not hear
from the owner about network problems; but he is using e100 driver.

Comment 7 Red Hat Bugzilla 2004-03-03 05:56:27 UTC
Ok, closing since e100 appears to work.  We are deprecating eepro100
in favor of e100.