Bug 41294 - bug in e100 driver on Intel 815 Chipset in 10M Full Duplex
Summary: bug in e100 driver on Intel 815 Chipset in 10M Full Duplex
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: Red Hat Linux
Classification: Retired
Component: kernel
Version: 7.1
Hardware: i386
OS: Linux
medium
high
Target Milestone: ---
Assignee: Arjan van de Ven
QA Contact: Brock Organ
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2001-05-18 15:31 UTC by Need Real Name
Modified: 2007-04-18 16:33 UTC (History)
3 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2003-06-06 12:47:06 UTC
Embargoed:


Attachments (Terms of Use)
Turn off some options for e100 that might be causing perf loss (1.20 KB, patch)
2001-07-09 21:58 UTC, K. Spoon
no flags Details | Diff

Description Need Real Name 2001-05-18 15:31:30 UTC
From Bugzilla Helper:
User-Agent: Mozilla/4.0 (compatible; MSIE 5.5; Windows 98)

Description of problem:
Given mild load in 10Mb Full Duplex (config from swich and auto sensing OR 
mixed in modules.conf)when you load the system ftp a few files accross 
local network is enough. The driver will produce e100_wait_exec_cmd: Wait 
failed. scb cmd=0xf0 (The last number changes)

How reproducible:
Always

Steps to Reproduce:
1. load the e100 driver in 10Mb Full duplex on a mainboard with the Intel 
815 Integrated Ethernet such as the Aopen MX3S or the Intel D815EEA
2. ftp a few files accross network ( flood ping can also work)
3. ftp will hang move to another console window and ping any device on 
network and e100_wait_exec_cmd: Wait failed. scb cmd=0xf0 will appear
	

Actual Results:  Driver produces and e100_wait_exec_cmd: Wait failed. scb 
cmd=0xf0 
/etc/rc.d/init.d/network restart will fix the problem for a few mins.
It produces the error Multicast setup Failed STOPPING the network.
Sometimes box locks up completely no keyboard activity.

Expected Results:  It should work?

Additional info:

Bug reproduced on driver version 1.5.6 as shipped in Redhat 7.1 kernel 
2.4.2-2 
Also fails on 1.6.5 current versionon intels Website
System is fairly basic. ( tried Aopen MX3S and intel D815EEA ) Pent III 
533 and celeron 733 tried 256MB ram (PC100) IDE disk. no other cards in 
the system.
Switch is a Cisco 2924 XL

Comment 1 Arjan van de Ven 2001-05-18 15:36:17 UTC
Could you try using the eepro100 driver instead ?

Comment 2 Need Real Name 2001-05-18 16:03:30 UTC
We had used that driver as well. It fails when you go 10M HALF duplex. A much 
more common occurence.
I did the work on that a week ago so but if you need I can collect the 
information again. The error message was very similar the main differnece was 
the fact if failed on a different duplex setting.
I've also tried the e100 driver on a standard Intel network card (82559 based) 
on the same mainbord and it works fine.
I've also just emailed linux.nics.

Comment 3 Gernot 2001-05-19 18:09:18 UTC
Hi,
We seem to be having the same problem, this is what a colleague sent to 
linux.nics :

Hi,

We've got a bunch of servers in with the following configuration:

lspci -v 
00:00.0 Host bridge: Intel Corporation 82815 815 Chipset Host Bridge and Memory 
Controller Hub (rev 02)
        Flags: bus master, fast devsel, latency 0
        Capabilities: [88] #09 [f104]

00:02.0 VGA compatible controller: Intel Corporation 82815 CGC [Chipset 
Graphics Controller]  (rev 02) (prog-if 00 [VGA])
        Subsystem: Intel Corporation: Unknown device 4541
        Flags: bus master, 66Mhz, medium devsel, latency 0, IRQ 11
        Memory at f8000000 (32-bit, prefetchable) [size=64M]
        Memory at ffa80000 (32-bit, non-prefetchable) [size=512K]
        Capabilities: [dc] Power Management version 2

00:1e.0 PCI bridge: Intel Corporation 82820 820 (Camino 2) Chipset PCI (rev 01) 
(prog-if 00 [Normal decode])
        Flags: bus master, fast devsel, latency 0
        Bus: primary=00, secondary=01, subordinate=01, sec-latency=64
        I/O behind bridge: 0000d000-0000dfff
        Memory behind bridge: ff800000-ff8fffff
        Prefetchable memory behind bridge: f6a00000-f6afffff

00:1f.0 ISA bridge: Intel Corporation 82820 820 (Camino 2) Chipset ISA Bridge 
(ICH2) (rev 01)
        Flags: bus master, medium devsel, latency 0

00:1f.1 IDE interface: Intel Corporation 82820 820 (Camino 2) Chipset IDE U100 
(rev 01) (prog-if 80 [Master])
        Subsystem: Intel Corporation: Unknown device 4541
        Flags: bus master, medium devsel, latency 0
        I/O ports at ffa0 [size=16]

00:1f.2 USB Controller: Intel Corporation 82820 820 (Camino 2) Chipset USB (Hub 
A) (rev 01) (prog-if 00 [UHCI])
        Subsystem: Intel Corporation: Unknown device 4541
        Flags: bus master, medium devsel, latency 0, IRQ 11
        I/O ports at ef40 [size=32]

00:1f.3 SMBus: Intel Corporation 82820 820 (Camino 2) Chipset SMBus (rev 01)
        Subsystem: Intel Corporation: Unknown device 4541
        Flags: medium devsel, IRQ 9
        I/O ports at efa0 [size=16]

00:1f.4 USB Controller: Intel Corporation 82820 820 (Camino 2) Chipset USB (Hub 
B) (rev 01) (prog-if 00 [UHCI])
        Subsystem: Intel Corporation: Unknown device 4541
        Flags: bus master, medium devsel, latency 0, IRQ 10
        I/O ports at ef80 [size=32]

00:1f.5 Multimedia audio controller: Intel Corporation: Unknown device 2445 
(rev 01)
        Subsystem: Intel Corporation: Unknown device 4541
        Flags: bus master, medium devsel, latency 0, IRQ 9
        I/O ports at e800 [size=256]
        I/O ports at ef00 [size=64]

01:08.0 Ethernet controller: Intel Corporation 82820 820 (Camino 2) Chipset 
Ethernet (rev 01)
        Subsystem: Intel Corporation: Unknown device 3013
        Flags: bus master, medium devsel, latency 64, IRQ 11
        Memory at ff8ff000 (32-bit, non-prefetchable) [size=4K]
        I/O ports at df00 [size=64]
        Capabilities: [dc] Power Management version 2


--------
The problem is that the system (Redhat 7.1 with standard 2.4.2 RH kernel) 
completely hangs up when doing FTP transfers with the e100 1.5.6 driver which 
was shipped with RH as well as with the e100 1.6.5 driver which was installed 
by us after we have found out about these problems.
Unfortunately there are no error messages in any system log, it just stops 
logging after the FTP login message which was done successfully.
Afterwards the server has to be rebooted doing a hard reboot.

Now, I tried to run this with the eepro100 driver and now I finally got some 
error messages short after the system has gotten up:

May 19 09:34:36 catwalk kernel: eepro100.c:v1.09j-t 9/29/99 Donald Becker 
http://cesdis.gsfc.nasa.gov/linux/dri
vers/eepro100.html
May 19 09:34:36 catwalk kernel: eepro100.c: $Revision: 1.36 $ 2000/11/17 
Modified by Andrey V. Savochkin <saw@s
aw.sw.com.sg> and others
May 19 09:34:36 catwalk kernel: PCI: Found IRQ 11 for device 01:08.0
May 19 09:34:36 catwalk kernel: eth0: Intel Corporation 82820 820 (Camino 2) 
Chipset Ethernet, 00:D0:B7:E3:09:C
C, I/O at 0xdf00, IRQ 11.
May 19 09:34:36 catwalk kernel:   Board assembly 000000-000, Physical 
connectors present: RJ45
May 19 09:34:36 catwalk kernel:   Primary interface chip i82555 PHY #1.
May 19 09:34:36 catwalk keytable:
May 19 09:34:36 catwalk kernel:   General self-test: passed.
May 19 09:34:36 catwalk kernel:   Serial sub-system self-test: passed.
May 19 09:34:36 catwalk kernel:   Internal registers self-test: passed.
May 19 09:34:36 catwalk kernel:   ROM checksum self-test: passed (0x04f4518b).

The errors are starting a little while afterwards:

May 19 09:34:47 catwalk kernel: eepro100: wait_for_cmd_done timeout!
May 19 09:34:48 catwalk last message repeated 25 times
May 19 09:34:51 catwalk kernel: NETDEV WATCHDOG: eth0: transmit timed out
May 19 09:34:51 catwalk kernel: eth0: Transmit timed out: status 0050  0c80 at 
1223/1251 command 000c0000.
May 19 09:34:56 catwalk kernel: eepro100: wait_for_cmd_done timeout!
May 19 09:34:57 catwalk last message repeated 25 times
May 19 09:34:59 catwalk kernel: NETDEV WATCHDOG: eth0: transmit timed out
May 19 09:34:59 catwalk kernel: eth0: Transmit timed out: status 0050  0c80 at 
1566/1594 command 000c0000.
May 19 09:35:24 catwalk kernel: eepro100: wait_for_cmd_done timeout!
May 19 09:35:26 catwalk last message repeated 25 times
May 19 09:35:29 catwalk kernel: NETDEV WATCHDOG: eth0: transmit timed out
May 19 09:35:29 catwalk kernel: eth0: Transmit timed out: status 0050  0c80 at 
5314/5342 command 000c0000.
May 19 09:35:35 catwalk kernel: eepro100: wait_for_cmd_done timeout!
May 19 09:35:35 catwalk last message repeated 25 times
May 19 09:35:39 catwalk kernel: NETDEV WATCHDOG: eth0: transmit timed out
May 19 09:35:39 catwalk kernel: eth0: Transmit timed out: status 0050  0c80 at 
6083/6111 command 000c0000.
May 19 09:35:43 catwalk kernel: eepro100: wait_for_cmd_done timeout!
May 19 09:35:44 catwalk last message repeated 25 times
May 19 09:35:47 catwalk kernel: NETDEV WATCHDOG: eth0: transmit timed out
May 19 09:35:47 catwalk kernel: eth0: Transmit timed out: status 0050  0c80 at 
6407/6435 command 000c0000.
May 19 09:35:54 catwalk kernel: eepro100: wait_for_cmd_done timeout!
May 19 09:35:55 catwalk last message repeated 25 times
May 19 09:35:57 catwalk kernel: NETDEV WATCHDOG: eth0: transmit timed out
May 19 09:35:57 catwalk kernel: eth0: Transmit timed out: status 0050  0c80 at 
7163/7191 command 000c0000.
May 19 09:36:13 catwalk kernel: eepro100: wait_for_cmd_done timeout!
May 19 09:36:14 catwalk last message repeated 25 times
May 19 09:36:17 catwalk kernel: NETDEV WATCHDOG: eth0: transmit timed out
May 19 09:36:17 catwalk kernel: eth0: Transmit timed out: status 0050  0c80 at 
9752/9780 command 200c0000.
May 19 09:36:27 catwalk kernel: eepro100: wait_for_cmd_done timeout!
May 19 09:36:28 catwalk last message repeated 25 times
May 19 09:36:31 catwalk kernel: NETDEV WATCHDOG: eth0: transmit timed out
May 19 09:36:31 catwalk kernel: eth0: Transmit timed out: status 0050  0c80 at 
11412/11440 command 000c0000.
May 19 09:36:43 catwalk kernel: eepro100: wait_for_cmd_done timeout!
May 19 09:36:43 catwalk last message repeated 25 times
May 19 09:36:47 catwalk kernel: NETDEV WATCHDOG: eth0: transmit timed out
May 19 09:36:47 catwalk kernel: eth0: Transmit timed out: status 0050  0c80 at 
13311/13339 command 000c0000.
May 19 09:36:51 catwalk kernel: eepro100: wait_for_cmd_done timeout!
May 19 09:36:52 catwalk last message repeated 25 times
May 19 09:36:55 catwalk kernel: NETDEV WATCHDOG: eth0: transmit timed out
May 19 09:36:55 catwalk kernel: eth0: Transmit timed out: status 0050  0c80 at 
13632/13660 command 200c0000.


-----------------

I believe that the problems of both drivers are somehow related to each other 
and would like to ask if you are aware of this problem?
I know that eepro100 is not directly maintained by Intel but I believe e100 is 
having the same problems, the system just crashes before any error can be 
logged.

If you need further information about the system please let me know.

Thanks,
Michael

Comment 4 Arjan van de Ven 2001-05-19 18:27:34 UTC
Just to get one more datapoint: does this happen with the kernel in rawhide
(2.4.3-5) too ?
(this kernel does not have the 1.6.5 driver yet, the next version will)

Comment 5 Gernot 2001-05-19 21:27:59 UTC
I'm sorry, there is currently no spare system of this kind available to us 
which we could use rawhide kernels on and the customers on the servers affected 
are already, understandably, becoming a little nervous :)

However, I just saw that our problem is occuring in 10M Half Duplex mode as 
well, so this doesn't seem to be limited to 10M full duplex.

Just something I want to add:
On other servers running RH 7.0 with kernel 2.2.17 (I believe) and e100 1.5.5a 
we had the same or a very similar problem ( it was discussed on Intel's board 
as well: http://intelforums.com/cgi-bin/WebX.fcgi?
13^5@.eea54ad ) which was fixed by downgrading the e100 version 
to 1.3.20 shipped with the kernel-2.2.19-7.0.1 standard kernel. That server is 
running in 100M Full Duplex mode though and since then there has not been a 
single minute of downtime and it's been running well without any further 
occurence of this error.

Unfortunately I'm unable to install the e100 1.3.20 driver on the servers this 
problem's occuring now as I do not find that old version anywhere.

Anyway to get back to your question I don't believe that this rawhide kernel 
will work because the normal e100 driver shipped with RH 7.1 (1.5.6) doesn't 
work either, with exactly the same occurences.

I hope this helps a little.

Gernot

Comment 6 Need Real Name 2001-05-23 15:54:47 UTC
I was asked by issuppor.intel.com (Joe) to try making the following 
changes in e100.h and recompiling the module. 

change 
#define TX_FRAME_CNT   7
to
#define TX_FRAME_CNT   1

change
#define CFG_BYTE_PARM6         0x32
to
#define CFG_BYTE_PARM6         0x3a

This din't fix the problem.

I've also tried using 2.4.3-5 kernel (from rawhide) this has no effect on the 
e100 driver (at 10M FD) or on the eepro100 driver (at 10M HD).
for reference the eepro100 driver error message is
eepro100: wait_for_cmd_done timeout!

Comment 7 Need Real Name 2001-05-29 15:20:30 UTC
Two new things tried (both unsuccessfully) 

e100 driver version 1.6.6.
The new version of driver made no difference.

I also saw some postings about an old bug in the eepro100 driver with the same 
symptoms that could be fixed by turning multicast off using the following:
insmod eepro100 multicast_filter_limit=0
This made no difference as well.


Comment 8 K. Spoon 2001-07-09 21:58:10 UTC
Created attachment 23127 [details]
Turn off some options for e100 that might be causing perf loss

Comment 9 K. Spoon 2001-07-09 22:04:56 UTC
Greetings,

We were having the same problem over here, and I noticed that it was mainly on
non-Intel servers (AMD Athlon based machines).  The attached patch turns off the
CPU cycle saver feature, as that sounded kind of suspicous, and it seems to have
fixed both the duplex problems and a nasty NFS lockup that was occuring during
kickstarts.

I'd be interested to hear if it helps out (and if anyone with all Intel hardware
has been having the same problems).

And in case I flubbed the attachment, the patch is available at
http://spoonix.com/RS-e100_nfsfix.patch

K. Spoon

Comment 10 Need Real Name 2001-10-12 11:48:57 UTC
Has anyone tried to replicate fault with kernel-2.4.3-12 ?
Maria

Comment 11 Need Real Name 2001-11-06 10:18:11 UTC
The above mentioned symptoms fhould have been taken care of in the latest e100 driver,version 1.6.22 and 
up. Please try this driver, it should solve this issue.


Note You need to log in before you can comment on or make changes to this bug.