From Bugzilla Helper: User-Agent: Mozilla/4.0 (compatible; MSIE 5.5; Windows 98) Description of problem: Given mild load in 10Mb Full Duplex (config from swich and auto sensing OR mixed in modules.conf)when you load the system ftp a few files accross local network is enough. The driver will produce e100_wait_exec_cmd: Wait failed. scb cmd=0xf0 (The last number changes) How reproducible: Always Steps to Reproduce: 1. load the e100 driver in 10Mb Full duplex on a mainboard with the Intel 815 Integrated Ethernet such as the Aopen MX3S or the Intel D815EEA 2. ftp a few files accross network ( flood ping can also work) 3. ftp will hang move to another console window and ping any device on network and e100_wait_exec_cmd: Wait failed. scb cmd=0xf0 will appear Actual Results: Driver produces and e100_wait_exec_cmd: Wait failed. scb cmd=0xf0 /etc/rc.d/init.d/network restart will fix the problem for a few mins. It produces the error Multicast setup Failed STOPPING the network. Sometimes box locks up completely no keyboard activity. Expected Results: It should work? Additional info: Bug reproduced on driver version 1.5.6 as shipped in Redhat 7.1 kernel 2.4.2-2 Also fails on 1.6.5 current versionon intels Website System is fairly basic. ( tried Aopen MX3S and intel D815EEA ) Pent III 533 and celeron 733 tried 256MB ram (PC100) IDE disk. no other cards in the system. Switch is a Cisco 2924 XL
Could you try using the eepro100 driver instead ?
We had used that driver as well. It fails when you go 10M HALF duplex. A much more common occurence. I did the work on that a week ago so but if you need I can collect the information again. The error message was very similar the main differnece was the fact if failed on a different duplex setting. I've also tried the e100 driver on a standard Intel network card (82559 based) on the same mainbord and it works fine. I've also just emailed linux.nics.
Hi, We seem to be having the same problem, this is what a colleague sent to linux.nics : Hi, We've got a bunch of servers in with the following configuration: lspci -v 00:00.0 Host bridge: Intel Corporation 82815 815 Chipset Host Bridge and Memory Controller Hub (rev 02) Flags: bus master, fast devsel, latency 0 Capabilities: [88] #09 [f104] 00:02.0 VGA compatible controller: Intel Corporation 82815 CGC [Chipset Graphics Controller] (rev 02) (prog-if 00 [VGA]) Subsystem: Intel Corporation: Unknown device 4541 Flags: bus master, 66Mhz, medium devsel, latency 0, IRQ 11 Memory at f8000000 (32-bit, prefetchable) [size=64M] Memory at ffa80000 (32-bit, non-prefetchable) [size=512K] Capabilities: [dc] Power Management version 2 00:1e.0 PCI bridge: Intel Corporation 82820 820 (Camino 2) Chipset PCI (rev 01) (prog-if 00 [Normal decode]) Flags: bus master, fast devsel, latency 0 Bus: primary=00, secondary=01, subordinate=01, sec-latency=64 I/O behind bridge: 0000d000-0000dfff Memory behind bridge: ff800000-ff8fffff Prefetchable memory behind bridge: f6a00000-f6afffff 00:1f.0 ISA bridge: Intel Corporation 82820 820 (Camino 2) Chipset ISA Bridge (ICH2) (rev 01) Flags: bus master, medium devsel, latency 0 00:1f.1 IDE interface: Intel Corporation 82820 820 (Camino 2) Chipset IDE U100 (rev 01) (prog-if 80 [Master]) Subsystem: Intel Corporation: Unknown device 4541 Flags: bus master, medium devsel, latency 0 I/O ports at ffa0 [size=16] 00:1f.2 USB Controller: Intel Corporation 82820 820 (Camino 2) Chipset USB (Hub A) (rev 01) (prog-if 00 [UHCI]) Subsystem: Intel Corporation: Unknown device 4541 Flags: bus master, medium devsel, latency 0, IRQ 11 I/O ports at ef40 [size=32] 00:1f.3 SMBus: Intel Corporation 82820 820 (Camino 2) Chipset SMBus (rev 01) Subsystem: Intel Corporation: Unknown device 4541 Flags: medium devsel, IRQ 9 I/O ports at efa0 [size=16] 00:1f.4 USB Controller: Intel Corporation 82820 820 (Camino 2) Chipset USB (Hub B) (rev 01) (prog-if 00 [UHCI]) Subsystem: Intel Corporation: Unknown device 4541 Flags: bus master, medium devsel, latency 0, IRQ 10 I/O ports at ef80 [size=32] 00:1f.5 Multimedia audio controller: Intel Corporation: Unknown device 2445 (rev 01) Subsystem: Intel Corporation: Unknown device 4541 Flags: bus master, medium devsel, latency 0, IRQ 9 I/O ports at e800 [size=256] I/O ports at ef00 [size=64] 01:08.0 Ethernet controller: Intel Corporation 82820 820 (Camino 2) Chipset Ethernet (rev 01) Subsystem: Intel Corporation: Unknown device 3013 Flags: bus master, medium devsel, latency 64, IRQ 11 Memory at ff8ff000 (32-bit, non-prefetchable) [size=4K] I/O ports at df00 [size=64] Capabilities: [dc] Power Management version 2 -------- The problem is that the system (Redhat 7.1 with standard 2.4.2 RH kernel) completely hangs up when doing FTP transfers with the e100 1.5.6 driver which was shipped with RH as well as with the e100 1.6.5 driver which was installed by us after we have found out about these problems. Unfortunately there are no error messages in any system log, it just stops logging after the FTP login message which was done successfully. Afterwards the server has to be rebooted doing a hard reboot. Now, I tried to run this with the eepro100 driver and now I finally got some error messages short after the system has gotten up: May 19 09:34:36 catwalk kernel: eepro100.c:v1.09j-t 9/29/99 Donald Becker http://cesdis.gsfc.nasa.gov/linux/dri vers/eepro100.html May 19 09:34:36 catwalk kernel: eepro100.c: $Revision: 1.36 $ 2000/11/17 Modified by Andrey V. Savochkin <saw@s aw.sw.com.sg> and others May 19 09:34:36 catwalk kernel: PCI: Found IRQ 11 for device 01:08.0 May 19 09:34:36 catwalk kernel: eth0: Intel Corporation 82820 820 (Camino 2) Chipset Ethernet, 00:D0:B7:E3:09:C C, I/O at 0xdf00, IRQ 11. May 19 09:34:36 catwalk kernel: Board assembly 000000-000, Physical connectors present: RJ45 May 19 09:34:36 catwalk kernel: Primary interface chip i82555 PHY #1. May 19 09:34:36 catwalk keytable: May 19 09:34:36 catwalk kernel: General self-test: passed. May 19 09:34:36 catwalk kernel: Serial sub-system self-test: passed. May 19 09:34:36 catwalk kernel: Internal registers self-test: passed. May 19 09:34:36 catwalk kernel: ROM checksum self-test: passed (0x04f4518b). The errors are starting a little while afterwards: May 19 09:34:47 catwalk kernel: eepro100: wait_for_cmd_done timeout! May 19 09:34:48 catwalk last message repeated 25 times May 19 09:34:51 catwalk kernel: NETDEV WATCHDOG: eth0: transmit timed out May 19 09:34:51 catwalk kernel: eth0: Transmit timed out: status 0050 0c80 at 1223/1251 command 000c0000. May 19 09:34:56 catwalk kernel: eepro100: wait_for_cmd_done timeout! May 19 09:34:57 catwalk last message repeated 25 times May 19 09:34:59 catwalk kernel: NETDEV WATCHDOG: eth0: transmit timed out May 19 09:34:59 catwalk kernel: eth0: Transmit timed out: status 0050 0c80 at 1566/1594 command 000c0000. May 19 09:35:24 catwalk kernel: eepro100: wait_for_cmd_done timeout! May 19 09:35:26 catwalk last message repeated 25 times May 19 09:35:29 catwalk kernel: NETDEV WATCHDOG: eth0: transmit timed out May 19 09:35:29 catwalk kernel: eth0: Transmit timed out: status 0050 0c80 at 5314/5342 command 000c0000. May 19 09:35:35 catwalk kernel: eepro100: wait_for_cmd_done timeout! May 19 09:35:35 catwalk last message repeated 25 times May 19 09:35:39 catwalk kernel: NETDEV WATCHDOG: eth0: transmit timed out May 19 09:35:39 catwalk kernel: eth0: Transmit timed out: status 0050 0c80 at 6083/6111 command 000c0000. May 19 09:35:43 catwalk kernel: eepro100: wait_for_cmd_done timeout! May 19 09:35:44 catwalk last message repeated 25 times May 19 09:35:47 catwalk kernel: NETDEV WATCHDOG: eth0: transmit timed out May 19 09:35:47 catwalk kernel: eth0: Transmit timed out: status 0050 0c80 at 6407/6435 command 000c0000. May 19 09:35:54 catwalk kernel: eepro100: wait_for_cmd_done timeout! May 19 09:35:55 catwalk last message repeated 25 times May 19 09:35:57 catwalk kernel: NETDEV WATCHDOG: eth0: transmit timed out May 19 09:35:57 catwalk kernel: eth0: Transmit timed out: status 0050 0c80 at 7163/7191 command 000c0000. May 19 09:36:13 catwalk kernel: eepro100: wait_for_cmd_done timeout! May 19 09:36:14 catwalk last message repeated 25 times May 19 09:36:17 catwalk kernel: NETDEV WATCHDOG: eth0: transmit timed out May 19 09:36:17 catwalk kernel: eth0: Transmit timed out: status 0050 0c80 at 9752/9780 command 200c0000. May 19 09:36:27 catwalk kernel: eepro100: wait_for_cmd_done timeout! May 19 09:36:28 catwalk last message repeated 25 times May 19 09:36:31 catwalk kernel: NETDEV WATCHDOG: eth0: transmit timed out May 19 09:36:31 catwalk kernel: eth0: Transmit timed out: status 0050 0c80 at 11412/11440 command 000c0000. May 19 09:36:43 catwalk kernel: eepro100: wait_for_cmd_done timeout! May 19 09:36:43 catwalk last message repeated 25 times May 19 09:36:47 catwalk kernel: NETDEV WATCHDOG: eth0: transmit timed out May 19 09:36:47 catwalk kernel: eth0: Transmit timed out: status 0050 0c80 at 13311/13339 command 000c0000. May 19 09:36:51 catwalk kernel: eepro100: wait_for_cmd_done timeout! May 19 09:36:52 catwalk last message repeated 25 times May 19 09:36:55 catwalk kernel: NETDEV WATCHDOG: eth0: transmit timed out May 19 09:36:55 catwalk kernel: eth0: Transmit timed out: status 0050 0c80 at 13632/13660 command 200c0000. ----------------- I believe that the problems of both drivers are somehow related to each other and would like to ask if you are aware of this problem? I know that eepro100 is not directly maintained by Intel but I believe e100 is having the same problems, the system just crashes before any error can be logged. If you need further information about the system please let me know. Thanks, Michael
Just to get one more datapoint: does this happen with the kernel in rawhide (2.4.3-5) too ? (this kernel does not have the 1.6.5 driver yet, the next version will)
I'm sorry, there is currently no spare system of this kind available to us which we could use rawhide kernels on and the customers on the servers affected are already, understandably, becoming a little nervous :) However, I just saw that our problem is occuring in 10M Half Duplex mode as well, so this doesn't seem to be limited to 10M full duplex. Just something I want to add: On other servers running RH 7.0 with kernel 2.2.17 (I believe) and e100 1.5.5a we had the same or a very similar problem ( it was discussed on Intel's board as well: http://intelforums.com/cgi-bin/WebX.fcgi? 13^5@.eea54ad ) which was fixed by downgrading the e100 version to 1.3.20 shipped with the kernel-2.2.19-7.0.1 standard kernel. That server is running in 100M Full Duplex mode though and since then there has not been a single minute of downtime and it's been running well without any further occurence of this error. Unfortunately I'm unable to install the e100 1.3.20 driver on the servers this problem's occuring now as I do not find that old version anywhere. Anyway to get back to your question I don't believe that this rawhide kernel will work because the normal e100 driver shipped with RH 7.1 (1.5.6) doesn't work either, with exactly the same occurences. I hope this helps a little. Gernot
I was asked by issuppor.intel.com (Joe) to try making the following changes in e100.h and recompiling the module. change #define TX_FRAME_CNT 7 to #define TX_FRAME_CNT 1 change #define CFG_BYTE_PARM6 0x32 to #define CFG_BYTE_PARM6 0x3a This din't fix the problem. I've also tried using 2.4.3-5 kernel (from rawhide) this has no effect on the e100 driver (at 10M FD) or on the eepro100 driver (at 10M HD). for reference the eepro100 driver error message is eepro100: wait_for_cmd_done timeout!
Two new things tried (both unsuccessfully) e100 driver version 1.6.6. The new version of driver made no difference. I also saw some postings about an old bug in the eepro100 driver with the same symptoms that could be fixed by turning multicast off using the following: insmod eepro100 multicast_filter_limit=0 This made no difference as well.
Created attachment 23127 [details] Turn off some options for e100 that might be causing perf loss
Greetings, We were having the same problem over here, and I noticed that it was mainly on non-Intel servers (AMD Athlon based machines). The attached patch turns off the CPU cycle saver feature, as that sounded kind of suspicous, and it seems to have fixed both the duplex problems and a nasty NFS lockup that was occuring during kickstarts. I'd be interested to hear if it helps out (and if anyone with all Intel hardware has been having the same problems). And in case I flubbed the attachment, the patch is available at http://spoonix.com/RS-e100_nfsfix.patch K. Spoon
Has anyone tried to replicate fault with kernel-2.4.3-12 ? Maria
The above mentioned symptoms fhould have been taken care of in the latest e100 driver,version 1.6.22 and up. Please try this driver, it should solve this issue.