Bug 67966

Summary: (NET 3C59X) Transmit timeouts on eth0 on Dell Inspiron 3500 laptop
Product: [Retired] Red Hat Linux Reporter: Diego Novillo <dnovillo>
Component: kernelAssignee: Jeff Garzik <jgarzik>
Status: CLOSED CURRENTRELEASE QA Contact: Brian Brock <bbrock>
Severity: medium Docs Contact:
Priority: medium    
Version: 7.3CC: gowdy, peterm
Target Milestone: ---   
Target Release: ---   
Hardware: i686   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2004-09-30 15:39:44 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Diego Novillo 2002-07-04 21:27:35 UTC
From Bugzilla Helper:
User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:0.9.9) Gecko/20020513

Description of problem:
This laptop has a 3c575 PCMCIA ethernet adaptor connected to the network using
dhcpd.  It usually works fine, but every now and then I get transmission
timeouts and I get disconnected from the network.

Since I'm not always inside the network, I have configured eth0 to not come up
automatically.  Instead, I bring it up using 'ifup eth0'.

Version-Release number of selected component (if applicable):


How reproducible:
Sometimes

Steps to Reproduce:
1. Configure eth0 to not come up automatically.
2. Configure eth0 as a DHCP connection.
3. Bring it up with ifup eth0.
	

Actual Results:  The device comes up, but every now and then it gets a
transmission timeout that shows up in /var/log/messages as follows:

Jul  4 15:24:33 frodo kernel: NETDEV WATCHDOG: eth0: transmit timed out
Jul  4 15:24:33 frodo kernel: eth0: transmit timed out, tx_status 00 status e601.
Jul  4 15:24:33 frodo kernel:   diagnostics: net 0cc2 media a800 dma 0000003a
fifo 8000
Jul  4 15:24:33 frodo kernel: eth0: Interrupt posted but not delivered -- IRQ
blocked by another device?
Jul  4 15:24:33 frodo kernel:   Flags; bus-master 1, dirty 10870(6) current 10870(6)
Jul  4 15:24:33 frodo kernel:   Transmit list 00000000 vs. c0379380.
Jul  4 15:24:33 frodo kernel:   0: @c0379200  length 8000002a status 0001002a
Jul  4 15:24:33 frodo kernel:   1: @c0379240  length 8000002a status 0001002a
Jul  4 15:24:33 frodo kernel:   2: @c0379280  length 8000002a status 0001002a
Jul  4 15:24:33 frodo kernel:   3: @c03792c0  length 8000002a status 0001002a
Jul  4 15:24:33 frodo kernel:   4: @c0379300  length 8000002a status 8001002a
Jul  4 15:24:33 frodo kernel:   5: @c0379340  length 8000002a status 8001002a
Jul  4 15:24:33 frodo kernel:   6: @c0379380  length 80000042 status 00010042
Jul  4 15:24:33 frodo kernel:   7: @c03793c0  length 80000042 status 00010042
Jul  4 15:24:33 frodo kernel:   8: @c0379400  length 8000004a status 0001004a
Jul  4 15:24:33 frodo kernel:   9: @c0379440  length 8000002a status 0001002a
Jul  4 15:24:33 frodo kernel:   10: @c0379480  length 8000004a status 0001004a
Jul  4 15:24:33 frodo kernel:   11: @c03794c0  length 8000002a status 0001002a
Jul  4 15:24:33 frodo kernel:   12: @c0379500  length 8000002a status 0001002a
Jul  4 15:24:33 frodo kernel:   13: @c0379540  length 8000002a status 0001002a
Jul  4 15:24:33 frodo kernel:   14: @c0379580  length 8000002a status 0001002a
Jul  4 15:24:33 frodo kernel:   15: @c03795c0  length 8000002a status 0001002a


Additional info:

$ lspci

00:00.0 Host bridge: Intel Corp. 440BX/ZX/DX - 82443BX/ZX/DX Host bridge (rev 03)
00:01.0 PCI bridge: Intel Corp. 440BX/ZX/DX - 82443BX/ZX/DX AGP bridge (rev 03)
00:04.0 CardBus bridge: Texas Instruments PCI1220 (rev 02)
00:04.1 CardBus bridge: Texas Instruments PCI1220 (rev 02)
00:07.0 Bridge: Intel Corp. 82371AB/EB/MB PIIX4 ISA (rev 02)
00:07.1 IDE interface: Intel Corp. 82371AB/EB/MB PIIX4 IDE (rev 01)
00:07.2 USB Controller: Intel Corp. 82371AB/EB/MB PIIX4 USB (rev 01)
00:07.3 Bridge: Intel Corp. 82371AB/EB/MB PIIX4 ACPI (rev 02)
01:00.0 VGA compatible controller: Neomagic Corporation [MagicMedia 256AV] (rev 20)
01:00.1 Multimedia audio controller: Neomagic Corporation [MagicMedia 256AV
Audio] (rev 20)
06:00.0 Ethernet controller: 3Com Corporation 3c575 [Megahertz] 10/100 LAN
CardBus (rev 01)

$ lsmod
Module                  Size  Used by    Tainted: P  
ad1848                 25664   0 (autoclean) (unused)
sound                  71692   0 (autoclean) [ad1848]
soundcore               6500   2 (autoclean) [sound]
cisco_ipsec           364128   0
autofs                 11972   0 (autoclean) (unused)
3c59x                  28680   1
ds                      8608   2
yenta_socket           12384   2
pcmcia_core            50752   0 [ds yenta_socket]
ipchains               43464   9
ide-cd                 30272   0 (autoclean)
cdrom                  32032   0 (autoclean) [ide-cd]
usb-uhci               24484   0 (unused)
usbcore                71904   1 [usb-uhci]
ext3                   67328   2
jbd                    49496   2 [ext3]


$ lsdev
Device            DMA   IRQ  I/O Ports
------------------------------------------------
06:00.0                      4800-487f
cascade             4     2 
dma                          0080-008f
dma1                         0000-001f
dma2                         00c0-00df
eth0                     10 
fpu                          00f0-00ff
ide0                     14  01f0-01f7 03f6-03f6 fcd0-fcd7
ide1                     15  0170-0177 0376-0376 fcd8-fcdf
Intel                        2180-219f 8000-803f fcd0-fcdf fce0-fcff
keyboard                  1  0060-006f
Mouse                    12 
PCI                          0cf8-0cff 4000-40ff 4400-44ff 4800-48ff 4800-487f
4c00-4cff
pic1                         0020-003f
pic2                         00a0-00bf
rtc                       8  0070-007f
serial                    4  03f8-03ff
timer                     0  0040-005f
usb-uhci                 11  fce0-fcff
vga+                         03c0-03df

Comment 1 Arjan van de Ven 2002-07-05 09:53:23 UTC
>$ lsmod
>Module                  Size  Used by    Tainted: P  
>cisco_ipsec           364128   0

Does this still happen without binary only modules loaded ?

Comment 2 Diego Novillo 2002-07-08 18:33:18 UTC
Yup.  I removed the cisco module, restarted the network and 2 hours later I got
the exact same timeout.


Comment 3 Diego Novillo 2002-07-10 13:48:33 UTC
Some more data that may be relevant.  I got a new error today that froze
evolution's IMAP connections:

Jul 10 09:35:03 frodo dhcpcd[1139]: sendto: Socket operation on non-socket 
Jul 10 09:35:03 frodo dhcpcd[1139]: sendto: Socket operation on non-socket 
Jul 10 09:35:03 frodo dhcpcd[1139]: dhcpStop: ioctl SIOCSIFADDR: Inappropriate
ioctl for device 
Jul 10 09:35:03 frodo dhcpcd[1139]: dhcpStop: ioctl SIOCSIFFLAGS: Inappropriate
ioctl for device 


The strange thing is that this only happens when I have the laptop plugged in
the office (ie, no VPN modules loaded).  This never happens at home, where I'm
always connected via the VPN running on top of my ADSL provider.

This new error doesn't seem to kill dhcpd, it merely confuses clients that have
open connections.

Comment 4 Arjan van de Ven 2002-07-10 13:51:39 UTC
I get that too once in a while; it actually is a bug in the dhcp server where it
gives you a new, different!! ip while you're still using the old one....

Comment 5 Arjan van de Ven 2002-07-10 13:53:06 UTC
As for the 3c59x bug: please try the kernel at

http://people.redhat.com/arjanv/testkernels

it has a nasty 3c59x bug fixed...

Comment 6 Diego Novillo 2002-07-10 14:33:20 UTC
The new kernel failed again, unfortunately.  Five minutes after rebooting the
system.  I was downloading the SRPM for the kernel when it happened.  Strangely,
the download kept going for a while longer and then froze with a new 'NETDEV
WATCHDOG: eth0: transmit timed out' message.

I can consistently bring the connection down by trying to download the SRPM from
your webpage.  The only difference between my home and office network is that in
the office I'm on a 100Mbit network.

Could this be a hardware problem?  But this never happened with 7.2 nor when the
machine was running Windows 2000.

Comment 7 Jason Merrill 2002-07-27 09:32:45 UTC
Arjan, I'm also seeing the dhcpcd problem on my Dell Latitude C600, with the
mini-PCI 3c556 ethernet

00:10.0 Ethernet controller: 3Com Corporation 3c556 Hurricane CardBus (rev 10)

which also uses the 3c59x driver.  I don't think it's a server issue; my DHCP
server hasn't changed since my laptop was running 7.2, which worked fine.  I
have a bit more information in my /var/log/messages, perhaps because I'm
invoking dhcpcd with -d:

Jul 27 03:12:01 prospero dhcpcd[1884]: sendto: Socket operation on non-socket 
Jul 27 03:12:01 prospero dhcpcd[1884]: sendto: Socket operation on non-socket 
Jul 27 03:12:01 prospero dhcpcd[1884]: dhcpStop: ioctl SIOCSIFADDR:
Inappropriate ioctl for device 
Jul 27 03:12:01 prospero dhcpcd[1884]: dhcpStop: ioctl SIOCSIFFLAGS:
Inappropriate ioctl for device 
Jul 27 03:12:01 prospero dhcpcd: MAC address = 00:01:03:82:da:a0 dhcpcd: your IP
address = 192.168.64.100 
Jul 27 03:12:03 prospero dhcpcd: MAC address = 00:01:03:82:da:a0 dhcpcd: your IP
address = 192.168.64.100 dhcpcd: MAC address =
 00:01:03:82:da:a0 dhcpcd: your IP address = 192.168.64.101 
Jul 27 03:12:04 prospero logger: upping...

Where .100 was my old IP address, and .101 is a new and different IP address. 
"upping" is my ifup-local script telling me it's been invoked.  Why would dhcp
get an IP address three times?

Anyway, I'm going to try your test kernel and see if it works better for me.


Comment 8 Jason Merrill 2002-07-27 21:50:54 UTC
Nope, I still get the dhcp timeout with your 2.4.18-5e kernel.  To clarify, I'm
not seeing Diego's other problem.


Comment 9 Stephen J. Gowdy 2003-10-03 13:23:29 UTC
For a long time I've seen the DHCP problem. Eventually I noticed that I had two
dhcpcd processes running on my Dell Latitude C400 processes after a suspend/resume
cycle. This only happens at work and I'm using the TruMobile 1150 802.11b card
both at work and home. At home I've configured by DHCP server to always give my
MAC address the same IP number so perhaps that is why I don't see this at home.
If I killed both processes and did "ifup eth0" things were happy.

Anyway, in reading other bug reports I concluded that starting dhcp somehow
depends on RH modifications to the hotplug scripts. I had been using the latest
version from linux-hotplug.sf.net rather than the RH ones. So earlier this week
I reverted to the RH version and now I no longer get two processes starting (or
one not exiting, I never actually tested that). However, now I get the same old
problem with my IP address changing while running. I have the same messages in
my logs;

Oct  3 05:20:07 antonia apmd[827]: Normal Resume after 00:14:54 (100% 2:00) AC power
Oct  3 05:20:18 antonia dhcpcd[24376]: dhcpStart: ioctl SIOCSIFFLAGS: Device or
resource busy 
Oct  3 05:20:28 antonia cardmgr[936]: executing: './network check eth0'
Oct  3 05:20:28 antonia cardmgr[936]: executing: './network stop eth0'
Oct  3 05:20:29 antonia cardmgr[936]: executing: 'modprobe -r orinoco_cs'
Oct  3 05:20:29 antonia /etc/hotplug/net.agent: NET unregister event not supported
Oct  3 05:20:33 antonia cardmgr[936]: socket 1: Intersil PRISM2 11 Mbps Wireless
Adapter
Oct  3 05:20:33 antonia cardmgr[936]: executing: 'modprobe orinoco_cs'
Oct  3 05:20:33 antonia cardmgr[936]: executing: './network start eth0'
Oct  3 05:20:33 antonia /etc/hotplug/net.agent: invoke ifup eth0
Oct  3 05:20:33 antonia kernel: eth0: New link status: Connected (0001)
Oct  3 05:20:37 antonia dhcpcd[24524]: DHCP_NAK server response received:
requested address not available 
Oct  3 05:51:01 antonia dhcpcd[24528]: sendto: Socket operation on non-socket 
Oct  3 05:51:01 antonia dhcpcd[24528]: sendto: Socket operation on non-socket 
Oct  3 05:51:01 antonia dhcpcd[24528]: dhcpStop: ioctl SIOCSIFADDR:
Inappropriate ioctl for device 
Oct  3 05:51:01 antonia dhcpcd[24528]: dhcpStop: ioctl SIOCSIFFLAGS:
Inappropriate ioctl for device 

So at 05:30:37 I received one IP address and then for some reason at 05:51:01
I got a new one;

bash-2.05a# ls -l dhcpcd-eth0.info*
-rw-r--r--    1 root     root          501 Oct  3 05:51 dhcpcd-eth0.info
-rw-r--r--    1 root     root          501 Oct  3 05:20 dhcpcd-eth0.info.old

As you can see the process ID of the dhcpcd is the same now.

At least with the linux-hotplug.sf.net version of the hotplug scripts I had a
(not very nice) workaround. Perhaps I should go back to that. I see if the same
problem occurs later today (I usually run a lot of x-windows from other machines
so if my IP number changes I could loose a lot of time).

Not sure if this is a kernel issue though... I'm running 2.4.20-20.7 and dhcpcd
is 1.3.22pl1-7.

Comment 10 Jason Merrill 2003-10-03 15:23:49 UTC
FYI, I don't think I've ever had this problem under RHL9.

Comment 11 Bugzilla owner 2004-09-30 15:39:44 UTC
Thanks for the bug report. However, Red Hat no longer maintains this version of
the product. Please upgrade to the latest version and open a new bug if the problem
persists.

The Fedora Legacy project (http://fedoralegacy.org/) maintains some older releases, 
and if you believe this bug is interesting to them, please report the problem in
the bug tracker at: http://bugzilla.fedora.us/