130623 – Update to kernel-2.6.8-1.521 makes ethernet (8139too) to seize up

Bug 130623 - Update to kernel-2.6.8-1.521 makes ethernet (8139too) to seize up

Summary: Update to kernel-2.6.8-1.521 makes ethernet (8139too) to seize up

Keywords:
Status:	CLOSED CURRENTRELEASE
Alias:	None
Product:	Fedora
Classification:	Fedora
Component:	kernel
Sub Component:
Version:	2
Hardware:	All
OS:	Linux
Priority:	medium
Severity:	medium
Target Milestone:	---
Assignee:	Dave Jones
QA Contact:	Brian Brock
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2004-08-22 23:33 UTC by Michal Jaegermann
Modified:	2015-01-04 22:09 UTC (History)
CC List:	3 users (show)
Fixed In Version:
Clone Of:
Environment:
Last Closed:	2004-11-28 08:30:07 UTC
Type:	---
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Description Michal Jaegermann 2004-08-22 23:33:37 UTC

Description of problem:

After an update to 2.6.8-1.521 on Acer Travelmate laptop with
built-in "RTL-8139/8139C/8139C+" ethernet a network traffic
stops after a while.  This apparently requires a more sizeable
transfer and just moving in some 15M binaries of an old kernel
luckily is not enough to lock up ethernet. :-)

There is not much which is visible on a machine, I am afraid.
'mii-tool' reports "eth0: autonegotiation failed, link ok",
which is a normal state, and no amount of usual tricks, like 
'mii-tool -R' or taking a link down, unloading a module and
trying to restart eth0 helps.  Everything works, up to the point,
but no packets are moving on a wire until a machine is rebooted.

Logs show repeated messages:
NETDEV WATCHDOG: eth0: transmit timed out
eth0: link up, 10Mbps, half-duplex, lpa 0x0000

There is some other clue that '8139too' may be not responsible
for that because when this happens then a USB mouse
("Bus 002 Device 004: ID 046d:c00e Logitech, Inc. Optical Mouse")
which happens to be connected to that laptop stops working too
and nothing cat revive it save a reboot.

As mentioned before this does not happen with a "normal" amount
of a network traffic but attempts to do a backup by rsyncing to
another machine killed networking, and a mouse, twice in a row.
Only after backin off to kernel-2.6.7-1.494.2.2 I was able to
perform that task.  To give an idea about a scale this is
a message from rsync on a completion:

wrote 570694893 bytes  read 3025682 bytes  405026.88 bytes/sec
total size is 4596947042  speedup is 8.01

An interrupt structre with 2.6.7-1.494.2.2 looks like that:

# cat /proc/interrupts 
           CPU0       
  0:    3182848          XT-PIC  timer
  1:        182          XT-PIC  i8042
  2:          0          XT-PIC  cascade
  8:          1          XT-PIC  rtc
  9:          4          XT-PIC  acpi
 10:          5          XT-PIC  uhci_hcd, yenta, Intel 82801DB-ICH4
 11:     709722          XT-PIC  uhci_hcd, uhci_hcd, eth0
 12:       2588          XT-PIC  i8042
 14:     225937          XT-PIC  ide0
 15:        921          XT-PIC  ide1
NMI:          0 
ERR:          0

Version-Release number of selected component (if applicable):
kernel-2.6.8-1.521

How reproducible:
See above.

Comment 1 Gabriel M. Elder 2004-09-28 19:07:45 UTC

One of my machines is having an almost identical problem. The network
interface works fine until i log out of X (and return to the GDM
welcome screen). I not sure how or if these two events are related,
but the NIC seizure is happening consistently when, and (so far) only
when, i log out of GNOME.

This seems to not be isolated to the 2.6.8-1.521 kernel. I had the
same problem with the 2.6.7-1.494.2.2 kernel. I am using the 8139too
driver.

For this network card, lspci -v shows:
02:07.0 Ethernet controller: Realtek Semiconductor Co., Ltd.
RTL-8139/8139C/8139C+ (rev 10)
        Subsystem: Realtek Semiconductor Co., Ltd. RT8139
        Flags: bus master, medium devsel, latency 64, IRQ 3
        I/O ports at dc00 [size=ff800000]
        Memory at ff7ffc00 (32-bit, non-prefetchable) [size=256]
        Expansion ROM at 00010000 [disabled]
        Capabilities: [50] Power Management version 2

cat /proc/pci shows for this device:
  Bus  2, device   7, function  0:
    Class 0200: PCI device 10ec:8139 (rev 16).
      IRQ 3.
      Master Capable.  Latency=64.  Min Gnt=32.Max Lat=64.
      I/O at 0xdc00 [0xdcff].
      Non-prefetchable 32 bit memory at 0xff7ffc00 [0xff7ffcff].

rmmodding the driver and reloading doesn't help. service network
restart hangs when it tries to bring the interface back up. Only
rebooting makes the interface function correctly again. If i get a
chance, i may look at the kernel driver source. I hope other people
are checking into this as well, as any assistance is much appreciated
at this point.

Comment 2 Gabriel M. Elder 2004-10-01 04:03:25 UTC

Some additional info here. I now suspect that my first post for this
bug should be a separate bug. I dunno, my apologies if that's the case.

The problem i submitted is more of a hardware issue than a Linux
issue, stemming primarily from an IRQ resource conflict between the
NIC and the video controller. The system is a Dell Dimension 4300S.
The BIOS (rather foolishly) INSISTS on binding the video IRQ to the
NIC, so that changing one always changes the other to be the same,
despite there being a few other available IRQs. Seems kinda stupid to
me. At any rate, the result of this fact was that no matter what NIC i
installed on the m'board, it would still always freeze when i logged
out of GNOME/X. Since i wasn't able to alter the interrupts to my
satisfaction in the BIOS, and wasn't sure if there was a way to select
an interrupt for either PCI device from within the OS (using pcitools
or something), i found a suitable workaround by editing the
/etc/X11/gdm/gdm.conf to AlwaysRestartServer=true. This seemed to
prevent the NIC from freezing as a result of the shared IRQ
contention. (Some kind of race condition that this solution helps avoid?)

I would point out that the originator of this bug is also suffering
from an IRQ resource conflict:

11:     709722          XT-PIC  uhci_hcd, uhci_hcd, eth0

and so his problem might be avoided if the devices in question could
be set to use different IRQs, maybe in the BIOS.

The problem i was having would happen regardless of what IRQ the two
devices were using, because, thanks to BIOS restrictions, they were
still always using the same IRQ (!). It'd be nice if the BIOS were
open sourced...

Perhaps there are some kernel-level changes that could be thought of
in regards to PCI IRQ sharing and contention?

Comment 3 Michal Jaegermann 2004-10-01 04:18:13 UTC

> and so his problem might be avoided if the devices in question could
> be set to use different IRQs, maybe in the BIOS.

I wish.  This is a laptop an what can be done in BIOS is, ahem,
not that overwhelming.  Moreover this is likely ACPI which assigns
those interrupts and without ACPI that laptop does not work too well.
In any case I did not observe the problem with kernels older than
2.6.8-1.521

Comment 4 Dave Jones 2004-11-27 20:27:35 UTC

mass update for old bugs:

Is this still a problem with the 2.6.9 based update kernel ?

Comment 5 Michal Jaegermann 2004-11-28 01:30:08 UTC

Very recent update of FC2 kernel to 2.6.9-1.6_FC2 apparently
solved that problem for me.  I did not try too many times
but so far backups over a network were successful.

Comment 6 Warren Togami 2004-11-28 08:30:07 UTC

REOPEN if this issue comes back.

Note You need to log in before you can comment on or make changes to this bug.