538920 – r8169 netdev timeout when aspm is enabled

Bug 538920 - r8169 netdev timeout when aspm is enabled

Summary: r8169 netdev timeout when aspm is enabled

Keywords:
Status:	CLOSED WONTFIX
Alias:	None
Product:	Fedora
Classification:	Fedora
Component:	kernel
Sub Component:
Version:	13
Hardware:	x86_64
OS:	Linux
Priority:	low
Severity:	urgent
Target Milestone:	---
Assignee:	Kyle McMartin
QA Contact:	Fedora Extras Quality Assurance
Docs Contact:
URL:
Whiteboard:
Duplicates (3):	547517 555154 576058 (view as bug list)
Depends On:
Blocks:	619880 627569
TreeView+	depends on / blocked

Reported:	2009-11-19 16:02 UTC by Igor
Modified:	2015-09-01 03:53 UTC (History)
CC List:	55 users (show)
Fixed In Version:
Clone Of:
Clones:	619880 627569 (view as bug list)
Environment:
Last Closed:	2011-06-27 14:33:05 UTC
Type:	---
Embargoed:
Dependent Products:

Attachments	(Terms of Use)
eth0 stops working (2.27 KB, text/plain) 2009-12-14 16:09 UTC, Alan Altmann	no flags	Details
call trace of rtl8111/8168 failure (r8169 driver) (2.80 KB, text/plain) 2009-12-18 11:39 UTC, Eric Smith	no flags	Details
Output of lspci -vn (6.39 KB, text/plain) 2010-07-27 21:09 UTC, Adrian	no flags	Details
Output of acpidump (135.83 KB, text/plain) 2010-07-27 21:11 UTC, Adrian	no flags	Details
Output of lspci -vn (6.94 KB, text/plain) 2010-07-27 21:14 UTC, Richard Körber	no flags	Details
Output of acpidump (155.30 KB, text/plain) 2010-07-27 21:15 UTC, Richard Körber	no flags	Details
Output of lspci -vvvn (17.95 KB, text/plain) 2010-08-05 14:31 UTC, Adrian	no flags	Details
Output of lspci -vvvn (11.96 KB, text/plain) 2010-08-05 14:46 UTC, Richard Körber	no flags	Details
Ouput of lspci -vvvnn (26.02 KB, text/plain) 2010-08-28 23:24 UTC, Pavel Holica	no flags	Details
Output of acpidump (134.52 KB, text/plain) 2010-08-28 23:34 UTC, Pavel Holica	no flags	Details
/var/log/messages (113.71 KB, text/plain) 2010-10-03 21:28 UTC, Frantisek Hanzlik	no flags	Details
Show Obsolete (2) View All

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Linux Kernel	15232	0	None	None	None	2019-04-02 03:36:38 UTC

Internal Links: 548860

Description Igor 2009-11-19 16:02:52 UTC

Description of problem:

r8169 kernel module randomly stop working during normal Desktop work

Version-Release number of selected component (if applicable):

2.6.31.5-127.fc12.x86_64

Error:

WARNING: at net/sched/sch_generic.c:246 dev_watchdog+0xf3/0x164() (Not tainted)
Hardware name: EX58-UD3R
NETDEV WATCHDOG: eth0 (r8169): transmit queue 0 timed out
Modules linked in: fuse bridge stp llc ipv6 cpufreq_ondemand acpi_cpufreq freq_table dm_multipath xts gf128mul kvm_intel kvm uinput snd_hda_codec_realtek snd_hda_intel snd_usb_audio snd_usb_lib snd_hda_codec snd_rawmidi snd_hwdep gspca_pac7311 gspca_main snd_seq videodev v4l1_compat firewire_ohci snd_seq_device snd_pcm i2c_i801 v4l2_compat_ioctl32 i2c_core snd_timer firewire_core crc_itu_t r8169 mii iTCO_wdt wmi iTCO_vendor_support joydev snd soundcore snd_page_alloc serio_raw cryptd aes_x86_64 aes_generic cbc dm_crypt pata_acpi ata_generic pata_jmicron [last unloaded: microcode]
Pid: 0, comm: swapper Not tainted 2.6.31.5-127.fc12.x86_64 #1
Call Trace:
 <IRQ>  [<ffffffff81051694>] warn_slowpath_common+0x84/0x9c
 [<ffffffff81051703>] warn_slowpath_fmt+0x41/0x43
 [<ffffffff8138e561>] ? netif_tx_lock+0x44/0x6d
 [<ffffffff8138e6cb>] dev_watchdog+0xf3/0x164
 [<ffffffff81064008>] ? __queue_work+0x3a/0x43
 [<ffffffff8106c3f0>] ? sched_clock_cpu+0x18/0x176
 [<ffffffff8105be64>] run_timer_softirq+0x19f/0x21c
 [<ffffffff8106e897>] ? clocksource_read+0xf/0x11
 [<ffffffff810256aa>] ? apic_write+0x16/0x18
 [<ffffffff810575b4>] __do_softirq+0xdd/0x1ad
 [<ffffffff81012eac>] call_softirq+0x1c/0x30
 [<ffffffff810143fb>] do_softirq+0x47/0x8d
 [<ffffffff810572c6>] irq_exit+0x44/0x86
 [<ffffffff8141eab2>] smp_apic_timer_interrupt+0x86/0x94
 [<ffffffff81012873>] apic_timer_interrupt+0x13/0x20
 <EOI>  [<ffffffff8101907f>] ? mwait_idle+0x91/0xae
 [<ffffffff81019021>] ? mwait_idle+0x33/0xae
 [<ffffffff8141d079>] ? atomic_notifier_call_chain+0x13/0x15
 [<ffffffff81010bb8>] ? enter_idle+0x25/0x27
 [<ffffffff81010c60>] ? cpu_idle+0xa6/0xe9
 [<ffffffff814145be>] ? start_secondary+0x1f3/0x234


How reproducible:

Completely randomly kernel 

Steps to Reproduce:
1. Start FC12
2. Work as Desktop
3. System looses network with error
  
Actual results:

FC11 was okay -- never got this error before

Expected results:


Additional info:
00:00.0 Host bridge: Intel Corporation X58 I/O Hub to ESI Port (rev 12)
00:01.0 PCI bridge: Intel Corporation X58 I/O Hub PCI Express Root Port 1 (rev 12)
00:03.0 PCI bridge: Intel Corporation X58 I/O Hub PCI Express Root Port 3 (rev 12)
00:05.0 PCI bridge: Intel Corporation X58 I/O Hub PCI Express Root Port 5 (rev 12)
00:07.0 PCI bridge: Intel Corporation X58 I/O Hub PCI Express Root Port 7 (rev 12)
00:09.0 PCI bridge: Intel Corporation X58 I/O Hub PCI Express Root Port 9 (rev 12)
00:10.0 PIC: Intel Corporation X58 Physical and Link Layer Registers Port 0 (rev 12)
00:10.1 PIC: Intel Corporation X58 Routing and Protocol Layer Registers Port 0 (rev 12)
00:11.0 PIC: Intel Corporation QuickPath Interconnect Physical and Link Layer Registers Port 1 (rev 12)
00:11.1 PIC: Intel Corporation QuickPath Interconnect Routing and Protocol Layer Registers Port 1 (rev 12)
00:13.0 PIC: Intel Corporation X58 I/O Hub I/OxAPIC Interrupt Controller (rev 12)
00:14.0 PIC: Intel Corporation X58 I/O Hub System Management Registers (rev 12)
00:14.1 PIC: Intel Corporation X58 I/O Hub GPIO and Scratch Pad Registers (rev 12)
00:14.2 PIC: Intel Corporation X58 I/O Hub Control Status and RAS Registers (rev 12)
00:15.0 PIC: Intel Corporation X58 Trusted Execution Technology Registers (rev 12)
00:1a.0 USB Controller: Intel Corporation 82801JI (ICH10 Family) USB UHCI Controller #4
00:1a.1 USB Controller: Intel Corporation 82801JI (ICH10 Family) USB UHCI Controller #5
00:1a.2 USB Controller: Intel Corporation 82801JI (ICH10 Family) USB UHCI Controller #6
00:1a.7 USB Controller: Intel Corporation 82801JI (ICH10 Family) USB2 EHCI Controller #2
00:1b.0 Audio device: Intel Corporation 82801JI (ICH10 Family) HD Audio Controller
00:1c.0 PCI bridge: Intel Corporation 82801JI (ICH10 Family) PCI Express Port 1
00:1c.1 PCI bridge: Intel Corporation 82801JI (ICH10 Family) PCI Express Port 2
00:1c.4 PCI bridge: Intel Corporation 82801JI (ICH10 Family) PCI Express Port 5
00:1d.0 USB Controller: Intel Corporation 82801JI (ICH10 Family) USB UHCI Controller #1
00:1d.1 USB Controller: Intel Corporation 82801JI (ICH10 Family) USB UHCI Controller #2
00:1d.2 USB Controller: Intel Corporation 82801JI (ICH10 Family) USB UHCI Controller #3
00:1d.7 USB Controller: Intel Corporation 82801JI (ICH10 Family) USB2 EHCI Controller #1
00:1e.0 PCI bridge: Intel Corporation 82801 PCI Bridge (rev 90)
00:1f.0 ISA bridge: Intel Corporation 82801JIR (ICH10R) LPC Interface Controller
00:1f.2 IDE interface: Intel Corporation 82801JI (ICH10 Family) 4 port SATA IDE Controller
00:1f.3 SMBus: Intel Corporation 82801JI (ICH10 Family) SMBus Controller
00:1f.5 IDE interface: Intel Corporation 82801JI (ICH10 Family) 2 port SATA IDE Controller
02:00.0 VGA compatible controller: nVidia Corporation GeForce 9600 GSO (rev a1)
07:00.0 SATA controller: JMicron Technologies, Inc. 20360/20363 Serial ATA Controller (rev 02)
07:00.1 IDE interface: JMicron Technologies, Inc. 20360/20363 Serial ATA Controller (rev 02)
08:00.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL8111/8168B PCI Express Gigabit Ethernet controller (rev 02)
09:06.0 FireWire (IEEE 1394): Texas Instruments TSB43AB23 IEEE-1394a-2000 Controller (PHY/Link)

Comment 1 pholdaway 2009-11-25 19:48:19 UTC

I have the same sort of problem...

Nov 25 03:37:34 backuppc kernel: WARNING: at net/sched/sch_generic.c:246 dev_watchdog+0xf3/0x164() (Not tainted)
Nov 25 03:37:34 backuppc kernel: Hardware name: X8SIE
Nov 25 03:37:34 backuppc kernel: NETDEV WATCHDOG: eth0 (e1000e): transmit queue 0 timed out
Nov 25 03:37:34 backuppc kernel: Modules linked in: nfs lockd fscache nfs_acl auth_rpcgss autofs4 sunrpc ipv6 cpufreq_ondemand acpi_cpufreq freq_table xfs exportfs dm_multipath uinput e1000e i2c_i801 i2c_core joydev raid0 raid1 mptsas mptscsih mptbase usb_storage scsi_transport_sas [last unloaded: microcode]
Nov 25 03:37:34 backuppc kernel: Pid: 0, comm: swapper Not tainted 2.6.31.5-127.fc12.x86_64 #1
Nov 25 03:37:34 backuppc kernel: Call Trace:
Nov 25 03:37:34 backuppc kernel: <IRQ>  [<ffffffff81051694>] warn_slowpath_common+0x84/0x9c
Nov 25 03:37:34 backuppc kernel: [<ffffffff81051703>] warn_slowpath_fmt+0x41/0x43
Nov 25 03:37:34 backuppc kernel: [<ffffffff8138e561>] ? netif_tx_lock+0x44/0x6d
Nov 25 03:37:34 backuppc kernel: [<ffffffff8138e6cb>] dev_watchdog+0xf3/0x164
Nov 25 03:37:34 backuppc kernel: [<ffffffff8105bbf2>] ? internal_add_timer+0xcf/0xd1
Nov 25 03:37:34 backuppc kernel: [<ffffffff8105bcab>] ? cascade+0x6a/0x84
Nov 25 03:37:34 backuppc kernel: [<ffffffff8105be64>] run_timer_softirq+0x19f/0x21c
Nov 25 03:37:34 backuppc kernel: [<ffffffff8106ae2b>] ? hrtimer_interrupt+0x13c/0x153
Nov 25 03:37:34 backuppc kernel: [<ffffffff810575b4>] __do_softirq+0xdd/0x1ad
Nov 25 03:37:34 backuppc kernel: [<ffffffff81026976>] ? apic_write+0x16/0x18
Nov 25 03:37:34 backuppc kernel: [<ffffffff81012eac>] call_softirq+0x1c/0x30
Nov 25 03:37:34 backuppc kernel: [<ffffffff810143fb>] do_softirq+0x47/0x8d
Nov 25 03:37:34 backuppc kernel: [<ffffffff810572c6>] irq_exit+0x44/0x86
Nov 25 03:37:34 backuppc kernel: [<ffffffff8141ea15>] do_IRQ+0xa5/0xbc
Nov 25 03:37:34 backuppc kernel: [<ffffffff810126d3>] ret_from_intr+0x0/0x11
Nov 25 03:37:34 backuppc kernel: <EOI>  [<ffffffff8101907f>] ? mwait_idle+0x91/0xae
Nov 25 03:37:34 backuppc kernel: [<ffffffff81019021>] ? mwait_idle+0x33/0xae
Nov 25 03:37:34 backuppc kernel: [<ffffffff8141d079>] ? atomic_notifier_call_chain+0x13/0x15
Nov 25 03:37:34 backuppc kernel: [<ffffffff81010bb8>] ? enter_idle+0x25/0x27
Nov 25 03:37:34 backuppc kernel: [<ffffffff81010c60>] ? cpu_idle+0xa6/0xe9
Nov 25 03:37:34 backuppc kernel: [<ffffffff814145be>] ? start_secondary+0x1f3/0x234

Comment 2 fred2 2009-11-26 14:31:14 UTC

same problem, maybe slightly different trace:

executable kernel
kernel 2.6.31.5-127.fc12.x86_64 #1 SMP 
package kernel
component

how to reproduce:
1. occurs randomly
2. symptom: network connection fails (every few minutes). 
$ ping 192.168.10.1
PING 192.168.10.1 (192.168.10.1) 56(84) bytes of data.
From 192.168.10.196 icmp_seq=2 Destination Host Unreachable
From 192.168.10.196 icmp_seq=3 Destination Host Unreachable
3. service NetworkManager restart seems to have no effect. must reboot to restore network connection.
4. (abrt reports this, but i assume does not / cannot automatically send oops.)

Nov 25 21:23:54 localhost kernel: ------------[ cut here ]------------
Nov 25 21:23:54 localhost kernel: WARNING: at net/sched/sch_generic.c:246 dev_watchdog+0xf3/0x164() (Not tainted)
Nov 25 21:23:54 localhost kernel: Hardware name: M51Ta              
Nov 25 21:23:54 localhost kernel: NETDEV WATCHDOG: eth0 (r8169): transmit queue 0 timed out
Nov 25 21:23:54 localhost kernel: Modules linked in: fuse sunrpc ipt_MASQUERADE iptable_nat nf_nat ip6t_REJECT nf_conntrack_ipv6 ip6table_filter ip6_tables ipv6 cpufreq_ondemand powernow_k8 freq_table dm_multipath uinput snd_hda_codec_atihdmi snd_hda_codec_realtek snd_hda_intel snd_hda_codec arc4 ecb ath9k snd_hwdep snd_seq mac80211 snd_seq_device uvcvideo snd_pcm sdhci_pci videodev sdhci ath firewire_ohci mmc_core firewire_core v4l1_compat r8169 amd64_edac_mod v4l2_compat_ioctl32 snd_timer cfg80211 edac_core mii snd i2c_piix4 shpchp soundcore rfkill snd_page_alloc serio_raw crc_itu_t joydev asus_laptop ata_generic pata_acpi pata_atiixp video output radeon ttm drm_kms_helper drm i2c_algo_bit i2c_core [last unloaded: scsi_wait_scan]
Nov 25 21:23:54 localhost kernel: Pid: 2040, comm: gnome-system-mo Not tainted 2.6.31.5-127.fc12.x86_64 #1
Nov 25 21:23:54 localhost kernel: Call Trace:
Nov 25 21:23:54 localhost kernel: <IRQ>  [<ffffffff81051694>] warn_slowpath_common+0x84/0x9c
Nov 25 21:23:54 localhost kernel: [<ffffffff81051703>] warn_slowpath_fmt+0x41/0x43
Nov 25 21:23:54 localhost kernel: [<ffffffff8138e561>] ? netif_tx_lock+0x44/0x6d
Nov 25 21:23:54 localhost kernel: [<ffffffff8138e6cb>] dev_watchdog+0xf3/0x164
Nov 25 21:23:54 localhost kernel: [<ffffffff8106eb1f>] ? getnstimeofday+0x5b/0xaf
Nov 25 21:23:54 localhost kernel: [<ffffffff81064008>] ? __queue_work+0x3a/0x43
Nov 25 21:23:54 localhost kernel: [<ffffffff8105be64>] run_timer_softirq+0x19f/0x21c
Nov 25 21:23:54 localhost kernel: [<ffffffff8106e897>] ? clocksource_read+0xf/0x11
Nov 25 21:23:54 localhost kernel: [<ffffffff810256aa>] ? apic_write+0x16/0x18
Nov 25 21:23:54 localhost kernel: [<ffffffff810575b4>] __do_softirq+0xdd/0x1ad
Nov 25 21:23:54 localhost kernel: [<ffffffff81012eac>] call_softirq+0x1c/0x30
Nov 25 21:23:54 localhost kernel: [<ffffffff810143fb>] do_softirq+0x47/0x8d
Nov 25 21:23:54 localhost kernel: [<ffffffff810572c6>] irq_exit+0x44/0x86
Nov 25 21:23:54 localhost kernel: [<ffffffff8141eab2>] smp_apic_timer_interrupt+0x86/0x94
Nov 25 21:23:54 localhost kernel: [<ffffffff81012873>] apic_timer_interrupt+0x13/0x20
Nov 25 21:23:54 localhost kernel: <EOI> 
Nov 25 21:23:54 localhost kernel: ---[ end trace 2630126d8eeeb4a2 ]---

Comment 3 delusion_master 2009-11-29 10:21:11 UTC

I have a similar problem (same card, same kernel, x86_64), with the difference that I disabled NetworkManager and only use the network service. Besides, I don't seem to get any network failure every few minutes, but it systematically fails at system startup.

When I boot up, eth0 link is not up (even though newtork is configured), and I have to force a 'ifdown eth0' 'ifup eth0' to get it working again. The problem is with ethtool I guess, since even when the network works it's slower than it should be (100mbit instead of 1000). Every attempt at forcing it differently by means of ethtool causes the link to go down again, which means I need to ifdown/ifup again.

I posted some details here, in case they can be of help to anybody
http://forums.fedoraforum.org/showthread.php?t=235218

Comment 4 Adrian 2009-12-03 09:45:52 UTC

Same problem here. Unfortunately, it renders F12 completely useless to me.

WARNING: at net/sched/sch_generic.c:246 dev_watchdog+0xf3/0x164() (Not tainted)
Hardware name: System Product Name
NETDEV WATCHDOG: eth0 (r8169): transmit queue 0 timed out
Modules linked in: vfat fat usb_storage fuse sunrpc ip6t_REJECT nf_conntrack_ipv6 ip6table_filter ip6_tables ipv6 cpufreq_ondemand powernow_k8 freq_table dm_multipath uinput usblp snd_hda_codec_realtek snd_hda_intel snd_hda_codec snd_usb_audio snd_seq uvcvideo videodev v4l1_compat r8169 snd_pcm amd64_edac_mod snd_usb_lib v4l2_compat_ioctl32 mii shpchp ppdev snd_rawmidi snd_timer snd_seq_device parport_pc snd_hwdep parport snd snd_page_alloc joydev edac_core asus_atk0110 i2c_piix4 k8temp soundcore serio_raw pata_acpi ata_generic pata_atiixp floppy radeon ttm drm_kms_helper drm i2c_algo_bit i2c_core [last unloaded: scsi_wait_scan]
Pid: 0, comm: swapper Not tainted 2.6.31.6-145.fc12.x86_64 #1
Call Trace:
 <IRQ>  [<ffffffff810516f4>] warn_slowpath_common+0x84/0x9c
 [<ffffffff81051763>] warn_slowpath_fmt+0x41/0x43
 [<ffffffff8138e671>] ? netif_tx_lock+0x44/0x6d
 [<ffffffff8138e7db>] dev_watchdog+0xf3/0x164
 [<ffffffff8106eb3b>] ? getnstimeofday+0x5b/0xaf
 [<ffffffff8101831c>] ? native_sched_clock+0x2d/0x61
 [<ffffffff811ef39b>] ? blk_rq_timed_out_timer+0xee/0xfb
 [<ffffffff8106c562>] ? sched_clock_cpu+0x16e/0x176
 [<ffffffff8105bec4>] run_timer_softirq+0x19f/0x21c
 [<ffffffff8106e8b3>] ? clocksource_read+0xf/0x11
 [<ffffffff8102566a>] ? apic_write+0x16/0x18
 [<ffffffff81057614>] __do_softirq+0xdd/0x1ad
 [<ffffffff81012eac>] call_softirq+0x1c/0x30
 [<ffffffff810143fb>] do_softirq+0x47/0x8d
 [<ffffffff81057326>] irq_exit+0x44/0x86
 [<ffffffff8141ebc2>] smp_apic_timer_interrupt+0x86/0x94
 [<ffffffff81012873>] apic_timer_interrupt+0x13/0x20
 <EOI>  [<ffffffff8102e23d>] ? native_safe_halt+0xb/0xd
 [<ffffffff81018e1f>] ? default_idle+0x47/0x6d
 [<ffffffff81018f40>] ? c1e_idle+0xfb/0x102
 [<ffffffff81010c60>] ? cpu_idle+0xa6/0xe9
 [<ffffffff814146ce>] ? start_secondary+0x1f3/0x234

Comment 5 Adam Williamson 2009-12-03 21:06:32 UTC

Looks like Ubuntu has the same problem:

https://bugs.launchpad.net/bugs/472057

I don't see an upstream report yet, though.

-- 
Fedora Bugzappers volunteer triage team
https://fedoraproject.org/wiki/BugZappers

Comment 6 Pavel V. Stepanov 2009-12-04 18:42:52 UTC

Hello all!

I have the same problem. Firstly I thought that it was my fault and had to try checking software and configs, then I change my mind. The problem description and methods to reproduce are:


Description:

Switching between desktops while downloading something hangs machine completely (in my case). If try to download file and DO NOT switch to other desktop, it will complete successfully (mostly) but, can hang up also.

Reproduce:

1. Start any ftp/torrent/samba manager
2. Start any download
3. Switch several times between desktops

In 99% it will hang up.

The driver which my system using is r8169.

---

Addition:

I've tried driver from Realtek Website (r8168). It has normally compiled and installed. With this driver situation is mostly the same, except the system remains to run. The network adapter goes down. 'ifdown eth0 && ifup eth0' recovers it but in next minutes, during download and desktop switching it goes down again.   

----

I think this is not problem of the driver, so I include only r8169 dumps.

If any additional information will need, I will provide. Let me know.

-----

Hardware is ASUS Notebook W2J series.

----------------------

[master@notebook ~]$ sudo lspci |grep Ether
02:00.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL8111/8168B PCI Express Gigabit Ethernet controller (rev 01)
[master@notebook ~]$


[master@notebook ~]$ uname -r
2.6.31.5-127.fc12.i686
[master@notebook ~]$


[master@notebook ~]$ cat /var/log/r8169.log
Nov 29 02:35:16 notebook kernel: ------------[ cut here ]------------
Nov 29 02:35:16 notebook kernel: WARNING: at net/sched/sch_generic.c:246 dev_watchdog+0xc6/0x12d() (Not tainted)
Nov 29 02:35:16 notebook kernel: Hardware name: W2J
Nov 29 02:35:16 notebook kernel: NETDEV WATCHDOG: eth0 (r8169): transmit queue 0 timed out
Nov 29 02:35:16 notebook kernel: Modules linked in: nls_utf8 cifs fuse rfcomm sco bridge stp llc bnep l2cap autofs4 sunrpc ipv6 cpufreq_ondemand acpi_cpufreq dm_multipath uinput iTCO_wdt iTCO_vendor_support btusb arc4 saa7134_alsa ecb mt352 saa7134_dvb videobuf_dvb dvb_core snd_hda_codec_si3054 tuner_xc2028 snd_hda_codec_realtek snd_hda_intel tuner snd_hda_codec iwl3945 snd_hwdep snd_seq saa7134 ir_common iwlcore snd_seq_device mac80211 v4l2_common sdhci_pci snd_pcm videodev r8169 mii sdhci v4l1_compat snd_timer mmc_core videobuf_dma_sg ricoh_mmc cfg80211 snd videobuf_core firewire_ohci tveeprom firewire_core soundcore crc_itu_t snd_page_alloc bluetooth joydev rfkill asus_laptop serio_raw tpm_infineon video output usb_storage radeon ttm drm_kms_helper drm i2c_algo_bit i2c_core [last unloaded: scsi_wait_scan]
Nov 29 02:35:16 notebook kernel: Pid: 0, comm: swapper Not tainted 2.6.31.5-127.fc12.i686 #1
Nov 29 02:35:16 notebook kernel: Call Trace:
Nov 29 02:35:16 notebook kernel: [<c0436d93>] warn_slowpath_common+0x70/0x87
Nov 29 02:35:16 notebook kernel: [<c06ee200>] ? dev_watchdog+0xc6/0x12d
Nov 29 02:35:16 notebook kernel: [<c0436de8>] warn_slowpath_fmt+0x29/0x2c
Nov 29 02:35:16 notebook kernel: [<c06ee200>] dev_watchdog+0xc6/0x12d
Nov 29 02:35:16 notebook kernel: [<c0450405>] ? getnstimeofday+0x57/0xe0
Nov 29 02:35:16 notebook kernel: [<c0416a0a>] ? apic_write+0x14/0x16
Nov 29 02:35:16 notebook kernel: [<c0416c20>] ? lapic_next_event+0x14/0x18
Nov 29 02:35:16 notebook kernel: [<c045338e>] ? clockevents_program_event+0xbf/0xcd
Nov 29 02:35:16 notebook kernel: [<c06ee13a>] ? dev_watchdog+0x0/0x12d
Nov 29 02:35:16 notebook kernel: [<c043fd21>] run_timer_softirq+0x14e/0x1af
Nov 29 02:35:16 notebook kernel: [<c043c042>] __do_softirq+0xb1/0x157
Nov 29 02:35:16 notebook kernel: [<c043c11e>] do_softirq+0x36/0x41
Nov 29 02:35:16 notebook kernel: [<c043c210>] irq_exit+0x2e/0x61
Nov 29 02:35:16 notebook kernel: [<c04173ab>] smp_apic_timer_interrupt+0x6d/0x7b
Nov 29 02:35:16 notebook kernel: [<c0403f95>] apic_timer_interrupt+0x31/0x38
Nov 29 02:35:16 notebook kernel: [<c044007b>] ? mod_timer_pending+0xc/0x16
Nov 29 02:35:16 notebook kernel: [<c05f20c9>] ? acpi_idle_enter_simple+0x102/0x135
Nov 29 02:35:16 notebook kernel: [<c06bba42>] cpuidle_idle_call+0x65/0x9b
Nov 29 02:35:16 notebook kernel: [<c04026ff>] cpu_idle+0x96/0xaf
Nov 29 02:35:16 notebook kernel: [<c0761353>] start_secondary+0x1f5/0x233
Nov 29 02:35:16 notebook kernel: ---[ end trace ee3762c52d6f6676 ]---
[master@notebook ~]$


[master@notebook ~]$ sudo ethtool eth0
Settings for eth0:
        Supported ports: [ TP MII ]
        Supported link modes:   10baseT/Half 10baseT/Full
                                100baseT/Half 100baseT/Full
                                1000baseT/Half 1000baseT/Full
        Supports auto-negotiation: Yes
        Advertised link modes:  10baseT/Half 10baseT/Full
                                100baseT/Half 100baseT/Full
                                1000baseT/Half 1000baseT/Full
        Advertised auto-negotiation: Yes
        Speed: 10Mb/s
        Duplex: Half
        Port: MII
        PHYAD: 0
        Transceiver: internal
        Auto-negotiation: on
        Supports Wake-on: pumbg
        Wake-on: pg
        Current message level: 0x00000033 (51)
        Link detected: no
[master@notebook ~]$





[master@notebook ~]$ lsmod | grep r8169
r8169                  28252  0
mii                     4120  1 r8169
[master@notebook ~]$

-------------------------------
P.S. In Fedora 11 all works properly. After upgrade to F12, it has to broke.

Comment 7 Adrian 2009-12-06 18:56:30 UTC

NOTE: This only applies to r8168
--------------------------------

I decided to try r8168 (from the realtek website) and had a similar problem to the one(s) described above. The call traces were different, though. 

I thought that I might include them in case they help to shed some light:


Dec  6 20:27:13 localhost kernel: BUG: sleeping function called from invalid context at mm/slub.c:1697
Dec  6 20:27:13 localhost kernel: in_atomic(): 1, irqs_disabled(): 0, pid: 0, name: swapper
Dec  6 20:27:13 localhost kernel: Pid: 0, comm: swapper Not tainted 2.6.31.6-145.fc12.x86_64 #1
Dec  6 20:27:13 localhost kernel: Call Trace:
Dec  6 20:27:13 localhost kernel: <IRQ>  [<ffffffff8104024e>] __might_sleep+0xe6/0xe8
Dec  6 20:27:13 localhost kernel: [<ffffffff810f32d0>] kmem_cache_alloc_notrace+0x3b/0xc2
Dec  6 20:27:13 localhost kernel: [<ffffffff8109ad00>] request_threaded_irq+0xa2/0x192
Dec  6 20:27:13 localhost kernel: [<ffffffffa011e4e0>] ? rtl8168_interrupt+0x0/0x226 [r8168]
Dec  6 20:27:13 localhost kernel: [<ffffffffa011ea6e>] ? rtl8168_esd_timer+0x0/0x2da [r8168]
Dec  6 20:27:13 localhost kernel: [<ffffffffa011e88c>] rtl8168_open+0x77/0x259 [r8168]
Dec  6 20:27:13 localhost kernel: [<ffffffffa011ed17>] rtl8168_esd_timer+0x2a9/0x2da [r8168]
Dec  6 20:27:13 localhost kernel: [<ffffffff81017b9d>] ? sched_clock+0x9/0xd
Dec  6 20:27:13 localhost kernel: [<ffffffff8105bec4>] run_timer_softirq+0x19f/0x21c
Dec  6 20:27:13 localhost kernel: [<ffffffff8106e8b3>] ? clocksource_read+0xf/0x11
Dec  6 20:27:13 localhost kernel: [<ffffffff8102566a>] ? apic_write+0x16/0x18
Dec  6 20:27:13 localhost kernel: [<ffffffff81057614>] __do_softirq+0xdd/0x1ad
Dec  6 20:27:13 localhost kernel: [<ffffffff81012eac>] call_softirq+0x1c/0x30
Dec  6 20:27:13 localhost kernel: [<ffffffff810143fb>] do_softirq+0x47/0x8d
Dec  6 20:27:13 localhost kernel: [<ffffffff81057326>] irq_exit+0x44/0x86
Dec  6 20:27:13 localhost kernel: [<ffffffff8141ebc2>] smp_apic_timer_interrupt+0x86/0x94
Dec  6 20:27:13 localhost kernel: [<ffffffff81012873>] apic_timer_interrupt+0x13/0x20
Dec  6 20:27:13 localhost kernel: <EOI>  [<ffffffff8102e23d>] ? native_safe_halt+0xb/0xd
Dec  6 20:27:13 localhost kernel: [<ffffffff81018e1f>] ? default_idle+0x47/0x6d
Dec  6 20:27:13 localhost kernel: [<ffffffff81018f40>] ? c1e_idle+0xfb/0x102
Dec  6 20:27:13 localhost kernel: [<ffffffff81010c60>] ? cpu_idle+0xa6/0xe9
Dec  6 20:27:13 localhost kernel: [<ffffffff814146ce>] ? start_secondary+0x1f3/0x234
Dec  6 20:27:13 localhost kernel: IRQ handler type mismatch for IRQ 26
Dec  6 20:27:13 localhost kernel: current handler: eth0
Dec  6 20:27:13 localhost kernel: Pid: 0, comm: swapper Not tainted 2.6.31.6-145.fc12.x86_64 #1
Dec  6 20:27:13 localhost kernel: Call Trace:
Dec  6 20:27:13 localhost kernel: <IRQ>  [<ffffffff8109a8e7>] __setup_irq+0x25e/0x2c3
Dec  6 20:27:13 localhost kernel: [<ffffffff8109ad8d>] request_threaded_irq+0x12f/0x192
Dec  6 20:27:13 localhost kernel: [<ffffffffa011e4e0>] ? rtl8168_interrupt+0x0/0x226 [r8168]
Dec  6 20:27:13 localhost kernel: [<ffffffffa011ea6e>] ? rtl8168_esd_timer+0x0/0x2da [r8168]
Dec  6 20:27:13 localhost kernel: [<ffffffffa011e88c>] rtl8168_open+0x77/0x259 [r8168]
Dec  6 20:27:13 localhost kernel: [<ffffffffa011ed17>] rtl8168_esd_timer+0x2a9/0x2da [r8168]
Dec  6 20:27:13 localhost kernel: [<ffffffff81017b9d>] ? sched_clock+0x9/0xd
Dec  6 20:27:13 localhost kernel: [<ffffffff8105bec4>] run_timer_softirq+0x19f/0x21c
Dec  6 20:27:13 localhost kernel: [<ffffffff8106e8b3>] ? clocksource_read+0xf/0x11
Dec  6 20:27:13 localhost kernel: [<ffffffff8102566a>] ? apic_write+0x16/0x18
Dec  6 20:27:13 localhost kernel: [<ffffffff81057614>] __do_softirq+0xdd/0x1ad
Dec  6 20:27:13 localhost kernel: [<ffffffff81012eac>] call_softirq+0x1c/0x30
Dec  6 20:27:13 localhost kernel: [<ffffffff810143fb>] do_softirq+0x47/0x8d
Dec  6 20:27:13 localhost kernel: [<ffffffff81057326>] irq_exit+0x44/0x86
Dec  6 20:27:13 localhost kernel: [<ffffffff8141ebc2>] smp_apic_timer_interrupt+0x86/0x94
Dec  6 20:27:13 localhost kernel: [<ffffffff81012873>] apic_timer_interrupt+0x13/0x20
Dec  6 20:27:13 localhost kernel: <EOI>  [<ffffffff8102e23d>] ? native_safe_halt+0xb/0xd
Dec  6 20:27:13 localhost kernel: [<ffffffff81018e1f>] ? default_idle+0x47/0x6d
Dec  6 20:27:13 localhost kernel: [<ffffffff81018f40>] ? c1e_idle+0xfb/0x102
Dec  6 20:27:13 localhost kernel: [<ffffffff81010c60>] ? cpu_idle+0xa6/0xe9
Dec  6 20:27:13 localhost kernel: [<ffffffff814146ce>] ? start_secondary+0x1f3/0x234

Comment 8 Alan Altmann 2009-12-14 16:09:38 UTC

Created attachment 378257 [details]
eth0 stops working

Same problem.  Seems to only happen under heavy use.  Error log attached.

Comment 9 Eric Smith 2009-12-18 11:39:32 UTC

Created attachment 379192 [details]
call trace of rtl8111/8168 failure (r8169 driver)

Same problem here, on a Shuttle X27D running F12 x86_64.  The Ethernet seems to go deaf (and/or dumb?) every few days.  I don't know whether the traffic is particularly heavy at the time of failure.  ifdown/ifup doesn't recover it, have to reboot.

After the failure the system log gets "kernel: r8169: eth0: link up" messages about once a minute.

lspci reports:

01:00.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL8111/8168B PCI Express Gigabit Ethernet controller (rev 02)
	Subsystem: Holco Enterprise Co, Ltd/Shuttle Computer Device 3166
	Flags: bus master, fast devsel, latency 0, IRQ 25
	I/O ports at de00 [size=256]
	Memory at fddff000 (64-bit, non-prefetchable) [size=4K]
	Memory at fdef0000 (64-bit, prefetchable) [size=64K]
	[virtual] Expansion ROM at fde00000 [disabled] [size=128K]
	Capabilities: [40] Power Management version 3
	Capabilities: [50] MSI: Enable+ Count=1/1 Maskable- 64bit+
	Capabilities: [70] Express Endpoint, MSI 01
	Capabilities: [b0] MSI-X: Enable- Count=2 Masked-
	Capabilities: [d0] Vital Product Data
	Capabilities: [100] Advanced Error Reporting
	Capabilities: [140] Virtual Channel <?>
	Capabilities: [160] Device Serial Number ee-00-00-00-68-4c-e0-00
	Kernel driver in use: r8169
	Kernel modules: r8169

Comment 10 Igor 2009-12-18 15:54:54 UTC

Same severe error with latest FC12 kernel: 2.6.31.6-166.fc12.x86_64


WARNING: at net/sched/sch_generic.c:246 dev_watchdog+0xf3/0x164() (Tainted: P          )
Hardware name: EX58-UD3R
NETDEV WATCHDOG: eth0 (r8169): transmit queue 0 timed out
Modules linked in: fuse ipv6 cpufreq_ondemand acpi_cpufreq freq_table dm_multipath xts gf128mul kvm_intel kvm uinput snd_hda_codec_realtek snd_hda_intel snd_hda_codec snd_hwdep snd_seq snd_seq_device snd_pcm snd_timer nvidia(P) i2c_i801 firewire_ohci snd soundcore snd_page_alloc firewire_core crc_itu_t r8169 mii i2c_core joydev iTCO_wdt iTCO_vendor_support serio_raw wmi cryptd aes_x86_64 aes_generic cbc dm_crypt pata_acpi ata_generic pata_jmicron [last unloaded: microcode]
Pid: 0, comm: swapper Tainted: P           2.6.31.6-166.fc12.x86_64 #1
Call Trace:
 <IRQ>  [<ffffffff810516f4>] warn_slowpath_common+0x84/0x9c
 [<ffffffff81051763>] warn_slowpath_fmt+0x41/0x43
 [<ffffffff8138e831>] ? netif_tx_lock+0x44/0x6d
 [<ffffffff8138e99b>] dev_watchdog+0xf3/0x164
 [<ffffffff8105bc52>] ? internal_add_timer+0xcf/0xd1
 [<ffffffff8105bd0b>] ? cascade+0x6a/0x84
 [<ffffffff8105bec4>] run_timer_softirq+0x19f/0x21c
 [<ffffffff8106e8b3>] ? clocksource_read+0xf/0x11
 [<ffffffff8102566a>] ? apic_write+0x16/0x18
 [<ffffffff81057614>] __do_softirq+0xdd/0x1ad
 [<ffffffff81012eac>] call_softirq+0x1c/0x30
 [<ffffffff810143fb>] do_softirq+0x47/0x8d
 [<ffffffff81057326>] irq_exit+0x44/0x86
 [<ffffffff8141ed92>] smp_apic_timer_interrupt+0x86/0x94
 [<ffffffff81012873>] apic_timer_interrupt+0x13/0x20
 <EOI>  [<ffffffff8101907f>] ? mwait_idle+0x91/0xae
 [<ffffffff81019021>] ? mwait_idle+0x33/0xae
 [<ffffffff8141d359>] ? atomic_notifier_call_chain+0x13/0x15
 [<ffffffff81010bb8>] ? enter_idle+0x25/0x27
 [<ffffffff81010c60>] ? cpu_idle+0xa6/0xe9
 [<ffffffff8141489e>] ? start_secondary+0x1f3/0x234

Comment 11 Antoine Martin 2009-12-27 11:31:38 UTC

Same issue with 2.6.31.9-174.fc12.x86_64,
Tested with onboard NIC and PCIe card which has the same chipset:
02:00.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL8111/8168B PCI Express Gigabit Ethernet controller (rev 01)
03:00.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL8111/8168B PCI Express Gigabit Ethernet controller (rev 02)

Trivial to reproduce in my case:
rsync someotherhost:/somelargefile ./

Kernel hangs after a few dozens MBs are transferred.

Comment 12 klaas.de.waal 2009-12-31 21:39:21 UTC

I've solved this problem by switching to the "vanilla" linux kernel 2.6.32.2. 
The r8169 driver is significantly changed w.r.t. the r8169 driver in the F12 2.6.31.something kernel.
Note that with the vanilla kernel the "nouveau" driver is then missing (but that can be fixed by loading the nv driver or the NVIDIA binary).

Comment 13 Michal Hlavinka 2010-01-04 13:14:22 UTC

I can confirm this. I've used kernel from rawhide (2.6.32.2-15) and there is significant improvement. However, r8169 stopped working once during 200 GB transfer, but still compared to every time hang within a few GB, it's great improvement!

Comment 14 Marko Karg 2010-01-08 12:37:46 UTC

I've seen the r8169 stopping even with the rawhide kernel while copying just a few MBs, so I don't regard this as a great improvement yet, sorry.

Comment 15 fred2 2010-01-10 20:24:43 UTC

i downgraded from fc12 to fc11 to avoid this problem. but, it may be creeping into fc11 kernels also. the following occurred on the fc11 host while usb networking w/ an attached device. (did not lose host eth0 networking, just lost usb networking (so, at least the abrt did generate and send an oops)).

or, maybe an expert could clarify whether this is a separate bug - same basic error, but i assume it doesn't involve r8169 explicitly?

[localhost]$ uname -a
Linux localhost.localdomain 2.6.30.10-105.fc11.x86_64 #1 SMP Thu Dec 24 16:41:51 UTC 2009 x86_64 x86_64 x86_64 GNU/Linux

------------[ cut here ]------------
WARNING: at net/sched/sch_generic.c:226 dev_watchdog+0xde/0x14f() (Not tainted)
Hardware name: M51Ta              
NETDEV WATCHDOG: eth1 (cdc_ether): transmit timed out
Modules linked in: cdc_ether usbnet vfat fat usb_storage fuse nfsd lockd nfs_acl auth_rpcgss exportfs sco bridge stp llc bnep l2cap bluetooth sunrpc ipt_MASQUERADE iptable_nat nf_nat ip6t_REJECT nf_conntrack_ipv6 ip6table_filter ip6_tables ipv6 cpufreq_ondemand powernow_k8 freq_table dm_multipath uinput snd_hda_codec_atihdmi snd_hda_codec_realtek uvcvideo videodev snd_hda_intel firewire_ohci arc4 v4l1_compat firewire_core sdhci_pci snd_hda_codec ecb sdhci v4l2_compat_ioctl32 ata_generic snd_hwdep mmc_core pata_acpi snd_seq crc_itu_t snd_seq_device joydev serio_raw pcspkr ath9k snd_pcm pata_atiixp i2c_piix4 mac80211 snd_timer video rfkill snd r8169 soundcore output asus_laptop snd_page_alloc cfg80211 shpchp mii radeon drm i2c_algo_bit i2c_core [last unloaded: scsi_wait_scan]
Pid: 0, comm: swapper Not tainted 2.6.30.10-105.fc11.x86_64 #1
Call Trace:
 <IRQ>  [<ffffffff81049505>] warn_slowpath_common+0x84/0x9c
 [<ffffffff81049574>] warn_slowpath_fmt+0x41/0x43
 [<ffffffff81350409>] ? netif_tx_lock+0x44/0x6c
 [<ffffffff8135055c>] dev_watchdog+0xde/0x14f
 [<ffffffff8135047e>] ? dev_watchdog+0x0/0x14f
 [<ffffffff81052a0d>] run_timer_softirq+0x194/0x210
 [<ffffffff810640b3>] ? getnstimeofday+0x5b/0xaf
 [<ffffffff81021f78>] ? lapic_next_event+0x15/0x19
 [<ffffffff8104e824>] __do_softirq+0xb9/0x18d
 [<ffffffff81011e0c>] call_softirq+0x1c/0x30
 [<ffffffff81013233>] do_softirq+0x47/0x8d
 [<ffffffff8104e564>] irq_exit+0x44/0x81
 [<ffffffff81022894>] smp_apic_timer_interrupt+0x86/0x94
 [<ffffffff81011813>] apic_timer_interrupt+0x13/0x20
 <EOI>  [<ffffffff8102b259>] ? native_safe_halt+0xb/0xd
 [<ffffffff81017aa8>] ? default_idle+0x47/0x6d
 [<ffffffff81017bdb>] ? c1e_idle+0x10d/0x114
 [<ffffffff8100fcdd>] ? cpu_idle+0xa6/0xe9
 [<ffffffff813d51a7>] ? start_secondary+0x1f3/0x234
---[ end trace dccec5442c9fe65e ]---

Comment 16 Hideki Yoshida 2010-01-11 02:08:18 UTC

I am getting this dev_watchdog error for r8169 on following kernels:
kernel-2.6.31.1-56.fc12.i686
kernel-2.6.31.9-174.fc12.i686
kernel-2.6.32.3-10.fc12.i686
kernel-PAE-2.6.31.6-166.fc12.i686
kernel-PAE-2.6.31.9-174.fc12.i686

Bursty outbound traffic, like
ssh host 'dd if=/dev/zero count=1M' >/dev/null (from another host)
triggers this error.

I switched to kernel-PAE-2.6.30-6.fc12.i686 which seems to be free from this bug.

Comment 17 Michal Hlavinka 2010-01-12 10:53:00 UTC

Thanks for the info, Hideki.

I've tried kernel-2.6.30-6.fc12 and it seems it's working now. But also traffic is slower than usual. With 2.6.32.3-10 I get +- 7 MB/s (using smb transfer for testing) but with 2.6.30-6 only 3.7 MB/s. Also in some other bug reports one guy has reported this network card works when traffic is reduced. So maybe 2.6.30-6 does not contain any fix, but only another bug making this card work slower? (I guess 2.6.30-6.fc12 is from early F-12 development and contains some kernel debugging turned on which can possibly slow this).

Also I have one r8168/8111b card in my laptop, and it's working fine. I've found lspci reports my (not working card) as "... RTL8111/8168B ... (rev 02)", but dmesg | grep XID says "eth0: RTL8168c/8111c"

This is also reported upstream several times, for example 

http://bugzilla.kernel.org/show_bug.cgi?id=14985
http://bugzilla.kernel.org/show_bug.cgi?id=14962
http://bugzilla.kernel.org/show_bug.cgi?id=14709
http://bugzilla.kernel.org/show_bug.cgi?id=12296
...

hopefully, this will get fixed soon

Comment 18 Frantisek Hanzlik 2010-01-26 10:00:02 UTC

NB Acer TravelMate TM8471-353G32Mn, network install wasn't possible due to kernel / r8169 crashes. After dvd install I tested it with kernels 2.2.6.31.9-174.fc12.i686 and 6.31.12-174.2.3.fc12.i686 from fedora repos, and with 2.6.32.4-29.fc12.i686 and 2.6.32.4-30.fc12.i686 from koji - network is working with 0.5 - 2 kB/sec only, quite useless.
Ubuntu 9.10 fresh install seems woorking OK.

Comment 19 Mace Moneta 2010-01-27 22:55:27 UTC

Seeing this on the Koji 2.6.32.6-36.fc12.x86_64 kernel as well.

Comment 20 Igor 2010-01-27 23:39:36 UTC

Again, on latest FC12 kernel (2.6.31.12-174.2.3.fc12.x86_64):

WARNING: at net/sched/sch_generic.c:246 dev_watchdog+0xf3/0x164() (Tainted: P          )
Hardware name: EX58-UD3R
NETDEV WATCHDOG: eth0 (r8169): transmit queue 0 timed out
Modules linked in: fuse cpufreq_ondemand acpi_cpufreq freq_table ipv6 dm_multipath xts gf128mul kvm_intel kvm uinput snd_hda_codec_realtek r8169 snd_hda_intel nvidia(P) iTCO_wdt firewire_ohci snd_hda_codec snd_hwdep snd_seq snd_seq_device mii snd_pcm iTCO_vendor_support i2c_i801 firewire_core crc_itu_t snd_timer joydev snd soundcore snd_page_alloc i2c_core serio_raw wmi cryptd aes_x86_64 aes_generic cbc dm_crypt pata_acpi ata_generic pata_jmicron [last unloaded: microcode]
Pid: 0, comm: swapper Tainted: P           2.6.31.12-174.2.3.fc12.x86_64 #1
Call Trace:
 <IRQ>  [<ffffffff81051710>] warn_slowpath_common+0x84/0x9c
 [<ffffffff8105177f>] warn_slowpath_fmt+0x41/0x43
 [<ffffffff81391135>] ? netif_tx_lock+0x44/0x6d
 [<ffffffff8139129f>] dev_watchdog+0xf3/0x164
 [<ffffffff8106409c>] ? __queue_work+0x3a/0x43
 [<ffffffff8106c43c>] ? sched_clock_cpu+0x18/0x176
 [<ffffffff8105bee0>] run_timer_softirq+0x19f/0x21c
 [<ffffffff8106e8e3>] ? clocksource_read+0xf/0x11
 [<ffffffff8102569a>] ? apic_write+0x16/0x18
 [<ffffffff81057630>] __do_softirq+0xdd/0x1ad
 [<ffffffff81012eac>] call_softirq+0x1c/0x30
 [<ffffffff810143fb>] do_softirq+0x47/0x8d
 [<ffffffff81057342>] irq_exit+0x44/0x86
 [<ffffffff81421672>] smp_apic_timer_interrupt+0x86/0x94
 [<ffffffff81012873>] apic_timer_interrupt+0x13/0x20
 <EOI>  [<ffffffff8101907f>] ? mwait_idle+0x91/0xae
 [<ffffffff81019021>] ? mwait_idle+0x33/0xae
 [<ffffffff8141fc39>] ? atomic_notifier_call_chain+0x13/0x15
 [<ffffffff81010bb8>] ? enter_idle+0x25/0x27
 [<ffffffff81010c60>] ? cpu_idle+0xa6/0xe9
 [<ffffffff8141717e>] ? start_secondary+0x1f3/0x234
---[ end trace 35e2b4fdee3f8507 ]---

Comment 21 Mace Moneta 2010-01-27 23:54:06 UTC

On 2.6.32.6-36.fc12.x86_64:

Jan 27 17:41:32 slayer kernel:------------[ cut here ]------------
Jan 27 17:41:32 slayer kernel:WARNING: at net/sched/sch_generic.c:261 dev_watchdog+0xf3/0x164()
Jan 27 17:41:32 slayer kernel:Hardware name: C2SEA
Jan 27 17:41:32 slayer kernel:NETDEV WATCHDOG: eth0 (r8169): transmit queue 0 timed out
Jan 27 17:41:32 slayer kernel:Modules linked in: ppdev parport fuse nvidia(P) ipt_MASQUERADE iptable_nat nf_nat rfcomm sco bridge stp llc bnep l2cap w83627ehf hwmon_vid coretemp cpufreq_ondemand acpi_cpufreq freq_table ipv6 xt_physdev kvm_intel kvm uinput snd_hda_codec_realtek uvcvideo snd_hda_intel snd_hda_codec snd_seq_dummy snd_seq_oss snd_seq_midi_event snd_seq snd_pcm_oss snd_usb_audio snd_mixer_oss snd_pcm videodev v4l1_compat v4l2_compat_ioctl32 btusb snd_usb_lib r8169 bluetooth i2c_i801 snd_rawmidi rfkill snd_timer snd_seq_device pl2303 mii iTCO_wdt usbserial snd_page_alloc snd_hwdep iTCO_vendor_support snd soundcore raid0 raid1 pata_acpi ata_generic dm_multipath firewire_ohci pata_it8213 firewire_core crc_itu_t i2c_core [last unloaded: vmnet]
Jan 27 17:41:32 slayer kernel:Pid: 0, comm: swapper Tainted: P           2.6.32.6-36.fc12.x86_64 #1
Jan 27 17:41:32 slayer kernel:Call Trace:
Jan 27 17:41:32 slayer kernel: <IRQ>  [<ffffffff810562b4>] warn_slowpath_common+0x7c/0x94
Jan 27 17:41:32 slayer kernel: [<ffffffff81056323>] warn_slowpath_fmt+0x41/0x43
Jan 27 17:41:32 slayer kernel: [<ffffffff813c535f>] ? netif_tx_lock+0x44/0x6d
Jan 27 17:41:32 slayer kernel: [<ffffffff813c54c9>] dev_watchdog+0xf3/0x164
Jan 27 17:41:32 slayer kernel: [<ffffffff810793a1>] ? sched_clock_local+0x1c/0x82
Jan 27 17:41:32 slayer kernel: [<ffffffff81064d8e>] ? internal_add_timer+0xcf/0xd1
Jan 27 17:41:32 slayer kernel: [<ffffffff81064e5c>] ? cascade+0x6a/0x86
Jan 27 17:41:32 slayer kernel: [<ffffffff8106503c>] run_timer_softirq+0x1c4/0x268
Jan 27 17:41:32 slayer kernel: [<ffffffff81026b5e>] ? apic_write+0x16/0x18
Jan 27 17:41:32 slayer kernel: [<ffffffff8105d8ec>] __do_softirq+0xe5/0x1a9
Jan 27 17:41:32 slayer kernel: [<ffffffff810804f2>] ? tick_program_event+0x2a/0x2c
Jan 27 17:41:32 slayer kernel: [<ffffffff81012e2c>] call_softirq+0x1c/0x30
Jan 27 17:41:32 slayer kernel: [<ffffffff810143aa>] do_softirq+0x46/0x86
Jan 27 17:41:32 slayer kernel: [<ffffffff8105d72a>] irq_exit+0x3b/0x7d
Jan 27 17:41:32 slayer kernel: [<ffffffff8145922a>] smp_apic_timer_interrupt+0x86/0x94
Jan 27 17:41:32 slayer kernel: [<ffffffff810127f3>] apic_timer_interrupt+0x13/0x20
Jan 27 17:41:32 slayer kernel: <EOI>  [<ffffffff81019148>] ? mwait_idle+0x7a/0x88
Jan 27 17:41:32 slayer kernel: [<ffffffff810190fa>] ? mwait_idle+0x2c/0x88
Jan 27 17:41:32 slayer kernel: [<ffffffff81387c11>] ? cpuidle_idle_call+0x38/0xf3
Jan 27 17:41:32 slayer kernel: [<ffffffff81010ca5>] ? cpu_idle+0xaa/0xe4
Jan 27 17:41:32 slayer kernel: [<ffffffff8144ce98>] ? start_secondary+0x1f2/0x233
Jan 27 17:41:32 slayer kernel:---[ end trace 68ff79fa8a42ca1e ]---
Jan 27 17:41:32 slayer kernel:r8169: eth0: link up
Jan 27 17:41:44 slayer kernel:r8169: eth0: link up
Jan 27 17:41:56 slayer kernel:r8169: eth0: link up
Jan 27 17:42:08 slayer kernel:r8169: eth0: link up
Jan 27 17:42:20 slayer kernel:r8169: eth0: link up
Jan 27 17:42:32 slayer kernel:r8169: eth0: link up
Jan 27 17:42:44 slayer kernel:r8169: eth0: link up
Jan 27 17:42:56 slayer kernel:r8169: eth0: link up
Jan 27 17:43:08 slayer kernel:r8169: eth0: link up
Jan 27 17:43:20 slayer kernel:r8169: eth0: link up
Jan 27 17:43:32 slayer kernel:r8169: eth0: link up
Jan 27 17:43:44 slayer kernel:r8169: eth0: link up
Jan 27 17:43:56 slayer kernel:r8169: eth0: link up

# lspci -vvv -nn -s 03:00.0
03:00.0 Ethernet controller [0200]: Realtek Semiconductor Co., Ltd. RTL8111/8168B PCI Express Gigabit Ethernet controller [10ec:8168] (rev 02)
        Subsystem: Super Micro Computer Inc Device [15d9:8168]
        Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+
        Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
        Latency: 0, Cache Line Size: 64 bytes
        Interrupt: pin A routed to IRQ 28
        Region 0: I/O ports at d800 [size=256]
        Region 2: Memory at feaff000 (64-bit, non-prefetchable) [size=4K]
        Region 4: Memory at f8ff0000 (64-bit, prefetchable) [size=64K]
        Expansion ROM at feac0000 [disabled] [size=128K]
        Capabilities: [40] Power Management version 3
                Flags: PMEClk- DSI- D1+ D2+ AuxCurrent=375mA PME(D0+,D1+,D2+,D3hot+,D3cold+)
                Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME-
        Capabilities: [50] MSI: Enable+ Count=1/1 Maskable- 64bit+
                Address: 00000000fee0200c  Data: 41a1
        Capabilities: [70] Express (v1) Endpoint, MSI 01
                DevCap: MaxPayload 256 bytes, PhantFunc 0, Latency L0s <512ns, L1 <64us
                        ExtTag- AttnBtn- AttnInd- PwrInd- RBE+ FLReset-
                DevCtl: Report errors: Correctable- Non-Fatal- Fatal- Unsupported-
                        RlxdOrd+ ExtTag- PhantFunc- AuxPwr- NoSnoop-
                        MaxPayload 128 bytes, MaxReadReq 4096 bytes
                DevSta: CorrErr+ UncorrErr- FatalErr- UnsuppReq+ AuxPwr+ TransPend-
                LnkCap: Port #0, Speed 2.5GT/s, Width x1, ASPM L0s L1, Latency L0 <512ns, L1 <64us
                        ClockPM+ Surprise- LLActRep- BwNot-
                LnkCtl: ASPM L0s L1 Enabled; RCB 64 bytes Disabled- Retrain- CommClk+
                        ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
                LnkSta: Speed 2.5GT/s, Width x1, TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
        Capabilities: [b0] MSI-X: Enable- Count=2 Masked-
                Vector table: BAR=4 offset=00000000
                PBA: BAR=4 offset=00000800
        Capabilities: [d0] Vital Product Data
                Unknown small resource type 05, will not decode more.
        Capabilities: [100 v1] Advanced Error Reporting
                UESta:  DLP- SDES- TLP- FCP- CmpltTO+ CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq+ ACSViol-
                UEMsk:  DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
                UESvrt: DLP+ SDES+ TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt- RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol-
                CESta:  RxErr+ BadTLP+ BadDLLP+ Rollover+ Timeout+ NonFatalErr+
                CEMsk:  RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr+
                AERCap: First Error Pointer: 14, GenCap+ CGenEn- ChkCap+ ChkEn-
        Capabilities: [140 v1] Virtual Channel
                Caps:   LPEVC=0 RefClk=100ns PATEntrySize=0
                Arb:    Fixed- WRR32- WRR64- WRR128- 100ns- - - onfig- TableOffset=0
                Ctrl:   ArbSelect=Fixed
                Status: InProgress-
                VC0:    Caps:   PATOffset=00 MaxTimeSlots=1 RejSnoopTrans-
                        Arb:    Fixed- WRR32- WRR64- WRR128- TWRR128- WRR256- Fixed- RR32-
                        Ctrl:   Enable+ ID=0 ArbSelect=Fixed TC/VC=01
                        Status: NegoPending- InProgress-
        Capabilities: [160 v1] Device Serial Number 68-00-00-00-68-4c-e0-00
        Kernel driver in use: r8169
        Kernel modules: r8169

Comment 22 Evan Klitzke 2010-02-07 20:12:37 UTC

I am also affected by this problem. Unlike most of the others commenting on this bug, I'm running a 386 (well, 686 I suppose) kernel.

Does anyone know if the abrt oops reports will be sent upstream once this happens? I see an entry in /var/log/messages about an abrt report being generated, but once I get the oops networking is no longer functional. Presumably the abrt stuff could send the report upstream after I reboot, however, if it was smart.

I'm wondering because if it's the case that the reports don't end up getting sent upstream, the problem could be more widespread than the kernel developers might otherwise think.

Comment 23 Evan Klitzke 2010-02-08 18:24:20 UTC

I believe that this is related to, or the same as, https://bugzilla.redhat.com/show_bug.cgi?id=518801

Comment 24 break19 2010-02-13 01:04:28 UTC

lspci -v output:
00:00.0 Host bridge: Advanced Micro Devices [AMD] RS780 Host Bridge
        Subsystem: ASUSTeK Computer Inc. Device 82f1
        Flags: bus master, 66MHz, medium devsel, latency 0
        Capabilities: [c4] HyperTransport: Slave or Primary Interface
        Capabilities: [54] HyperTransport: UnitID Clumping
        Capabilities: [40] HyperTransport: Retry Mode
        Capabilities: [9c] HyperTransport: #1a
        Capabilities: [f8] HyperTransport: #1c

00:02.0 PCI bridge: Advanced Micro Devices [AMD] RS780 PCI to PCI bridge (ext gfx port 0) (prog-if 00 [Normal decode])
        Flags: bus master, fast devsel, latency 0
        Bus: primary=00, secondary=01, subordinate=01, sec-latency=0
        I/O behind bridge: 0000d000-0000dfff
        Memory behind bridge: fbe00000-fbefffff
        Prefetchable memory behind bridge: 00000000d0000000-00000000dfffffff
        Capabilities: [50] Power Management version 3
        Capabilities: [58] Express Root Port (Slot+), MSI 00
        Capabilities: [a0] MSI: Enable+ Count=1/1 Maskable- 64bit-
        Capabilities: [b0] Subsystem: ASUSTeK Computer Inc. Device 82f1
        Capabilities: [b8] HyperTransport: MSI Mapping Enable+ Fixed+
        Capabilities: [100] Vendor Specific Information: ID=0001 Rev=1 Len=010 <?>
        Capabilities: [110] Virtual Channel
        Kernel driver in use: pcieport
        Kernel modules: shpchp

00:06.0 PCI bridge: Advanced Micro Devices [AMD] RS780 PCI to PCI bridge (PCIE port 2) (prog-if 00 [Normal decode])
        Flags: bus master, fast devsel, latency 0
        Bus: primary=00, secondary=02, subordinate=02, sec-latency=0
        I/O behind bridge: 0000e000-0000efff
        Memory behind bridge: fbf00000-fbffffff
        Prefetchable memory behind bridge: 00000000faf00000-00000000faffffff
        Capabilities: [50] Power Management version 3
        Capabilities: [58] Express Root Port (Slot+), MSI 00
        Capabilities: [a0] MSI: Enable+ Count=1/1 Maskable- 64bit-
        Capabilities: [b0] Subsystem: ASUSTeK Computer Inc. Device 82f1
        Capabilities: [b8] HyperTransport: MSI Mapping Enable+ Fixed+
        Capabilities: [100] Vendor Specific Information: ID=0001 Rev=1 Len=010 <?>
        Capabilities: [110] Virtual Channel
        Kernel driver in use: pcieport
        Kernel modules: shpchp

00:11.0 SATA controller: ATI Technologies Inc SB700/SB800 SATA Controller [AHCI mode] (prog-if 01 [AHCI 1.0])
        Subsystem: ASUSTeK Computer Inc. Device 82ef
        Flags: bus master, 66MHz, medium devsel, latency 64, IRQ 22
        I/O ports at c000 [size=8]
        I/O ports at b000 [size=4]
        I/O ports at a000 [size=8]
        I/O ports at 9000 [size=4]
        I/O ports at 8000 [size=16]
        Memory at fbdff800 (32-bit, non-prefetchable) [size=1K]
        Capabilities: [60] Power Management version 2
        Capabilities: [70] SATA HBA v1.0
        Kernel driver in use: ahci

00:12.0 USB Controller: ATI Technologies Inc SB700/SB800 USB OHCI0 Controller (prog-if 10 [OHCI])
        Subsystem: ASUSTeK Computer Inc. Device 82ef
        Flags: bus master, 66MHz, medium devsel, latency 64, IRQ 16
        Memory at fbdfe000 (32-bit, non-prefetchable) [size=4K]
        Kernel driver in use: ohci_hcd

00:12.1 USB Controller: ATI Technologies Inc SB700 USB OHCI1 Controller (prog-if 10 [OHCI])
        Subsystem: ASUSTeK Computer Inc. Device 82ef
        Flags: bus master, 66MHz, medium devsel, latency 64, IRQ 16
        Memory at fbdfd000 (32-bit, non-prefetchable) [size=4K]
        Kernel driver in use: ohci_hcd

00:12.2 USB Controller: ATI Technologies Inc SB700/SB800 USB EHCI Controller (prog-if 20 [EHCI])
        Subsystem: ASUSTeK Computer Inc. Device 82ef
        Flags: bus master, 66MHz, medium devsel, latency 64, IRQ 17
        Memory at fbdff000 (32-bit, non-prefetchable) [size=256]
        Capabilities: [c0] Power Management version 2
        Capabilities: [e4] Debug port: BAR=1 offset=00e0
        Kernel driver in use: ehci_hcd

00:13.0 USB Controller: ATI Technologies Inc SB700/SB800 USB OHCI0 Controller (prog-if 10 [OHCI])
        Subsystem: ASUSTeK Computer Inc. Device 82ef
        Flags: bus master, 66MHz, medium devsel, latency 64, IRQ 18
        Memory at fbdfc000 (32-bit, non-prefetchable) [size=4K]
        Kernel driver in use: ohci_hcd

00:13.1 USB Controller: ATI Technologies Inc SB700 USB OHCI1 Controller (prog-if 10 [OHCI])
        Subsystem: ASUSTeK Computer Inc. Device 82ef
        Flags: bus master, 66MHz, medium devsel, latency 64, IRQ 18
        Memory at fbdfb000 (32-bit, non-prefetchable) [size=4K]
        Kernel driver in use: ohci_hcd

00:13.2 USB Controller: ATI Technologies Inc SB700/SB800 USB EHCI Controller (prog-if 20 [EHCI])
        Subsystem: ASUSTeK Computer Inc. Device 82ef
        Flags: bus master, 66MHz, medium devsel, latency 64, IRQ 19
        Memory at fbdfa800 (32-bit, non-prefetchable) [size=256]
        Capabilities: [c0] Power Management version 2
        Capabilities: [e4] Debug port: BAR=1 offset=00e0
        Kernel driver in use: ehci_hcd

00:14.0 SMBus: ATI Technologies Inc SBx00 SMBus Controller (rev 3a)
        Subsystem: ASUSTeK Computer Inc. Device 82ef
        Flags: 66MHz, medium devsel
        Capabilities: [b0] HyperTransport: MSI Mapping Enable- Fixed+
        Kernel driver in use: piix4_smbus
        Kernel modules: i2c-piix4

00:14.1 IDE interface: ATI Technologies Inc SB700/SB800 IDE Controller (prog-if 8a [Master SecP PriP])
        Subsystem: ASUSTeK Computer Inc. Device 82ef
        Flags: bus master, 66MHz, medium devsel, latency 64, IRQ 16
        I/O ports at 01f0 [size=8]
        I/O ports at 03f4 [size=1]
        I/O ports at 0170 [size=8]
        I/O ports at 0374 [size=1]
        I/O ports at ff00 [size=16]
        Capabilities: [70] MSI: Enable- Count=1/1 Maskable- 64bit-
        Kernel driver in use: pata_atiixp
        Kernel modules: ata_generic, pata_acpi, pata_atiixp

00:14.2 Audio device: ATI Technologies Inc SBx00 Azalia (Intel HDA)
        Subsystem: ASUSTeK Computer Inc. Device 8230
        Flags: bus master, slow devsel, latency 64, IRQ 16
        Memory at fbdf4000 (64-bit, non-prefetchable) [size=16K]
        Capabilities: [50] Power Management version 2
        Kernel driver in use: HDA Intel
        Kernel modules: snd-hda-intel

00:14.3 ISA bridge: ATI Technologies Inc SB700/SB800 LPC host controller
        Subsystem: ASUSTeK Computer Inc. Device 82ef
        Flags: bus master, 66MHz, medium devsel, latency 0

00:14.4 PCI bridge: ATI Technologies Inc SBx00 PCI to PCI Bridge (prog-if 01 [Subtractive decode])
        Flags: bus master, 66MHz, medium devsel, latency 64
        Bus: primary=00, secondary=03, subordinate=03, sec-latency=64

00:14.5 USB Controller: ATI Technologies Inc SB700/SB800 USB OHCI2 Controller (prog-if 10 [OHCI])
        Subsystem: ASUSTeK Computer Inc. Device 82ef
        Flags: bus master, 66MHz, medium devsel, latency 64, IRQ 18
        Memory at fbdf9000 (32-bit, non-prefetchable) [size=4K]
        Kernel driver in use: ohci_hcd

00:18.0 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] HyperTransport Technology Configuration
        Flags: fast devsel
        Capabilities: [80] HyperTransport: Host or Secondary Interface

00:18.1 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] Address Map
        Flags: fast devsel

00:18.2 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] DRAM Controller
        Flags: fast devsel
        Kernel modules: amd64_edac_mod

00:18.3 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] Miscellaneous Control
        Flags: fast devsel
        Capabilities: [f0] Secure device <?>
        Kernel driver in use: k8temp
        Kernel modules: k8temp

01:00.0 VGA compatible controller: ATI Technologies Inc RV730 PRO [Radeon HD 4650] (prog-if 00 [VGA controller])
        Subsystem: PC Partner Limited Device e100
        Flags: bus master, fast devsel, latency 0, IRQ 27
        Memory at d0000000 (64-bit, prefetchable) [size=256M]
        Memory at fbef0000 (64-bit, non-prefetchable) [size=64K]
        I/O ports at d000 [size=256]
        Expansion ROM at fbec0000 [disabled] [size=128K]
        Capabilities: [50] Power Management version 3
        Capabilities: [58] Express Legacy Endpoint, MSI 00
        Capabilities: [a0] MSI: Enable+ Count=1/1 Maskable- 64bit+
        Capabilities: [100] Vendor Specific Information: ID=0001 Rev=1 Len=010 <?>
        Kernel driver in use: radeon
        Kernel modules: radeon

01:00.1 Audio device: ATI Technologies Inc R700 Audio Device [Radeon HD 4000 Series]
        Subsystem: PC Partner Limited Device aa38
        Flags: bus master, fast devsel, latency 0, IRQ 19
        Memory at fbeec000 (64-bit, non-prefetchable) [size=16K]
        Capabilities: [50] Power Management version 3
        Capabilities: [58] Express Legacy Endpoint, MSI 00
        Capabilities: [a0] MSI: Enable- Count=1/1 Maskable- 64bit+
        Capabilities: [100] Vendor Specific Information: ID=0001 Rev=1 Len=010 <?>
        Kernel driver in use: HDA Intel
        Kernel modules: snd-hda-intel

02:00.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL8111/8168B PCI Express Gigabit Ethernet controller (rev 02)
        Subsystem: ASUSTeK Computer Inc. Device 82c6
        Flags: fast devsel, IRQ 28
        I/O ports at e800 [disabled] [size=256]
        Memory at fbfff000 (64-bit, non-prefetchable) [disabled] [size=4K]
        Memory at faff0000 (64-bit, prefetchable) [disabled] [size=64K]
        [virtual] Expansion ROM at fbfc0000 [disabled] [size=128K]
        Capabilities: [40] Power Management version 3
        Capabilities: [50] MSI: Enable- Count=1/1 Maskable- 64bit+
        Capabilities: [70] Express Endpoint, MSI 01
        Capabilities: [b0] MSI-X: Enable- Count=2 Masked-
        Capabilities: [d0] Vital Product Data
        Capabilities: [100] Advanced Error Reporting
        Capabilities: [140] Virtual Channel
        Capabilities: [160] Device Serial Number 00-e0-4c-68-00-00-00-01
        Kernel driver in use: r8169
        Kernel modules: r8169

Backtrace: 
WARNING: at net/sched/sch_generic.c:261 dev_watchdog+0xf3/0x164()
Hardware name: System Product Name
NETDEV WATCHDOG: eth0 (r8169): transmit queue 0 timed out
Modules linked in: sunrpc cpufreq_ondemand powernow_k8 freq_table ip6t_REJECT nf_conntrack_ipv6 ip6table_filter ip6_tables ipv6 snd_hda_codec_atihdmi snd_hda_codec_realtek snd_hda_intel snd_hda_codec arc4 snd_hwdep snd_seq ecb usblp joydev snd_seq_device rtl8187 asus_atk0110 mac80211 cfg80211 snd_pcm rfkill eeprom_93cx6 edac_core k8temp edac_mce_amd serio_raw snd_timer snd soundcore i2c_piix4 snd_page_alloc r8169 mii ata_generic pata_acpi dm_multipath pata_atiixp radeon ttm drm_kms_helper drm i2c_algo_bit i2c_core usb_storage [last unloaded: scsi_wait_scan]
Pid: 0, comm: swapper Not tainted 2.6.32.8-48.rc2.fc12.x86_64 #1
Call Trace:
<IRQ>  [<ffffffff81056348>] warn_slowpath_common+0x7c/0x94
[<ffffffff810563b7>] warn_slowpath_fmt+0x41/0x43
[<ffffffff813c5beb>] ? netif_tx_lock+0x44/0x6d
[<ffffffff813c5d55>] dev_watchdog+0xf3/0x164
[<ffffffff81079419>] ? sched_clock_local+0x1c/0x82
[<ffffffff810eb797>] ? sync_supers_timer_fn+0x0/0x1c
[<ffffffff81065a00>] ? mod_timer+0x23/0x25
[<ffffffff810650b4>] run_timer_softirq+0x1c4/0x268
[<ffffffff81026b8e>] ? apic_write+0x16/0x18
[<ffffffff8105d964>] __do_softirq+0xe5/0x1a9
[<ffffffff810805c2>] ? tick_program_event+0x2a/0x2c
[<ffffffff81012e6c>] call_softirq+0x1c/0x30
[<ffffffff810143ea>] do_softirq+0x46/0x86
[<ffffffff8105d7a2>] irq_exit+0x3b/0x7d
[<ffffffff814599fa>] smp_apic_timer_interrupt+0x86/0x94
[<ffffffff81012833>] apic_timer_interrupt+0x13/0x20
<EOI>  [<ffffffff8103020d>] ? native_safe_halt+0xb/0xd
[<ffffffff81018f37>] ? default_idle+0x36/0x53
[<ffffffff8101904f>] ? c1e_idle+0xfb/0x102
[<ffffffff81010cc8>] ? cpu_idle+0xaa/0xe4
[<ffffffff8143e9b7>] ? rest_init+0x6b/0x6d
[<ffffffff81815de2>] ? start_kernel+0x3f4/0x3ff
[<ffffffff818152c1>] ? x86_64_start_reservations+0xac/0xb0
[<ffffffff818153bd>] ? x86_64_start_kernel+0xf8/0x107

I even tried upgrading to the 2.6.32-rc2 koji kernel which was suggested by someone in #Fedora - no change.

Comment 25 Nerijus Baliūnas 2010-02-17 19:37:57 UTC

2.6.31.12-174.2.19.fc12 theoretically should fix this bug:
* Sat Feb  6 2010 Chuck Ebbert <cebbert>  2.6.31.12-174.2.8

- CVE-2009-4537 kernel: r8169 issue reported at 26c3

  (fix taken from Red Hat/CentOS 5.4)

(see http://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2009-4537 )

But it does not - PC either hangs or network is lost after transferring ~100 MB.
2.6.33-0.44.rc8.git0.fc13.i686 seems to be a bit better - I can transfer up to 1 GB, but it still stops working after some time.

Comment 26 Mace Moneta 2010-02-17 19:50:51 UTC

Does setting txqueuelen to 100 help?  I haven't had a reoccurrence since I made the change, but two machines isn't much of a sample:

/sbin/ifconfig eth0 txqueuelen 100

Comment 27 Nerijus Baliūnas 2010-02-17 20:47:20 UTC

No, setting txqueuelen to 100 didn't help.

Comment 28 Mace Moneta 2010-02-23 16:37:57 UTC

Re-occurrence on kernel 2.6.32.8-58.fc12.x86_64:

------------[ cut here ]------------
WARNING: at net/sched/sch_generic.c:261 dev_watchdog+0xf3/0x164()
Hardware name: C2SEA
NETDEV WATCHDOG: eth0 (r8169): transmit queue 0 timed out
Modules linked in: fuse w83627ehf hwmon_vid coretemp vmnet vmblock vsock vmci vmmon cpufreq_ondemand acpi_cpufreq freq_table ipv6 kvm_intel kvm uinput snd_hda_codec_realtek usblp snd_hda_intel nvidia(P) snd_hda_codec snd_hwdep ppdev snd_seq parport_pc snd_seq_device parport snd_pcm snd_timer i2c_i801 r8169 snd iTCO_wdt iTCO_vendor_support soundcore mii snd_page_alloc raid0 raid1 pata_acpi ata_generic dm_multipath firewire_ohci firewire_core crc_itu_t pata_it8213 i2c_core [last unloaded: i2c_algo_bit]
Pid: 7, comm: ksoftirqd/1 Tainted: P           2.6.32.8-58.fc12.x86_64 #1
Call Trace:
 <IRQ>  [<ffffffff81056348>] warn_slowpath_common+0x7c/0x94
 [<ffffffff810563b7>] warn_slowpath_fmt+0x41/0x43
 [<ffffffff813c5e0b>] ? netif_tx_lock+0x44/0x6d
 [<ffffffff813c5f75>] dev_watchdog+0xf3/0x164
 [<ffffffff8107071a>] ? __queue_work+0x3a/0x41
 [<ffffffff810650b4>] run_timer_softirq+0x1c4/0x268
 [<ffffffff8105d964>] __do_softirq+0xe5/0x1a9
 [<ffffffff81012e6c>] call_softirq+0x1c/0x30
 <EOI>  [<ffffffff810143ea>] do_softirq+0x46/0x86
 [<ffffffff8105d49c>] ksoftirqd+0x65/0xee
 [<ffffffff8105d437>] ? ksoftirqd+0x0/0xee
 [<ffffffff810745b6>] kthread+0x7f/0x87
 [<ffffffff81012d6a>] child_rip+0xa/0x20
 [<ffffffff81074537>] ? kthread+0x0/0x87
 [<ffffffff81012d60>] ? child_rip+0x0/0x20
---[ end trace 23eb60a0516292f4 ]---
r8169: eth0: link up

Comment 29 Evan Klitzke 2010-02-23 17:17:37 UTC

In desperation I grabbed kernel-2.6.33-0.46.rc8.git1.fc13.i686.rpm, the latest kernel in the f13 directories on the mirrors, and am still having this problem. I don't have a call trace handy, but I can add it if anyone thinks it would be helpful.

Comment 30 Nerijus Baliūnas 2010-02-23 17:42:44 UTC

It seems the latest F11 kernel 2.6.30.10-105.2.23.fc11.i586 works! I transfered 8 GB already and it still didn't crash or loose network.

Comment 31 Antonio Gallardo 2010-02-25 21:17:18 UTC

(In reply to comment #30)
> It seems the latest F11 kernel 2.6.30.10-105.2.23.fc11.i586 works! I transfered
> 8 GB already and it still didn't crash or loose network.    

We just have a crash with the above kernel. The weird thing is there is no more stack trace, just a lot of "r8169: eht1 link up" messages:

Feb 25 14:37:55 ags01 kernel: r8169: eth1: link up
Feb 25 14:38:07 ags01 kernel: r8169: eth1: link up
Feb 25 14:38:19 ags01 kernel: r8169: eth1: link up
Feb 25 14:38:31 ags01 kernel: r8169: eth1: link up
Feb 25 14:38:43 ags01 kernel: r8169: eth1: link up
Feb 25 14:38:55 ags01 kernel: r8169: eth1: link up
Feb 25 14:39:07 ags01 kernel: r8169: eth1: link up

The info of my machine is at:

https://bugzilla.redhat.com/show_bug.cgi?id=518801#c2

Comment 32 eeriegeek 2010-02-28 06:32:52 UTC

I seem to have this problem as well. Just downloaded the 3.3 GB Fedora Core 12 DVD over this network link without trouble and installed some new hardware. When running yum update the link to the router froze about every 50 MB. Resetting the router or someties just unplugging and reconnecting the ethernet cable usually gets it going again. Messages like this from dmesg:

r8169: eth0: link down
r8169: eth0: link up

kernel version: 2.6.31.12-174.2.22.fc12.x86_64

eth0 is onboard MB Gigabyte EP45-UD3P

r8169 Gigabit Ethernet driver 2.3LK-NAPI loaded
r8169 0000:04:00.0: PCI INT A -> GSI 16 (level, low) -> IRQ 16
r8169 0000:04:00.0: setting latency timer to 64
  alloc irq_desc for 28 on node 0
  alloc kstat_irqs on node 0
r8169 0000:04:00.0: irq 28 for MSI/MSI-X
eth0: RTL8168c/8111c at 0xffffc90000674000, 00:24:1d:83:64:17, XID 3c4000c0 IRQ 28

Comment 33 Ryan Pisani 2010-03-06 17:34:43 UTC

This issue is still present in 2.6.32.9-67.fc12.x86_64.

Under heavy load it drops, a few minutes later the adapter resets and the link comes back on. Happens sporadically when watching HD mpg files over the network. 


WARNING: at net/sched/sch_generic.c:261 dev_watchdog+0xf3/0x164()
Hardware name: Studio Hybrid 140g
NETDEV WATCHDOG: eth0 (r8169): transmit queue 0 timed out
Modules linked in: nfs lockd fscache nfs_acl auth_rpcgss autofs4 sunrpc cpufreq_ondemand acpi_cpufreq freq_table ip6t_REJECT nf_conntrack_ipv6 ip6table_filter ip6_tables ipv6 snd_hda_codec_realtek snd_hda_intel snd_hda_codec snd_hwdep r8169 snd_seq_dummy snd_seq_oss snd_seq_midi_event snd_seq snd_seq_device snd_pcm_oss snd_mixer_oss snd_pcm mii snd_timer i2c_i801 iTCO_wdt iTCO_vendor_support lirc_mceusb snd soundcore snd_page_alloc dcdbas ssb mmc_core lirc_dev serio_raw joydev firewire_ohci firewire_core crc_itu_t usb_storage i915 drm_kms_helper drm i2c_algo_bit i2c_core video output [last unloaded: microcode]
Pid: 0, comm: swapper Not tainted 2.6.32.9-67.fc12.x86_64 #1
Call Trace:
 <IRQ>  [<ffffffff81056348>] warn_slowpath_common+0x7c/0x94
 [<ffffffff810563b7>] warn_slowpath_fmt+0x41/0x43
 [<ffffffff813c5ec3>] ? netif_tx_lock+0x44/0x6d
 [<ffffffff813c602d>] dev_watchdog+0xf3/0x164
 [<ffffffff8107071a>] ? __queue_work+0x3a/0x41
 [<ffffffff810650b4>] run_timer_softirq+0x1c4/0x268
 [<ffffffff8105d964>] __do_softirq+0xe5/0x1a9
 [<ffffffff810acd61>] ? handle_IRQ_event+0x60/0x121
 [<ffffffff81012e6c>] call_softirq+0x1c/0x30
 [<ffffffff810143ea>] do_softirq+0x46/0x86
 [<ffffffff8105d7a2>] irq_exit+0x3b/0x7d
 [<ffffffff81459c5d>] do_IRQ+0xa5/0xbc
 [<ffffffff81012693>] ret_from_intr+0x0/0x11
 <EOI>  [<ffffffff81294de7>] ? acpi_idle_enter_simple+0x109/0x13d
 [<ffffffff81294de0>] ? acpi_idle_enter_simple+0x102/0x13d
 [<ffffffff81294b0b>] ? acpi_idle_enter_bm+0xd8/0x2ab
 [<ffffffff81458177>] ? notifier_call_chain+0x14/0x63
 [<ffffffff81388776>] ? cpuidle_idle_call+0x99/0xf3
 [<ffffffff81010cc8>] ? cpu_idle+0xaa/0xe4
 [<ffffffff8143ecb7>] ? rest_init+0x6b/0x6d
 [<ffffffff81817de2>] ? start_kernel+0x3f4/0x3ff
 [<ffffffff818172c1>] ? x86_64_start_reservations+0xac/0xb0
 [<ffffffff818173bd>] ? x86_64_start_kernel+0xf8/0x107
---[ end trace 4764d7d8c429a209 ]---
r8169: eth0: link up

Comment 34 fred2 2010-03-10 16:34:44 UTC

problem is present in fedora 13 alpha:
Bug 572252 - NETDEV WATCHDOG: eth0 (r8169): transmit queue 0 timed out
kernel version is: 2.6.33-0.52.rc8.git6.fc13.x86_64

Comment 35 Sergio Augusto Vladisauskis 2010-03-10 18:06:35 UTC

I have same bug!
My network card is a  Realtek Semiconductor Co., Ltd. RTL8111/8168B PCI Express
Gigabit Ethernet controller (rev 02).
I using Fediora 12 x86_64.

My /var/log/message:

------------[ cut here ]------------
WARNING: at net/sched/sch_generic.c:246 dev_watchdog+0xf3/0x164() (Tainted: G  
     W )
Hardware name: A790GXM-A
NETDEV WATCHDOG: eth0 (r8169): transmit queue 0 timed out
Modules linked in: usblp xpad sunrpc ipv6 cpufreq_ondemand powernow_k8
freq_table microcode fuse snd_usb_audio snd_usb_lib snd_rawmidi gspca_sonixj
gspca_main videodev v4l1_compat v4l2_compat_ioctl32 dm_multipath uinput joydev
snd_hda_codec_atihdmi snd_hda_codec_realtek snd_hda_intel snd_hda_codec
snd_hwdep snd_seq snd_seq_device snd_pcm snd_timer amd64_edac_mod r8169 snd
edac_core soundcore mii snd_page_alloc i2c_piix4 serio_raw wmi ata_generic
pata_acpi pata_atiixp usb_storage radeon ttm drm_kms_helper drm i2c_algo_bit
i2c_core [last unloaded: scsi_wait_scan]
Pid: 0, comm: swapper Tainted: G        W  2.6.31.9-174.fc12.x86_64 #1
Call Trace:
 <IRQ>  [<ffffffff81051710>] warn_slowpath_common+0x84/0x9c
 [<ffffffff8105177f>] warn_slowpath_fmt+0x41/0x43
 [<ffffffff81391009>] ? netif_tx_lock+0x44/0x6d
 [<ffffffff81391173>] dev_watchdog+0xf3/0x164
 [<ffffffff8106409c>] ? __queue_work+0x3a/0x43
 [<ffffffff8107199f>] ? clockevents_program_event+0x7a/0x83
 [<ffffffff8105bee0>] run_timer_softirq+0x19f/0x21c
 [<ffffffff81057630>] __do_softirq+0xdd/0x1ad
 [<ffffffff81026966>] ? apic_write+0x16/0x18
 [<ffffffff81012eac>] call_softirq+0x1c/0x30
 [<ffffffff810143fb>] do_softirq+0x47/0x8d
 [<ffffffff81057342>] irq_exit+0x44/0x86
 [<ffffffff814214b5>] do_IRQ+0xa5/0xbc
 [<ffffffff810126d3>] ret_from_intr+0x0/0x11
 <EOI>  [<ffffffff8102e23d>] ? native_safe_halt+0xb/0xd
 [<ffffffff81018e1f>] ? default_idle+0x47/0x6d
 [<ffffffff81018f1e>] ? c1e_idle+0xd9/0x102
 [<ffffffff81010c60>] ? cpu_idle+0xa6/0xe9
 [<ffffffff8141705e>] ? start_secondary+0x1f3/0x234
---[ end trace 501f7ea14d007e33 ]---
r8169: eth0: link up
r8169: eth0: link up

Comment 36 unknown32 2010-03-14 16:12:13 UTC

Here is my list.
Since installed FC12 I am yet to have a stable system where I have the computer up more then 24 hours. This also affects rawhide as well. 
WARNING: at net/sched/sch_generic.c:261 dev_watchdog+0xf3/0x164()
Hardware name: System Product Name
NETDEV WATCHDOG: eth0 (r8169): transmit queue 0 timed out
Modules linked in: fuse sunrpc cpufreq_ondemand powernow_k8 freq_table ip6t_REJECT nf_conntrack_ipv6 ip6table_filter ip6_tables ipv6 microcode uinput snd_hda_codec_realtek snd_hda_intel snd_hda_codec snd_hwdep snd_seq snd_seq_device snd_pcm ppdev shpchp parport_pc parport snd_timer edac_core asus_atk0110 edac_mce_amd snd r8169 mii i2c_piix4 soundcore snd_page_alloc pata_acpi ata_generic dm_multipath pata_atiixp radeon ttm drm_kms_helper drm i2c_algo_bit i2c_core [last unloaded: scsi_wait_scan]
Pid: 0, comm: swapper Not tainted 2.6.32.9-70.fc12.x86_64 #1
Call Trace:
<IRQ>  [<ffffffff81056348>] warn_slowpath_common+0x7c/0x94
[<ffffffff810563b7>] warn_slowpath_fmt+0x41/0x43
[<ffffffff813c5f03>] ? netif_tx_lock+0x44/0x6d
[<ffffffff813c606d>] dev_watchdog+0xf3/0x164
[<ffffffff81079325>] ? sched_clock_local+0x1c/0x82
[<ffffffff81079451>] ? sched_clock_cpu+0xc6/0xd1
[<ffffffff810650b4>] run_timer_softirq+0x1c4/0x268
[<ffffffff81026b8e>] ? apic_write+0x16/0x18
[<ffffffff8105d964>] __do_softirq+0xe5/0x1a9
[<ffffffff810804ce>] ? tick_program_event+0x2a/0x2c
[<ffffffff81012e6c>] call_softirq+0x1c/0x30
[<ffffffff810143ea>] do_softirq+0x46/0x86
[<ffffffff8105d7a2>] irq_exit+0x3b/0x7d
[<ffffffff81459d3a>] smp_apic_timer_interrupt+0x86/0x94
[<ffffffff81012833>] apic_timer_interrupt+0x13/0x20
<EOI>  [<ffffffff8103020d>] ? native_safe_halt+0xb/0xd
[<ffffffff81018f37>] ? default_idle+0x36/0x53
[<ffffffff8101904f>] ? c1e_idle+0xfb/0x102
[<ffffffff81010cc8>] ? cpu_idle+0xaa/0xe4
[<ffffffff8143ecf7>] ? rest_init+0x6b/0x6d
[<ffffffff81817de2>] ? start_kernel+0x3f4/0x3ff
[<ffffffff818172c1>] ? x86_64_start_reservations+0xac/0xb0
[<ffffffff818173bd>] ? x86_64_start_kernel+0xf8/0x107

Comment 37 fred2 2010-04-14 15:03:54 UTC

problem is still present in fedora 13 beta livecd x86_64
$ uname -a
Linux localhost.localdomain 2.6.33.1-24.fc13.x86_64 #1 SMP Tue Mar 30 18:21:22 UTC 2010 x86_64 x86_64 x86_64 GNU/Linux

see also https://bugzilla.redhat.com/show_bug.cgi?id=576058

Comment 38 Eric Smith 2010-04-25 09:25:01 UTC

*** Bug 576058 has been marked as a duplicate of this bug. ***

Comment 39 Kevin White 2010-04-30 00:03:21 UTC

This problem also affects me, running Fedora 12 i386.  After a fresh install, I can't even "yum install iperf"...the NIC hangs before the metadata can be downloaded.  And I'm using a default yum setup, so the yum data is coming from the Internet, not a local mirror.
I find it odd that the _install_ works: that is done across the local, gigabit network, to a NFS share.  But as soon as the OS reboots, it gets crazy.

I'm using a Jetway NF76-N1GL-LF with a VIA Nano U2300, VIA VX800, and Realtek RTL8111C PCI-E Gigabit Ethernet.

There _is_ a known BIOS issue with this particular board and CPU.  I am running a BIOS (A05) that fixes that known problem.

Once I get into failure state, the machine doesn't even notice link down.  I can unplug the network cable and ethtool still shows link up.

processor       : 0
vendor_id       : CentaurHauls
cpu family      : 6
model           : 15
model name      : VIA Nano processor U2300@1000MHz
stepping        : 2
cpu MHz         : 533.000
cache size      : 1024 KB
fdiv_bug        : no
hlt_bug         : no
f00f_bug        : no
coma_bug        : no
fpu             : yes
fpu_exception   : yes
cpuid level     : 10
wp              : yes
flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat clflush acpi mmx fxsr sse sse2 ss tm syscall nx fxsr_opt rdtscp lm constant_tsc up rep_good pni monitor est tm2 ssse3 cx16 xtpr rng rng_en ace ace_en ace2 phe phe_en lahf_lm
bogomips        : 1994.75
clflush size    : 64
power management:

------------[ cut here ]------------
WARNING: at net/sched/sch_generic.c:246 dev_watchdog+0xc6/0x12d() (Not tainted)
Hardware name: VX800
NETDEV WATCHDOG: eth0 (r8169): transmit queue 0 timed out
Modules linked in: sunrpc ip6_tables cpufreq_ondemand acpi_cpufreq dm_multipath uinput snd_hda_codec_via snd_hda_intel snd_hda_codec snd_hwdep snd_seq snd_seq_device snd_pcm snd_timer i2c_viapro snd soundcore snd_page_alloc r8169 serio_raw i2c_core mii pata_acpi ata_generic pata_via [last unloaded: scsi_wait_scan]
Pid: 0, comm: swapper Not tainted 2.6.31.5-127.fc12.i686.PAE #1
Call Trace:
 [<c043db03>] warn_slowpath_common+0x70/0x87
 [<c06ff9d0>] ? dev_watchdog+0xc6/0x12d
 [<c043db58>] warn_slowpath_fmt+0x29/0x2c
 [<c06ff9d0>] dev_watchdog+0xc6/0x12d
 [<c04471bd>] ? mod_timer+0x20/0x27
 [<c0680021>] ? usb_hcd_poll_rh_status+0x126/0x12e
 [<c06ff90a>] ? dev_watchdog+0x0/0x12d
 [<c0446e91>] run_timer_softirq+0x14e/0x1af
 [<c0442daa>] __do_softirq+0xb1/0x157
 [<c0442e86>] do_softirq+0x36/0x41
 [<c0442f78>] irq_exit+0x2e/0x61
 [<c041cf17>] smp_apic_timer_interrupt+0x6d/0x7b
 [<c04099b5>] apic_timer_interrupt+0x31/0x38
 [<c040f34b>] ? mwait_idle+0x67/0x85
 [<c040811f>] cpu_idle+0x96/0xaf
 [<c0765784>] rest_init+0x58/0x5a
 [<c09a78c3>] start_kernel+0x32b/0x330
 [<c09a7081>] i386_start_kernel+0x70/0x77
---[ end trace 8eb83276b856939b ]---

Comment 40 Kyle McMartin 2010-04-30 13:47:48 UTC

http://git.kernel.org/?p=linux/kernel/git/davem/net-next-2.6.git;a=commit;h=630b943c182d1aed69f244405131902fbcba7ec6

I think this may possibly be related...

Comment 41 Kyle McMartin 2010-05-01 23:41:11 UTC

Care to try this build? http://koji.fedoraproject.org/koji/taskinfo?taskID=2154510

It's a revert of all the r8169 code since 2.6.30, hopefully that will tell us if it's a bug in the driver, or if the driver is using core functionality wrong (which is what I think is going on... I've inspected all the changes to the code since and there's nothing even remotely suspicious.)

regards, Kyle

Comment 42 Norm Lunde 2010-05-03 21:16:43 UTC

I've been having similar problems on my FC12 host (2.6.32.11-99.fc12.x86_64) but only when running with bridged networking (to support KVM/QEMU guests.)

When I try to scp a large file to or from the host system, the console goes blank and the machine locks up completely (no ping, no response to keyboard or mouse events, etc.)   The lockup consistently occurs after a few MB of data have been transferred.  This happens even when none of the guest VMs are running.

I tried the latest r8168 and r8169 drivers from Realtek's web site, but they did not help.

When I return to the original, non-bridged configuration and reboot, the problem goes away.

Next I plan to try the bridge setup with a different NIC.

Norm

Comment 43 Michal Hlavinka 2010-05-03 21:51:49 UTC

(In reply to comment #41)
> Care to try this build?
> http://koji.fedoraproject.org/koji/taskinfo?taskID=2154510

I've tried to use this kernel, but my only hw where I see this bug is my headless router+nas running F-12 (I've bought extra marwell NICs as workaround). I've installed this kernel after linux-firmware and grubby updated to F-13 versions as required by rpm, but this kernel does not boot at all. This is headless server and I have only laptops and no lcd available, so I can't check why this kernel does not boot,logs are empty. sorry, maybe someone else can test it

Comment 44 Norm Lunde 2010-05-03 21:58:38 UTC

(In reply to comment #42)
> Next I plan to try the bridge setup with a different NIC.

With an Intel e1000 NIC and bridged networking I don't see the problem.  I'm able to do the same large file copies with scp that locked up the machine previously.

Norm

Comment 45 Kevin White 2010-05-04 02:38:30 UTC

I tried the Koji kernel...no luck.

Interesting data points.  One, I installed Fedora 13 beta to do it.  I did a network install, like I had done many times with Fedora 12, only this time I even did VNC.  One of the things that baffled me before was that an F12 network install worked fine...repeatedly.  Well, this F13 beta network install, using VNC, did not.  The NIC died before package selection and the VNC session was terminated.

So, I had to write the F13 ISO to a USB flash drive and do the install that way.  After the box came up, I did a "yum update", and was excited that I actually got to the package download stage, but it died while downloading packages.

So, I copied the Koji kernel to a USB flash drive and installed it...rebooted, and resumed the yum update...death again 50% of the way in.

Now, just to throw some more uncertainty into the mix: there _could_ be a hardware issue.  The reason I say that is that I'm doing all of this on a Jetway NF76 board.  However, my main desktop is a Foxconn A79A-S AM2+ board, using the 8111B NIC.  The Jetway NF76 board uses and 8111C...could be a B to C difference, maybe...but they both use the same driver.  And I've never had any trouble with the A79A-S.

I contacted Jetway support and they tell me that their techs tested after I contacted them and said they didn't see the problem.  However, I haven't gotten confirmation exactly what they tested (as this chain shows, if they didn't try a new enough Linux install, they might not see the problem.)  I'm going to continue to press them for more data, as well as push to get a replacement board.

Maybe there's something stupid about implementing the 8111C that makes it somehow "fragile".

Comment 46 Michal Hlavinka 2010-05-04 07:05:54 UTC

Kevin, as I wrote in comment #17 I suspect all these issues are only on 8111C hw (sometimes reported as 8111B when using lspci, but correctly identified as 8111C in dmesg). I know 3 machines using the same kernel with 8111B and all of them works fine. Only 8111C is causing this troubles.

Comment 47 Mace Moneta 2010-05-04 07:36:41 UTC

Comment #21 is a B, if I'm not mistaken (see the lspci), exhibiting the same problem (on two machines, same motherboard: Supermicro C2SEA).

Comment 48 Mace Moneta 2010-05-04 07:38:12 UTC

Correction; dmesg identifies it as a C:

r8169 Gigabit Ethernet driver 2.3LK-NAPI loaded
r8169 0000:03:00.0: PCI INT A -> GSI 17 (level, low) -> IRQ 17
r8169 0000:03:00.0: setting latency timer to 64
  alloc irq_desc for 28 on node -1
  alloc kstat_irqs on node -1
r8169 0000:03:00.0: irq 28 for MSI/MSI-X
eth0: RTL8168c/8111c at 0xffffc90000c76000, 00:30:48:b0:96:f0, XID 1c4000c0 IRQ 28

Comment 49 Pavel Holica 2010-05-04 08:15:47 UTC

I've encountered similar problems on http://www.smolts.org/client/show/pub_c6f3438d-74fb-41da-9152-7e7cd595d38b

Apr 18 20:22:37 janet kernel: eth1: RTL8168d/8111d at 0xffffc90000676000, 00:27:0e:03:60:a8, XID 081000c0 IRQ 29

Here is error message from log:

Apr  1 18:39:11 janet kernel: ------------[ cut here ]------------
Apr  1 18:39:11 janet kernel: WARNING: at net/sched/sch_generic.c:261 dev_watchd
og+0xf3/0x164()
Apr  1 18:39:11 janet kernel: Hardware name:         
Apr  1 18:39:11 janet kernel: NETDEV WATCHDOG: eth0 (r8169): transmit queue 0 ti
med out
Apr  1 18:39:11 janet kernel: Modules linked in: sunrpc p4_clockmod freq_table s
peedstep_lib nf_conntrack_ftp ip6t_REJECT nf_conntrack_ipv6 ip6table_filter ip6_
tables ipv6 tda18271 af9013 snd_hda_codec_realtek snd_hda_intel snd_hda_codec sn
d_hwdep snd_seq snd_seq_device snd_pcm snd_timer r8169 snd mii ppdev parport_pc 
parport i2c_i801 soundcore dvb_usb_af9015 snd_page_alloc dvb_usb serio_raw dvb_c
ore dm_multipath i915 drm_kms_helper drm i2c_algo_bit i2c_core video output [las
t unloaded: freq_table]
Apr  1 18:39:11 janet kernel: Pid: 0, comm: swapper Not tainted 2.6.32.10-90.fc1
2.x86_64 #1
Apr  1 18:39:11 janet kernel: Call Trace:
Apr  1 18:39:11 janet kernel: <IRQ>  [<ffffffff81056350>] warn_slowpath_common+0
x7c/0x94
Apr  1 18:39:11 janet kernel: [<ffffffff810563bf>] warn_slowpath_fmt+0x41/0x43
Apr  1 18:39:11 janet kernel: [<ffffffff813c610b>] ? netif_tx_lock+0x44/0x6d
Apr  1 18:39:11 janet kernel: [<ffffffff813c6275>] dev_watchdog+0xf3/0x164
Apr  1 18:39:11 janet kernel: [<ffffffff8107932d>] ? sched_clock_local+0x1c/0x82
Apr  1 18:39:11 janet kernel: [<ffffffff81079459>] ? sched_clock_cpu+0xc6/0xd1
Apr  1 18:39:11 janet kernel: [<ffffffff810650bc>] run_timer_softirq+0x1c4/0x268
Apr  1 18:39:11 janet kernel: [<ffffffff81026b9e>] ? apic_write+0x16/0x18
Apr  1 18:39:11 janet kernel: [<ffffffff8105d96c>] __do_softirq+0xe5/0x1a9
Apr  1 18:39:11 janet kernel: [<ffffffff810804d6>] ? tick_program_event+0x2a/0x2c
Apr  1 18:39:11 janet kernel: [<ffffffff81012e6c>] call_softirq+0x1c/0x30
Apr  1 18:39:11 janet kernel: [<ffffffff810143ea>] do_softirq+0x46/0x86
Apr  1 18:39:11 janet kernel: [<ffffffff8105d7aa>] irq_exit+0x3b/0x7d
Apr  1 18:39:11 janet kernel: [<ffffffff81459f4a>] smp_apic_timer_interrupt+0x86/0x94
Apr  1 18:39:11 janet kernel: [<ffffffff81012833>] apic_timer_interrupt+0x13/0x20
Apr  1 18:39:11 janet kernel: <EOI>  [<ffffffff81019157>] ? mwait_idle+0x7a/0x88
Apr  1 18:39:11 janet kernel: [<ffffffff81019109>] ? mwait_idle+0x2c/0x88
Apr  1 18:39:11 janet kernel: [<ffffffff81010cc8>] ? cpu_idle+0xaa/0xe4
Apr  1 18:39:11 janet kernel: [<ffffffff8143ef07>] ? rest_init+0x6b/0x6d
Apr  1 18:39:11 janet kernel: [<ffffffff81817de2>] ? start_kernel+0x3f4/0x3ff
Apr  1 18:39:11 janet kernel: [<ffffffff818172c1>] ? x86_64_start_reservations+0xac/0xb0
Apr  1 18:39:11 janet kernel: [<ffffffff818173bd>] ? x86_64_start_kernel+0xf8/0x107
Apr  1 18:39:11 janet kernel: ---[ end trace 91df6b381cb83a69 ]---
Apr  1 18:39:11 janet kernel: r8169: eth0: link up

Comment 50 Adrian 2010-05-04 09:56:56 UTC

For what it's worth, I do not have the same problems in Arch (2.6.33.3-1) or Ubuntu 10.04.

Comment 51 Kyle McMartin 2010-05-06 04:47:59 UTC

Thanks Adrian, that info helped. Try this scratch build please (when it completes):
http://koji.fedoraproject.org/koji/taskinfo?taskID=2166491

Hopefully that will fix the issue.

regards, Kyle

Comment 52 Kevin White 2010-05-06 19:21:27 UTC

Sadly, this scratch build didn't fix the problem for me.  I took my same Fedora 13 Beta install that has never had a "yum update" successfully run, I installed kernel-2.6.33.3-84.bz538920.fc13.src.rpm, i686 on it, did "yum clean all" then "yum update", and the NIC went to 0 bytes at about 30% downloaded of the new RPMs.

I'm only one data point, and I'm beginning to believe that the board I'm using might have issues...

I've been working on getting the box to run Ubuntu as well, but had trouble getting it to install and boot on the same hard drive that has three other Red Hat products on it.  :)  I need to throw another hard drive in and see if I can confirm Ubuntu working for me.

Comment 53 Kevin White 2010-05-06 19:45:27 UTC

One more thing: with this test kernel, the network watchdog never fires.  I let yum sit there for 45 minutes getting 0 bytes of data, and the watchdog never fired.

Comment 54 Kyle McMartin 2010-05-07 04:48:40 UTC

Thanks, can people on F-12 and F-13 try these builds with code the upstream maintainer of the driver has asked for testing of:

http://koji.fedoraproject.org/koji/taskinfo?taskID=2170701 for F-13
http://koji.fedoraproject.org/koji/taskinfo?taskID=2170695 for F-12

Thanks! Kyle.

Comment 55 Michal Hlavinka 2010-05-07 07:02:22 UTC

Hi Kyle, 

> http://koji.fedoraproject.org/koji/taskinfo?taskID=2170695 for F-12

this build failed

Comment 56 Dennis Gilmore 2010-05-07 14:12:56 UTC

on F-13 i got 
------------[ cut here ]------------
WARNING: at net/sched/sch_generic.c:255 dev_watchdog+0xf0/0x192()
Hardware name: X7SLA
NETDEV WATCHDOG: eth1 (r8169): transmit queue 0 timed out
Modules linked in: tun sit tunnel4 ipt_MASQUERADE iptable_nat nf_nat nf_conntrack_ipv6 ip6t_REJECT ip6table_filter ip6_tables ipv6 r8169 mii serio_raw i2c_i801 iTCO_wdt iTCO_vendor_support i915 drm_kms_helper drm i2c_algo_bit i2c_core video output [last unloaded: scsi_wait_scan]
Pid: 0, comm: swapper Not tainted 2.6.33.3-85.bz538920.fc13.x86_64 #1
Call Trace:
 <IRQ>  [<ffffffff8104b558>] warn_slowpath_common+0x77/0x8f
 [<ffffffff8104b5bd>] warn_slowpath_fmt+0x3c/0x3e
 [<ffffffff8139caa3>] ? netif_tx_lock+0x3f/0x68
 [<ffffffff8139cbbc>] dev_watchdog+0xf0/0x192
 [<ffffffff8106956d>] ? sched_clock_local+0x1c/0x82
 [<ffffffff81069696>] ? sched_clock_cpu+0xc3/0xce
 [<ffffffff810583d6>] run_timer_softirq+0x1ba/0x25e
 [<ffffffff8106c299>] ? ktime_get+0x60/0xb9
 [<ffffffff810516c5>] __do_softirq+0xe0/0x1a1
 [<ffffffff810702d8>] ? tick_program_event+0x25/0x27
 [<ffffffff8100aa1c>] call_softirq+0x1c/0x30
 [<ffffffff8100c21d>] do_softirq+0x41/0x7e
 [<ffffffff81051518>] irq_exit+0x36/0x78
 [<ffffffff81020234>] smp_apic_timer_interrupt+0x89/0x97
 [<ffffffff8100a4d3>] apic_timer_interrupt+0x13/0x20
 <EOI>  [<ffffffff8101138b>] ? mwait_idle+0x75/0x83
 [<ffffffff8101133d>] ? mwait_idle+0x27/0x83
 [<ffffffff81008bfd>] cpu_idle+0xa5/0xdf
 [<ffffffff81413f55>] rest_init+0x79/0x7b
 [<ffffffff81ba6df8>] start_kernel+0x40e/0x419
 [<ffffffff81ba62bc>] x86_64_start_reservations+0xa7/0xab
 [<ffffffff81ba63b8>] x86_64_start_kernel+0xf8/0x107
---[ end trace 1d732f2b440b4f24 ]---
r8169: eth1: link up
r8169: eth1: link up
r8169: eth1: link up

eth0 which is the same hardware was still ok and working fine.  eth1 stoped working at that point

Comment 57 unknown32 2010-05-19 02:07:48 UTC

Went to the FC13 beta and same issue as before when I reported on FC12 Release & rawhide.

here is my latest backtrace:

------------[ cut here ]------------
WARNING: at net/sched/sch_generic.c:255 dev_watchdog+0xf0/0x192()
Hardware name: System Product Name
NETDEV WATCHDOG: eth0 (r8169): transmit queue 0 timed out
Modules linked in: fuse sunrpc cpufreq_ondemand powernow_k8 freq_table ip6t_REJECT nf_conntrack_ipv6 ip6table_filter ip6_tables ipv6 uinput snd_hda_codec_realtek snd_hda_intel snd_hda_codec snd_hwdep snd_seq snd_seq_device snd_pcm ppdev snd_timer snd r8169 soundcore shpchp parport_pc mii edac_core parport edac_mce_amd snd_page_alloc i2c_piix4 microcode k10temp asus_atk0110 ata_generic pata_acpi pata_atiixp radeon ttm drm_kms_helper drm i2c_algo_bit i2c_core [last unloaded: scsi_wait_scan]
Pid: 0, comm: swapper Not tainted 2.6.33.3-85.fc13.x86_64 #1
Call Trace:
<IRQ>  [<ffffffff8104b558>] warn_slowpath_common+0x77/0x8f
[<ffffffff8104b5bd>] warn_slowpath_fmt+0x3c/0x3e
[<ffffffff8139caa3>] ? netif_tx_lock+0x3f/0x68
[<ffffffff8139cbbc>] dev_watchdog+0xf0/0x192
[<ffffffff8106956d>] ? sched_clock_local+0x1c/0x82
[<ffffffff8106117c>] ? __queue_work+0x35/0x3c
[<ffffffff810583d6>] run_timer_softirq+0x1ba/0x25e
[<ffffffff8106c299>] ? ktime_get+0x60/0xb9
[<ffffffff810516c5>] __do_softirq+0xe0/0x1a1
[<ffffffff810702d8>] ? tick_program_event+0x25/0x27
[<ffffffff8100aa1c>] call_softirq+0x1c/0x30
[<ffffffff8100c21d>] do_softirq+0x41/0x7e
[<ffffffff81051518>] irq_exit+0x36/0x78
[<ffffffff81020234>] smp_apic_timer_interrupt+0x89/0x97
[<ffffffff8100a4d3>] apic_timer_interrupt+0x13/0x20
<EOI>  [<ffffffff81028384>] ? native_safe_halt+0x6/0x8
[<ffffffff8101117a>] default_idle+0x31/0x4e
[<ffffffff8101128d>] c1e_idle+0xf6/0xfd
[<ffffffff81008bfd>] cpu_idle+0xa5/0xdf
[<ffffffff814238cd>] start_secondary+0x1f2/0x233

Comment 58 Jason Priebe 2010-05-19 10:21:27 UTC

FWIW -- I've been testing the drivers provided by Realtek (version 8.018.00, 2010-04-01).

These drivers have been MUCH more stable for me under Fedora 11.

I've been stress testing two systems by moving large files over HTTP (approximately 1 GB each).  One system has the stock driver provided by Fedora 11.  The other has the vendor driver.  With the stock driver, my test would result in network blips about 5 times per day.  With the vendor driver, I've seen no blips.

With the stock driver, I've seen a mix of the symptoms described here.  Sometimes I get just "eth0: link up" in the log.  Occasionally, I would get the full watchdog error with call trace.  Also, most of the time, the interruption in the network was brief (probably under 10 seconds).  But sometimes (twice in 6 weeks), the network would go down for minutes, or even hours, before recovering.

I've been testing the realtek driver for about 5 days, and I haven't seen any of these problems.  Still crossing my fingers, though, because in that relatively short timeframe, the stock driver didn't exhibit the "full" network outage problem, either (I've only seen that twice in about 6 weeks of prior testing).  So I guess it's still possible that whatever causes that could still happen with the vendor driver.

Comment 59 nayfield 2010-05-21 01:06:49 UTC

I am having the same issue on the latest f12 PAE kernel, but I have a newer realtek than the 8111C.  System is an asus eee box.  Happens when viewing video over the network.


from dmesg:
eth0: RTL8168d/8111d at 0xf8788000, 48:5b:39:08:d5:59, XID 083000c0 IRQ 28

trace:
------------[ cut here ]------------
WARNING: at net/sched/sch_generic.c:261 dev_watchdog+0xc6/0x12d()
Hardware name: EB1501
NETDEV WATCHDOG: eth0 (r8169): transmit queue 0 timed out
Modules linked in: fuse nfs lockd fscache nfs_acl auth_rpcgss sunrpc p4_clockmod ip6t_REJECT nf_conntrack_ipv6 ip6table_filter ip6_tables ipv6 uinput nvidia(P) snd_hda_codec_nvhdmi snd_hda_codec_realtek arc4 ecb snd_hda_intel snd_hda_codec ath9k snd_hwdep mac80211 snd_seq snd_seq_device snd_pcm snd_timer ath snd cfg80211 soundcore i2c_nforce2 rfkill snd_page_alloc r8169 i2c_core lirc_streamzap serio_raw mii lirc_dev wmi asus_atk0110 dm_multipath usb_storage [last unloaded: microcode]
Pid: 0, comm: swapper Tainted: P           2.6.32.11-99.fc12.i686.PAE #1
Call Trace:
 [<c04412bd>] warn_slowpath_common+0x6a/0x81
 [<c072c2d7>] ? dev_watchdog+0xc6/0x12d
 [<c0441312>] warn_slowpath_fmt+0x29/0x2c
 [<c072c2d7>] dev_watchdog+0xc6/0x12d
 [<c044e632>] ? __mod_timer+0x100/0x10b
 [<c0447c30>] ? local_bh_enable_ip+0xd/0xf
 [<c07a625b>] ? _spin_unlock_bh+0x13/0x15
 [<fa21c429>] ? fib6_run_gc+0xb7/0xbe [ipv6]
 [<c044e310>] run_timer_softirq+0x16d/0x1f0
 [<c072c211>] ? dev_watchdog+0x0/0x12d
 [<c04479aa>] __do_softirq+0xb1/0x157
 [<c0447a86>] do_softirq+0x36/0x41
 [<c0447b79>] irq_exit+0x2e/0x61
 [<c041e036>] smp_apic_timer_interrupt+0x6d/0x7b
 [<c0409ab5>] apic_timer_interrupt+0x31/0x38
 [<c040f281>] ? mwait_idle+0x61/0x6c
 [<c0408212>] cpu_idle+0x96/0xb0
 [<c0793104>] rest_init+0x58/0x5a
 [<c09fa8ee>] start_kernel+0x33c/0x341
 [<c09fa0af>] i386_start_kernel+0x9e/0xa5
---[ end trace f3fe1b39ab9788b2 ]---

Comment 60 frollic nilsson 2010-05-23 19:31:16 UTC

problem still occurs in 2.6.32.12-115.fc12.i686

Comment 61 John Foderaro 2010-05-24 00:44:11 UTC

kernel: 2.6.32.12-115.fc12.x86_64
motherboard: GA-790FXTA-UD5
lan: two Realtek RTL811D's on the motherboard
cpu: quad core AMD 64 bit phenom

eth0: on 1Gbps LAN
eth1: on 100Mbps link to cable modem

When I transfer a large file into my machine over the 1Gbps LAN (eth0)
I find that within 15 seconds eth1 appears to be unresponsive and unwilling
to send or receive packets.  Within two minutes I get the watchdog
crash on eth1 (as others have seen).

eth0 continues to work fine however and will complete the data transfer
as long as it takes.

The only problem for me is that eth1 is non-function and I need to 
reboot to bring it back to life.


WARNING: at net/sched/sch_generic.c:261 dev_watchdog+0xf3/0x164()
Hardware name: GA-790FXTA-UD5
NETDEV WATCHDOG: eth1 (r8169): transmit queue 0 timed out
Modules linked in: bluetooth rfkill r8169 mii fuse vboxnetadp vboxnetflt vboxdrv autofs4 it87 hwmon_vid sunrpc cpufreq_ondemand powernow_k8 freq_table ipt_LOG ipt_MASQUERADE iptable_nat nf_nat ip6t_REJECT nf_conntrack_ipv6 ip6table_filter ip6_tables ipv6 microcode dm_multipath kvm_amd kvm uinput snd_hda_codec_realtek snd_hda_intel snd_hda_codec snd_hwdep snd_seq snd_seq_device snd_pcm snd_timer snd ppdev soundcore parport_pc parport xhci snd_page_alloc i2c_piix4 edac_core edac_mce_amd firewire_ohci ata_generic firewire_core pata_acpi crc_itu_t pata_atiixp pata_jmicron nouveau ttm drm_kms_helper drm i2c_algo_bit i2c_core [last unloaded: scsi_wait_scan]
Pid: 4816, comm: sshd Not tainted 2.6.32.12-115.fc12.x86_64 #1
Call Trace:
<IRQ>  [<ffffffff81056370>] warn_slowpath_common+0x7c/0x94
[<ffffffff810563df>] warn_slowpath_fmt+0x41/0x43
[<ffffffff813c6d17>] ? netif_tx_lock+0x44/0x6d
[<ffffffff813c6e81>] dev_watchdog+0xf3/0x164
[<ffffffff81046073>] ? task_tick_fair+0x2d/0x127
[<ffffffff810650ee>] run_timer_softirq+0x1c4/0x268
[<ffffffff81026b6e>] ? apic_write+0x16/0x18
[<ffffffff8105d998>] __do_softirq+0xe5/0x1a9
[<ffffffff810806aa>] ? tick_program_event+0x2a/0x2c
[<ffffffff81012e6c>] call_softirq+0x1c/0x30
[<ffffffff810143ea>] do_softirq+0x46/0x86
[<ffffffff8105d7d6>] irq_exit+0x3b/0x7d
[<ffffffff8145adda>] smp_apic_timer_interrupt+0x86/0x94
[<ffffffff81012833>] apic_timer_interrupt+0x13/0x20
<EOI>

Comment 62 Thorsten Krohn 2010-05-24 12:54:01 UTC

Kernel: 2.6.32.13-120.fc12.i686
Hardware: Asus EEE-Box-PC 1501

Same probleme like the others, sending large files over the nic card hangs after about 400 mbytes .... Same situation with newest kernel 2.6.34-11.fc14.i686


------------[ cut here ]------------
WARNING: at net/sched/sch_generic.c:261 dev_watchdog+0xc6/0x12d()
Hardware name: EB1501
NETDEV WATCHDOG: eth0 (r8169): transmit queue 0 timed out
Modules linked in: ftdi_sio nfnetlink_queue nfnetlink nfsd lockd nfs_acl auth_rpcgss exportfs xt_NFQUEUE sunrpc aes_i586 aes_generic tun p4_clockmod ipv6 dm_multipath arc4 ecb ath9k mac80211 ath cfg80211 rfkill r8169 wmi usbserial serio_raw mii i2c_nforce2 asus_atk0110 usb_storage nouveau ttm drm_kms_helper drm i2c_algo_bit i2c_core [last unloaded: ftdi_sio]
Pid: 0, comm: swapper Not tainted 2.6.32.13-120.fc12.i686 #1
Call Trace:
 [<c043a4a1>] warn_slowpath_common+0x6a/0x81
 [<c071ab9b>] ? dev_watchdog+0xc6/0x12d
 [<c043a4f6>] warn_slowpath_fmt+0x29/0x2c
 [<c071ab9b>] dev_watchdog+0xc6/0x12d
 [<c045691e>] ? hrtimer_forward+0x114/0x128
 [<c0417a26>] ? apic_write+0x14/0x16
 [<c0417c3c>] ? lapic_next_event+0x14/0x18
 [<c045e70d>] ? clockevents_program_event+0xbf/0xcd
 [<c0447038>] run_timer_softirq+0x16d/0x1f0
 [<c071aad5>] ? dev_watchdog+0x0/0x12d
 [<c0440b8e>] __do_softirq+0xb1/0x157
 [<c0440c6a>] do_softirq+0x36/0x41
 [<c0440d5d>] irq_exit+0x2e/0x61
 [<c04183e2>] smp_apic_timer_interrupt+0x6d/0x7b
 [<c0403fb5>] apic_timer_interrupt+0x31/0x38
 [<c0409649>] ? mwait_idle+0x61/0x6c
 [<c0402712>] cpu_idle+0x96/0xb0
 [<c078ebd8>] start_secondary+0x1f5/0x233
---[ end trace a1c573e4c750bda4 ]---
r8169: eth0: link up

Comment 63 Thorsten Krohn 2010-05-25 19:43:51 UTC

I now upgraded to FC13 ... it's the same Situation :-(

WARNING: at net/sched/sch_generic.c:255 dev_watchdog+0xc1/0x150()
Hardware name: EB1501
NETDEV WATCHDOG: eth0 (r8169): transmit queue 0 timed out
Modules linked in: nfnetlink_queue nfnetlink ftdi_sio nfsd lockd nfs_acl auth_rpcgss exportfs xt_NFQUEUE sunrpc aes_i586 aes_generic tun p4_clockmod ipv6 arc4 ecb wmi ath9k ath9k_common mac80211 asus_atk0110 ath9k_hw ath cfg80211 rfkill microcode serio_raw i2c_nforce2 usbserial r8169 mii usb_storage nouveau ttm drm_kms_helper drm i2c_algo_bit video output i2c_core [last unloaded: scsi_wait_scan]
Pid: 0, comm: swapper Not tainted 2.6.33.4-95.fc13.i686 #1
Call Trace:
 [<c0436dfd>] warn_slowpath_common+0x65/0x7c
 [<c06f9450>] ? dev_watchdog+0xc1/0x150
 [<c0436e48>] warn_slowpath_fmt+0x24/0x27
 [<c06f9450>] dev_watchdog+0xc1/0x150
 [<c04417f9>] ? internal_add_timer+0x8e/0x92
 [<c044188b>] ? cascade+0x4b/0x5e
 [<c0441a01>] run_timer_softirq+0x163/0x1e6
 [<c06f938f>] ? dev_watchdog+0x0/0x150
 [<c043c1d9>] __do_softirq+0xac/0x152
 [<c043c2b0>] do_softirq+0x31/0x3c
 [<c043c3c4>] irq_exit+0x29/0x5c
 [<c0417fa7>] smp_apic_timer_interrupt+0x6f/0x7d
 [<c0771255>] apic_timer_interrupt+0x31/0x38
 [<c04091f3>] ? mwait_idle+0x5c/0x67
 [<c04024b8>] cpu_idle+0x91/0xad
 [<c076c4ad>] start_secondary+0x1f5/0x233
---[ end trace cb513a5390649b7c ]---
r8169: eth0: link up

Comment 64 vadim 2010-05-28 15:55:11 UTC

https://bugzilla.kernel.org/show_bug.cgi?id=12411
https://bugzilla.kernel.org/show_bug.cgi?id=14962
https://bugzilla.redhat.com/show_bug.cgi?id=573150
https://bugzilla.redhat.com/show_bug.cgi?id=530052

It seems this all about the same, 
I hope it will be fixed at least in rawhide using latest 2.6.34 kernels.
I tested it with 2.6.34-rc7 and get it working, may be it works since 2.6.34-rc4

Comment 65 Johann 2010-06-01 07:47:01 UTC

I've had similar problems with RTL8111/8168 onboard NICs.
Using Realtek's 8.018.00 driver removed the NETDEV WATCHDOG messages but I still was getting hangs. These later hangs turned out not to be network hangs but system hangs.
Now, using hpet=disable in the kernel line of grub.conf removed the hangs
I am using kernel 2.6.31.12-rt21.

Hope this helps.

Comment 66 Mace Moneta 2010-06-29 12:03:45 UTC

I just had a re-occurrence using Realtek's 8.018.00 driver and the 2.6.33.5-124.fc13.x86_64 kernel.


Jun 29 03:05:48 slayer kernel:------------[ cut here ]------------
Jun 29 03:05:48 slayer kernel:WARNING: at net/sched/sch_generic.c:255 dev_watchdog+0xf0/0x192()
Jun 29 03:05:48 slayer kernel:Hardware name: C2SEA
Jun 29 03:05:48 slayer kernel:NETDEV WATCHDOG: eth0 (r8168): transmit queue 0 timed out
Jun 29 03:05:48 slayer kernel:Modules linked in: sit tunnel4 fuse w83627ehf hwmon_vid coretemp cpufreq_ondemand acpi_cpufreq freq_table ipv6 kvm_intel kvm uinput snd_hda_codec_realtek nvidia(P) snd_hda_intel snd_h
da_codec snd_seq_dummy snd_seq_oss uvcvideo snd_seq_midi_event snd_seq snd_usb_audio snd_pcm_oss snd_hwdep snd_mixer_oss snd_usb_lib snd_pcm snd_rawmidi snd_seq_device snd_timer videodev pl2303 v4l1_compat snd usb
serial v4l2_compat_ioctl32 r8168 soundcore i2c_i801 iTCO_wdt snd_page_alloc i2c_core iTCO_vendor_support microcode raid0 raid1 pata_acpi ata_generic firewire_ohci firewire_core crc_itu_t pata_it8213 [last unloaded
: scsi_wait_scan]
Jun 29 03:05:48 slayer kernel:Pid: 0, comm: swapper Tainted: P           2.6.33.5-124.fc13.x86_64 #1
Jun 29 03:05:48 slayer kernel:Call Trace:
Jun 29 03:05:48 slayer kernel: <IRQ>  [<ffffffff8104b54c>] warn_slowpath_common+0x77/0x8f
Jun 29 03:05:48 slayer kernel: [<ffffffff8104b5b1>] warn_slowpath_fmt+0x3c/0x3e
Jun 29 03:05:48 slayer kernel: [<ffffffff8139cb23>] ? netif_tx_lock+0x3f/0x68
Jun 29 03:05:48 slayer kernel: [<ffffffff8139cc3c>] dev_watchdog+0xf0/0x192
Jun 29 03:05:48 slayer kernel: [<ffffffff81069561>] ? sched_clock_local+0x1c/0x82
Jun 29 03:05:48 slayer kernel: [<ffffffff81058132>] ? internal_add_timer+0xca/0xcc
Jun 29 03:05:48 slayer kernel: [<ffffffff810581f6>] ? cascade+0x65/0x7f
Jun 29 03:05:48 slayer kernel: [<ffffffff810583ca>] run_timer_softirq+0x1ba/0x25e
Jun 29 03:05:48 slayer kernel: [<ffffffff8106c28d>] ? ktime_get+0x60/0xb9
Jun 29 03:05:48 slayer kernel: [<ffffffff810516b9>] __do_softirq+0xe0/0x1a1
Jun 29 03:05:48 slayer kernel: [<ffffffff810702cc>] ? tick_program_event+0x25/0x27
Jun 29 03:05:48 slayer kernel: [<ffffffff8100aa1c>] call_softirq+0x1c/0x30
Jun 29 03:05:48 slayer kernel: [<ffffffff8100c21d>] do_softirq+0x41/0x7e
Jun 29 03:05:48 slayer kernel: [<ffffffff8105150c>] irq_exit+0x36/0x78
Jun 29 03:05:48 slayer kernel: [<ffffffff81020234>] smp_apic_timer_interrupt+0x89/0x97
Jun 29 03:05:48 slayer kernel: [<ffffffff8100a4d3>] apic_timer_interrupt+0x13/0x20
Jun 29 03:05:48 slayer kernel: <EOI>  [<ffffffff8101138d>] ? mwait_idle+0x75/0x83
Jun 29 03:05:48 slayer kernel: [<ffffffff8101133f>] ? mwait_idle+0x27/0x83
Jun 29 03:05:48 slayer kernel: [<ffffffff81360208>] cpuidle_idle_call+0x33/0xef
Jun 29 03:05:48 slayer kernel: [<ffffffff81008bfd>] cpu_idle+0xa5/0xdf
Jun 29 03:05:48 slayer kernel: [<ffffffff814239d5>] start_secondary+0x1f2/0x233
Jun 29 03:05:48 slayer kernel:---[ end trace 576087bcac0c6e9e ]---

Comment 67 maverick.pt 2010-06-29 14:12:19 UTC

I'm using Fedora 12, and it happens when i try to copy a large file over network, the most weird is that only happens on eth1, if i use eth0 no problem, and both cards are the same and onboard (Gigabyte GA-790FXTA-UD5), could it be a hardware problem?

dmesg seems to be the same as other people reported:

Jun 29 15:06:07 virt kernel: WARNING: at net/sched/sch_generic.c:261 dev_watchdog+0xf3/0x164()
Jun 29 15:06:07 virt kernel: Hardware name: GA-790FXTA-UD5
Jun 29 15:06:07 virt kernel: NETDEV WATCHDOG: eth1 (r8169): transmit queue 0 timed out
Jun 29 15:06:07 virt kernel: Modules linked in: nls_utf8 cifs ipt_MASQUERADE iptable_nat nf_nat sunrpc bridge stp llc xt_physdev ip6t_REJECT nf_conntrack_ipv6 ip6table_filter ip6_tables ipv6 microcode dm_multipath kvm_amd kvm xhci r8169 edac_core edac_mce_amd i2c_piix4 mii ppdev parport_pc parport pata_acpi ata_generic firewire_ohci firewire_core crc_itu_t pata_atiixp pata_jmicron nouveau ttm drm_kms_helper drm i2c_algo_bit i2c_core [last unloaded: freq_table]
Jun 29 15:06:07 virt kernel: Pid: 0, comm: swapper Not tainted 2.6.32.14-127.fc12.x86_64 #1
Jun 29 15:06:07 virt kernel: Call Trace:
Jun 29 15:06:07 virt kernel: <IRQ>  [<ffffffff81056364>] warn_slowpath_common+0x7c/0x94
Jun 29 15:06:07 virt kernel: [<ffffffff810563d3>] warn_slowpath_fmt+0x41/0x43
Jun 29 15:06:07 virt kernel: [<ffffffff813c6c97>] ? netif_tx_lock+0x44/0x6d
Jun 29 15:06:07 virt kernel: [<ffffffff813c6e01>] dev_watchdog+0xf3/0x164
Jun 29 15:06:07 virt kernel: [<ffffffff81079435>] ? sched_clock_local+0x1c/0x82
Jun 29 15:06:07 virt kernel: [<ffffffff81064e34>] ? internal_add_timer+0xcf/0xd1
Jun 29 15:06:07 virt kernel: [<ffffffff81064f02>] ? cascade+0x6a/0x86
Jun 29 15:06:07 virt kernel: [<ffffffff810650e2>] run_timer_softirq+0x1c4/0x268
Jun 29 15:06:07 virt kernel: [<ffffffff81026b6e>] ? apic_write+0x16/0x18
Jun 29 15:06:07 virt kernel: [<ffffffff8105d98c>] __do_softirq+0xe5/0x1a9
Jun 29 15:06:07 virt kernel: [<ffffffff8108069e>] ? tick_program_event+0x2a/0x2c
Jun 29 15:06:07 virt kernel: [<ffffffff81012e6c>] call_softirq+0x1c/0x30
Jun 29 15:06:07 virt kernel: [<ffffffff810143ea>] do_softirq+0x46/0x86
Jun 29 15:06:07 virt kernel: [<ffffffff8105d7ca>] irq_exit+0x3b/0x7d
Jun 29 15:06:07 virt kernel: [<ffffffff8145ae5a>] smp_apic_timer_interrupt+0x86/0x94
Jun 29 15:06:07 virt kernel: [<ffffffff81012833>] apic_timer_interrupt+0x13/0x20
Jun 29 15:06:07 virt kernel: <EOI>  [<ffffffff81030229>] ? native_safe_halt+0xb/0xd
Jun 29 15:06:07 virt kernel: [<ffffffff81018f39>] ? default_idle+0x36/0x53
Jun 29 15:06:07 virt kernel: [<ffffffff81010cdd>] ? cpu_idle+0xaa/0xe4
Jun 29 15:06:07 virt kernel: [<ffffffff8144eaef>] ? start_secondary+0x1f2/0x233
Jun 29 15:06:07 virt kernel: ---[ end trace 16063383841d7c13 ]---

I'm gonna try to install Windows on my machine and make some tests to see if my problem is hardware or the r8169 linux driver.

Comment 68 John Foderaro 2010-06-29 14:22:27 UTC

I have the exact same board (Gigabyte GA-790FXTA-UD5) and was trying to run Fedora 12.  I concluded that the Realtek drivers simply do not work having tried the ones that came with Fedora 12 and the ones from the Realtek web site.

There has been some activity with this driver in newer kernels so maybe a fix is on the way.

My solution was to buy a network card and disable the onboard Realteks.

Comment 69 maverick.pt 2010-06-29 14:26:55 UTC

I've installed a network card from D-LINK (DGE-528T) and it uses the same driver (r8169) and with this one i have no problem, and with the ones onboard only the second one is giving me problems, its really weird :(

Comment 70 Jonathan Larmour 2010-06-29 15:51:50 UTC

I have noticed something about this which is admittedly unusual. First of all as background, in normal operation, this problem is normally uncommon for me. And when it does happen, the driver eventually recovers, although it may take a few minutes. It seems to be the case that the first time it occurs I get the full NETDEV_WATCHDOG message with backtrace etc. as reported by others. But subsequently all I see from dmesg after the fact is:
r8169: eth0: link up
At a guess something stops it from spamming syslog with the backtraces all the time. But it does, eventually, recover.

My machine has an overnight backup running (using amanda FWIW), which obviously heavily uses the ethernet to send the backup data to the remote server. That usually succeeds fine.

*Unless* I am also, simultaneously, logged in remotely via ssh during the backup process. In which case it tends to happen very frequently indeed.

I can't be certain, but viewing files with "less" seems to really provoke it, specifically paging through large text files.

I would say it's so bizarre to be surely coincidence, but perhaps there is a link with packet sizes? Heavy TCP traffic will obviously tend to cause large packets close to the 1500-ish ethernet MTU (no jumbo frames here). Perhaps a particular mixture of large and small frames annoys the driver in some way?

Anyway, I'm mentioning this just in case, based on empirical observation.

Comment 71 Thorsten Krohn 2010-06-30 07:45:00 UTC

I now find a Solution for me:

I disabled ACPI for the Kernel and the NIC is working !

With minimum ACPI (Kernel Parameter acpi=ht) there is also no Problem with the NIC.

My Hardware is a Asus Eee Box 1501

Comment 72 Mace Moneta 2010-06-30 10:45:34 UTC

I've gone as long as three weeks between events (machine up 24x7).  Let us know how long you've been running without a re-occurrence if you think you have a solution.

Comment 73 maverick.pt 2010-06-30 23:54:54 UTC

Regarding my hardware, i've installed Windows 7 64bit, and had no problems using both onboard nics, tried windows default driver and realtek driver and all worked fine, i transfered larges files very well using both nics.

Back to Fedora 12, i tried acpi=off and acpi=ht and no luck, eth1 still breaks after a few seconds of large transfer activity :(

What i realy don't understand if its a driver problem, why does it only happens with the second nic, they are both onboard and same chip.

Comment 74 Mace Moneta 2010-07-11 12:27:10 UTC

*** Bug 547517 has been marked as a duplicate of this bug. ***

Comment 75 Antoine Martin 2010-07-11 12:31:52 UTC

FYI: I've flashed a new BIOS and not had the problem since... (hence I am unsubscribing from this bug)
My original report was in comment #11

Comment 76 Richard Körber 2010-07-26 21:21:55 UTC

Same problem here, Fedora 13 i686:

Jul 23 00:59:03 lucifer kernel: ------------[ cut here ]------------
Jul 23 00:59:03 lucifer kernel: WARNING: at net/sched/sch_generic.c:255 dev_watchdog+0xc1/0x150()
Jul 23 00:59:03 lucifer kernel: Hardware name: ION-MB330-1
Jul 23 00:59:03 lucifer kernel: NETDEV WATCHDOG: eth0 (r8169): transmit queue 0 timed out
Jul 23 00:59:03 lucifer kernel: Modules linked in: nf_conntrack_netbios_ns sunrpc ip6t_REJECT nf_conntrack_ipv6 ip6table_filter ip6_tables ipv6 snd_hda_codec
_nvhdmi snd_hda_codec_realtek snd_hda_intel snd_hda_codec snd_hwdep snd_seq snd_seq_device snd_pcm r8169 serio_raw mii snd_timer wmi snd soundcore snd_page_a
lloc i2c_nforce2 ext2 nouveau ttm drm_kms_helper drm i2c_algo_bit video output i2c_core [last unloaded: scsi_wait_scan]
Jul 23 00:59:03 lucifer kernel: Pid: 0, comm: swapper Not tainted 2.6.33.3-85.fc13.i686.PAE #1
Jul 23 00:59:03 lucifer kernel: Call Trace:
Jul 23 00:59:03 lucifer kernel: [<c043d625>] warn_slowpath_common+0x65/0x7c
Jul 23 00:59:03 lucifer kernel: [<c070a33c>] ? dev_watchdog+0xc1/0x150
Jul 23 00:59:03 lucifer kernel: [<c043d670>] warn_slowpath_fmt+0x24/0x27
Jul 23 00:59:03 lucifer kernel: [<c070a33c>] dev_watchdog+0xc1/0x150
Jul 23 00:59:03 lucifer kernel: [<c045520f>] ? hrtimer_forward+0x10f/0x123
Jul 23 00:59:03 lucifer kernel: [<c041cd0d>] ? lapic_next_event+0x16/0x1a
Jul 23 00:59:03 lucifer kernel: [<c04486d1>] run_timer_softirq+0x163/0x1e6
Jul 23 00:59:03 lucifer kernel: [<c070a27b>] ? dev_watchdog+0x0/0x150
Jul 23 00:59:03 lucifer kernel: [<c0442a01>] __do_softirq+0xac/0x152
Jul 23 00:59:03 lucifer kernel: [<c0442ad8>] do_softirq+0x31/0x3c
Jul 23 00:59:03 lucifer kernel: [<c0442bec>] irq_exit+0x29/0x5c
Jul 23 00:59:03 lucifer kernel: [<c041d687>] smp_apic_timer_interrupt+0x6f/0x7d
Jul 23 00:59:03 lucifer kernel: [<c078306d>] apic_timer_interrupt+0x31/0x38
Jul 23 00:59:03 lucifer kernel: [<c040e8d3>] ? mwait_idle+0x5c/0x67
Jul 23 00:59:03 lucifer kernel: [<c0407a78>] cpu_idle+0x91/0xad
Jul 23 00:59:03 lucifer kernel: [<c077e288>] start_secondary+0x1f5/0x233
Jul 23 00:59:03 lucifer kernel: ---[ end trace 86e551f450c50322 ]---
Jul 23 00:59:03 lucifer kernel: r8169: eth0: link up

Everytime the "link up" message appears, the system is unreachable from network for a few seconds. Sometimes the system even hard-freezes.

In my desperation I installed 2.6.35-0.56.rc6.git1.fc14.i686.PAE from rawhide. Only one hard-freeze since then, which could be a result of my experiments with the realtek driver though. Anyways, there are still "link up" messages and disconnects with the shipped r8169 module (as well as with realtek's r8168 module).

The error usually occurs when there is heavy network AND disk load. When I try to download a file from the system, it usually breaks after a few hundred megabytes. This renders my NAS unusable.

With the rawhide kernel and "pci=nomsi noacpi" boot options, there were no hard-freezes and no disconnects so far, but still this error message:

Jul 26 23:00:35 lucifer kernel: ------------[ cut here ]------------
Jul 26 23:00:35 lucifer kernel: WARNING: at net/sched/sch_generic.c:258 dev_watchdog+0xc6/0x12e()
Jul 26 23:00:35 lucifer kernel: Hardware name: ION-MB330-1
Jul 26 23:00:35 lucifer kernel: NETDEV WATCHDOG: eth0 (r8169): transmit queue 0 timed out
Jul 26 23:00:35 lucifer kernel: Modules linked in: nfsd lockd nfs_acl auth_rpcgss exportfs coretemp sunrpc p4_clockmod nf_conntrack_netbios_ns ip6t_REJECT nf_conntrack_ipv6 ip6table_filter ip6_tables ipv6 snd_hda_codec_nvhdmi snd_hda_codec_realtek snd_hda_intel snd_hda_codec snd_hwdep snd_seq snd_seq_device snd_pcm microcode r8169 mii snd_timer serio_raw snd soundcore snd_page_alloc wmi i2c_nforce2 ext2 nouveau ttm drm_kms_helper drm i2c_algo_bit video output i2c_core [last unloaded: mperf]
Jul 26 23:00:35 lucifer kernel: Pid: 0, comm: swapper Not tainted 2.6.35-0.56.rc6.git1.fc14.i686.PAE #1
Jul 26 23:00:35 lucifer kernel: Call Trace:
Jul 26 23:00:35 lucifer kernel: [<c0441e59>] warn_slowpath_common+0x6a/0x7f
Jul 26 23:00:35 lucifer kernel: [<c075d815>] ? dev_watchdog+0xc6/0x12e
Jul 26 23:00:35 lucifer kernel: [<c0441ee1>] warn_slowpath_fmt+0x2b/0x2f
Jul 26 23:00:35 lucifer kernel: [<c075d815>] dev_watchdog+0xc6/0x12e
Jul 26 23:00:35 lucifer kernel: [<c04699e4>] ? lock_acquire+0xb1/0xbc
Jul 26 23:00:35 lucifer kernel: [<c044cd85>] ? run_timer_softirq+0x123/0x25c
Jul 26 23:00:35 lucifer kernel: [<c044cdfe>] run_timer_softirq+0x19c/0x25c
Jul 26 23:00:35 lucifer kernel: [<c075d74f>] ? dev_watchdog+0x0/0x12e
Jul 26 23:00:35 lucifer kernel: [<c0447678>] __do_softirq+0xc2/0x179
Jul 26 23:00:35 lucifer kernel: [<c0447763>] do_softirq+0x34/0x56
Jul 26 23:00:35 lucifer kernel: [<c04479e0>] irq_exit+0x3d/0x70
Jul 26 23:00:35 lucifer kernel: [<c041fc55>] smp_apic_timer_interrupt+0x65/0x73
Jul 26 23:00:35 lucifer kernel: [<c07e15e6>] apic_timer_interrupt+0x36/0x3c
Jul 26 23:00:35 lucifer kernel: [<c040e71f>] ? mwait_idle+0x62/0x70
Jul 26 23:00:35 lucifer kernel: [<c0407851>] cpu_idle+0x93/0xb4
Jul 26 23:00:35 lucifer kernel: [<c07dbcc4>] start_secondary+0x258/0x298
Jul 26 23:00:35 lucifer kernel: ---[ end trace bf3a5ecb05b84bd3 ]---
Jul 26 23:00:35 lucifer kernel: r8169 0000:04:00.0: eth0: link up

I hope there will be a fix soon, as my NAS case has no space for an external NIC, so I have to use the internal one. Luckily the kernel boot options seem to be an acceptable workaround for a NAS system.

The same hardware worked fine with Fedora 10.

Comment 77 Igor 2010-07-26 21:32:10 UTC

Kyle,

Do we have any information regarding this severe issue?

Upgrade to F13 is still no go. Everything was working on FC11, this lockups started from F12.

Linux server 2.6.33.6-147.fc13.x86_64 #1 SMP Tue Jul 6 22:32:17 UTC 2010 x86_64 x86_64 x86_64 GNU/Linux

------------[ cut here ]------------
WARNING: at net/sched/sch_generic.c:255 dev_watchdog+0xf0/0x192()
Hardware name: EX58-UD3R
NETDEV WATCHDOG: eth0 (r8169): transmit queue 0 timed out
Modules linked in: sit tunnel4 fuse ipv6 aes_x86_64 aes_generic xts gf128mul dm_crypt uinput snd_hda_codec_realtek snd_hda_intel snd_hda_codec snd_hwdep snd_seq snd_seq_device snd_pcm iTCO_wdt iTCO_vendor_support snd_timer snd soundcore snd_page_alloc r8169 serio_raw wmi mii joydev i2c_i801 microcode ata_generic firewire_ohci firewire_core pata_acpi crc_itu_t pata_jmicron nouveau ttm drm_kms_helper drm i2c_algo_bit video output i2c_core [last unloaded: scsi_wait_scan]
Pid: 0, comm: swapper Not tainted 2.6.33.6-147.fc13.x86_64 #1
Call Trace:
 <IRQ>  [<ffffffff8104aecc>] warn_slowpath_common+0x77/0x8f
 [<ffffffff8104af31>] warn_slowpath_fmt+0x3c/0x3e
 [<ffffffff8139bd5f>] ? netif_tx_lock+0x3f/0x68
 [<ffffffff8139be78>] dev_watchdog+0xf0/0x192
 [<ffffffff8100ff01>] ? sched_clock+0x9/0xd
 [<ffffffff81068fbf>] ? sched_clock_cpu+0x44/0xce
 [<ffffffff81057d4a>] run_timer_softirq+0x1ba/0x25e
 [<ffffffff8106bc41>] ? ktime_get+0x60/0xb9
 [<ffffffff81051039>] __do_softirq+0xe0/0x1a1
 [<ffffffff8106fc80>] ? tick_program_event+0x25/0x27
 [<ffffffff8100aa1c>] call_softirq+0x1c/0x30
 [<ffffffff8100c21d>] do_softirq+0x41/0x7e
 [<ffffffff81050e8c>] irq_exit+0x36/0x78
 [<ffffffff81020244>] smp_apic_timer_interrupt+0x89/0x97
 [<ffffffff8100a4d3>] apic_timer_interrupt+0x13/0x20
 <EOI>  [<ffffffff8101139d>] ? mwait_idle+0x75/0x83
 [<ffffffff8101134f>] ? mwait_idle+0x27/0x83
 [<ffffffff81008bfd>] cpu_idle+0xa5/0xdf
 [<ffffffff81422c17>] start_secondary+0x1f2/0x233
---[ end trace 5fa87e8fa16d870c ]---

Thank you!

Comment 78 Kyle McMartin 2010-07-27 00:04:54 UTC

Yeah, cebbert figured out that disabling aspm should work...

Boot your kernel with pci=noaspm and let us know if things are better.

--Kyle

Comment 79 Adrian 2010-07-27 08:54:20 UTC

It seems to be working for me! I have tested it for just over an hour now and so far so good...

Note: I had to use pcie_aspm=off as a kernel parameter (2.6.33.3-85.fc13.x86_64).
(pci=noaspm does not work)

Comment 80 Richard Körber 2010-07-27 17:54:00 UTC

Yes, it works great with the rawhide kernel and "pcie_aspm=off" here, too. Thanks!

Comment 81 maverick.pt 2010-07-27 18:54:07 UTC

Yes, works for me too! :)

Should have waited one more week... last week bought 2 intel nics because of this problem :(

Comment 82 Matthew Garrett 2010-07-27 20:28:01 UTC

Can you please attach the output of lspci -vn and acpidump?

Comment 83 Adrian 2010-07-27 21:09:51 UTC

Created attachment 434848 [details]
Output of lspci -vn

Comment 84 Adrian 2010-07-27 21:11:15 UTC

Created attachment 434849 [details]
Output of acpidump

Comment 85 Richard Körber 2010-07-27 21:14:06 UTC

Created attachment 434851 [details]
Output of lspci -vn

Comment 86 Richard Körber 2010-07-27 21:15:48 UTC

Created attachment 434852 [details]
Output of acpidump

It's a Point Of View ION330-1 board.

Comment 87 pholdaway 2010-07-29 17:43:26 UTC

kernel boot parameter pcie_aspm=off works for me too.

Please note:

I was the first to comment on this issue when I tried to install Fedora 12.

This issue is *not* restricted to r8169, I am using a e100e.

I dropped back to Fedora 11 and network hangs did not occur.

Today I upgraded to Fedora 13, hoping that pcie_aspm=off would work for me too.

Prior to setting this, the network hangs occured *very* frequently; yum or ssh could trigger it.

After setting this parameter the network is now stable.

Below is output I saw in /var/log/messages.

Jul 29 09:17:10 backuppc kernel: WARNING: at net/sched/sch_generic.c:255 dev_watchdog+0xf0/0x192()
Jul 29 09:17:10 backuppc kernel: Hardware name: X8SIE
Jul 29 09:17:10 backuppc kernel: NETDEV WATCHDOG: eth0 (e1000e): transmit queue 0 timed out
Jul 29 09:17:10 backuppc kernel: Modules linked in: fuse nfs lockd fscache nfs_acl auth_rpcgss autofs4 sunrpc cpufreq_ondemand acpi_cpufreq freq_table ipv6 xfs exportfs
Jul 29 09:17:10 backuppc kernel: Pid: 0, comm: swapper Not tainted 2.6.33.3-85.fc13.x86_64 #1
Jul 29 09:17:10 backuppc kernel: Call Trace:
Jul 29 09:17:10 backuppc kernel: <IRQ>  [<ffffffff8104b558>] warn_slowpath_common+0x77/0x8f
Jul 29 09:17:10 backuppc kernel: [<ffffffff8104b5bd>] warn_slowpath_fmt+0x3c/0x3e
Jul 29 09:17:10 backuppc kernel: [<ffffffff8139caa3>] ? netif_tx_lock+0x3f/0x68
Jul 29 09:17:10 backuppc kernel: [<ffffffff8139cbbc>] dev_watchdog+0xf0/0x192
Jul 29 09:17:10 backuppc kernel: [<ffffffff8100ff15>] ? read_tsc+0x9/0x1b
Jul 29 09:17:10 backuppc kernel: [<ffffffff8105813e>] ? internal_add_timer+0xca/0xcc
Jul 29 09:17:10 backuppc kernel: [<ffffffff81058202>] ? cascade+0x65/0x7f
Jul 29 09:17:10 backuppc kernel: [<ffffffff810583d6>] run_timer_softirq+0x1ba/0x25e
Jul 29 09:17:10 backuppc kernel: [<ffffffff810702d8>] ? tick_program_event+0x25/0x27
Jul 29 09:17:10 backuppc kernel: [<ffffffff810516c5>] __do_softirq+0xe0/0x1a1
Jul 29 09:17:10 backuppc kernel: [<ffffffff8109a31a>] ? handle_IRQ_event+0x5b/0x11c
Jul 29 09:17:10 backuppc kernel: [<ffffffff8100aa1c>] call_softirq+0x1c/0x30
Jul 29 09:17:10 backuppc kernel: [<ffffffff8100c21d>] do_softirq+0x41/0x7e
Jul 29 09:17:10 backuppc kernel: [<ffffffff81051518>] irq_exit+0x36/0x78
Jul 29 09:17:10 backuppc kernel: [<ffffffff8100b957>] do_IRQ+0xa7/0xbe
Jul 29 09:17:10 backuppc kernel: [<ffffffff8142b1d3>] ret_from_intr+0x0/0x11
Jul 29 09:17:10 backuppc kernel: <EOI>  [<ffffffff8101138b>] ? mwait_idle+0x75/0x83
Jul 29 09:17:10 backuppc kernel: [<ffffffff8101133d>] ? mwait_idle+0x27/0x83
Jul 29 09:17:10 backuppc kernel: [<ffffffff81008bfd>] cpu_idle+0xa5/0xdf
Jul 29 09:17:10 backuppc kernel: [<ffffffff814238cd>] start_secondary+0x1f2/0x233
Jul 29 09:17:10 backuppc kernel: ---[ end trace 131642cb644a260a ]---

Comment 88 Matthew Garrett 2010-07-29 17:58:22 UTC

Please file a separate bug for the Intel card.

Comment 89 Pat Gunn 2010-07-29 23:25:34 UTC

The kernel parameter did not help at all with my card, which is a:

03:00.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL8111/8168B PCI Express Gigabit Ethernet controller (rev 02)
        Subsystem: Hewlett-Packard Company Device 3610

Comment 90 Matthew Garrett 2010-08-05 14:18:44 UTC

I'm sorry, I should have asked for lspci -vvvn.

Comment 91 Adrian 2010-08-05 14:31:52 UTC

Created attachment 436865 [details]
Output of lspci -vvvn

Result of "lspci -vvvn"

Comment 92 Richard Körber 2010-08-05 14:46:18 UTC

Created attachment 436868 [details]
Output of lspci -vvvn

Comment 93 Chuck Ebbert 2010-08-06 20:07:22 UTC

(In reply to comment #90)
> I'm sorry, I should have asked for lspci -vvvn.    

You should ask for lspci -vvvnn so you get both the numbers and the names of the devices.

Comment 94 fred2 2010-08-25 18:37:04 UTC

(1) fedora 13 x86_64 with pcie_aspm=off works for me. thank you.
(2) fedora 14 alpha livecd x86_64 nominal boot fails (as expected??), abrt output below.
(3) fedora 14 alpha livecd x86_64 with pcie_aspm=off works.


(2) trace:
------------[ cut here ]------------
WARNING: at net/sched/sch_generic.c:258 dev_watchdog+0xfb/0x16f()
Hardware name: M51Ta              
NETDEV WATCHDOG: eth0 (r8169): transmit queue 0 timed out
Modules linked in: fuse cpufreq_ondemand powernow_k8 freq_table mperf ip6t_REJEC
T nf_conntrack_ipv6 ip6table_filter ip6_tables uinput arc4 ecb ath9k ath9k_commo
n ath9k_hw uvcvideo r852 sm_common snd_hda_codec_atihdmi snd_hda_codec_realtek s
nd_hda_intel snd_hda_codec snd_hwdep snd_seq nand snd_seq_device videodev ath na
nd_ids nand_ecc snd_pcm v4l1_compat microcode v4l2_compat_ioctl32 mac80211 mtd s
nd_timer asus_laptop edac_core r8169 serio_raw shpchp i2c_piix4 joydev k10temp s
nd mii edac_mce_amd sparse_keymap cfg80211 soundcore snd_page_alloc rfkill ipv6 
autofs4 squashfs vfat fat firewire_ohci sdhci_pci pata_acpi sdhci firewire_core 
ata_generic mmc_core crc_itu_t usb_storage pata_atiixp video output radeon ttm d
rm_kms_helper drm i2c_algo_bit i2c_core [last unloaded: scsi_wait_scan]
Pid: 0, comm: swapper Not tainted 2.6.35-0.57.rc6.git1.fc14.x86_64 #1
Call Trace:
 <IRQ>  [<ffffffff810510ba>] warn_slowpath_common+0x85/0x9d
 [<ffffffff81051175>] warn_slowpath_fmt+0x46/0x48
 [<ffffffff813fe917>] dev_watchdog+0xfb/0x16f
 [<ffffffff8107e36c>] ? lock_release+0x19a/0x1a6
 [<ffffffff8105dcec>] run_timer_softirq+0x237/0x324
 [<ffffffff8105dc52>] ? run_timer_softirq+0x19d/0x324
 [<ffffffff8107e36c>] ? lock_release+0x19a/0x1a6
 [<ffffffff813fe81c>] ? dev_watchdog+0x0/0x16f
 [<ffffffff8100abdc>] ? call_softirq+0x1c/0x30
 [<ffffffff81057674>] __do_softirq+0xfa/0x1cf
 [<ffffffff81078734>] ? tick_dev_program_event+0x36/0xf4
 [<ffffffff8107881c>] ? tick_program_event+0x2a/0x2c
 [<ffffffff8100abdc>] call_softirq+0x1c/0x30
 [<ffffffff8100c375>] do_softirq+0x4b/0xa2
 [<ffffffff8105781a>] irq_exit+0x4a/0x8c
 [<ffffffff8149f09a>] smp_apic_timer_interrupt+0x7e/0x8c
 [<ffffffff8100a693>] apic_timer_interrupt+0x13/0x20
 <EOI>  [<ffffffff810110f1>] ? default_idle+0x34/0x59
 [<ffffffff8102c865>] ? native_safe_halt+0xb/0xd
 [<ffffffff8107e89a>] ? trace_hardirqs_on+0xd/0xf
 [<ffffffff810110f6>] default_idle+0x39/0x59
 [<ffffffff81008307>] cpu_idle+0xaf/0xd1
 [<ffffffff8147f46b>] rest_init+0xcf/0xd6
 [<ffffffff8147f39c>] ? rest_init+0x0/0xd6
 [<ffffffff81d78c76>] start_kernel+0x438/0x443
 [<ffffffff81d782c6>] x86_64_start_reservations+0xb1/0xb5
 [<ffffffff81d783c2>] x86_64_start_kernel+0xf8/0x107
---[ end trace 3f9d77f7054fb69c ]---
r8169 0000:04:00.0: eth0: link up

Comment 95 Pavel Holica 2010-08-28 23:24:07 UTC

Created attachment 441739 [details]
Ouput of lspci -vvvnn

Ok, I've upgraded to Fedora 13 and still encounter network problems. As I mentioned above, I have 8111d card:
eth0: RTL8168d/8111d at 0xffffc90015e90000, 00:27:0e:03:60:a8, XID 081000c0 IRQ 29

I've tried pcie_aspm=off and it didn't work.

------------[ cut here ]------------
WARNING: at net/sched/sch_generic.c:255 dev_watchdog+0xf0/0x192()
Hardware name:         
NETDEV WATCHDOG: eth0 (r8169): transmit queue 0 timed out
Modules linked in: sunrpc p4_clockmod freq_table speedstep_lib nf_conntrack_ftp ip6t_REJECT nf_conntrack_ipv6 ip6table_filter ip6_tables ipv6 snd_hda_codec_realtek snd_hda_intel snd_hda_codec snd_hwdep snd_seq snd_seq_device snd_pcm snd_timer snd soundcore snd_page_alloc tulip r8169 serio_raw mii i2c_i801 microcode i915 drm_kms_helper drm i2c_algo_bit i2c_core video output [last unloaded: freq_table]
Pid: 0, comm: swapper Not tainted 2.6.33.8-149.fc13.x86_64 #1
Call Trace:
 <IRQ>  [<ffffffff81049ecc>] warn_slowpath_common+0x77/0x8f
 [<ffffffff81049f31>] warn_slowpath_fmt+0x3c/0x3e
 [<ffffffff8139af77>] ? netif_tx_lock+0x3f/0x68
 [<ffffffff8139b090>] dev_watchdog+0xf0/0x192
 [<ffffffff81067f15>] ? sched_clock_local+0x1c/0x82
 [<ffffffff81056ab2>] ? internal_add_timer+0xca/0xcc
 [<ffffffff81056b76>] ? cascade+0x65/0x7f
 [<ffffffff81056d4a>] run_timer_softirq+0x1ba/0x25e
 [<ffffffff8106ac41>] ? ktime_get+0x60/0xb9
 [<ffffffff8105003a>] __do_softirq+0xe0/0x1a1
 [<ffffffff8106ec80>] ? tick_program_event+0x25/0x27
 [<ffffffff81009a1c>] call_softirq+0x1c/0x30
 [<ffffffff8100b21d>] do_softirq+0x41/0x7e
 [<ffffffff8104fe8d>] irq_exit+0x36/0x78
 [<ffffffff8101f2ad>] smp_apic_timer_interrupt+0x89/0x97
 [<ffffffff810094d3>] apic_timer_interrupt+0x13/0x20
 <EOI>  [<ffffffff8101039d>] ? mwait_idle+0x75/0x83
 [<ffffffff8101034f>] ? mwait_idle+0x27/0x83
 [<ffffffff81007bfd>] cpu_idle+0xa5/0xdf
 [<ffffffff81421e8e>] start_secondary+0x1f2/0x233
---[ end trace bd01ea576d6e52c8 ]---
r8169: eth0: link up

Comment 96 Pavel Holica 2010-08-28 23:34:08 UTC

Created attachment 441741 [details]
Output of acpidump

Comment 97 Pavel Holica 2010-08-28 23:39:24 UTC

pci=noaspm didn't help either

Comment 98 Peter 2010-09-28 15:09:41 UTC

Same problem here. FWIW already had pci=noaspm enabled to fix a separate nic related ASPM problem when this problem occured.

Linux version 2.6.34.6-54.fc13.x86_64 (mockbuild.fedoraproject.org) (gcc version 4.4.4 20100630 (Red Hat 4.4.4-10) (GCC) ) #1 SMP Sun Sep 5 17:16:27
 UTC 2010

------------[ cut here ]------------
WARNING: at net/sched/sch_generic.c:256 dev_watchdog+0xf5/0x197()
Hardware name: X7SPA-HF
NETDEV WATCHDOG: eth0 (e1000e): transmit queue 0 timed out
Modules linked in: bluetooth rfkill sit tunnel4 ipt_MASQUERADE nf_nat_snmp_basic nf_nat_pptp nf_conntrack_pptp nf_conntrack_proto_gre nf_nat_proto_gre nf_nat_tftp nf_conntrack_tftp n
tp nf_conntrack_tftp nf_nat_sip nf_conntrack_sip nf_nat_h323 nf_conntrack_h323 nf_nat_irc nf_conntrack_irc nf_nat_ftp nf_conntrack_ftp iptable_nat nf_nat tun sunrpc p4_clockmod freq_
nrpc p4_clockmod freq_table speedstep_lib ipv6 microcode i2c_i801 e1000e serio_raw i2c_core iTCO_wdt iTCO_vendor_support raid1 [last unloaded: freq_table]
Pid: 0, comm: swapper Not tainted 2.6.34.6-54.fc13.x86_64 #1
Call Trace:
 <IRQ>  [<ffffffff8104d12f>] warn_slowpath_common+0x7c/0x94
 [<ffffffff8104d19e>] warn_slowpath_fmt+0x41/0x43
 [<ffffffff813ba12f>] ? netif_tx_lock+0x44/0x6d
 [<ffffffff813ba24d>] dev_watchdog+0xf5/0x197
 [<ffffffff8106b22d>] ? sched_clock_local+0x1c/0x82
 [<ffffffff8105920c>] ? internal_add_timer+0xcf/0xd1
 [<ffffffff810592db>] ? cascade+0x6a/0x86
 [<ffffffff810594b6>] run_timer_softirq+0x1bf/0x263
 [<ffffffff8106e3d7>] ? ktime_get+0x65/0xbe
 [<ffffffff81053265>] __do_softirq+0xe5/0x1a6
 [<ffffffff810726b0>] ? tick_program_event+0x2a/0x2c
 [<ffffffff8100ab5c>] call_softirq+0x1c/0x30
 [<ffffffff8100c342>] do_softirq+0x46/0x83
 [<ffffffff810530d6>] irq_exit+0x3b/0x7d
 [<ffffffff81452af0>] smp_apic_timer_interrupt+0x8d/0x9b
 [<ffffffff8100a613>] apic_timer_interrupt+0x13/0x20
 <EOI>  [<ffffffff81011879>] ? mwait_idle+0x7a/0x87
 [<ffffffff8101182b>] ? mwait_idle+0x2c/0x87
 [<ffffffff81008c22>] cpu_idle+0xaa/0xe4
 [<ffffffff81445a5e>] start_secondary+0x253/0x294
---[ end trace 37dc44e121a6e71f ]---
e1000e 0000:02:00.0: eth0: Reset adapter


eth0      Link encap:Ethernet  HWaddr 00:25:90:...
          UP BROADCAST MULTICAST  MTU:1500  Metric:1
          RX packets:32619954 errors:4973572127610 dropped:828928687935 overruns:0 frame:3315714751740
          TX packets:12066383 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000
          RX bytes:15961594725 (14.8 GiB)  TX bytes:1391720792 (1.2 GiB)
          Interrupt:16 Memory:fe9e0000-fea00000

e1000e: Intel(R) PRO/1000 Network Driver - 1.2.10-NAPI
e1000e: Copyright(c) 1999 - 2010 Intel Corporation.
e1000e 0000:02:00.0: PCI INT A -> GSI 16 (level, low) -> IRQ 16
e1000e 0000:02:00.0: setting latency timer to 64
e1000e 0000:02:00.0: irq 28 for MSI/MSI-X
e1000e 0000:02:00.0: irq 29 for MSI/MSI-X
e1000e 0000:02:00.0: Disabling ASPM L0s
e1000e 0000:02:00.0: eth0: (PCI Express:2.5GB/s:Width x1) 00:25:90:...
e1000e 0000:02:00.0: eth0: Intel(R) PRO/1000 Network Connection
e1000e 0000:02:00.0: eth0: MAC: 4, PHY: 8, PBA No: 0101ff-0ff
e1000e 0000:03:00.0: PCI INT A -> GSI 17 (level, low) -> IRQ 17
e1000e 0000:03:00.0: setting latency timer to 64
e1000e 0000:03:00.0: irq 30 for MSI/MSI-X
e1000e 0000:03:00.0: irq 31 for MSI/MSI-X
e1000e 0000:03:00.0: Disabling ASPM L0s
e1000e 0000:03:00.0: eth1: (PCI Express:2.5GB/s:Width x1) 00:25:90:...
e1000e 0000:03:00.0: eth1: Intel(R) PRO/1000 Network Connection
e1000e 0000:03:00.0: eth1: MAC: 4, PHY: 8, PBA No: 0101ff-0ff
e1000e: eth0 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: None
e1000e: eth1 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: RX/TX


00:00.0 Host bridge: Intel Corporation N10 Family DMI Bridge (rev 02)
00:1a.0 USB Controller: Intel Corporation 82801I (ICH9 Family) USB UHCI Controller #4 (rev 02)
00:1a.1 USB Controller: Intel Corporation 82801I (ICH9 Family) USB UHCI Controller #5 (rev 02)
00:1a.2 USB Controller: Intel Corporation 82801I (ICH9 Family) USB UHCI Controller #6 (rev 02)
00:1a.7 USB Controller: Intel Corporation 82801I (ICH9 Family) USB2 EHCI Controller #2 (rev 02)
00:1c.0 PCI bridge: Intel Corporation 82801I (ICH9 Family) PCI Express Port 1 (rev 02)
00:1c.4 PCI bridge: Intel Corporation 82801I (ICH9 Family) PCI Express Port 5 (rev 02)
00:1c.5 PCI bridge: Intel Corporation 82801I (ICH9 Family) PCI Express Port 6 (rev 02)
00:1d.0 USB Controller: Intel Corporation 82801I (ICH9 Family) USB UHCI Controller #1 (rev 02)
00:1d.1 USB Controller: Intel Corporation 82801I (ICH9 Family) USB UHCI Controller #2 (rev 02)
00:1d.2 USB Controller: Intel Corporation 82801I (ICH9 Family) USB UHCI Controller #3 (rev 02)
00:1d.7 USB Controller: Intel Corporation 82801I (ICH9 Family) USB2 EHCI Controller #1 (rev 02)
00:1e.0 PCI bridge: Intel Corporation 82801 PCI Bridge (rev 92)
00:1f.0 ISA bridge: Intel Corporation 82801IR (ICH9R) LPC Interface Controller (rev 02)
00:1f.2 RAID bus controller: Intel Corporation 82801 SATA RAID Controller (rev 02)
00:1f.3 SMBus: Intel Corporation 82801I (ICH9 Family) SMBus Controller (rev 02)
02:00.0 Ethernet controller: Intel Corporation 82574L Gigabit Network Connection
03:00.0 Ethernet controller: Intel Corporation 82574L Gigabit Network Connection
04:04.0 VGA compatible controller: Matrox Graphics, Inc. MGA G200eW WPCM450 (rev 0a)

Comment 99 Matthew Garrett 2010-09-28 15:18:14 UTC

You have ASPM disabled and you have an e1000e, which is a pretty strong indication that you have a different bug :)

Comment 100 Stanislaw Gruszka 2010-10-01 17:46:44 UTC

*** Bug 555154 has been marked as a duplicate of this bug. ***

Comment 101 Frantisek Hanzlik 2010-10-03 21:28:48 UTC

Created attachment 451315 [details]
/var/log/messages

there are several kernel oopses (some not about r8169, i not know when interesting in this case or not)

Comment 102 Frantisek Hanzlik 2010-10-03 21:49:57 UTC

we probably entere this bug after upgrading from F10  (on same HW, worked fine, kernel 2.6.27.41-170.2.117.fc10.i686) to F14 (kernel 2.6.35.4-28.fc14.i686.PAE).
Gigabyte GA-EP45-DQ6 MB has four integrated 8111C (according to manual) chips, "lspci -vvvnn" for one of them look as:

04:00.0 Ethernet controller [0200]: Realtek Semiconductor Co., Ltd. RTL8111/8168B PCI Express Gigabit Ethernet controller [10ec:8168] (rev 02)
	Subsystem: Giga-byte Technology GA-EP45-DS5 Motherboard [1458:e000]
	Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+
	Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
	Latency: 0, Cache Line Size: 32 bytes
	Interrupt: pin A routed to IRQ 52
	Region 0: I/O ports at 8000 [size=256]
	Region 2: Memory at f7010000 (64-bit, prefetchable) [size=4K]
	Region 4: Memory at f7000000 (64-bit, prefetchable) [size=64K]
	[virtual] Expansion ROM at f7020000 [disabled] [size=64K]
	Capabilities: [40] Power Management version 3
		Flags: PMEClk- DSI- D1+ D2+ AuxCurrent=375mA PME(D0+,D1+,D2+,D3hot+,D3cold+)
		Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME-
	Capabilities: [50] MSI: Enable+ Count=1/2 Maskable- 64bit+
		Address: 00000000fee0300c  Data: 41d1
	Capabilities: [70] Express (v1) Endpoint, MSI 01
		DevCap:	MaxPayload 256 bytes, PhantFunc 0, Latency L0s <512ns, L1 <8us
			ExtTag- AttnBtn- AttnInd- PwrInd- RBE+ FLReset-
		DevCtl:	Report errors: Correctable- Non-Fatal- Fatal- Unsupported-
			RlxdOrd+ ExtTag- PhantFunc- AuxPwr- NoSnoop-
			MaxPayload 128 bytes, MaxReadReq 512 bytes
		DevSta:	CorrErr- UncorrErr- FatalErr- UnsuppReq- AuxPwr+ TransPend-
		LnkCap:	Port #0, Speed 2.5GT/s, Width x1, ASPM L0s L1, Latency L0 <512ns, L1 <64us
			ClockPM+ Surprise- LLActRep- BwNot-
		LnkCtl:	ASPM L0s Enabled; RCB 64 bytes Disabled- Retrain- CommClk+
			ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
		LnkSta:	Speed 2.5GT/s, Width x1, TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
	Capabilities: [b0] MSI-X: Enable- Count=2 Masked-
		Vector table: BAR=4 offset=00000000
		PBA: BAR=4 offset=00000800
	Capabilities: [d0] Vital Product Data
		Unknown small resource type 00, will not decode more.
	Capabilities: [100 v1] Advanced Error Reporting
		UESta:	DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
		UEMsk:	DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
		UESvrt:	DLP+ SDES+ TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt- RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol-
		CESta:	RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr-
		CEMsk:	RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr+
		AERCap:	First Error Pointer: 00, GenCap+ CGenEn- ChkCap+ ChkEn-
	Capabilities: [140 v1] Virtual Channel
		Caps:	LPEVC=0 RefClk=100ns PATEntryBits=1
		Arb:	Fixed- WRR32- WRR64- WRR128-
		Ctrl:	ArbSelect=Fixed
		Status:	InProgress-
		VC0:	Caps:	PATOffset=00 MaxTimeSlots=1 RejSnoopTrans-
			Arb:	Fixed- WRR32- WRR64- WRR128- TWRR128- WRR256-
			Ctrl:	Enable+ ID=0 ArbSelect=Fixed TC/VC=ff
			Status:	NegoPending- InProgress-
	Capabilities: [160 v1] Device Serial Number 12-34-56-78-12-34-56-78
	Kernel driver in use: r8169
	Kernel modules: r8169

Things appears as after transiting cca 400 MB, LAN card stops responding and in /var/log/messages is message about kernel oopse - see attachment.

Curiously, when using external add-on PCI-Express card with 8168B chip instead of integrated one, all seems worked fine. "lspci -vvvnn" for this card is:

03:00.0 Ethernet controller [0200]: Realtek Semiconductor Co., Ltd. RTL8111/8168B PCI Express Gigabit Ethernet controller [10ec:8168] (rev 01)
	Subsystem: Realtek Semiconductor Co., Ltd. RTL8111/8168B PCI Express Gigabit Ethernet controller [10ec:8168]
	Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+
	Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
	Latency: 0, Cache Line Size: 32 bytes
	Interrupt: pin A routed to IRQ 51
	Region 0: I/O ports at 7000 [size=256]
	Region 2: Memory at ed000000 (64-bit, non-prefetchable) [size=4K]
	[virtual] Expansion ROM at ec000000 [disabled] [size=8K]
	Capabilities: [40] Power Management version 2
		Flags: PMEClk- DSI- D1+ D2+ AuxCurrent=375mA PME(D0-,D1+,D2+,D3hot+,D3cold+)
		Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=0 PME-
	Capabilities: [48] Vital Product Data
		Unknown small resource type 00, will not decode more.
	Capabilities: [50] MSI: Enable+ Count=1/2 Maskable- 64bit+
		Address: 00000000fee0100c  Data: 41c9
	Capabilities: [60] Express (v1) Endpoint, MSI 00
		DevCap:	MaxPayload 1024 bytes, PhantFunc 0, Latency L0s <1us, L1 unlimited
			ExtTag+ AttnBtn+ AttnInd+ PwrInd+ RBE- FLReset-
		DevCtl:	Report errors: Correctable- Non-Fatal- Fatal- Unsupported-
			RlxdOrd+ ExtTag- PhantFunc- AuxPwr- NoSnoop+
			MaxPayload 128 bytes, MaxReadReq 4096 bytes
		DevSta:	CorrErr- UncorrErr- FatalErr- UnsuppReq- AuxPwr+ TransPend-
		LnkCap:	Port #0, Speed 2.5GT/s, Width x1, ASPM L0s, Latency L0 unlimited, L1 unlimited
			ClockPM- Surprise- LLActRep- BwNot-
		LnkCtl:	ASPM Disabled; RCB 64 bytes Disabled- Retrain- CommClk-
			ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
		LnkSta:	Speed 2.5GT/s, Width x1, TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
	Capabilities: [84] Vendor Specific Information: Len=4c <?>
	Capabilities: [100 v1] Advanced Error Reporting
		UESta:	DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
		UEMsk:	DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
		UESvrt:	DLP+ SDES- TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt- RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol-
		CESta:	RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr-
		CEMsk:	RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr-
		AERCap:	First Error Pointer: 00, GenCap- CGenEn- ChkCap- ChkEn-
	Capabilities: [12c v1] Virtual Channel
		Caps:	LPEVC=0 RefClk=100ns PATEntryBits=1
		Arb:	Fixed- WRR32- WRR64- WRR128-
		Ctrl:	ArbSelect=Fixed
		Status:	InProgress-
		VC0:	Caps:	PATOffset=00 MaxTimeSlots=1 RejSnoopTrans-
			Arb:	Fixed- WRR32- WRR64- WRR128- TWRR128- WRR256-
			Ctrl:	Enable+ ID=0 ArbSelect=Fixed TC/VC=ff
			Status:	NegoPending- InProgress-
	Capabilities: [148 v1] Device Serial Number c3-0d-00-00-10-ec-81-68
	Capabilities: [154 v1] Power Budgeting <?>
	Kernel driver in use: r8169
	Kernel modules: r8169

Till this time I run this system without "pcie_aspm=off", I'll try this later.

Comment 103 Richard Marko 2010-10-03 21:57:17 UTC

Same issue on Lenovo G560, random crashes, reboot required.

Workaround which helps is "acpi=nopci pcie_aspm=off" (I'm not sure whether former is required too).

Comment 104 Pavel Holica 2010-10-05 06:05:51 UTC

Update to comment 95 .

I've made a complaint to vendor of my motherboard (with this onboard nic) and got new one, which works without problem now, so it was dammaged nic in my case.

Comment 105 Frantisek Hanzlik 2010-10-20 10:58:17 UTC

Regarding my "Comment 102" - after adding "pcie_aspm=off" to kernel cmdline (without any additional acpi=off / acpi=nopci etc.) PC works two weeks (24x7, several hundert gigabytes through realtek interface) without this problem and there are no oops messages in /var/log/messages. It seems it's really some ASPM relevant bug.

Comment 106 lejeczek 2010-12-18 00:14:50 UTC

F14
2.6.35.9-64.fc14.x86_64

it feels like mentioning that this might have a pattern, as far as I can remember, all the mobos that I've had, that have double/dual nics of the same vendor with MACs numbers one after another, like in this mobo
a) 6C:F0:49:51:B4:DB
b) 6C:F0:49:51:B4:EB
have had this peculiar/unexpected/whimsical attitude towards us users.

in my case, 6C:F0:49:51:B4:EB fails, 6C:F0:49:51:B4:DB stays up/operational

moreover, `ethtool -p` for an instance, fails


[ 4258.704106] ------------[ cut here ]------------
[ 4258.704112] WARNING: at net/sched/sch_generic.c:258 dev_watchdog+0xf3/0x167()
[ 4258.704114] Hardware name: GA-790FXTA-UD5
[ 4258.704115] NETDEV WATCHDOG: eth1 (r8169): transmit queue 0 timed out
[ 4258.704117] Modules linked in: nfs fscache sit tunnel4 nfsd lockd nfs_acl auth_rpcgss exportfs sunrpc bonding xt_multiport ipt_MASQUERADE iptable_nat nf_nat ip6t_REJECT nf_conntrack_ipv6 ip6table_filter ip6_tables ipv6 kvm_amd kvm radeon ttm drm_kms_helper snd_hda_codec_atihdmi ppdev drm snd_hda_intel parport_pc parport edac_core r8169 xhci_hcd mii wmi snd_hda_codec edac_mce_amd snd_hwdep e1000 snd_seq snd_seq_device i2c_piix4 i2c_algo_bit i2c_core serio_raw snd_pcm snd_timer snd soundcore snd_page_alloc k10temp joydev tun firewire_ohci ata_generic firewire_core pata_acpi crc_itu_t pata_atiixp pata_jmicron [last unloaded: scsi_wait_scan]
[ 4258.704144] Pid: 0, comm: swapper Not tainted 2.6.35.9-64.fc14.x86_64 #1
[ 4258.704146] Call Trace:
[ 4258.704147]  <IRQ>  [<ffffffff8104d855>] warn_slowpath_common+0x85/0x9d
[ 4258.704153]  [<ffffffff8104d910>] warn_slowpath_fmt+0x46/0x48
[ 4258.704155]  [<ffffffff813d56ba>] ? netif_tx_lock+0x44/0x6d
[ 4258.704158]  [<ffffffff813d5824>] dev_watchdog+0xf3/0x167
[ 4258.704160]  [<ffffffff81059a19>] ? internal_add_timer+0xcf/0xd1
[ 4258.704163]  [<ffffffff81059ae2>] ? cascade+0x65/0x81
[ 4258.704165]  [<ffffffff81059cd4>] run_timer_softirq+0x1d6/0x2a3
[ 4258.704167]  [<ffffffff813d5731>] ? dev_watchdog+0x0/0x167
[ 4258.704170]  [<ffffffff81021ff6>] ? apic_write+0x16/0x18
[ 4258.704172]  [<ffffffff810538e5>] __do_softirq+0xdd/0x199
[ 4258.704175]  [<ffffffff81072584>] ? tick_dev_program_event+0x36/0xf4
[ 4258.704177]  [<ffffffff8107266c>] ? tick_program_event+0x2a/0x2c
[ 4258.704180]  [<ffffffff8100abdc>] call_softirq+0x1c/0x30
[ 4258.704182]  [<ffffffff8100c338>] do_softirq+0x46/0x82
[ 4258.704184]  [<ffffffff81053a45>] irq_exit+0x3b/0x7d
[ 4258.704186]  [<ffffffff8146f90a>] smp_apic_timer_interrupt+0x7e/0x8c
[ 4258.704188]  [<ffffffff8100a693>] apic_timer_interrupt+0x13/0x20
[ 4258.704189]  <EOI>  [<ffffffff8128f58d>] ? raw_local_irq_enable+0xd/0x12
[ 4258.704194]  [<ffffffff8106b478>] ? sched_clock_idle_wakeup_event+0x17/0x1b
[ 4258.704196]  [<ffffffff8129050b>] acpi_idle_enter_simple+0xd7/0x10d
[ 4258.704199]  [<ffffffff81394085>] cpuidle_idle_call+0x8b/0xe9
[ 4258.704201]  [<ffffffff81008325>] cpu_idle+0xaa/0xcc
[ 4258.704204]  [<ffffffff81462746>] start_secondary+0x24d/0x28e
[ 4258.704206] ---[ end trace 0e49e7c7501e3fe3 ]---


many thanks & regards to open-source fighters!!!
greedy corporations will burn in hell ;) should we boycott Ralink corporation??
cheers

Comment 107 vadim 2010-12-18 01:05:40 UTC

I can confirm #106.
I had too much battles with r8169.c. I newer win sinse 2.6.27.
But now I can say than problem not actually in driver, and this is some issue about missing interrupts somehow, and this is not driver or chip fault. Disabling aspm, msi and so on just changes time to reproduce.
While I tried to modify driver somehow (force msi for buggy board or change initialisation sequence) I saw situation when I didnt initiate one nic correctly, and when I didnt init first chip, then second becomes to work. But both never works together. Second chip cannot even send interrupt about link change(always link up, but may work).
This all is about two r8168b nic (mac 0x14) on Intel 910GMLE and ICH6M chipset (advantech gmb910board).
I am happy because I found it works with 2.6.36 from rawhide.
I applied patch-crs and crs=on to 2.6.35.9 (only diff I able to found), but 2.6.35 still fail to receive interrupts properly. I just use 2.6.36 for my board with dual nic, but I am interesting to backport something to make 2nic's work on 2.6.35

Comment 108 lejeczek 2010-12-18 08:38:32 UTC

only to add, that heavy traffic plays no role for me.
ssh/ping'ing to one NIC suffices to put the second one away, and this happens
instantly.

and we should boycott Realtek and Gigabyte too!!

Comment 109 vadim 2010-12-18 14:43:43 UTC

(In reply to comment #108)
> only to add, that heavy traffic plays no role for me.
> ssh/ping'ing to one NIC suffices to put the second one away, and this happens
> instantly.
> 
> and we should boycott Realtek and Gigabyte too!!

Actually kernel driver r8169 exists because of realtek's r8168 open source driver, so it is not fair to feel hurt. both drivers do exactly same things.
Could you try 2.6.36 from rawhide ? nic driver is unchanged, but some pci or pcix issues is probably solved.

Comment 110 vadim 2010-12-21 09:48:07 UTC

Another one solution that works for me is porting drivers/pci/pcix from 2.6.36 to 2.6.35 (and probably to 2.6.34, but now I use 2.6.35).

Comment 111 lejeczek 2011-01-06 10:45:38 UTC

@vadim

fair enough, but what? should we call it a nice gesture? were Realtek so generous?
you are right, we should not feel hurt, true, we should ask one question instead:

is a hardware vendor's contribution good enough, solid and continuous, and today! not in the past?

if we feel is not, then we should advise each other not to use this specific vendor/product, using simple means, eg. http://www.linuxquestions.org/hcl/index.php

we would have saved ourselves some headaches if we had known some things lack support from places where it should come strong in first place.

as of now, I going to try to stay away from Realtek, and advise everybody who asks to do the same, and it make sense cause we shop a lot.

cheers

Comment 112 lejeczek 2011-01-18 10:23:40 UTC

yes, further problems with these Realteks,
connectivity under constant & heavy traffic is a major problem!
no jumbo frames support!? max mtu seems to be 7200

Comment 113 William H. Haller 2011-01-25 15:48:31 UTC

I'm also getting many link up messages - 1 per minute. My realtek is on a PCI-E card. I tried the pcie_aspm=off option, but that didn't change anything. I dropped the tx queue length down to 100 and that has run the longest (without the unrecoverable link drop/up or multiple link up messages for around 400MB so far). There is also an onboard 8168B that is serving the LAN side that stays working. It is the outward facing card that has the problems.

Max MTU is indeed 7200, but that is jumbo frame size - just not jumbo frame to the extent that some vendors support.

The TX queue length problem and link not coming back up is very new - within the last couple kernels. Unfortunately I don't currently have older than the .63 installed, but will try that soon. Before there would be occasional drop/up sequences but they didn't happen often and always came back on their own. The driver fails at least on kernel-2.6.34.7-63.fc13.x86_64 and kernel-2.6.34.7-66.fc13.x86_64.

WARNING: at net/sched/sch_generic.c:256 dev_watchdog+0xf5/0x197()                                                                                                    
Jan 24 17:43:10 gabriel kernel: Hardware name: NC683AAR-ABA m9500y                                                                                                                                   
Jan 24 17:43:10 gabriel kernel: NETDEV WATCHDOG: eth1 (r8169): transmit queue 0 timed out                                                                                                            
Jan 24 17:43:10 gabriel kernel: Modules linked in: fuse autofs4 nfs fscache nfs_acl auth_rpcgss lockd sunrpc cpufreq_ondemand powernow_k8 freq_table ipv6 nf_conntrack_ftp ipt_LOG xt_mac xt_iprange$
Jan 24 17:43:10 gabriel kernel: Pid: 0, comm: swapper Tainted: P           2.6.34.7-66.fc13.x86_64 #1                                                                                                
Jan 24 17:43:10 gabriel kernel: Call Trace:                                                                                                                                                          
Jan 24 17:43:10 gabriel kernel: <IRQ>  [<ffffffff8104d11f>] warn_slowpath_common+0x7c/0x94                                                                                                           
Jan 24 17:43:10 gabriel kernel: [<ffffffff8104d18e>] warn_slowpath_fmt+0x41/0x43                                                                                                                     
Jan 24 17:43:10 gabriel kernel: [<ffffffff813b9dff>] ? netif_tx_lock+0x44/0x6d                                                                                                                       
Jan 24 17:43:10 gabriel kernel: [<ffffffff813b9f1d>] dev_watchdog+0xf5/0x197                                                                                                                         
Jan 24 17:43:10 gabriel kernel: [<ffffffff8106b22d>] ? sched_clock_local+0x1c/0x82                                                                                                                   
Jan 24 17:43:10 gabriel kernel: [<ffffffff81062754>] ? __queue_work+0x3a/0x42                                                                                                                        
Jan 24 17:43:10 gabriel kernel: [<ffffffff810594b6>] run_timer_softirq+0x1bf/0x263                                                                                                                   
Jan 24 17:43:10 gabriel kernel: [<ffffffff8106e3c7>] ? ktime_get+0x65/0xbe                                                                                                                           
Jan 24 17:43:10 gabriel kernel: [<ffffffff8105327a>] __do_softirq+0xe2/0x1a4                                                                                                                         
Jan 24 17:43:10 gabriel kernel: [<ffffffff810726ac>] ? tick_program_event+0x2a/0x2c                                                                                                                  
Jan 24 17:43:10 gabriel kernel: [<ffffffff8100ab5c>] call_softirq+0x1c/0x30                                                                                                                          
Jan 24 17:43:10 gabriel kernel: [<ffffffff8100c342>] do_softirq+0x46/0x83                                                                                                                            
Jan 24 17:43:10 gabriel kernel: [<ffffffff810530ee>] irq_exit+0x3b/0x7d                                                                                                                              
Jan 24 17:43:10 gabriel kernel: [<ffffffff814526b0>] smp_apic_timer_interrupt+0x8d/0x9b                                                                                                              
Jan 24 17:43:10 gabriel kernel: [<ffffffff8100a613>] apic_timer_interrupt+0x13/0x20                                                                                                                  
Jan 24 17:43:10 gabriel kernel: <EOI>  [<ffffffff8102a615>] ? native_safe_halt+0xb/0xd                                                                                                               
Jan 24 17:43:10 gabriel kernel: [<ffffffff8101164d>] default_idle+0x36/0x53                                                                                                                          
Jan 24 17:43:10 gabriel kernel: [<ffffffff81011765>] c1e_idle+0xfb/0x102                                                                                                                             
Jan 24 17:43:10 gabriel kernel: [<ffffffff81008c22>] cpu_idle+0xaa/0xe4                                                                                                                              
Jan 24 17:43:10 gabriel kernel: [<ffffffff81445659>] start_secondary+0x253/0x294                                                                                                                     
Jan 24 17:43:10 gabriel kernel: ---[ end trace 4d99175274e5070b ]---

04:00.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL8111/8168B PCI Express Gigabit Ethernet controller (rev 01)                                                                          
        Subsystem: Realtek Semiconductor Co., Ltd. RTL8111/8168B PCI Express Gigabit Ethernet controller                                                                                             
        Flags: bus master, fast devsel, latency 0, IRQ 30                                                                                                                                            
        I/O ports at ce00 [size=256]                                                                                                                                                                 
        Memory at fdcff000 (64-bit, non-prefetchable) [size=4K]                                                                                                                                      
        [virtual] Expansion ROM at fdb00000 [disabled] [size=128K]                                                                                                                                   
        Capabilities: [40] Power Management version 2                                                                                                                                                
        Capabilities: [48] Vital Product Data                                                                                                                                                        
        Capabilities: [50] MSI: Enable+ Count=1/2 Maskable- 64bit+                                                                                                                                   
        Capabilities: [60] Express Endpoint, MSI 00                                                                                                                                                  
        Capabilities: [84] Vendor Specific Information: Len=4c <?>                                                                                                                                   
        Capabilities: [100] Advanced Error Reporting                                                                                                                                                 
        Capabilities: [12c] Virtual Channel                                                                                                                                                          
        Capabilities: [148] Device Serial Number 1f-01-00-00-10-ec-81-68                                                                                                                             
        Capabilities: [154] Power Budgeting <?>                                                                                                                                                      
        Kernel driver in use: r8169                                                                                                                                                                  
        Kernel modules: r8169                                                                                                                                                                        

I went out and bought another PCI-E card, but unfortunately it has a chip set that doesn't have an available Linux driver in Fedora. There are some on the net but I'm not enthused going that route. I have another that I know is supported on backorder that should work but any resolution to this in the meantime would be greatly appreciated.

Comment 114 William H. Haller 2011-01-31 15:36:12 UTC

A work around that appears to be succeeding for me is disabling the cpuspeed service completely. I did this Saturday morning and haven't had a tx lockup since. I was noticing that the lockup tended to occur when I was disabling the cpuspeed service automatically at backup time so the system could quickly back up and when it was disabled automatically during the day for normal operations. The crash time frequently coincides with the actual cpuspeed switch from slow to fast.

Since removing these actions from cron and letting it run full speed, all has been well. Admittedly it has been under two days, but that is better than I had been seeing. It could also be reduced weekend network load (600 MB so far), so at this point this information may be anecdotal. Thought it was worth mentioning in case anyone else could see if it made a difference for them.

This same sequence is being run on other systems as well which have the 8169, but all of those are on the main board. This is the only one on the PCI-E bus and only the PCI-E bus NIC seems affected even on this box the majority of the time. I think I have only seen one instance where the onboard ports locked up on this box and once on an older i386 box. The i386 box was running a PCI based NIC. The one time it locked up on this box may have been a quirk thrown by the lockup on the PCI-E card - unknown.

Comment 115 Douglas E. Warner 2011-02-11 17:43:55 UTC

My Atom 330 CPU doesn't support CPU frequency scaling and still exhibits this problem (kernel-2.6.35.10-74.fc14.x86_64).  I've set the pcie_aspm=off option and am waiting to see if I continue to have problems.

Comment 116 v.plessky 2011-04-02 15:06:33 UTC

I have seen similar crashes for Realtek adapter, but on old kernels.

For kernel 2.6.37 and newer - I don't see such crash.
So it worth to test latest Fedora 15 builds (2.6.38.x kernel).

Comment 117 Andrey Motoshkov 2011-05-11 11:21:43 UTC

Hi. Same behaviour - different HW.
# ethtool -i eth0
driver: tg3
version: 3.110
firmware-version: 5755m-v3.29
bus-info: 0000:09:00.0

How to reproduce: I was trying to upload (TX) ~3G file from  my laptop (trying cifs mount and scp). Download (RX) was not affected.

May 11 14:01:20 dragonfly kernel: [ 1731.712169] ------------[ cut here ]------------
May 11 14:01:20 dragonfly kernel: [ 1731.712186] WARNING: at net/sched/sch_generic.c:258 dev_watchdog+0xf3/0x167()
May 11 14:01:20 dragonfly kernel: [ 1731.712193] Hardware name: Latitude D630
May 11 14:01:20 dragonfly kernel: [ 1731.712199] NETDEV WATCHDOG: eth0 (tg3): transmit queue 0 timed out
May 11 14:01:20 dragonfly kernel: [ 1731.712205] Modules linked in: nls_utf8 hidp fuse rfcomm sco bnep l2cap coretemp sunrpc cpufreq_ondemand acpi_cpufreq freq_table mperf ip6t_REJECT nf_conntrack_ipv6 ip6table_filter ip6_tables ipv6 kvm_intel kvm uinput arc4 snd_hda_codec_idt ecb nvidia(P) iwl3945 snd_hda_intel iwlcore snd_hda_codec snd_hwdep snd_seq snd_seq_device btusb mac80211 snd_pcm cfg80211 bluetooth snd_timer snd dell_laptop dell_wmi rfkill tg3 i2c_i801 joydev i2c_core dcdbas soundcore snd_page_alloc wmi microcode firewire_ohci firewire_core crc_itu_t yenta_socket video output [last unloaded: scsi_wait_scan]
May 11 14:01:20 dragonfly kernel: [ 1731.712331] Pid: 0, comm: swapper Tainted: P            2.6.35.13-91.fc14.x86_64 #1
May 11 14:01:20 dragonfly kernel: [ 1731.712337] Call Trace:
May 11 14:01:20 dragonfly kernel: [ 1731.712342]  <IRQ>  [<ffffffff8104dcf1>] warn_slowpath_common+0x85/0x9d
May 11 14:01:20 dragonfly kernel: [ 1731.712363]  [<ffffffff8104ddac>] warn_slowpath_fmt+0x46/0x48
May 11 14:01:20 dragonfly kernel: [ 1731.712373]  [<ffffffff813d65aa>] ? netif_tx_lock+0x44/0x6d
May 11 14:01:20 dragonfly kernel: [ 1731.712382]  [<ffffffff813d6714>] dev_watchdog+0xf3/0x167
May 11 14:01:20 dragonfly kernel: [ 1731.712393]  [<ffffffff8146ac57>] ? _raw_spin_unlock_irqrestore+0x17/0x19
May 11 14:01:20 dragonfly kernel: [ 1731.712404]  [<ffffffff8106305d>] ? __queue_work+0x3a/0x43
May 11 14:01:20 dragonfly kernel: [ 1731.712415]  [<ffffffff8105a1f4>] run_timer_softirq+0x1d6/0x2a3
May 11 14:01:20 dragonfly kernel: [ 1731.712426]  [<ffffffff81071b0c>] ? clockevents_program_event+0x8e/0x90
May 11 14:01:20 dragonfly kernel: [ 1731.712435]  [<ffffffff813d6621>] ? dev_watchdog+0x0/0x167
May 11 14:01:20 dragonfly kernel: [ 1731.712446]  [<ffffffff81053dd9>] __do_softirq+0xf0/0x1bf
May 11 14:01:20 dragonfly kernel: [ 1731.712456]  [<ffffffff8100ca3a>] ? timer_interrupt+0x1e/0x25
May 11 14:01:20 dragonfly kernel: [ 1731.712466]  [<ffffffff8100abdc>] call_softirq+0x1c/0x30
May 11 14:01:20 dragonfly kernel: [ 1731.712475]  [<ffffffff8100c338>] do_softirq+0x46/0x82
May 11 14:01:20 dragonfly kernel: [ 1731.712484]  [<ffffffff81053f65>] irq_exit+0x49/0x8b
May 11 14:01:20 dragonfly kernel: [ 1731.712493]  [<ffffffff81470b85>] do_IRQ+0x9d/0xb4
May 11 14:01:20 dragonfly kernel: [ 1731.712503]  [<ffffffff8146b093>] ret_from_intr+0x0/0x11
May 11 14:01:20 dragonfly kernel: [ 1731.712508]  <EOI>  [<ffffffff81290020>] ? raw_local_irq_enable+0x10/0x12
May 11 14:01:20 dragonfly kernel: [ 1731.712528]  [<ffffffff8106ba54>] ? sched_clock_idle_wakeup_event+0x17/0x1b
May 11 14:01:20 dragonfly kernel: [ 1731.712538]  [<ffffffff81290e8c>] acpi_idle_enter_bm+0x228/0x260
May 11 14:01:20 dragonfly kernel: [ 1731.712549]  [<ffffffff81394d5d>] cpuidle_idle_call+0x8b/0xe9
May 11 14:01:20 dragonfly kernel: [ 1731.712560]  [<ffffffff81008325>] cpu_idle+0xaa/0xcc
May 11 14:01:20 dragonfly kernel: [ 1731.712571]  [<ffffffff81452866>] rest_init+0x8a/0x8c
May 11 14:01:20 dragonfly kernel: [ 1731.712582]  [<ffffffff81ba1c49>] start_kernel+0x40b/0x416
May 11 14:01:20 dragonfly kernel: [ 1731.712593]  [<ffffffff81ba12c6>] x86_64_start_reservations+0xb1/0xb5
May 11 14:01:20 dragonfly kernel: [ 1731.712603]  [<ffffffff81ba13c2>] x86_64_start_kernel+0xf8/0x107
May 11 14:01:20 dragonfly kernel: [ 1731.712610] ---[ end trace 7fff863281528ef4 ]---

Tried rmmod & modprobe - nothing. Only reboot helped in my case.
Also tested new driver v3.116j from Broadcom - the same. 
Workaround:
After adding "pcie_aspm=off" to kernel boot-ops - works perfectly with standard kernel driver.

Comment 118 Andrey Motoshkov 2011-05-11 14:45:02 UTC

Some sad correction to my previous comment.
pcie_aspm=off helped me to "cross" 3G line.
I tried to upload 23G file (cifs mount) and my network died at 19.7G point with the same:
May 11 15:16:16 dragonfly kernel: [ 4454.720153] ------------[ cut here ]------------
May 11 15:16:16 dragonfly kernel: [ 4454.720171] WARNING: at net/sched/sch_generic.c:258 dev_watchdog+0xf3/0x167()
May 11 15:16:16 dragonfly kernel: [ 4454.720178] Hardware name: Latitude D630
May 11 15:16:16 dragonfly kernel: [ 4454.720184] NETDEV WATCHDOG: eth0 (tg3): transmit queue 0 timed out
May 11 15:16:16 dragonfly kernel: [ 4454.720190] Modules linked in: cifs nls_utf8 hidp fuse rfcomm sco bnep l2cap coretemp sunrpc cpufreq_ondemand acpi_cpufreq freq_table mperf ip6t_REJECT nf_conntrack_ipv6 ip6table_filter ip6_tables ipv6 kvm_intel kvm uinput snd_hda_codec_idt snd_hda_intel snd_hda_codec arc4 ecb snd_hwdep nvidia(P) snd_seq iwl3945 snd_seq_device snd_pcm iwlcore mac80211 snd_timer dell_wmi snd btusb cfg80211 dell_laptop tg3 bluetooth i2c_i801 soundcore snd_page_alloc rfkill wmi dcdbas i2c_core joydev microcode firewire_ohci firewire_core crc_itu_t yenta_socket video output [last unloaded: scsi_wait_scan]
May 11 15:16:16 dragonfly kernel: [ 4454.720319] Pid: 0, comm: swapper Tainted: P            2.6.35.13-91.fc14.x86_64 #1
May 11 15:16:16 dragonfly kernel: [ 4454.720325] Call Trace:
May 11 15:16:16 dragonfly kernel: [ 4454.720330]  <IRQ>  [<ffffffff8104dcf1>] warn_slowpath_common+0x85/0x9d
May 11 15:16:16 dragonfly kernel: [ 4454.720350]  [<ffffffff8104ddac>] warn_slowpath_fmt+0x46/0x48
May 11 15:16:16 dragonfly kernel: [ 4454.720360]  [<ffffffff813d65aa>] ? netif_tx_lock+0x44/0x6d
May 11 15:16:16 dragonfly kernel: [ 4454.720368]  [<ffffffff813d6714>] dev_watchdog+0xf3/0x167
May 11 15:16:16 dragonfly kernel: [ 4454.720380]  [<ffffffff8146ac57>] ? _raw_spin_unlock_irqrestore+0x17/0x19
May 11 15:16:16 dragonfly kernel: [ 4454.720391]  [<ffffffff8106305d>] ? __queue_work+0x3a/0x43
May 11 15:16:16 dragonfly kernel: [ 4454.720402]  [<ffffffff8105a1f4>] run_timer_softirq+0x1d6/0x2a3
May 11 15:16:16 dragonfly kernel: [ 4454.720411]  [<ffffffff813d6621>] ? dev_watchdog+0x0/0x167
May 11 15:16:16 dragonfly kernel: [ 4454.720421]  [<ffffffff810221ba>] ? apic_write+0x16/0x18
May 11 15:16:16 dragonfly kernel: [ 4454.720431]  [<ffffffff81053dd9>] __do_softirq+0xf0/0x1bf
May 11 15:16:16 dragonfly kernel: [ 4454.720442]  [<ffffffff81072b8c>] ? tick_dev_program_event+0x36/0xf4
May 11 15:16:16 dragonfly kernel: [ 4454.720451]  [<ffffffff81072c74>] ? tick_program_event+0x2a/0x2c
May 11 15:16:16 dragonfly kernel: [ 4454.720461]  [<ffffffff8100abdc>] call_softirq+0x1c/0x30
May 11 15:16:16 dragonfly kernel: [ 4454.720471]  [<ffffffff8100c338>] do_softirq+0x46/0x82
May 11 15:16:16 dragonfly kernel: [ 4454.720480]  [<ffffffff81053f65>] irq_exit+0x49/0x8b
May 11 15:16:16 dragonfly kernel: [ 4454.720489]  [<ffffffff81470c1a>] smp_apic_timer_interrupt+0x7e/0x8c
May 11 15:16:16 dragonfly kernel: [ 4454.720498]  [<ffffffff8100a693>] apic_timer_interrupt+0x13/0x20
May 11 15:16:16 dragonfly kernel: [ 4454.720504]  <EOI>  [<ffffffff81394d75>] ? cpuidle_idle_call+0xa3/0xe9
May 11 15:16:16 dragonfly kernel: [ 4454.720521]  [<ffffffff81394d5d>] ? cpuidle_idle_call+0x8b/0xe9
May 11 15:16:16 dragonfly kernel: [ 4454.720532]  [<ffffffff81008325>] cpu_idle+0xaa/0xcc
May 11 15:16:16 dragonfly kernel: [ 4454.720543]  [<ffffffff81463a4d>] start_secondary+0x24d/0x28e
May 11 15:16:16 dragonfly kernel: [ 4454.720551] ---[ end trace af2f2eed990b4dcb ]---

So is this still relevant to the bug or should I post my trace into another?

Comment 119 Bug Zapper 2011-06-02 17:23:02 UTC

This message is a reminder that Fedora 13 is nearing its end of life.
Approximately 30 (thirty) days from now Fedora will stop maintaining
and issuing updates for Fedora 13.  It is Fedora's policy to close all
bug reports from releases that are no longer maintained.  At that time
this bug will be closed as WONTFIX if it remains open with a Fedora 
'version' of '13'.

Package Maintainer: If you wish for this bug to remain open because you
plan to fix it in a currently maintained version, simply change the 'version' 
to a later Fedora version prior to Fedora 13's end of life.

Bug Reporter: Thank you for reporting this issue and we are sorry that 
we may not be able to fix it before Fedora 13 is end of life.  If you 
would still like to see this bug fixed and are able to reproduce it 
against a later version of Fedora please change the 'version' of this 
bug to the applicable version.  If you are unable to change the version, 
please add a comment here and someone will do it for you.

Although we aim to fix as many bugs as possible during every release's 
lifetime, sometimes those efforts are overtaken by events.  Often a 
more recent Fedora release includes newer upstream software that fixes 
bugs or makes them obsolete.

The process we are following is described here: 
http://fedoraproject.org/wiki/BugZappers/HouseKeeping

Comment 120 Curtis Doty 2011-06-18 19:44:32 UTC

Is this the same issue in Fedora 14 e1000e driver on Supermicro with Opteron 6128? If so, lets keep this bug alive by moving to Fedora 14.

> [319374.704049] ------------[ cut here ]------------
> [319374.704059] WARNING: at net/sched/sch_generic.c:258 dev_watchdog+0xf3/0x167()
> [319374.704063] Hardware name: H8SGL
> [319374.704066] NETDEV WATCHDOG: eth0 (e1000e): transmit queue 0 timed out
> [319374.704072] Modules linked in: ext2 i2c_piix4 amd64_edac_mod i2c_core e1000e k10temp edac_core edac_mce_amd serio_raw microcode pata_acpi ata_generic pata_atiixp [last unloaded: scsi_wait_scan]
> [319374.704086] Pid: 0, comm: swapper Not tainted 2.6.35.13-91.fc14.x86_64 #1
> [319374.704088] Call Trace:
> [319374.704091]  <IRQ>  [<ffffffff8104dcf1>] warn_slowpath_common+0x85/0x9d
> [319374.704102]  [<ffffffff8104ddac>] warn_slowpath_fmt+0x46/0x48
> [319374.704106]  [<ffffffff813d65aa>] ? netif_tx_lock+0x44/0x6d
> [319374.704109]  [<ffffffff813d6714>] dev_watchdog+0xf3/0x167
> [319374.704115]  [<ffffffff8146ac57>] ? _raw_spin_unlock_irqrestore+0x17/0x19
> [319374.704120]  [<ffffffff8106305d>] ? __queue_work+0x3a/0x43
> [319374.704126]  [<ffffffff8105a1f4>] run_timer_softirq+0x1d6/0x2a3
> [319374.704130]  [<ffffffff8101058b>] ? native_sched_clock+0x35/0x37
> [319374.704133]  [<ffffffff813d6621>] ? dev_watchdog+0x0/0x167
> [319374.704138]  [<ffffffff81053dd9>] __do_softirq+0xf0/0x1bf
> [319374.704143]  [<ffffffff81010207>] ? paravirt_read_tsc+0x9/0xd
> [319374.704147]  [<ffffffff8106b7dd>] ? sched_clock_local+0x12/0x75
> [319374.704151]  [<ffffffff8100abdc>] call_softirq+0x1c/0x30
> [319374.704154]  [<ffffffff8100c338>] do_softirq+0x46/0x82
> [319374.704158]  [<ffffffff81053f65>] irq_exit+0x49/0x8b
> [319374.704162]  [<ffffffff81470c1a>] smp_apic_timer_interrupt+0x7e/0x8c
> [319374.704166]  [<ffffffff8100a693>] apic_timer_interrupt+0x13/0x20
> [319374.704168]  <EOI>  [<ffffffff8102b7dd>] ? native_safe_halt+0xb/0xd
> [319374.704175]  [<ffffffff81010f13>] ? need_resched+0x23/0x2d
> [319374.704179]  [<ffffffff8101103a>] default_idle+0x34/0x4f
> [319374.704183]  [<ffffffff81008325>] cpu_idle+0xaa/0xcc
> [319374.704189]  [<ffffffff81463a4d>] start_secondary+0x24d/0x28e
> [319374.704192] ---[ end trace 75cb8a9a138ee284 ]---
> [319374.704220] e1000e 0000:03:00.0: eth0: Reset adapter

Also FYI, I ran the upstream driver e1000e-1.2.20 for awhile, and the problem didn't manifest.

However, here I've since disabled the PowerNow in BIOS as well as pcie_aspm=off in the kernel. And no lockups yet...

Comment 121 Bug Zapper 2011-06-27 14:33:05 UTC

Fedora 13 changed to end-of-life (EOL) status on 2011-06-25. Fedora 13 is 
no longer maintained, which means that it will not receive any further 
security or bug fix updates. As a result we are closing this bug.

If you can reproduce this bug against a currently maintained version of 
Fedora please feel free to reopen this bug against that version.

Thank you for reporting this bug and we are sorry it could not be fixed.

Note You need to log in before you can comment on or make changes to this bug.

adrianx
antonio
awilliam
bill
break19
bugzilla
codehotter
curtis
delusion_master
dennis
dougsland
eeriegeek
evan
franta
gansalmon
hideki
itamar
james.leddy
jfeeney
jifl-bugzilla
jkf385
johann.ransay
kernel-maint
klaas.de.waal
kmcmartin
kt
magic
maverick.pt
mhlavink
mishu
mkarg
motoskov
nerijus
nlunde
peljasz
peterd
peterm
pgunn
pholdaway
pholica
redhat-kevin
redhat
renich
rissko
rod
sergiovl
silfreed
spacewar
tao
thorsten
tom
unknown32
vadim.v.panov
v.plessky
wishmaster