Bug 547517

Summary: r8169 transmit queue 0 timed out
Product: [Fedora] Fedora Reporter: Mace Moneta <moneta.mace>
Component: kernelAssignee: Kernel Maintainer List <kernel-maint>
Status: CLOSED DUPLICATE QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: medium Docs Contact:
Priority: low    
Version: 12CC: dougsland, gansalmon, itamar, kernel-maint, mhlavink, moneta.mace, unknown32, vadim.v.panov
Target Milestone: ---   
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2010-07-11 08:27:10 EDT Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:

Description Mace Moneta 2009-12-14 15:40:53 EST
Description of problem:

While copying some large files from machine to machine on the same gig-E LAN, the copy paused and dmesg reported:

WARNING: at net/sched/sch_generic.c:246 dev_watchdog+0xf3/0x164() (Not tainted)
Hardware name: C2SEA
NETDEV WATCHDOG: eth0 (r8169): transmit queue 0 timed out
Modules linked in: fuse w83627ehf hwmon_vid coretemp cpufreq_ondemand acpi_cpufreq freq_table ipv6 kvm_intel kvm uinput usblp snd_hda_codec_intelhdmi snd_hda_codec_rea
ltek snd_hda_intel snd_hda_codec snd_hwdep snd_seq snd_seq_device snd_pcm snd_timer snd soundcore ppdev r8169 snd_page_alloc iTCO_wdt iTCO_vendor_support i2c_i801 parp
ort_pc parport mii raid0 raid1 firewire_ohci pata_acpi ata_generic dm_multipath firewire_core crc_itu_t pata_it8213 i915 drm_kms_helper drm i2c_algo_bit i2c_core video
 output [last unloaded: scsi_wait_scan]
Pid: 0, comm: swapper Not tainted 2.6.31.6-166.fc12.x86_64 #1
Call Trace:
 <IRQ>  [<ffffffff810516f4>] warn_slowpath_common+0x84/0x9c
 [<ffffffff81051763>] warn_slowpath_fmt+0x41/0x43
 [<ffffffff8138e831>] ? netif_tx_lock+0x44/0x6d
 [<ffffffff8138e99b>] dev_watchdog+0xf3/0x164
 [<ffffffff8106eb3b>] ? getnstimeofday+0x5b/0xaf
 [<ffffffff81064068>] ? __queue_work+0x3a/0x43
 [<ffffffff8106c562>] ? sched_clock_cpu+0x16e/0x176
 [<ffffffff8105bec4>] run_timer_softirq+0x19f/0x21c
 [<ffffffff8106e8b3>] ? clocksource_read+0xf/0x11
 [<ffffffff8102566a>] ? apic_write+0x16/0x18
 [<ffffffff81057614>] __do_softirq+0xdd/0x1ad
 [<ffffffff81012eac>] call_softirq+0x1c/0x30
 [<ffffffff810143fb>] do_softirq+0x47/0x8d
 [<ffffffff81057326>] irq_exit+0x44/0x86
 [<ffffffff8141ed92>] smp_apic_timer_interrupt+0x86/0x94
 [<ffffffff81012873>] apic_timer_interrupt+0x13/0x20
 <EOI>  [<ffffffff812679dd>] ? acpi_idle_enter_bm+0x281/0x2b5
 [<ffffffff812679d6>] ? acpi_idle_enter_bm+0x27a/0x2b5
 [<ffffffff81353b7f>] ? cpuidle_idle_call+0x99/0xce
 [<ffffffff81010c60>] ? cpu_idle+0xa6/0xe9
 [<ffffffff8141489e>] ? start_secondary+0x1f3/0x234
---[ end trace f81f94a8bc7ef390 ]---
r8169: eth0: link up

After which the file copy resumed.

Version-Release number of selected component (if applicable):

kernel-2.6.31.6-166.fc12.x86_64

How reproducible:

Unknown

Steps to Reproduce:
1.Copy large files over gig-E network
2.
3.
  
Actual results:

Brief network outage

Expected results:

No error

Additional info: 

02:00.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL8111/8168B PCI Express Gigabit Ethernet controller (rev 02)

Jumbo frames in use (7200), copy performed via rsync

Both machines have the same hardware/software configuration.  There was no error on the destination (receiving) machine.
Comment 1 Mace Moneta 2009-12-14 18:00:32 EST
After restarting the transfer a few times when the outage lasted a little too long, I tried reducing the rsync bandwidth by about 5Mb/s from what it was unrestricted (--bwlimit=35000).

That appears to prevent the problem from reoccurring.  Does the driver need to reserve some bandwidth, or throttle transmission when the TX queue depth exceeds a threshold?
Comment 2 unknown32 2010-03-11 00:23:54 EST
I am also having this issue.
But it is happening both when transferring files from  a computer to a SAN and vise versa. I will also get this when I am just using the internet.  This computer has been on  ubuntu 9.04  64 bit & Centos 5.4 where this is not an issue. Problems occurring on kernel 2.6.32.9-70.fc12.x86_64. Also happens when I tried rawhide as well.
Comment 3 vadim 2010-05-25 07:25:56 EDT
I am also having this issue.
It happening when transferring large amount of data.
Module r8169 is fixed since 2.6.34-rc4.
I used backported driver from 2.6.34-rc7 on FC13 2.6.33.4-95, kernel works fine, but connection itself dont.
I tryed driver from 2.6.30 on FC13, and same bug appeared.
Unfortunately fix from 2.6.34-rc4 only removes kernel error, but network connection still becomes frozen after transferring some data.
Another words this is not one bug, but two.
1st bug - causing invalid reset_task processing in r8169 and kernel error
2nd bug much deeper, causing overflow conditions occur more frequently or invalid overflow proccessing
Comment 4 vadim 2010-05-26 05:41:32 EDT
partial fix and workaround https://bugzilla.kernel.org/show_bug.cgi?id=12411 may help.
Comment 5 Mace Moneta 2010-07-11 08:27:10 EDT
This bug appears to be a duplicate of Bug 538920.  Closing.

*** This bug has been marked as a duplicate of bug 538920 ***