Bug 751637

Summary: NETDEV WATCHDOG: eth1 (r8169): transmit queue 0 timed out
Product: [Fedora] Fedora Reporter: Christopher Murtagh <christopher.murtagh>
Component: kernelAssignee: Kernel Maintainer List <kernel-maint>
Status: CLOSED DUPLICATE QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: high Docs Contact:
Priority: unspecified    
Version: 15CC: fredericg_99, gansalmon, itamar, jonathan, kernel-maint, madhu.chinakonda
Target Milestone: ---   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2011-11-07 17:46:39 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Christopher Murtagh 2011-11-06 18:21:41 UTC
Description of problem:

After some time interval, the network stops working. This message above is put into dmesg along with:

[ 7346.905279] ------------[ cut here ]------------
[ 7346.905290] WARNING: at net/sched/sch_generic.c:255 dev_watchdog+0xf0/0x150()
[ 7346.905293] Hardware name: System Product Name
[ 7346.905296] NETDEV WATCHDOG: eth1 (r8169): transmit queue 0 timed out
[ 7346.905298] Modules linked in: bnep bluetooth fuse ppdev parport_pc lp parport sunrpc cpufreq_ondemand acpi_cpufreq mperf xt_physdev xt_recent ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 ipt_MASQUERADE xt_state iptable_nat nf_nat nf_conntrack_ipv4 nf_conntrack nf_defrag_ipv4 ip6table_filter hwmon_vid ip6_tables coretemp snd_hda_codec_realtek snd_hda_intel snd_hda_codec snd_hwdep snd_seq snd_seq_device r8169 snd_pcm snd_timer snd soundcore iTCO_wdt mii iTCO_vendor_support i2c_i801 snd_page_alloc eeepc_wmi asus_wmi microcode sparse_keymap rfkill serio_raw ipv6 raid1 nouveau ttm drm_kms_helper drm i2c_algo_bit i2c_core mxm_wmi wmi video [last unloaded: scsi_wait_scan]
[ 7346.905350] Pid: 0, comm: swapper Not tainted 2.6.40.8-4.fc15.x86_64 #1
[ 7346.905352] Call Trace:
[ 7346.905354]  <IRQ>  [<ffffffff81054c2e>] warn_slowpath_common+0x83/0x9b
[ 7346.905366]  [<ffffffff81054ce9>] warn_slowpath_fmt+0x46/0x48
[ 7346.905371]  [<ffffffff813f1e21>] ? netif_tx_lock+0x4a/0x7c
[ 7346.905377]  [<ffffffff813f1f97>] dev_watchdog+0xf0/0x150
[ 7346.905382]  [<ffffffff81061d45>] run_timer_softirq+0x19b/0x280
[ 7346.905387]  [<ffffffff8100e975>] ? paravirt_read_tsc+0x9/0xd
[ 7346.905392]  [<ffffffff813f1ea7>] ? netif_tx_unlock+0x54/0x54
[ 7346.905396]  [<ffffffff8105a8fb>] __do_softirq+0xc9/0x1b5
[ 7346.905400]  [<ffffffff8100e975>] ? paravirt_read_tsc+0x9/0xd
[ 7346.905405]  [<ffffffff8149025c>] call_softirq+0x1c/0x30
[ 7346.905408]  [<ffffffff8100abb9>] do_softirq+0x46/0x81
[ 7346.905412]  [<ffffffff8105abdd>] irq_exit+0x57/0xb1
[ 7346.905416]  [<ffffffff81490b71>] smp_apic_timer_interrupt+0x7c/0x8a
[ 7346.905421]  [<ffffffff8148fa13>] apic_timer_interrupt+0x13/0x20
[ 7346.905423]  <EOI>  [<ffffffff8100e975>] ? paravirt_read_tsc+0x9/0xd
[ 7346.905432]  [<ffffffff812848a0>] ? intel_idle+0xd8/0x100
[ 7346.905435]  [<ffffffff81284882>] ? intel_idle+0xba/0x100
[ 7346.905441]  [<ffffffff813b0169>] cpuidle_idle_call+0xd7/0x168
[ 7346.905446]  [<ffffffff81008307>] cpu_idle+0xa5/0xdf
[ 7346.905452]  [<ffffffff81467c8e>] rest_init+0x72/0x74
[ 7346.905457]  [<ffffffff81b66b8b>] start_kernel+0x3ca/0x3d5
[ 7346.905461]  [<ffffffff81b662c4>] x86_64_start_reservations+0xaf/0xb3
[ 7346.905465]  [<ffffffff81b66140>] ? early_idt_handlers+0x140/0x140
[ 7346.905469]  [<ffffffff81b663ca>] x86_64_start_kernel+0x102/0x111
[ 7346.905472] ---[ end trace f493fe877676f3e1 ]---


Network connectivity is restored with 'service network restart'. 


Version-Release number of selected component (if applicable):

2.6.40.x

How reproducible:

Consistently. Every time I revert back to the 2.6.40 kernels.

Steps to Reproduce:
1. Boot into 2.6.40
2. Wait some amount of time (minutes or hours)
3. There is no step 3!
 
I resolve this bug by reverting to the last 2.6.3x kernel I have (kernel-2.6.38.8-35.fc15.x86_64) where everything is stable.

I can provide more detailed hardware info if necessary.

IPv6 networking was enabled, but I'll disable it for now to see if this has any effect (when coming here to post this bug, I saw in other old/closed bugs that IPv6 might be a potential issue).

Comment 1 Christopher Murtagh 2011-11-07 03:10:44 UTC
Ok, after disabling IPv6 (and ip6tables), I still had the same problem several hours later:

[39816.520107] irq 17: nobody cared (try booting with the "irqpoll" option)
[39816.520114] Pid: 0, comm: swapper Tainted: G        W   2.6.40.8-4.fc15.x86_64 #1
[39816.520116] Call Trace:
[39816.520119]  <IRQ>  [<ffffffff810af848>] __report_bad_irq+0x38/0xc3
[39816.520131]  [<ffffffff810afadf>] note_interrupt+0x173/0x1f0
[39816.520135]  [<ffffffff810ae116>] handle_irq_event_percpu+0x15d/0x1a5
[39816.520138]  [<ffffffff810ae196>] handle_irq_event+0x38/0x56
[39816.520145]  [<ffffffff810754dc>] ? sched_clock_cpu+0x42/0xc6
[39816.520148]  [<ffffffff810b0278>] handle_fasteoi_irq+0x77/0x9b
[39816.520153]  [<ffffffff8100ab6d>] handle_irq+0x88/0x8e
[39816.520157]  [<ffffffff81490a9d>] do_IRQ+0x4d/0xa5
[39816.520162]  [<ffffffff81488cd3>] common_interrupt+0x13/0x13
[39816.520164]  <EOI>  [<ffffffff8100e975>] ? paravirt_read_tsc+0x9/0xd
[39816.520174]  [<ffffffff812848a0>] ? intel_idle+0xd8/0x100
[39816.520177]  [<ffffffff81284882>] ? intel_idle+0xba/0x100
[39816.520183]  [<ffffffff813b0169>] cpuidle_idle_call+0xd7/0x168
[39816.520188]  [<ffffffff81008307>] cpu_idle+0xa5/0xdf
[39816.520193]  [<ffffffff81467c8e>] rest_init+0x72/0x74
[39816.520198]  [<ffffffff81b66b8b>] start_kernel+0x3ca/0x3d5
[39816.520203]  [<ffffffff81b662c4>] x86_64_start_reservations+0xaf/0xb3
[39816.520207]  [<ffffffff81b66140>] ? early_idt_handlers+0x140/0x140
[39816.520210]  [<ffffffff81b663ca>] x86_64_start_kernel+0x102/0x111
[39816.520213] handlers:
[39816.520223] [<ffffffffa01cc292>] rtl8169_interrupt
[39816.520226] Disabling IRQ #17
[39840.635863] r8169 0000:04:01.0: eth1: link up
[39870.591319] r8169 0000:04:01.0: eth1: link up

So, it looks like reverting back to 2.6.38 is what I'm going to have to do (I can't afford to have this particular machine go down like that).

Comment 2 Dave Jones 2011-11-07 17:46:39 UTC

*** This bug has been marked as a duplicate of bug 715137 ***