625776 – e1000e crashes with Intel 82574L

Bug 625776 - e1000e crashes with Intel 82574L

Summary: e1000e crashes with Intel 82574L

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Fedora
Classification:	Fedora
Component:	kernel
Sub Component:
Version:	17
Hardware:	x86_64
OS:	Linux
Priority:	low
Severity:	high
Target Milestone:	---
Assignee:	Kernel Maintainer List
QA Contact:	Fedora Extras Quality Assurance
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2010-08-20 12:21 UTC by Brian
Modified:	2012-12-06 21:35 UTC (History)
CC List:	18 users (show)
Fixed In Version:
Clone Of:
Environment:
Last Closed:	2012-09-05 13:38:14 UTC
Type:	---
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Description Brian 2010-08-20 12:21:32 UTC

Description of problem:
I don't think there was actually heavy traffic going on.


Version-Release number of selected component (if applicable):
kernel-2.6.35.2-9.fc13.x86_64   (compiled from fc15 version from koji)

How reproducible:
Just run the server.

Steps to Reproduce:
1.  Just run traffic.
2.
3.
  
Actual results:
eth1 drops and won't come back

Expected results:
bliss

Additional info:

I could attach the kerneloops from /var/cache/abrt/ -- as a tarball?

Aug 20 01:38:57 hq2 kernel: ------------[ cut here ]------------
Aug 20 01:38:57 hq2 kernel: WARNING: at net/sched/sch_generic.c:258 dev_watchdog+0xfb/0x169()
Aug 20 01:38:57 hq2 kernel: Hardware name: H8SGL
Aug 20 01:38:57 hq2 kernel: NETDEV WATCHDOG: eth1 (e1000e): transmit queue 0 timed out
Aug 20 01:38:57 hq2 kernel: Modules linked in: ebtable_nat ebtables act_police cls_flow cls_fw cls_u32 sch_htb sch_hfsc sch_ingress sch_sfq bridge stp llc xt_time xt_connlimit xt_realm iptable_raw xt_comment xt_recent xt_policy ipt_ULOG ipt_REDIRECT ipt_NETMAP ipt_MASQUERADE ipt_ECN ipt_ecn ipt_CLUSTERIP ipt_ah ipt_addrtype nf_nat_tftp nf_nat_snmp_basic nf_nat_sip nf_nat_pptp nf_nat_proto_gre nf_nat_irc nf_nat_h323 nf_nat_ftp nf_nat_amanda ts_kmp nf_conntrack_amanda nf_conntrack_sane nf_conntrack_tftp nf_conntrack_sip nf_conntrack_proto_sctp nf_conntrack_pptp nf_conntrack_proto_gre nf_conntrack_netlink nf_conntrack_netbios_ns nf_conntrack_irc nf_conntrack_h323 nf_conntrack_ftp xt_TPROXY nf_tproxy_core xt_tcpmss xt_pkttype xt_physdev xt_owner xt_NFQUEUE xt_NFLOG nfnetlink_log xt_multiport xt_mark xt_mac xt_limit xt_length xt_iprange xt_helper xt_hashlimit xt_DSCP xt_dscp xt_dccp xt_connmark xt_CLASSIFY ipt_LOG iptable_nat nf_nat iptable_mangle nfnetlink fuse sunrpc tun cpufreq_ondemand powernow_k8 freq_table m
Aug 20 01:38:57 hq2 kernel: perf ip6t_REJECT nf_conntrack_ipv6 ip6table_filter ip6_tables ipv6 kvm_amd kvm uinput amd64_edac_mod i2c_piix4 edac_core e1000e i2c_core k10temp microcode edac_mce_amd btrfs zlib_deflate libcrc32c raid1 pata_acpi ata_generic usb_storage sata_promise pata_atiixp sata_sil24 [last unloaded: scsi_wait_scan]
Aug 20 01:38:57 hq2 kernel: Pid: 0, comm: swapper Not tainted 2.6.35.2-9.fc13.x86_64 #1
Aug 20 01:38:57 hq2 kernel: Call Trace:
Aug 20 01:38:57 hq2 kernel: <IRQ>  [<ffffffff81050f5d>] warn_slowpath_common+0x85/0x9d
Aug 20 01:38:57 hq2 kernel: [<ffffffff81051018>] warn_slowpath_fmt+0x46/0x48
Aug 20 01:38:57 hq2 kernel: [<ffffffff813ffef5>] dev_watchdog+0xfb/0x169
Aug 20 01:38:57 hq2 kernel: [<ffffffff8105dae5>] run_timer_softirq+0x23b/0x328
Aug 20 01:38:57 hq2 kernel: [<ffffffff8105da4b>] ? run_timer_softirq+0x1a1/0x328
Aug 20 01:38:57 hq2 kernel: [<ffffffff813ffdfa>] ? dev_watchdog+0x0/0x169
Aug 20 01:38:57 hq2 kernel: [<ffffffff81047cbf>] ? get_parent_ip+0x11/0x41
Aug 20 01:38:57 hq2 kernel: [<ffffffff81057663>] __do_softirq+0x101/0x1df
Aug 20 01:38:57 hq2 kernel: [<ffffffff810786f5>] ? tick_program_event+0x2a/0x2c
Aug 20 01:38:57 hq2 kernel: [<ffffffff8100ab9c>] call_softirq+0x1c/0x30
Aug 20 01:38:57 hq2 kernel: [<ffffffff8100c22b>] do_softirq+0x4b/0xa3
Aug 20 01:38:57 hq2 kernel: [<ffffffff81057224>] irq_exit+0x4a/0x8c
Aug 20 01:38:57 hq2 kernel: [<ffffffff814a23b8>] smp_apic_timer_interrupt+0x8d/0x9b
Aug 20 01:38:57 hq2 kernel: [<ffffffff8100a653>] apic_timer_interrupt+0x13/0x20
Aug 20 01:38:57 hq2 kernel: <EOI>  [<ffffffff810115ba>] ? default_idle+0x36/0x5d
Aug 20 01:38:57 hq2 kernel: [<ffffffff8102c8b9>] ? native_safe_halt+0xb/0xd
Aug 20 01:38:57 hq2 kernel: [<ffffffff8107c4e8>] ? trace_hardirqs_on+0xd/0xf
Aug 20 01:38:57 hq2 kernel: [<ffffffff810115bf>] default_idle+0x3b/0x5d
Aug 20 01:38:57 hq2 kernel: [<ffffffff81008c01>] cpu_idle+0xaf/0xe9
Aug 20 01:38:57 hq2 kernel: [<ffffffff8148184b>] rest_init+0xcf/0xd6
Aug 20 01:38:57 hq2 kernel: [<ffffffff8148177c>] ? rest_init+0x0/0xd6
Aug 20 01:38:57 hq2 kernel: [<ffffffff81d78ece>] start_kernel+0x447/0x452
Aug 20 01:38:57 hq2 kernel: [<ffffffff81d782c8>] x86_64_start_reservations+0xb3/0xb7
Aug 20 01:38:57 hq2 kernel: [<ffffffff81d783c4>] x86_64_start_kernel+0xf8/0x107
Aug 20 01:38:57 hq2 kernel: ---[ end trace ab918d46f26f7859 ]---
Aug 20 01:38:57 hq2 kernel: e1000e 0000:02:00.0: eth1: Reset adapter
Aug 20 01:40:01 hq2 abrt: Kerneloops: Reported 1 kernel oopses to Abrt
Aug 20 01:40:01 hq2 abrtd: Directory 'kerneloops-1282282801-1' creation detected
Aug 20 01:40:01 hq2 abrtd: New crash /var/cache/abrt/kerneloops-1282282801-1, processing
Aug 20 01:40:01 hq2 abrtd: RunApp('/var/cache/abrt/kerneloops-1282282801-1','test x"`cat component`" = x"xorg-x11-server-Xorg" && cp /var/log/Xorg.0.log .')
Aug 20 01:44:41 hq2 kernel: ADDRCONF(NETDEV_UP): eth1: link is not ready
Aug 20 01:55:35 hq2 kernel: IPv6 over IPv4 tunneling driver
Aug 20 01:55:35 hq2 kernel: sit0: Disabled Privacy Extensions
Aug 20 01:55:35 hq2 kernel: lo: Disabled Privacy Extensions
Aug 20 01:55:36 hq2 kernel: ADDRCONF(NETDEV_UP): eth0: link is not ready
Aug 20 01:55:37 hq2 kernel: e1000e: eth0 NIC Link is Up 100 Mbps Full Duplex, Flow Control: None
Aug 20 01:55:37 hq2 kernel: e1000e 0000:03:00.0: eth0: 10/100 speed: disabling TSO
Aug 20 01:55:37 hq2 kernel: ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready
Aug 20 01:55:40 hq2 kernel: ADDRCONF(NETDEV_UP): eth1: link is not ready
Aug 20 01:57:27 hq2 kernel: Kernel logging (proc) stopped.




Aug 20 02:04:03 hq2 kernel: fuse init (API version 7.14)
Aug 20 02:05:44 hq2 kernel: ------------[ cut here ]------------
Aug 20 02:05:44 hq2 kernel: WARNING: at net/sched/sch_generic.c:258 dev_watchdog+0xfb/0x169()
Aug 20 02:05:44 hq2 kernel: Hardware name: H8SGL
Aug 20 02:05:44 hq2 kernel: NETDEV WATCHDOG: eth0 (e1000e): transmit queue 0 timed out
Aug 20 02:05:44 hq2 kernel: Modules linked in: fuse sunrpc tun cpufreq_ondemand powernow_k8 freq_table mperf ip6t_REJECT nf_conntrack_ipv6 ip6table_filter ip6_tables ipv6 kvm_amd kvm uinput i2c_piix4 amd64_edac_mod edac_core i2c_core edac_mce_amd e1000e k10temp microcode btrfs zlib_deflate libcrc32c raid1 pata_acpi ata_generic usb_storage sata_promise pata_atiixp sata_sil24 [last unloaded: scsi_wait_scan]
Aug 20 02:05:44 hq2 kernel: Pid: 0, comm: swapper Not tainted 2.6.35.2-9.fc13.x86_64 #1
Aug 20 02:05:44 hq2 kernel: Call Trace:
Aug 20 02:05:44 hq2 kernel: <IRQ>  [<ffffffff81050f5d>] warn_slowpath_common+0x85/0x9d
Aug 20 02:05:44 hq2 kernel: [<ffffffff81051018>] warn_slowpath_fmt+0x46/0x48
Aug 20 02:05:44 hq2 kernel: [<ffffffff813ffef5>] dev_watchdog+0xfb/0x169
Aug 20 02:05:44 hq2 kernel: [<ffffffff8105dae5>] run_timer_softirq+0x23b/0x328
Aug 20 02:05:44 hq2 kernel: [<ffffffff8105da4b>] ? run_timer_softirq+0x1a1/0x328
Aug 20 02:05:44 hq2 kernel: [<ffffffff813ffdfa>] ? dev_watchdog+0x0/0x169
Aug 20 02:05:44 hq2 kernel: [<ffffffff81047cbf>] ? get_parent_ip+0x11/0x41
Aug 20 02:05:44 hq2 kernel: [<ffffffff81057663>] __do_softirq+0x101/0x1df
Aug 20 02:05:44 hq2 kernel: [<ffffffff810786f5>] ? tick_program_event+0x2a/0x2c
Aug 20 02:05:44 hq2 kernel: [<ffffffff8100ab9c>] call_softirq+0x1c/0x30
Aug 20 02:05:44 hq2 kernel: [<ffffffff8100c22b>] do_softirq+0x4b/0xa3
Aug 20 02:05:44 hq2 kernel: [<ffffffff81057224>] irq_exit+0x4a/0x8c
Aug 20 02:05:44 hq2 kernel: [<ffffffff814a23b8>] smp_apic_timer_interrupt+0x8d/0x9b
Aug 20 02:05:44 hq2 kernel: [<ffffffff8100a653>] apic_timer_interrupt+0x13/0x20
Aug 20 02:05:44 hq2 kernel: <EOI>  [<ffffffff810115ba>] ? default_idle+0x36/0x5d
Aug 20 02:05:44 hq2 kernel: [<ffffffff8102c8b9>] ? native_safe_halt+0xb/0xd
Aug 20 02:05:44 hq2 kernel: [<ffffffff8107c4e8>] ? trace_hardirqs_on+0xd/0xf
Aug 20 02:05:44 hq2 kernel: [<ffffffff810115bf>] default_idle+0x3b/0x5d
Aug 20 02:05:44 hq2 kernel: [<ffffffff81008c01>] cpu_idle+0xaf/0xe9
Aug 20 02:05:44 hq2 kernel: [<ffffffff8148184b>] rest_init+0xcf/0xd6
Aug 20 02:05:44 hq2 kernel: [<ffffffff8148177c>] ? rest_init+0x0/0xd6
Aug 20 02:05:44 hq2 kernel: [<ffffffff81d78ece>] start_kernel+0x447/0x452
Aug 20 02:05:44 hq2 kernel: [<ffffffff81d782c8>] x86_64_start_reservations+0xb3/0xb7
Aug 20 02:05:44 hq2 kernel: [<ffffffff81d783c4>] x86_64_start_kernel+0xf8/0x107
Aug 20 02:05:44 hq2 kernel: ---[ end trace 72c1332cb7a95f2a ]---
Aug 20 02:05:44 hq2 kernel: e1000e 0000:03:00.0: eth0: Reset adapter
Aug 20 02:07:17 hq2 kernel: usb 1-1: USB disconnect, address 2
Aug 20 02:07:17 hq2 kernel: usb 1-1.2: USB disconnect, address 3
Aug 20 02:07:17 hq2 kernel: usb 1-1.3: USB disconnect, address 4
Aug 20 02:10:30 hq2 kernel: IPv6 over IPv4 tunneling driver
Aug 20 02:10:30 hq2 kernel: sit0: Disabled Privacy Extensions
Aug 20 02:10:30 hq2 kernel: Kernel logging (proc) stopped.

Comment 1 Brian 2010-09-02 04:27:49 UTC

So far, so good with:
     2.6.36-0.11.rc2.git5.fc13.x86_64

Comment 2 Brian 2010-12-16 17:58:09 UTC

Back again with kernel-2.6.37-0.rc5.git2.1  (downloaded from koji, unmodified)

Trying to rmmod the e1000e module and then modprobe it back in doesn't bring the NIC back and produces this error:
  kernel: [163373.663114] e1000e: probe of 0000:03:00.0 failed with error -2



backtrace:

[162536.704103] WARNING: at net/sched/sch_generic.c:258 dev_watchdog+0x111/0x185()
[162536.704110] Hardware name: H8SGL
[162536.704116] NETDEV WATCHDOG: eth0 (e1000e): transmit queue 0 timed out
[162536.704122] Modules linked in: ip6table_filter ebtable_nat ebtables act_police cls_flow cls_fw cls_u32 sch_htb sch_hfsc sch_ingress sch_sfq bridge stp llc xt_time xt_connlimit xt_realm iptable_raw xt_comment xt_recent xt_policy ipt_ULOG ipt_REDIRECT ipt_NETMAP ipt_MASQUERADE ipt_ECN ipt_ecn ipt_CLUSTERIP ipt_ah ipt_addrtype nf_nat_tftp nf_nat_snmp_basic nf_nat_sip nf_nat_pptp nf_nat_proto_gre nf_nat_irc nf_nat_h323 nf_nat_ftp nf_nat_amanda ts_kmp nf_conntrack_amanda nf_conntrack_sane nf_conntrack_tftp nf_conntrack_sip nf_conntrack_proto_sctp nf_conntrack_pptp nf_conntrack_proto_gre nf_conntrack_netlink nf_conntrack_netbios_ns nf_conntrack_irc nf_conntrack_h323 nf_conntrack_ftp xt_TPROXY nf_tproxy_core ip6_tables nf_defrag_ipv6 xt_tcpmss xt_pkttype xt_physdev xt_owner xt_NFQUEUE xt_NFLOG nfnetlink_log xt_multiport xt_mark xt_mac xt_limit xt_length xt_iprange xt_helper xt_hashlimit xt_DSCP xt_dscp xt_dccp xt_connmark xt_CLASSIFY ipt_LOG iptable_nat nf_nat iptable_mangle nfnetlink
fuse tun sunrpc cpufreq_ondemand powernow_k8 freq_table mperf ipv6 kvm_amd kvm uinput amd64_edac_mod edac_core e1000e edac_mce_amd i2c_piix4 i2c_core k10temp ghes hed joydev microcode btrfs zlib_deflate libcrc32c raid1 pata_acpi ata_generic sata_promise pata_atiixp sata_sil24 uas usb_storage [last unloaded: scsi_wait_scan]
[162536.704368] Pid: 0, comm: kworker/0:0 Not tainted 2.6.37-0.rc5.git2.1.fc13.x86_64 #1
[162536.704375] Call Trace:
[162536.704380]  <IRQ>  [<ffffffff81052e29>] warn_slowpath_common+0x85/0x9d
[162536.704406]  [<ffffffff81052ee4>] warn_slowpath_fmt+0x46/0x48
[162536.704418]  [<ffffffff81419f70>] dev_watchdog+0x111/0x185
[162536.704430]  [<ffffffff8105fdcf>] run_timer_softirq+0x246/0x33f
[162536.704441]  [<ffffffff8105fd29>] ? run_timer_softirq+0x1a0/0x33f
[162536.704452]  [<ffffffff81419e5f>] ? dev_watchdog+0x0/0x185
[162536.704463]  [<ffffffff81059746>] __do_softirq+0x101/0x20c
[162536.704474]  [<ffffffff81011aa6>] ? native_sched_clock+0x2d/0x5f
[162536.704484]  [<ffffffff8100bc1c>] call_softirq+0x1c/0x30
[162536.704493]  [<ffffffff8100d29f>] do_softirq+0x4b/0xa3
[162536.704502]  [<ffffffff81059480>] irq_exit+0x57/0x9b
[162536.704514]  [<ffffffff814bf93c>] smp_apic_timer_interrupt+0x8d/0x9b
[162536.704523]  [<ffffffff8100b6d3>] apic_timer_interrupt+0x13/0x20
[162536.704529]  <EOI>  [<ffffffff81012686>] ? default_idle+0x3e/0x65
[162536.704549]  [<ffffffff8102d159>] ? native_safe_halt+0xb/0xd
[162536.704560]  [<ffffffff810816ee>] ? trace_hardirqs_on+0xd/0xf
[162536.704571]  [<ffffffff8101268b>] default_idle+0x43/0x65
[162536.704581]  [<ffffffff81009c81>] cpu_idle+0xbe/0x132
[162536.704592]  [<ffffffff814af9ea>] start_secondary+0x242/0x244

Comment 3 Peter Hanecak 2011-01-24 21:28:08 UTC

I've got same or very similar with any kernel after 2.6.33.8-149.fc13.i686 . This one is from 2.6.34.7-66.fc13.i686:

Jan 24 22:11:03 server kernel: ------------[ cut here ]------------
Jan 24 22:11:03 server kernel: WARNING: at net/sched/sch_generic.c:256 dev_watchdog+0xc6/0x154()
Jan 24 22:11:03 server kernel: Hardware name: A9830IMS
Jan 24 22:11:03 server kernel: NETDEV WATCHDOG: eth0 (e1000e): transmit queue 0 timed out
Jan 24 22:11:03 server kernel: Modules linked in: nfsd lockd nfs_acl auth_rpcgss exportfs f71882fg sunrpc authenc esp4 ah4 xfrm4_mode_transport cpufreq_ondemand acpi_cpufreq deflate zlib_deflate ctr twofish twofish_common camellia serpen
t blowfish cast5 des_generic cbc aes_i586 aes_generic xcbc rmd160 sha512_generic sha256_generic crypto_null af_key ipt_LOG xt_owner ip6t_REJECT ip6table_filter ip6_tables ipv6 iTCO_wdt iTCO_vendor_support e1000e i2c_i801 joydev serio_raw
 raid1 i915 drm_kms_helper drm i2c_algo_bit i2c_core usb_storage video output [last unloaded: scsi_wait_scan]
Jan 24 22:11:03 server kernel: Pid: 0, comm: swapper Not tainted 2.6.34.7-66.fc13.i686 #1
Jan 24 22:11:03 server kernel: Call Trace:
Jan 24 22:11:03 server kernel: [<c04387ae>] warn_slowpath_common+0x6a/0x81
Jan 24 22:11:03 server kernel: [<c0714184>] ? dev_watchdog+0xc6/0x154
Jan 24 22:11:03 server kernel: [<c0438803>] warn_slowpath_fmt+0x29/0x2c
Jan 24 22:11:03 server kernel: [<c0714184>] dev_watchdog+0xc6/0x154
Jan 24 22:11:03 server kernel: [<c0450069>] ? hrtimer_forward+0x114/0x128
Jan 24 22:11:03 server kernel: [<c0418424>] ? lapic_next_event+0x1b/0x1f
Jan 24 22:11:03 server kernel: [<c0442cea>] run_timer_softirq+0x167/0x1e9
Jan 24 22:11:03 server kernel: [<c045930f>] ? tick_dev_program_event+0x33/0x113
Jan 24 22:11:03 server kernel: [<c07140be>] ? dev_watchdog+0x0/0x154
Jan 24 22:11:03 server kernel: [<c043dbe7>] __do_softirq+0xab/0x14d
Jan 24 22:11:03 server kernel: [<c043dcbf>] do_softirq+0x36/0x41
Jan 24 22:11:03 server kernel: [<c043dddd>] irq_exit+0x2e/0x61
Jan 24 22:11:03 server kernel: [<c0418df5>] smp_apic_timer_interrupt+0x73/0x81
Jan 24 22:11:03 server kernel: [<c07902e5>] apic_timer_interrupt+0x31/0x38
Jan 24 22:11:03 server kernel: [<c040940e>] ? mwait_idle+0x61/0x6c
Jan 24 22:11:03 server kernel: [<c0402614>] cpu_idle+0x96/0xb2
Jan 24 22:11:03 server kernel: [<c077c877>] rest_init+0x67/0x69
Jan 24 22:11:03 server kernel: [<c09d4987>] start_kernel+0x350/0x355
Jan 24 22:11:03 server kernel: [<c09d40c9>] i386_start_kernel+0xc9/0xd0
Jan 24 22:11:03 server kernel: ---[ end trace 5c56a6c10bf4f628 ]---

When booted to thjat old kernel (2.6.33.8-149.fc13.i686) PC is OK. Witn newer kernels traffic on eth0 stops 2-3 minutes after boot.

The other information I do have about it is that ifconfig is reporting A LOT of collisions (millions ...).

Comment 4 Peter Hanecak 2011-01-24 21:29:51 UTC

Additional info:

My board have two NICs. Same problem is happening with any of those.

# lspci -vvv|less
01:00.0 Ethernet controller: Intel Corporation 82574L Gigabit Network Connection
        Subsystem: Micro-Star International Co., Ltd. Device 9830
        Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+
        Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
        Latency: 0, Cache Line Size: 64 bytes
        Interrupt: pin A routed to IRQ 16
        Region 0: Memory at feae0000 (32-bit, non-prefetchable) [size=128K]
        Region 2: I/O ports at dc80 [size=32]
        Region 3: Memory at feadc000 (32-bit, non-prefetchable) [size=16K]
        Capabilities: [c8] Power Management version 2
                Flags: PMEClk- DSI+ D1- D2- AuxCurrent=0mA PME(D0+,D1-,D2-,D3hot+,D3cold+)
                Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=1 PME-
        Capabilities: [d0] MSI: Enable- Count=1/1 Maskable- 64bit+
                Address: 0000000000000000  Data: 0000
        Capabilities: [e0] Express (v1) Endpoint, MSI 00
                DevCap: MaxPayload 256 bytes, PhantFunc 0, Latency L0s <512ns, L1 <64us
                        ExtTag- AttnBtn- AttnInd- PwrInd- RBE+ FLReset-
                DevCtl: Report errors: Correctable+ Non-Fatal+ Fatal+ Unsupported+
                        RlxdOrd+ ExtTag- PhantFunc- AuxPwr- NoSnoop+
                        MaxPayload 128 bytes, MaxReadReq 512 bytes
                DevSta: CorrErr+ UncorrErr- FatalErr- UnsuppReq+ AuxPwr+ TransPend-
                LnkCap: Port #0, Speed 2.5GT/s, Width x1, ASPM L0s L1, Latency L0 <128ns, L1 <64us
                        ClockPM- Surprise- LLActRep- BwNot-
                LnkCtl: ASPM L0s Enabled; RCB 64 bytes Disabled- Retrain- CommClk+
                        ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
                LnkSta: Speed 2.5GT/s, Width x1, TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
        Capabilities: [a0] MSI-X: Enable+ Count=3 Masked-
                Vector table: BAR=3 offset=00000000
                PBA: BAR=3 offset=00002000
        Capabilities: [100 v1] Advanced Error Reporting
                UESta:  DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
                UEMsk:  DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
                UESvrt: DLP+ SDES- TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt- RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol-
                CESta:  RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr+
                CEMsk:  RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr+
                AERCap: First Error Pointer: 00, GenCap- CGenEn- ChkCap- ChkEn-
        Kernel driver in use: e1000e
        Kernel modules: e1000e

02:00.0 Ethernet controller: Intel Corporation 82574L Gigabit Network Connection
        Subsystem: Micro-Star International Co., Ltd. Device 9830
        Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+
        Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
        Latency: 0, Cache Line Size: 64 bytes
        Interrupt: pin A routed to IRQ 17
        Region 0: Memory at febe0000 (32-bit, non-prefetchable) [size=128K]
        Region 2: I/O ports at ec80 [size=32]
        Region 3: Memory at febdc000 (32-bit, non-prefetchable) [size=16K]
        Capabilities: [c8] Power Management version 2
                Flags: PMEClk- DSI+ D1- D2- AuxCurrent=0mA PME(D0+,D1-,D2-,D3hot+,D3cold+)
                Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=1 PME-
        Capabilities: [d0] MSI: Enable- Count=1/1 Maskable- 64bit+
                Address: 0000000000000000  Data: 0000
        Capabilities: [e0] Express (v1) Endpoint, MSI 00
                DevCap: MaxPayload 256 bytes, PhantFunc 0, Latency L0s <512ns, L1 <64us
                        ExtTag- AttnBtn- AttnInd- PwrInd- RBE+ FLReset-
                DevCtl: Report errors: Correctable+ Non-Fatal+ Fatal+ Unsupported+
                        RlxdOrd+ ExtTag- PhantFunc- AuxPwr- NoSnoop+
                        MaxPayload 128 bytes, MaxReadReq 512 bytes
                DevSta: CorrErr+ UncorrErr- FatalErr- UnsuppReq+ AuxPwr+ TransPend-
                LnkCap: Port #0, Speed 2.5GT/s, Width x1, ASPM L0s L1, Latency L0 <128ns, L1 <64us
                        ClockPM- Surprise- LLActRep- BwNot-
                LnkCtl: ASPM L0s Enabled; RCB 64 bytes Disabled- Retrain- CommClk+
                        ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
                LnkSta: Speed 2.5GT/s, Width x1, TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
        Capabilities: [a0] MSI-X: Enable+ Count=3 Masked-
                Vector table: BAR=3 offset=00000000
                PBA: BAR=3 offset=00002000
        Capabilities: [100 v1] Advanced Error Reporting
                UESta:  DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
                UEMsk:  DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
                UESvrt: DLP+ SDES- TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt- RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol-
                CESta:  RxErr+ BadTLP+ BadDLLP+ Rollover- Timeout+ NonFatalErr+
                CEMsk:  RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr+
                AERCap: First Error Pointer: 00, GenCap- CGenEn- ChkCap- ChkEn-
        Kernel driver in use: e1000e
        Kernel modules: e1000e

...

Comment 5 CrystalCowboy 2011-02-08 21:20:30 UTC

Experiencing problems under Fedora 14
kernel 2.6.35.10-74.fc14.x86_64

Network chip is 82574L

Running lots of stuff (ntp, dhcp, nfs, rsync) but interface dies even if traffic is light. Have rebooted a dozen times today.

WARNING: at net/sched/sch_generic.c:258 dev_watchdog+0xf3/0x167()
Hardware name: empty
NETDEV WATCHDOG: eth0 (e1000e): transmit queue 0 timed out
Modules linked in: fuse deflate zlib_deflate ctr camellia cast5 rmd160 crypto_null ccm serpent blowfish twofish_x86_64 twofish_common ecb xcbc cbc sha256_generic sha512_generic des_generic cryptd aes_x86_64 aes_generic ah6 ah4 esp6 esp4 xfrm4_mode_beet xfrm4_tunnel tunnel4 xfrm4_mode_tunnel xfrm4_mode_transport xfrm6_mode_transport xfrm6_mode_ro xfrm6_mode_beet xfrm6_mode_tunnel ipcomp ipcomp6 xfrm_ipcomp xfrm6_tunnel tunnel6 af_key nfsd exportfs nfs lockd fscache nfs_acl auth_rpcgss sunrpc cpufreq_ondemand powernow_k8 freq_table mperf ipv6 snd_hda_codec_atihdmi snd_hda_codec_realtek snd_hda_intel snd_hda_codec snd_hwdep snd_seq snd_seq_device ftdi_sio snd_pcm amd64_edac_mod k10temp microcode usbserial joydev i2c_piix4 serio_raw edac_core edac_mce_amd snd_timer e1000e igb snd soundcore snd_page_alloc dca raid1 pata_acpi ata_generic firewire_ohci firewire_core crc_itu_t pata_atiixp mpt2sas scsi_transport_sas raid_class radeon ttm drm_kms_helper drm i2c_algo_bit i2c_co
re [last unloaded: scsi_wait_scan]
Pid: 0, comm: swapper Not tainted 2.6.35.10-74.fc14.x86_64 #1
Call Trace:
 <IRQ>  [<ffffffff8104d999>] warn_slowpath_common+0x85/0x9d
 [<ffffffff8104da54>] warn_slowpath_fmt+0x46/0x48
 [<ffffffff813d5906>] ? netif_tx_lock+0x44/0x6d
 [<ffffffff813d5a70>] dev_watchdog+0xf3/0x167
 [<ffffffff81469c5f>] ? _raw_spin_unlock_irqrestore+0x17/0x19
 [<ffffffff81062bed>] ? __queue_work+0x3a/0x43
 [<ffffffff81059e28>] run_timer_softirq+0x1d6/0x2a3
 [<ffffffff813d597d>] ? dev_watchdog+0x0/0x167
 [<ffffffff81021ff6>] ? apic_write+0x16/0x18
 [<ffffffff81053a39>] __do_softirq+0xdd/0x199
 [<ffffffff810726e4>] ? tick_dev_program_event+0x36/0xf4
 [<ffffffff810727cc>] ? tick_program_event+0x2a/0x2c
 [<ffffffff8100abdc>] call_softirq+0x1c/0x30
 [<ffffffff8100c338>] do_softirq+0x46/0x82
 [<ffffffff81053b99>] irq_exit+0x3b/0x7d
 [<ffffffff8146fc1a>] smp_apic_timer_interrupt+0x7e/0x8c
 [<ffffffff8100a693>] apic_timer_interrupt+0x13/0x20
 <EOI>  [<ffffffff8102b7dd>] ? native_safe_halt+0xb/0xd
 [<ffffffff81010f03>] ? need_resched+0x23/0x2d
 [<ffffffff8101102a>] default_idle+0x34/0x4f
 [<ffffffff81008325>] cpu_idle+0xaa/0xcc
 [<ffffffff81451906>] rest_init+0x8a/0x8c
 [<ffffffff81ba1c49>] start_kernel+0x40b/0x416
 [<ffffffff81ba12c6>] x86_64_start_reservations+0xb1/0xb5
 [<ffffffff81ba13c2>] x86_64_start_kernel+0xf8/0x107

Comment 6 Seth Bardash 2011-02-18 14:44:08 UTC

Installed Fedora 13 and Fedora 14 x86_64 on a Dual CPU AMD 6136 system based on the ASUS KGPE-D16 which has 2 Intel 82574L's. Same problems as reported above. Then installed updated kernels from fedora updates. No change. Installed Intel's latest driver from source. No change. Went to opensuse 11.1 (kernel 2.6.27.56-0.1) which seems to work fine, the network connections have been reliable since last week. Somewhere between 2.6.27.56 and the fedora 13 kernel this driver changed in introduced a bug that causes the system to no only lose the device but required a reset and / or a reboot to find them again. Since there seems to be multiple motherboards listed above, something is / has been changed in this driver to cause this issue. The errors I see all have this in common: 

net/sched/sch_generic.c:258 dev_watchdog

Comment 7 Bug Zapper 2011-06-01 11:02:59 UTC

This message is a reminder that Fedora 13 is nearing its end of life.
Approximately 30 (thirty) days from now Fedora will stop maintaining
and issuing updates for Fedora 13.  It is Fedora's policy to close all
bug reports from releases that are no longer maintained.  At that time
this bug will be closed as WONTFIX if it remains open with a Fedora 
'version' of '13'.

Package Maintainer: If you wish for this bug to remain open because you
plan to fix it in a currently maintained version, simply change the 'version' 
to a later Fedora version prior to Fedora 13's end of life.

Bug Reporter: Thank you for reporting this issue and we are sorry that 
we may not be able to fix it before Fedora 13 is end of life.  If you 
would still like to see this bug fixed and are able to reproduce it 
against a later version of Fedora please change the 'version' of this 
bug to the applicable version.  If you are unable to change the version, 
please add a comment here and someone will do it for you.

Although we aim to fix as many bugs as possible during every release's 
lifetime, sometimes those efforts are overtaken by events.  Often a 
more recent Fedora release includes newer upstream software that fixes 
bugs or makes them obsolete.

The process we are following is described here: 
http://fedoraproject.org/wiki/BugZappers/HouseKeeping

Comment 8 Bug Zapper 2011-06-28 14:21:09 UTC

Fedora 13 changed to end-of-life (EOL) status on 2011-06-25. Fedora 13 is 
no longer maintained, which means that it will not receive any further 
security or bug fix updates. As a result we are closing this bug.

If you can reproduce this bug against a currently maintained version of 
Fedora please feel free to reopen this bug against that version.

Thank you for reporting this bug and we are sorry it could not be fixed.

Comment 9 Jan "Yenya" Kasprzak 2011-12-02 16:34:10 UTC

Reopening: this bug is still present in F16. I have reproduced it on both ports of my server. 

# lspci|grep Ether
01:00.0 Ethernet controller: Intel Corporation 82574L Gigabit Network Connection
02:00.0 Ethernet controller: Intel Corporation 82574L Gigabit Network Connection

[115886.720052] ------------[ cut here ]------------
[115886.720389] WARNING: at net/sched/sch_generic.c:255 dev_watchdog+0xf0/0x150()
[115886.721022] Hardware name: H8SGL
[115886.721332] NETDEV WATCHDOG: eth0 (e1000e): transmit queue 0 timed out
[115886.721654] Modules linked in: fuse btrfs zlib_deflate libcrc32c ufs qnx4 hfsplus hfs minix vfat msdos fat jfs xfs reiserfs e1000e lockd nf_conntrack_ipv4 nf_defrag_ipv4 ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 xt_state nf_conntrack ip6table_filter ip6_tables joydev microcode k10temp amd64_edac_mod edac_core edac_mce_amd sp5100_tco i2c_piix4 i2c_core sunrpc raid10 ata_generic pata_acpi raid1 pata_atiixp [last unloaded: e1000e]
[115886.723708] Pid: 0, comm: kworker/0:0 Not tainted 3.1.2-1.fc16.x86_64 #1
[115886.724041] Call Trace:
[115886.724348]  <IRQ>  [<ffffffff81057a1e>] warn_slowpath_common+0x83/0x9b
[115886.724682]  [<ffffffff81057ad9>] warn_slowpath_fmt+0x46/0x48
[115886.725027]  [<ffffffff813f0955>] ? netif_tx_lock+0x4a/0x7c
[115886.725350]  [<ffffffff813f0acb>] dev_watchdog+0xf0/0x150
[115886.725672]  [<ffffffff81064b19>] run_timer_softirq+0x19b/0x280
[115886.725995]  [<ffffffff81014fec>] ? sched_clock+0x9/0xd
[115886.726327]  [<ffffffff813f09db>] ? netif_tx_unlock+0x54/0x54
[115886.726649]  [<ffffffff8105d67b>] __do_softirq+0xc9/0x1b5
[115886.726969]  [<ffffffff81014b35>] ? paravirt_read_tsc+0x9/0xd
[115886.727272]  [<ffffffff81014fec>] ? sched_clock+0x9/0xd
[115886.727567]  [<ffffffff814bfb6c>] call_softirq+0x1c/0x30
[115886.727860]  [<ffffffff81010b45>] do_softirq+0x46/0x81
[115886.728166]  [<ffffffff8105d943>] irq_exit+0x57/0xb1
[115886.728458]  [<ffffffff814c04e1>] smp_apic_timer_interrupt+0x7c/0x8a
[115886.728776]  [<ffffffff814be3de>] apic_timer_interrupt+0x6e/0x80
[115886.729103]  <EOI>  [<ffffffff8102f2f1>] ? native_safe_halt+0xb/0xd
[115886.729433]  [<ffffffff81015b7e>] default_idle+0x4e/0x86
[115886.729754]  [<ffffffff8100e2ed>] cpu_idle+0xae/0xe8
[115886.730086]  [<ffffffff814a54d9>] start_secondary+0x23f/0x241
[115886.730406] ---[ end trace 0992468ae984f23f ]---

I have tried rmmod e1000e, and modprobe e1000e, but it could not see one of the ports anymore. I had to reboot the machine in order to get the network connection.

Comment 10 Peter Hanecak 2011-12-02 17:38:17 UTC

I have a Fedora 14 machine which is still running on kernel-2.6.33.8-149.fc13.i686 due to this bug. All Fedora 14 kernels so far were affected by this issue.

But, it seems like kernel-2.6.35.14-103.fc14.i686 and kernel-2.6.35.14-106.fc14.i686 might NOT be affected. I've tried them and the bug did not occur. But it was only for a few hours and with almost no work being done on the box.

So, I'll test more and give more details.

Comment 11 Jan "Yenya" Kasprzak 2011-12-02 21:04:52 UTC

On my system the bug manifests itself only after several days of relatively heavy traffic (but usually under a week), so I would not dare to make conclusions after few hours of uptime.

Anyway, I have found a similar upstream report:
http://sourceforge.net/tracker/index.php?func=detail&aid=3404265&group_id=42302&atid=447449
They suggest to use IntMode=1 parameter on platforms with poor/missing MSI-X suport. I will try it. My mainboard is Supermicro H8SGL.

Comment 12 Peter Hanecak 2011-12-06 14:20:48 UTC

(In reply to comment #11)

kernel-2.6.35.14-106.fc14.i686 failed too after few hours.

Then I tried that kernel with IntMode=1 and ... error occured too, again after few hours.

Comment 13 Harm Elzinga 2012-01-29 09:09:50 UTC

We have a system with 2 82574L network connections as well. The system works well, but with heavy traffic once in a while the following error occurs:

kernel: [551299.157900] e1000e 0000:04:00.0: em1: Reset adapter
kernel: [551301.935449] e1000e: em1 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: None

Sometimes 20 times in a hour, sometimes once a day. When it occurs the machine is not reachable (as you would expected) and will start "working" again after a couple of seconds. I believe this might be the same bug.

The machine is running kernel:
config-2.6.41.9-1.fc15.x86_64

Is the driver supplied with FC15 / FC16 the latest from Intel?

Comment 14 Yoshinobu Ushida 2012-02-13 11:57:23 UTC

I have a Fedora 14 machine.(two 82574L NICs) This problem also occurred in my machine.

Then, I was add the kernel parameter pcie_aspm=off.

Before add, interface down had occurred once a week. After adding the kernel parameter, my machine is running for 40 days.

Comment 15 Harm Elzinga 2012-02-20 09:08:53 UTC

I tried the pcie_aspm=off kernel parameter in the grub configuration and rebooted.
The log states it is off, but the problem is still there. :(

I also tried resetting the switch it is connected to, but that didn't help either, and it is the only machine having this problem.

Comment 16 Dave Jones 2012-03-22 16:42:38 UTC

[mass update]
kernel-3.3.0-4.fc16 has been pushed to the Fedora 16 stable repository.
Please retest with this update.

Comment 17 Dave Jones 2012-03-22 16:47:04 UTC

[mass update]
kernel-3.3.0-4.fc16 has been pushed to the Fedora 16 stable repository.
Please retest with this update.

Comment 18 Dave Jones 2012-03-22 16:56:36 UTC

[mass update]
kernel-3.3.0-4.fc16 has been pushed to the Fedora 16 stable repository.
Please retest with this update.

Comment 19 Benjamin S. Scarlet 2012-05-06 09:24:23 UTC

I'm seeing this problem on a Supermicro H8DI3+-F. Something about my setup and load makes it occur in a matter of hours, not days. For me, it still occurs on kernel-3.3.4-1.fc16, but passing the kernel pcie_aspm=off seems to make it go away.

Comment 20 Josh Boyer 2012-05-07 15:15:45 UTC

There's an upstream conversation for this here:

http://thread.gmane.org/gmane.linux.kernel/1233566

toward the end of the thread, there are some patches posted.  We'll keep an eye on them.

Comment 21 Kyle Brantley 2012-05-27 02:18:07 UTC

It looks like this is also impacting RHEL (CentOS).

http://lists.soekris.com/pipermail/soekris-tech/2011-October/017822.html

The piece of hardware mentioned in that thread (the Soekris net6501) has four 82574 NICs, and is headless. PXE is the primary method of installation, so this prevents it from being installed at all. Even if installed via an alternative method, then I wind up with four useless NICs.

Comment 22 Josh Boyer 2012-06-05 20:19:23 UTC

(In reply to comment #20)
> There's an upstream conversation for this here:
> 
> http://thread.gmane.org/gmane.linux.kernel/1233566
> 
> toward the end of the thread, there are some patches posted.  We'll keep an
> eye on them.

The main patch went into 3.5-rc1 with commit d4a4206ebbaf48b55803a7eb34e330530d83a889.  The current rawhide kernel has that commit it in if anyone wants to test.

(In reply to comment #21)
> It looks like this is also impacting RHEL (CentOS).

RHEL customers should report issues via the proper RHEL channels.

Comment 23 Josh Boyer 2012-09-05 13:38:14 UTC

The fix mentioned in comment #22 went into 3.4.4 as well.  If this is still being seen with the most recent kernel, please reopen and attach the backtrace.

Comment 24 Brian 2012-12-06 21:35:30 UTC

I totally forgot I opened this bug . . . but Google brought me back here.

I put pcie_aspm=off  in the kernel boot line, but it hasn't appeared to fix the problem completely.  Now I'm getting port "hiccups" even though the switches aren't missing a beat.

There seems to be no rhyme or reason to how often the ports disable or to why.  Also, abrt isn't showing any crash files and I'm not seeing any dumps noted in /var/log/messages.

I'm open to suggestions on how to aggregate more meaningful info.

Running stock F17
Linux hq2 3.5.2-1.fc17.x86_64 #1 SMP Wed Aug 15 16:09:27 UTC 2012 x86_64 x86_64 x86_64 GNU/Linux

[~]# grep -A2 "Reset adapter" /var/log/messages
--
Dec  5 10:17:22 hq2 kernel: [9040718.750991] e1000e 0000:03:00.0: eth0: Reset adapter
Dec  5 10:17:22 hq2 kernel: [9040718.771643] br0: port 1(eth0) entered disabled state
Dec  5 10:17:23 hq2 kernel: [9040720.578149] e1000e: eth0 NIC Link is Up 100 Mbps Full Duplex, Flow Control: None
--
Dec  5 10:18:39 hq2 kernel: [9040795.766370] e1000e 0000:03:00.0: eth0: Reset adapter
Dec  5 10:18:39 hq2 kernel: [9040795.783101] br0: port 1(eth0) entered disabled state
Dec  5 10:18:40 hq2 kernel: [9040797.514533] e1000e: eth0 NIC Link is Up 100 Mbps Full Duplex, Flow Control: None
--
Dec  5 16:09:56 hq2 kernel: [9061874.776157] e1000e 0000:03:00.0: eth0: Reset adapter
Dec  5 16:09:56 hq2 kernel: [9061874.802737] br0: port 1(eth0) entered disabled state
Dec  5 16:09:57 hq2 kernel: [9061876.590246] e1000e: eth0 NIC Link is Up 100 Mbps Full Duplex, Flow Control: None
--
Dec  6 09:21:33 hq2 kernel: [9123777.723927] e1000e 0000:03:00.0: eth0: Reset adapter
Dec  6 09:21:33 hq2 kernel: [9123777.743226] br0: port 1(eth0) entered disabled state
Dec  6 09:21:34 hq2 kernel: [9123779.525124] e1000e: eth0 NIC Link is Up 100 Mbps Full Duplex, Flow Control: None
--
Dec  6 09:23:45 hq2 kernel: [9123909.720529] e1000e 0000:03:00.0: eth0: Reset adapter
Dec  6 09:23:45 hq2 kernel: [9123909.738376] br0: port 1(eth0) entered disabled state
Dec  6 09:23:46 hq2 kernel: [9123911.550694] e1000e: eth0 NIC Link is Up 100 Mbps Full Duplex, Flow Control: None
--
Dec  6 09:30:22 hq2 kernel: [9124306.750407] e1000e 0000:03:00.0: eth0: Reset adapter
Dec  6 09:30:22 hq2 kernel: [9124306.777550] br0: port 1(eth0) entered disabled state
Dec  6 09:30:23 hq2 kernel: [9124308.449522] e1000e: eth0 NIC Link is Up 100 Mbps Full Duplex, Flow Control: None
--
Dec  6 09:37:14 hq2 kernel: [9124718.789657] e1000e 0000:03:00.0: eth0: Reset adapter
Dec  6 09:37:14 hq2 kernel: [9124718.808761] br0: port 1(eth0) entered disabled state
Dec  6 09:37:15 hq2 kernel: [9124720.547813] e1000e: eth0 NIC Link is Up 100 Mbps Full Duplex, Flow Control: None
--
Dec  6 15:56:33 hq2 kernel: [9147479.974278] e1000e 0000:04:00.0: eth1: Reset adapter
Dec  6 15:56:33 hq2 kernel: [9147479.991302] br1: port 1(eth1) entered disabled state
Dec  6 15:56:36 hq2 kernel: [9147483.203583] e1000e: eth1 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: None

Note You need to log in before you can comment on or make changes to this bug.