Description of problem: I don't think there was actually heavy traffic going on. Version-Release number of selected component (if applicable): kernel-2.6.35.2-9.fc13.x86_64 (compiled from fc15 version from koji) How reproducible: Just run the server. Steps to Reproduce: 1. Just run traffic. 2. 3. Actual results: eth1 drops and won't come back Expected results: bliss Additional info: I could attach the kerneloops from /var/cache/abrt/ -- as a tarball? Aug 20 01:38:57 hq2 kernel: ------------[ cut here ]------------ Aug 20 01:38:57 hq2 kernel: WARNING: at net/sched/sch_generic.c:258 dev_watchdog+0xfb/0x169() Aug 20 01:38:57 hq2 kernel: Hardware name: H8SGL Aug 20 01:38:57 hq2 kernel: NETDEV WATCHDOG: eth1 (e1000e): transmit queue 0 timed out Aug 20 01:38:57 hq2 kernel: Modules linked in: ebtable_nat ebtables act_police cls_flow cls_fw cls_u32 sch_htb sch_hfsc sch_ingress sch_sfq bridge stp llc xt_time xt_connlimit xt_realm iptable_raw xt_comment xt_recent xt_policy ipt_ULOG ipt_REDIRECT ipt_NETMAP ipt_MASQUERADE ipt_ECN ipt_ecn ipt_CLUSTERIP ipt_ah ipt_addrtype nf_nat_tftp nf_nat_snmp_basic nf_nat_sip nf_nat_pptp nf_nat_proto_gre nf_nat_irc nf_nat_h323 nf_nat_ftp nf_nat_amanda ts_kmp nf_conntrack_amanda nf_conntrack_sane nf_conntrack_tftp nf_conntrack_sip nf_conntrack_proto_sctp nf_conntrack_pptp nf_conntrack_proto_gre nf_conntrack_netlink nf_conntrack_netbios_ns nf_conntrack_irc nf_conntrack_h323 nf_conntrack_ftp xt_TPROXY nf_tproxy_core xt_tcpmss xt_pkttype xt_physdev xt_owner xt_NFQUEUE xt_NFLOG nfnetlink_log xt_multiport xt_mark xt_mac xt_limit xt_length xt_iprange xt_helper xt_hashlimit xt_DSCP xt_dscp xt_dccp xt_connmark xt_CLASSIFY ipt_LOG iptable_nat nf_nat iptable_mangle nfnetlink fuse sunrpc tun cpufreq_ondemand powernow_k8 freq_table m Aug 20 01:38:57 hq2 kernel: perf ip6t_REJECT nf_conntrack_ipv6 ip6table_filter ip6_tables ipv6 kvm_amd kvm uinput amd64_edac_mod i2c_piix4 edac_core e1000e i2c_core k10temp microcode edac_mce_amd btrfs zlib_deflate libcrc32c raid1 pata_acpi ata_generic usb_storage sata_promise pata_atiixp sata_sil24 [last unloaded: scsi_wait_scan] Aug 20 01:38:57 hq2 kernel: Pid: 0, comm: swapper Not tainted 2.6.35.2-9.fc13.x86_64 #1 Aug 20 01:38:57 hq2 kernel: Call Trace: Aug 20 01:38:57 hq2 kernel: <IRQ> [<ffffffff81050f5d>] warn_slowpath_common+0x85/0x9d Aug 20 01:38:57 hq2 kernel: [<ffffffff81051018>] warn_slowpath_fmt+0x46/0x48 Aug 20 01:38:57 hq2 kernel: [<ffffffff813ffef5>] dev_watchdog+0xfb/0x169 Aug 20 01:38:57 hq2 kernel: [<ffffffff8105dae5>] run_timer_softirq+0x23b/0x328 Aug 20 01:38:57 hq2 kernel: [<ffffffff8105da4b>] ? run_timer_softirq+0x1a1/0x328 Aug 20 01:38:57 hq2 kernel: [<ffffffff813ffdfa>] ? dev_watchdog+0x0/0x169 Aug 20 01:38:57 hq2 kernel: [<ffffffff81047cbf>] ? get_parent_ip+0x11/0x41 Aug 20 01:38:57 hq2 kernel: [<ffffffff81057663>] __do_softirq+0x101/0x1df Aug 20 01:38:57 hq2 kernel: [<ffffffff810786f5>] ? tick_program_event+0x2a/0x2c Aug 20 01:38:57 hq2 kernel: [<ffffffff8100ab9c>] call_softirq+0x1c/0x30 Aug 20 01:38:57 hq2 kernel: [<ffffffff8100c22b>] do_softirq+0x4b/0xa3 Aug 20 01:38:57 hq2 kernel: [<ffffffff81057224>] irq_exit+0x4a/0x8c Aug 20 01:38:57 hq2 kernel: [<ffffffff814a23b8>] smp_apic_timer_interrupt+0x8d/0x9b Aug 20 01:38:57 hq2 kernel: [<ffffffff8100a653>] apic_timer_interrupt+0x13/0x20 Aug 20 01:38:57 hq2 kernel: <EOI> [<ffffffff810115ba>] ? default_idle+0x36/0x5d Aug 20 01:38:57 hq2 kernel: [<ffffffff8102c8b9>] ? native_safe_halt+0xb/0xd Aug 20 01:38:57 hq2 kernel: [<ffffffff8107c4e8>] ? trace_hardirqs_on+0xd/0xf Aug 20 01:38:57 hq2 kernel: [<ffffffff810115bf>] default_idle+0x3b/0x5d Aug 20 01:38:57 hq2 kernel: [<ffffffff81008c01>] cpu_idle+0xaf/0xe9 Aug 20 01:38:57 hq2 kernel: [<ffffffff8148184b>] rest_init+0xcf/0xd6 Aug 20 01:38:57 hq2 kernel: [<ffffffff8148177c>] ? rest_init+0x0/0xd6 Aug 20 01:38:57 hq2 kernel: [<ffffffff81d78ece>] start_kernel+0x447/0x452 Aug 20 01:38:57 hq2 kernel: [<ffffffff81d782c8>] x86_64_start_reservations+0xb3/0xb7 Aug 20 01:38:57 hq2 kernel: [<ffffffff81d783c4>] x86_64_start_kernel+0xf8/0x107 Aug 20 01:38:57 hq2 kernel: ---[ end trace ab918d46f26f7859 ]--- Aug 20 01:38:57 hq2 kernel: e1000e 0000:02:00.0: eth1: Reset adapter Aug 20 01:40:01 hq2 abrt: Kerneloops: Reported 1 kernel oopses to Abrt Aug 20 01:40:01 hq2 abrtd: Directory 'kerneloops-1282282801-1' creation detected Aug 20 01:40:01 hq2 abrtd: New crash /var/cache/abrt/kerneloops-1282282801-1, processing Aug 20 01:40:01 hq2 abrtd: RunApp('/var/cache/abrt/kerneloops-1282282801-1','test x"`cat component`" = x"xorg-x11-server-Xorg" && cp /var/log/Xorg.0.log .') Aug 20 01:44:41 hq2 kernel: ADDRCONF(NETDEV_UP): eth1: link is not ready Aug 20 01:55:35 hq2 kernel: IPv6 over IPv4 tunneling driver Aug 20 01:55:35 hq2 kernel: sit0: Disabled Privacy Extensions Aug 20 01:55:35 hq2 kernel: lo: Disabled Privacy Extensions Aug 20 01:55:36 hq2 kernel: ADDRCONF(NETDEV_UP): eth0: link is not ready Aug 20 01:55:37 hq2 kernel: e1000e: eth0 NIC Link is Up 100 Mbps Full Duplex, Flow Control: None Aug 20 01:55:37 hq2 kernel: e1000e 0000:03:00.0: eth0: 10/100 speed: disabling TSO Aug 20 01:55:37 hq2 kernel: ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready Aug 20 01:55:40 hq2 kernel: ADDRCONF(NETDEV_UP): eth1: link is not ready Aug 20 01:57:27 hq2 kernel: Kernel logging (proc) stopped. Aug 20 02:04:03 hq2 kernel: fuse init (API version 7.14) Aug 20 02:05:44 hq2 kernel: ------------[ cut here ]------------ Aug 20 02:05:44 hq2 kernel: WARNING: at net/sched/sch_generic.c:258 dev_watchdog+0xfb/0x169() Aug 20 02:05:44 hq2 kernel: Hardware name: H8SGL Aug 20 02:05:44 hq2 kernel: NETDEV WATCHDOG: eth0 (e1000e): transmit queue 0 timed out Aug 20 02:05:44 hq2 kernel: Modules linked in: fuse sunrpc tun cpufreq_ondemand powernow_k8 freq_table mperf ip6t_REJECT nf_conntrack_ipv6 ip6table_filter ip6_tables ipv6 kvm_amd kvm uinput i2c_piix4 amd64_edac_mod edac_core i2c_core edac_mce_amd e1000e k10temp microcode btrfs zlib_deflate libcrc32c raid1 pata_acpi ata_generic usb_storage sata_promise pata_atiixp sata_sil24 [last unloaded: scsi_wait_scan] Aug 20 02:05:44 hq2 kernel: Pid: 0, comm: swapper Not tainted 2.6.35.2-9.fc13.x86_64 #1 Aug 20 02:05:44 hq2 kernel: Call Trace: Aug 20 02:05:44 hq2 kernel: <IRQ> [<ffffffff81050f5d>] warn_slowpath_common+0x85/0x9d Aug 20 02:05:44 hq2 kernel: [<ffffffff81051018>] warn_slowpath_fmt+0x46/0x48 Aug 20 02:05:44 hq2 kernel: [<ffffffff813ffef5>] dev_watchdog+0xfb/0x169 Aug 20 02:05:44 hq2 kernel: [<ffffffff8105dae5>] run_timer_softirq+0x23b/0x328 Aug 20 02:05:44 hq2 kernel: [<ffffffff8105da4b>] ? run_timer_softirq+0x1a1/0x328 Aug 20 02:05:44 hq2 kernel: [<ffffffff813ffdfa>] ? dev_watchdog+0x0/0x169 Aug 20 02:05:44 hq2 kernel: [<ffffffff81047cbf>] ? get_parent_ip+0x11/0x41 Aug 20 02:05:44 hq2 kernel: [<ffffffff81057663>] __do_softirq+0x101/0x1df Aug 20 02:05:44 hq2 kernel: [<ffffffff810786f5>] ? tick_program_event+0x2a/0x2c Aug 20 02:05:44 hq2 kernel: [<ffffffff8100ab9c>] call_softirq+0x1c/0x30 Aug 20 02:05:44 hq2 kernel: [<ffffffff8100c22b>] do_softirq+0x4b/0xa3 Aug 20 02:05:44 hq2 kernel: [<ffffffff81057224>] irq_exit+0x4a/0x8c Aug 20 02:05:44 hq2 kernel: [<ffffffff814a23b8>] smp_apic_timer_interrupt+0x8d/0x9b Aug 20 02:05:44 hq2 kernel: [<ffffffff8100a653>] apic_timer_interrupt+0x13/0x20 Aug 20 02:05:44 hq2 kernel: <EOI> [<ffffffff810115ba>] ? default_idle+0x36/0x5d Aug 20 02:05:44 hq2 kernel: [<ffffffff8102c8b9>] ? native_safe_halt+0xb/0xd Aug 20 02:05:44 hq2 kernel: [<ffffffff8107c4e8>] ? trace_hardirqs_on+0xd/0xf Aug 20 02:05:44 hq2 kernel: [<ffffffff810115bf>] default_idle+0x3b/0x5d Aug 20 02:05:44 hq2 kernel: [<ffffffff81008c01>] cpu_idle+0xaf/0xe9 Aug 20 02:05:44 hq2 kernel: [<ffffffff8148184b>] rest_init+0xcf/0xd6 Aug 20 02:05:44 hq2 kernel: [<ffffffff8148177c>] ? rest_init+0x0/0xd6 Aug 20 02:05:44 hq2 kernel: [<ffffffff81d78ece>] start_kernel+0x447/0x452 Aug 20 02:05:44 hq2 kernel: [<ffffffff81d782c8>] x86_64_start_reservations+0xb3/0xb7 Aug 20 02:05:44 hq2 kernel: [<ffffffff81d783c4>] x86_64_start_kernel+0xf8/0x107 Aug 20 02:05:44 hq2 kernel: ---[ end trace 72c1332cb7a95f2a ]--- Aug 20 02:05:44 hq2 kernel: e1000e 0000:03:00.0: eth0: Reset adapter Aug 20 02:07:17 hq2 kernel: usb 1-1: USB disconnect, address 2 Aug 20 02:07:17 hq2 kernel: usb 1-1.2: USB disconnect, address 3 Aug 20 02:07:17 hq2 kernel: usb 1-1.3: USB disconnect, address 4 Aug 20 02:10:30 hq2 kernel: IPv6 over IPv4 tunneling driver Aug 20 02:10:30 hq2 kernel: sit0: Disabled Privacy Extensions Aug 20 02:10:30 hq2 kernel: Kernel logging (proc) stopped.
So far, so good with: 2.6.36-0.11.rc2.git5.fc13.x86_64
Back again with kernel-2.6.37-0.rc5.git2.1 (downloaded from koji, unmodified) Trying to rmmod the e1000e module and then modprobe it back in doesn't bring the NIC back and produces this error: kernel: [163373.663114] e1000e: probe of 0000:03:00.0 failed with error -2 backtrace: [162536.704103] WARNING: at net/sched/sch_generic.c:258 dev_watchdog+0x111/0x185() [162536.704110] Hardware name: H8SGL [162536.704116] NETDEV WATCHDOG: eth0 (e1000e): transmit queue 0 timed out [162536.704122] Modules linked in: ip6table_filter ebtable_nat ebtables act_police cls_flow cls_fw cls_u32 sch_htb sch_hfsc sch_ingress sch_sfq bridge stp llc xt_time xt_connlimit xt_realm iptable_raw xt_comment xt_recent xt_policy ipt_ULOG ipt_REDIRECT ipt_NETMAP ipt_MASQUERADE ipt_ECN ipt_ecn ipt_CLUSTERIP ipt_ah ipt_addrtype nf_nat_tftp nf_nat_snmp_basic nf_nat_sip nf_nat_pptp nf_nat_proto_gre nf_nat_irc nf_nat_h323 nf_nat_ftp nf_nat_amanda ts_kmp nf_conntrack_amanda nf_conntrack_sane nf_conntrack_tftp nf_conntrack_sip nf_conntrack_proto_sctp nf_conntrack_pptp nf_conntrack_proto_gre nf_conntrack_netlink nf_conntrack_netbios_ns nf_conntrack_irc nf_conntrack_h323 nf_conntrack_ftp xt_TPROXY nf_tproxy_core ip6_tables nf_defrag_ipv6 xt_tcpmss xt_pkttype xt_physdev xt_owner xt_NFQUEUE xt_NFLOG nfnetlink_log xt_multiport xt_mark xt_mac xt_limit xt_length xt_iprange xt_helper xt_hashlimit xt_DSCP xt_dscp xt_dccp xt_connmark xt_CLASSIFY ipt_LOG iptable_nat nf_nat iptable_mangle nfnetlink fuse tun sunrpc cpufreq_ondemand powernow_k8 freq_table mperf ipv6 kvm_amd kvm uinput amd64_edac_mod edac_core e1000e edac_mce_amd i2c_piix4 i2c_core k10temp ghes hed joydev microcode btrfs zlib_deflate libcrc32c raid1 pata_acpi ata_generic sata_promise pata_atiixp sata_sil24 uas usb_storage [last unloaded: scsi_wait_scan] [162536.704368] Pid: 0, comm: kworker/0:0 Not tainted 2.6.37-0.rc5.git2.1.fc13.x86_64 #1 [162536.704375] Call Trace: [162536.704380] <IRQ> [<ffffffff81052e29>] warn_slowpath_common+0x85/0x9d [162536.704406] [<ffffffff81052ee4>] warn_slowpath_fmt+0x46/0x48 [162536.704418] [<ffffffff81419f70>] dev_watchdog+0x111/0x185 [162536.704430] [<ffffffff8105fdcf>] run_timer_softirq+0x246/0x33f [162536.704441] [<ffffffff8105fd29>] ? run_timer_softirq+0x1a0/0x33f [162536.704452] [<ffffffff81419e5f>] ? dev_watchdog+0x0/0x185 [162536.704463] [<ffffffff81059746>] __do_softirq+0x101/0x20c [162536.704474] [<ffffffff81011aa6>] ? native_sched_clock+0x2d/0x5f [162536.704484] [<ffffffff8100bc1c>] call_softirq+0x1c/0x30 [162536.704493] [<ffffffff8100d29f>] do_softirq+0x4b/0xa3 [162536.704502] [<ffffffff81059480>] irq_exit+0x57/0x9b [162536.704514] [<ffffffff814bf93c>] smp_apic_timer_interrupt+0x8d/0x9b [162536.704523] [<ffffffff8100b6d3>] apic_timer_interrupt+0x13/0x20 [162536.704529] <EOI> [<ffffffff81012686>] ? default_idle+0x3e/0x65 [162536.704549] [<ffffffff8102d159>] ? native_safe_halt+0xb/0xd [162536.704560] [<ffffffff810816ee>] ? trace_hardirqs_on+0xd/0xf [162536.704571] [<ffffffff8101268b>] default_idle+0x43/0x65 [162536.704581] [<ffffffff81009c81>] cpu_idle+0xbe/0x132 [162536.704592] [<ffffffff814af9ea>] start_secondary+0x242/0x244
I've got same or very similar with any kernel after 2.6.33.8-149.fc13.i686 . This one is from 2.6.34.7-66.fc13.i686: Jan 24 22:11:03 server kernel: ------------[ cut here ]------------ Jan 24 22:11:03 server kernel: WARNING: at net/sched/sch_generic.c:256 dev_watchdog+0xc6/0x154() Jan 24 22:11:03 server kernel: Hardware name: A9830IMS Jan 24 22:11:03 server kernel: NETDEV WATCHDOG: eth0 (e1000e): transmit queue 0 timed out Jan 24 22:11:03 server kernel: Modules linked in: nfsd lockd nfs_acl auth_rpcgss exportfs f71882fg sunrpc authenc esp4 ah4 xfrm4_mode_transport cpufreq_ondemand acpi_cpufreq deflate zlib_deflate ctr twofish twofish_common camellia serpen t blowfish cast5 des_generic cbc aes_i586 aes_generic xcbc rmd160 sha512_generic sha256_generic crypto_null af_key ipt_LOG xt_owner ip6t_REJECT ip6table_filter ip6_tables ipv6 iTCO_wdt iTCO_vendor_support e1000e i2c_i801 joydev serio_raw raid1 i915 drm_kms_helper drm i2c_algo_bit i2c_core usb_storage video output [last unloaded: scsi_wait_scan] Jan 24 22:11:03 server kernel: Pid: 0, comm: swapper Not tainted 2.6.34.7-66.fc13.i686 #1 Jan 24 22:11:03 server kernel: Call Trace: Jan 24 22:11:03 server kernel: [<c04387ae>] warn_slowpath_common+0x6a/0x81 Jan 24 22:11:03 server kernel: [<c0714184>] ? dev_watchdog+0xc6/0x154 Jan 24 22:11:03 server kernel: [<c0438803>] warn_slowpath_fmt+0x29/0x2c Jan 24 22:11:03 server kernel: [<c0714184>] dev_watchdog+0xc6/0x154 Jan 24 22:11:03 server kernel: [<c0450069>] ? hrtimer_forward+0x114/0x128 Jan 24 22:11:03 server kernel: [<c0418424>] ? lapic_next_event+0x1b/0x1f Jan 24 22:11:03 server kernel: [<c0442cea>] run_timer_softirq+0x167/0x1e9 Jan 24 22:11:03 server kernel: [<c045930f>] ? tick_dev_program_event+0x33/0x113 Jan 24 22:11:03 server kernel: [<c07140be>] ? dev_watchdog+0x0/0x154 Jan 24 22:11:03 server kernel: [<c043dbe7>] __do_softirq+0xab/0x14d Jan 24 22:11:03 server kernel: [<c043dcbf>] do_softirq+0x36/0x41 Jan 24 22:11:03 server kernel: [<c043dddd>] irq_exit+0x2e/0x61 Jan 24 22:11:03 server kernel: [<c0418df5>] smp_apic_timer_interrupt+0x73/0x81 Jan 24 22:11:03 server kernel: [<c07902e5>] apic_timer_interrupt+0x31/0x38 Jan 24 22:11:03 server kernel: [<c040940e>] ? mwait_idle+0x61/0x6c Jan 24 22:11:03 server kernel: [<c0402614>] cpu_idle+0x96/0xb2 Jan 24 22:11:03 server kernel: [<c077c877>] rest_init+0x67/0x69 Jan 24 22:11:03 server kernel: [<c09d4987>] start_kernel+0x350/0x355 Jan 24 22:11:03 server kernel: [<c09d40c9>] i386_start_kernel+0xc9/0xd0 Jan 24 22:11:03 server kernel: ---[ end trace 5c56a6c10bf4f628 ]--- When booted to thjat old kernel (2.6.33.8-149.fc13.i686) PC is OK. Witn newer kernels traffic on eth0 stops 2-3 minutes after boot. The other information I do have about it is that ifconfig is reporting A LOT of collisions (millions ...).
Additional info: My board have two NICs. Same problem is happening with any of those. # lspci -vvv|less 01:00.0 Ethernet controller: Intel Corporation 82574L Gigabit Network Connection Subsystem: Micro-Star International Co., Ltd. Device 9830 Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+ Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx- Latency: 0, Cache Line Size: 64 bytes Interrupt: pin A routed to IRQ 16 Region 0: Memory at feae0000 (32-bit, non-prefetchable) [size=128K] Region 2: I/O ports at dc80 [size=32] Region 3: Memory at feadc000 (32-bit, non-prefetchable) [size=16K] Capabilities: [c8] Power Management version 2 Flags: PMEClk- DSI+ D1- D2- AuxCurrent=0mA PME(D0+,D1-,D2-,D3hot+,D3cold+) Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=1 PME- Capabilities: [d0] MSI: Enable- Count=1/1 Maskable- 64bit+ Address: 0000000000000000 Data: 0000 Capabilities: [e0] Express (v1) Endpoint, MSI 00 DevCap: MaxPayload 256 bytes, PhantFunc 0, Latency L0s <512ns, L1 <64us ExtTag- AttnBtn- AttnInd- PwrInd- RBE+ FLReset- DevCtl: Report errors: Correctable+ Non-Fatal+ Fatal+ Unsupported+ RlxdOrd+ ExtTag- PhantFunc- AuxPwr- NoSnoop+ MaxPayload 128 bytes, MaxReadReq 512 bytes DevSta: CorrErr+ UncorrErr- FatalErr- UnsuppReq+ AuxPwr+ TransPend- LnkCap: Port #0, Speed 2.5GT/s, Width x1, ASPM L0s L1, Latency L0 <128ns, L1 <64us ClockPM- Surprise- LLActRep- BwNot- LnkCtl: ASPM L0s Enabled; RCB 64 bytes Disabled- Retrain- CommClk+ ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt- LnkSta: Speed 2.5GT/s, Width x1, TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt- Capabilities: [a0] MSI-X: Enable+ Count=3 Masked- Vector table: BAR=3 offset=00000000 PBA: BAR=3 offset=00002000 Capabilities: [100 v1] Advanced Error Reporting UESta: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol- UEMsk: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol- UESvrt: DLP+ SDES- TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt- RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol- CESta: RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr+ CEMsk: RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr+ AERCap: First Error Pointer: 00, GenCap- CGenEn- ChkCap- ChkEn- Kernel driver in use: e1000e Kernel modules: e1000e 02:00.0 Ethernet controller: Intel Corporation 82574L Gigabit Network Connection Subsystem: Micro-Star International Co., Ltd. Device 9830 Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+ Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx- Latency: 0, Cache Line Size: 64 bytes Interrupt: pin A routed to IRQ 17 Region 0: Memory at febe0000 (32-bit, non-prefetchable) [size=128K] Region 2: I/O ports at ec80 [size=32] Region 3: Memory at febdc000 (32-bit, non-prefetchable) [size=16K] Capabilities: [c8] Power Management version 2 Flags: PMEClk- DSI+ D1- D2- AuxCurrent=0mA PME(D0+,D1-,D2-,D3hot+,D3cold+) Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=1 PME- Capabilities: [d0] MSI: Enable- Count=1/1 Maskable- 64bit+ Address: 0000000000000000 Data: 0000 Capabilities: [e0] Express (v1) Endpoint, MSI 00 DevCap: MaxPayload 256 bytes, PhantFunc 0, Latency L0s <512ns, L1 <64us ExtTag- AttnBtn- AttnInd- PwrInd- RBE+ FLReset- DevCtl: Report errors: Correctable+ Non-Fatal+ Fatal+ Unsupported+ RlxdOrd+ ExtTag- PhantFunc- AuxPwr- NoSnoop+ MaxPayload 128 bytes, MaxReadReq 512 bytes DevSta: CorrErr+ UncorrErr- FatalErr- UnsuppReq+ AuxPwr+ TransPend- LnkCap: Port #0, Speed 2.5GT/s, Width x1, ASPM L0s L1, Latency L0 <128ns, L1 <64us ClockPM- Surprise- LLActRep- BwNot- LnkCtl: ASPM L0s Enabled; RCB 64 bytes Disabled- Retrain- CommClk+ ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt- LnkSta: Speed 2.5GT/s, Width x1, TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt- Capabilities: [a0] MSI-X: Enable+ Count=3 Masked- Vector table: BAR=3 offset=00000000 PBA: BAR=3 offset=00002000 Capabilities: [100 v1] Advanced Error Reporting UESta: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol- UEMsk: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol- UESvrt: DLP+ SDES- TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt- RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol- CESta: RxErr+ BadTLP+ BadDLLP+ Rollover- Timeout+ NonFatalErr+ CEMsk: RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr+ AERCap: First Error Pointer: 00, GenCap- CGenEn- ChkCap- ChkEn- Kernel driver in use: e1000e Kernel modules: e1000e ...
Experiencing problems under Fedora 14 kernel 2.6.35.10-74.fc14.x86_64 Network chip is 82574L Running lots of stuff (ntp, dhcp, nfs, rsync) but interface dies even if traffic is light. Have rebooted a dozen times today. WARNING: at net/sched/sch_generic.c:258 dev_watchdog+0xf3/0x167() Hardware name: empty NETDEV WATCHDOG: eth0 (e1000e): transmit queue 0 timed out Modules linked in: fuse deflate zlib_deflate ctr camellia cast5 rmd160 crypto_null ccm serpent blowfish twofish_x86_64 twofish_common ecb xcbc cbc sha256_generic sha512_generic des_generic cryptd aes_x86_64 aes_generic ah6 ah4 esp6 esp4 xfrm4_mode_beet xfrm4_tunnel tunnel4 xfrm4_mode_tunnel xfrm4_mode_transport xfrm6_mode_transport xfrm6_mode_ro xfrm6_mode_beet xfrm6_mode_tunnel ipcomp ipcomp6 xfrm_ipcomp xfrm6_tunnel tunnel6 af_key nfsd exportfs nfs lockd fscache nfs_acl auth_rpcgss sunrpc cpufreq_ondemand powernow_k8 freq_table mperf ipv6 snd_hda_codec_atihdmi snd_hda_codec_realtek snd_hda_intel snd_hda_codec snd_hwdep snd_seq snd_seq_device ftdi_sio snd_pcm amd64_edac_mod k10temp microcode usbserial joydev i2c_piix4 serio_raw edac_core edac_mce_amd snd_timer e1000e igb snd soundcore snd_page_alloc dca raid1 pata_acpi ata_generic firewire_ohci firewire_core crc_itu_t pata_atiixp mpt2sas scsi_transport_sas raid_class radeon ttm drm_kms_helper drm i2c_algo_bit i2c_co re [last unloaded: scsi_wait_scan] Pid: 0, comm: swapper Not tainted 2.6.35.10-74.fc14.x86_64 #1 Call Trace: <IRQ> [<ffffffff8104d999>] warn_slowpath_common+0x85/0x9d [<ffffffff8104da54>] warn_slowpath_fmt+0x46/0x48 [<ffffffff813d5906>] ? netif_tx_lock+0x44/0x6d [<ffffffff813d5a70>] dev_watchdog+0xf3/0x167 [<ffffffff81469c5f>] ? _raw_spin_unlock_irqrestore+0x17/0x19 [<ffffffff81062bed>] ? __queue_work+0x3a/0x43 [<ffffffff81059e28>] run_timer_softirq+0x1d6/0x2a3 [<ffffffff813d597d>] ? dev_watchdog+0x0/0x167 [<ffffffff81021ff6>] ? apic_write+0x16/0x18 [<ffffffff81053a39>] __do_softirq+0xdd/0x199 [<ffffffff810726e4>] ? tick_dev_program_event+0x36/0xf4 [<ffffffff810727cc>] ? tick_program_event+0x2a/0x2c [<ffffffff8100abdc>] call_softirq+0x1c/0x30 [<ffffffff8100c338>] do_softirq+0x46/0x82 [<ffffffff81053b99>] irq_exit+0x3b/0x7d [<ffffffff8146fc1a>] smp_apic_timer_interrupt+0x7e/0x8c [<ffffffff8100a693>] apic_timer_interrupt+0x13/0x20 <EOI> [<ffffffff8102b7dd>] ? native_safe_halt+0xb/0xd [<ffffffff81010f03>] ? need_resched+0x23/0x2d [<ffffffff8101102a>] default_idle+0x34/0x4f [<ffffffff81008325>] cpu_idle+0xaa/0xcc [<ffffffff81451906>] rest_init+0x8a/0x8c [<ffffffff81ba1c49>] start_kernel+0x40b/0x416 [<ffffffff81ba12c6>] x86_64_start_reservations+0xb1/0xb5 [<ffffffff81ba13c2>] x86_64_start_kernel+0xf8/0x107
Installed Fedora 13 and Fedora 14 x86_64 on a Dual CPU AMD 6136 system based on the ASUS KGPE-D16 which has 2 Intel 82574L's. Same problems as reported above. Then installed updated kernels from fedora updates. No change. Installed Intel's latest driver from source. No change. Went to opensuse 11.1 (kernel 2.6.27.56-0.1) which seems to work fine, the network connections have been reliable since last week. Somewhere between 2.6.27.56 and the fedora 13 kernel this driver changed in introduced a bug that causes the system to no only lose the device but required a reset and / or a reboot to find them again. Since there seems to be multiple motherboards listed above, something is / has been changed in this driver to cause this issue. The errors I see all have this in common: net/sched/sch_generic.c:258 dev_watchdog
This message is a reminder that Fedora 13 is nearing its end of life. Approximately 30 (thirty) days from now Fedora will stop maintaining and issuing updates for Fedora 13. It is Fedora's policy to close all bug reports from releases that are no longer maintained. At that time this bug will be closed as WONTFIX if it remains open with a Fedora 'version' of '13'. Package Maintainer: If you wish for this bug to remain open because you plan to fix it in a currently maintained version, simply change the 'version' to a later Fedora version prior to Fedora 13's end of life. Bug Reporter: Thank you for reporting this issue and we are sorry that we may not be able to fix it before Fedora 13 is end of life. If you would still like to see this bug fixed and are able to reproduce it against a later version of Fedora please change the 'version' of this bug to the applicable version. If you are unable to change the version, please add a comment here and someone will do it for you. Although we aim to fix as many bugs as possible during every release's lifetime, sometimes those efforts are overtaken by events. Often a more recent Fedora release includes newer upstream software that fixes bugs or makes them obsolete. The process we are following is described here: http://fedoraproject.org/wiki/BugZappers/HouseKeeping
Fedora 13 changed to end-of-life (EOL) status on 2011-06-25. Fedora 13 is no longer maintained, which means that it will not receive any further security or bug fix updates. As a result we are closing this bug. If you can reproduce this bug against a currently maintained version of Fedora please feel free to reopen this bug against that version. Thank you for reporting this bug and we are sorry it could not be fixed.
Reopening: this bug is still present in F16. I have reproduced it on both ports of my server. # lspci|grep Ether 01:00.0 Ethernet controller: Intel Corporation 82574L Gigabit Network Connection 02:00.0 Ethernet controller: Intel Corporation 82574L Gigabit Network Connection [115886.720052] ------------[ cut here ]------------ [115886.720389] WARNING: at net/sched/sch_generic.c:255 dev_watchdog+0xf0/0x150() [115886.721022] Hardware name: H8SGL [115886.721332] NETDEV WATCHDOG: eth0 (e1000e): transmit queue 0 timed out [115886.721654] Modules linked in: fuse btrfs zlib_deflate libcrc32c ufs qnx4 hfsplus hfs minix vfat msdos fat jfs xfs reiserfs e1000e lockd nf_conntrack_ipv4 nf_defrag_ipv4 ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 xt_state nf_conntrack ip6table_filter ip6_tables joydev microcode k10temp amd64_edac_mod edac_core edac_mce_amd sp5100_tco i2c_piix4 i2c_core sunrpc raid10 ata_generic pata_acpi raid1 pata_atiixp [last unloaded: e1000e] [115886.723708] Pid: 0, comm: kworker/0:0 Not tainted 3.1.2-1.fc16.x86_64 #1 [115886.724041] Call Trace: [115886.724348] <IRQ> [<ffffffff81057a1e>] warn_slowpath_common+0x83/0x9b [115886.724682] [<ffffffff81057ad9>] warn_slowpath_fmt+0x46/0x48 [115886.725027] [<ffffffff813f0955>] ? netif_tx_lock+0x4a/0x7c [115886.725350] [<ffffffff813f0acb>] dev_watchdog+0xf0/0x150 [115886.725672] [<ffffffff81064b19>] run_timer_softirq+0x19b/0x280 [115886.725995] [<ffffffff81014fec>] ? sched_clock+0x9/0xd [115886.726327] [<ffffffff813f09db>] ? netif_tx_unlock+0x54/0x54 [115886.726649] [<ffffffff8105d67b>] __do_softirq+0xc9/0x1b5 [115886.726969] [<ffffffff81014b35>] ? paravirt_read_tsc+0x9/0xd [115886.727272] [<ffffffff81014fec>] ? sched_clock+0x9/0xd [115886.727567] [<ffffffff814bfb6c>] call_softirq+0x1c/0x30 [115886.727860] [<ffffffff81010b45>] do_softirq+0x46/0x81 [115886.728166] [<ffffffff8105d943>] irq_exit+0x57/0xb1 [115886.728458] [<ffffffff814c04e1>] smp_apic_timer_interrupt+0x7c/0x8a [115886.728776] [<ffffffff814be3de>] apic_timer_interrupt+0x6e/0x80 [115886.729103] <EOI> [<ffffffff8102f2f1>] ? native_safe_halt+0xb/0xd [115886.729433] [<ffffffff81015b7e>] default_idle+0x4e/0x86 [115886.729754] [<ffffffff8100e2ed>] cpu_idle+0xae/0xe8 [115886.730086] [<ffffffff814a54d9>] start_secondary+0x23f/0x241 [115886.730406] ---[ end trace 0992468ae984f23f ]--- I have tried rmmod e1000e, and modprobe e1000e, but it could not see one of the ports anymore. I had to reboot the machine in order to get the network connection.
I have a Fedora 14 machine which is still running on kernel-2.6.33.8-149.fc13.i686 due to this bug. All Fedora 14 kernels so far were affected by this issue. But, it seems like kernel-2.6.35.14-103.fc14.i686 and kernel-2.6.35.14-106.fc14.i686 might NOT be affected. I've tried them and the bug did not occur. But it was only for a few hours and with almost no work being done on the box. So, I'll test more and give more details.
On my system the bug manifests itself only after several days of relatively heavy traffic (but usually under a week), so I would not dare to make conclusions after few hours of uptime. Anyway, I have found a similar upstream report: http://sourceforge.net/tracker/index.php?func=detail&aid=3404265&group_id=42302&atid=447449 They suggest to use IntMode=1 parameter on platforms with poor/missing MSI-X suport. I will try it. My mainboard is Supermicro H8SGL.
(In reply to comment #11) kernel-2.6.35.14-106.fc14.i686 failed too after few hours. Then I tried that kernel with IntMode=1 and ... error occured too, again after few hours.
We have a system with 2 82574L network connections as well. The system works well, but with heavy traffic once in a while the following error occurs: kernel: [551299.157900] e1000e 0000:04:00.0: em1: Reset adapter kernel: [551301.935449] e1000e: em1 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: None Sometimes 20 times in a hour, sometimes once a day. When it occurs the machine is not reachable (as you would expected) and will start "working" again after a couple of seconds. I believe this might be the same bug. The machine is running kernel: config-2.6.41.9-1.fc15.x86_64 Is the driver supplied with FC15 / FC16 the latest from Intel?
I have a Fedora 14 machine.(two 82574L NICs) This problem also occurred in my machine. Then, I was add the kernel parameter pcie_aspm=off. Before add, interface down had occurred once a week. After adding the kernel parameter, my machine is running for 40 days.
I tried the pcie_aspm=off kernel parameter in the grub configuration and rebooted. The log states it is off, but the problem is still there. :( I also tried resetting the switch it is connected to, but that didn't help either, and it is the only machine having this problem.
[mass update] kernel-3.3.0-4.fc16 has been pushed to the Fedora 16 stable repository. Please retest with this update.
I'm seeing this problem on a Supermicro H8DI3+-F. Something about my setup and load makes it occur in a matter of hours, not days. For me, it still occurs on kernel-3.3.4-1.fc16, but passing the kernel pcie_aspm=off seems to make it go away.
There's an upstream conversation for this here: http://thread.gmane.org/gmane.linux.kernel/1233566 toward the end of the thread, there are some patches posted. We'll keep an eye on them.
It looks like this is also impacting RHEL (CentOS). http://lists.soekris.com/pipermail/soekris-tech/2011-October/017822.html The piece of hardware mentioned in that thread (the Soekris net6501) has four 82574 NICs, and is headless. PXE is the primary method of installation, so this prevents it from being installed at all. Even if installed via an alternative method, then I wind up with four useless NICs.
(In reply to comment #20) > There's an upstream conversation for this here: > > http://thread.gmane.org/gmane.linux.kernel/1233566 > > toward the end of the thread, there are some patches posted. We'll keep an > eye on them. The main patch went into 3.5-rc1 with commit d4a4206ebbaf48b55803a7eb34e330530d83a889. The current rawhide kernel has that commit it in if anyone wants to test. (In reply to comment #21) > It looks like this is also impacting RHEL (CentOS). RHEL customers should report issues via the proper RHEL channels.
The fix mentioned in comment #22 went into 3.4.4 as well. If this is still being seen with the most recent kernel, please reopen and attach the backtrace.
I totally forgot I opened this bug . . . but Google brought me back here. I put pcie_aspm=off in the kernel boot line, but it hasn't appeared to fix the problem completely. Now I'm getting port "hiccups" even though the switches aren't missing a beat. There seems to be no rhyme or reason to how often the ports disable or to why. Also, abrt isn't showing any crash files and I'm not seeing any dumps noted in /var/log/messages. I'm open to suggestions on how to aggregate more meaningful info. Running stock F17 Linux hq2 3.5.2-1.fc17.x86_64 #1 SMP Wed Aug 15 16:09:27 UTC 2012 x86_64 x86_64 x86_64 GNU/Linux [~]# grep -A2 "Reset adapter" /var/log/messages -- Dec 5 10:17:22 hq2 kernel: [9040718.750991] e1000e 0000:03:00.0: eth0: Reset adapter Dec 5 10:17:22 hq2 kernel: [9040718.771643] br0: port 1(eth0) entered disabled state Dec 5 10:17:23 hq2 kernel: [9040720.578149] e1000e: eth0 NIC Link is Up 100 Mbps Full Duplex, Flow Control: None -- Dec 5 10:18:39 hq2 kernel: [9040795.766370] e1000e 0000:03:00.0: eth0: Reset adapter Dec 5 10:18:39 hq2 kernel: [9040795.783101] br0: port 1(eth0) entered disabled state Dec 5 10:18:40 hq2 kernel: [9040797.514533] e1000e: eth0 NIC Link is Up 100 Mbps Full Duplex, Flow Control: None -- Dec 5 16:09:56 hq2 kernel: [9061874.776157] e1000e 0000:03:00.0: eth0: Reset adapter Dec 5 16:09:56 hq2 kernel: [9061874.802737] br0: port 1(eth0) entered disabled state Dec 5 16:09:57 hq2 kernel: [9061876.590246] e1000e: eth0 NIC Link is Up 100 Mbps Full Duplex, Flow Control: None -- Dec 6 09:21:33 hq2 kernel: [9123777.723927] e1000e 0000:03:00.0: eth0: Reset adapter Dec 6 09:21:33 hq2 kernel: [9123777.743226] br0: port 1(eth0) entered disabled state Dec 6 09:21:34 hq2 kernel: [9123779.525124] e1000e: eth0 NIC Link is Up 100 Mbps Full Duplex, Flow Control: None -- Dec 6 09:23:45 hq2 kernel: [9123909.720529] e1000e 0000:03:00.0: eth0: Reset adapter Dec 6 09:23:45 hq2 kernel: [9123909.738376] br0: port 1(eth0) entered disabled state Dec 6 09:23:46 hq2 kernel: [9123911.550694] e1000e: eth0 NIC Link is Up 100 Mbps Full Duplex, Flow Control: None -- Dec 6 09:30:22 hq2 kernel: [9124306.750407] e1000e 0000:03:00.0: eth0: Reset adapter Dec 6 09:30:22 hq2 kernel: [9124306.777550] br0: port 1(eth0) entered disabled state Dec 6 09:30:23 hq2 kernel: [9124308.449522] e1000e: eth0 NIC Link is Up 100 Mbps Full Duplex, Flow Control: None -- Dec 6 09:37:14 hq2 kernel: [9124718.789657] e1000e 0000:03:00.0: eth0: Reset adapter Dec 6 09:37:14 hq2 kernel: [9124718.808761] br0: port 1(eth0) entered disabled state Dec 6 09:37:15 hq2 kernel: [9124720.547813] e1000e: eth0 NIC Link is Up 100 Mbps Full Duplex, Flow Control: None -- Dec 6 15:56:33 hq2 kernel: [9147479.974278] e1000e 0000:04:00.0: eth1: Reset adapter Dec 6 15:56:33 hq2 kernel: [9147479.991302] br1: port 1(eth1) entered disabled state Dec 6 15:56:36 hq2 kernel: [9147483.203583] e1000e: eth1 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: None