Bug 222556
Summary: | kernel-2.6.19-1.2895, -1.2911 lose network connectivity with forcedeth driver | ||
---|---|---|---|
Product: | [Fedora] Fedora | Reporter: | John Thacker <johnthacker> |
Component: | kernel | Assignee: | Kernel Maintainer List <kernel-maint> |
Status: | CLOSED ERRATA | QA Contact: | Brian Brock <bbrock> |
Severity: | high | Docs Contact: | |
Priority: | high | ||
Version: | 6 | CC: | andy, bugs, davide.rossetti, goodyca48, jp, mail, mal, nphilipp, peeter.as, pjs1, rabe, vanicekp, vfiend, wtogami, zing |
Target Milestone: | --- | ||
Target Release: | --- | ||
Hardware: | x86_64 | ||
OS: | Linux | ||
Whiteboard: | |||
Fixed In Version: | 2.6.19-1-2911.6.3 | Doc Type: | Bug Fix |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2007-03-10 18:22:28 UTC | Type: | --- |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
John Thacker
2007-01-14 04:45:48 UTC
I have this exact same problem with the 2.6.19 kernel that was just pushed to FC6 updates, and likewise, the old 2.6.18 kernel works fine. I've only noticed the network connection dying under heavy bittorrent activity, though. An ifdown/ifup gets it working again, but it soon dies again. I can't see anything obvious in the logs, but I'm not entirely sure how to troubleshoot this, so if you need anymore information, just ask. I'm also using x86_64, I have a MSI K9N NEO-F board with onboard "nVidia Corporation MCP55 Ethernet" according to system-config-network. I have the same problem. After ~2 hours work with computer network connection dies even if I only use instant messenger. I have ASUS M2N-E motherboard that has also onboard "nVidia Corporation MCP55 Ethernet". The connection comes up again if to restart connection as root in system-config-network. I use the x86 version of Fedora and got the kernel from updates. Ah, I also have the same onboard nVidia Corporation MCP55 Ethernet. x86_64 here. So seems like a problem in the forcedeth driver, I guess. I am seeing some similar with the e1000 driver with an Intel 82545EM Gigabit Ethernet Controller. In my case, the machine hangs during the boot process in after the ntpd boot script appears. No problems with the previous kernel. I also noticed that NFS will die while compiling across the network. Both Pirut and Pup die while retrieving package information. I have 3 machines with mostly different hardware. One has the same motherboard as in comment 2. My internet is on a 10/100Mbos network and the NFS is on a 1Gbps net. That is, I have 2 ethernet devices in each machine. Since both are affected, I don't beleive the problem is in a device specific module. I updated using pup today excluding the 2895 kerenl and have been running fine using 2869 kernel. So I am convinced its specific to the 2895 kernel and not some other package or library bug. Once the NFS dies, I can't ping either direction. Trying to use service command to stop and start netfs, nfs and network does not clear the problem. The only solution is to reboot both machines. I am seeing the same behavior. Under heavy load, the ethernet dies. I have an onboard nVidia Corporation MCP55 Ethernet. The problem appears with the 2.6.19-1.2895.fc6 kernel. I have no problems when I use the previous 2.6.18-1.2869.fc6 kernel. Two or three times (= rarely) monitor (only monitor) has gone to suspend mode when I tried to open the system-config-network to restart the connection when the connection died. The monitor didn't come out of the suspend mode in this case and only reset to the computer helps. I have similar problems when bittorrent picks up speed the network then dies. I currently have a nVidia Corporation MCP55 Ethernet controller. It's on the motherboard which is a MSI K9N Neo F. Another nVidia MCP55 with dying ethernet under load. Restarting interface cures problem for a few moments, but nothing else See also bug 223672 "forcedeth 0.56 -> 0.57 breaks Marvell 88E1116". The nVidia MCP55 network problems with the new kernel would appear to be the same as 223672. I have network hangs with my Asus M2N-E which has the nVIDIA MCP55 with the Marvell 88E1116. *** Bug 223672 has been marked as a duplicate of this bug. *** Moved here from 223672 ... *** Bug 228201 has been marked as a duplicate of this bug. *** This may be fixed in the next update. Please test version kernel 1_2911 when it becomes available. I've grabbed 2.6.19-1.2911.fc6 and found that this somehow breaks my DSL connection (PPPoe), couldn't test the forcedeth problem due to this until now. I found the following in /var/log/messages: Feb 11 23:20:40 wombat adsl-connect: ADSL connection lost; attempting re-connection. Feb 11 23:20:45 wombat pppd[18544]: pppd 2.4.4 started by root, uid 0 Feb 11 23:20:45 wombat pppoe[18545]: ioctl(SIOCGIFHWADDR): Session 0: No such device Feb 11 23:20:45 wombat pppd[18544]: Failed to set PPP kernel option flags: Inappropriate ioctl for device Feb 11 23:20:45 wombat pppd[18544]: Using interface ppp0 Feb 11 23:20:45 wombat pppd[18544]: Connect: ppp0 <--> /dev/pts/2 Feb 11 23:20:45 wombat pppd[18544]: Modem hangup Feb 11 23:20:45 wombat pppd[18544]: Connection terminated. Feb 11 23:20:45 wombat pppd[18544]: Exit. ... and so forth. Do you want a new BZ for it? (In reply to comment #15) > I've grabbed 2.6.19-1.2911.fc6 and found that this somehow breaks my DSL > connection (PPPoe), couldn't test the forcedeth problem due to this until now. I > found the following in /var/log/messages: > > <SNIP> > ... and so forth. Do you want a new BZ for it? Nils, did 2.6.19-1_2895 work for you, or did you just upgrade from 2.6.18? I'm trying to see if this is a new problem. If you can, try 1_2910 as well. (In reply to comment #16) > Nils, did 2.6.19-1_2895 work for you, or did you just upgrade from 2.6.18? > > I'm trying to see if this is a new problem. > > If you can, try 1_2910 as well. 1.2895 worked for me PPP-wise (but had the described forcedeth problems). I can try 1.2910 once I'm at home. (In reply to comment #17) Okay, then also try 2895 again as well. Sometimes strange things happen to network interfaces; mine got renamed to a very strange value after I changed some things. If you get the same error (on any kernel) with pppoe, do an 'ifconfig -a' and see if your network interface names look normal. (In reply to comment #16) > If you can, try 1_2910 as well. I'm running 1.2910 right now and it works fine PPP-wise, but when I just tried to rsync a DVD image (~3.4GB) the network stalled after 1.5GB. "the network": the LAN interface using the forcedeth driver (In reply to comment #20) > "the network": the LAN interface using the forcedeth driver You mean the pppoe connection doesn't use the forcedeth interface? (In reply to comment #21) > (In reply to comment #20) > > "the network": the LAN interface using the forcedeth driver > > You mean the pppoe connection doesn't use the forcedeth interface? No, my PPPoe connection goes over an el-cheapo RTL3189 card, this is where I saw the PPP/ioctl problem on with 1.2911. My local network goes over the on-board NIC of an ASUS M2N-SLI Deluxe which uses forcedeth. This is the one where I saw the interface stalling after a certain amount of data was transferred. (In reply to comment #22) Okay, can you see if the forcedeth problems are fixed in 2911? And if the pppoe on the rtl8139 works in 2910? I tested 8139 and it worked for me... Can you tell whether it's using the 8139cp or 8139too driver? Fedora loads both on my system and you have to look at the boot messages to see which one is driving the adapter (some status messages appear after the "right" driver loads.) (In reply to comment #23) > (In reply to comment #22) > > Okay, can you see if the forcedeth problems are fixed in 2911? Unfortunately, they are not. Rsyncing the DVD stalled after about 2.2GB. > And if the pppoe on the rtl8139 works in 2910? I tested 8139 and it worked for > me... Can you tell whether it's using the 8139cp or 8139too driver? It's using 8139too, but apparently PPP works with 1.2911 now, so I guess this was a one-off. > Fedora loads both on my system and you have to look at the boot messages > to see which one is driving the adapter (some status messages appear > after the "right" driver loads.) This suggests that 8139cp isn't: Feb 13 08:53:44 wombat kernel: 8139cp 0000:01:06.0: This (id 10ec:8139 rev 10) is not an 8139C+ compatible chip Feb 13 08:53:44 wombat kernel: 8139cp 0000:01:06.0: Try the "8139too" driver instead. I have tested 1.2911 on a Sun Ultra 20 M2 (NFORCE-MCP55). The problem is the same (as described in duplicate bug 223672). the card stop working after about 10 minutes. rmmod and insmod of forcedeth will make de card works again during few minutes. Installed the new kernel (1.2911) from updates and network connection still disappeared after about 1 hour surfing and downloading two big files. (Making 2.6.18 kernel default again...) 2.6.19-1.2911.fc6 to asme symptoms Chuck, out of curiosity I tried kernel-2.6.20-1.2914.fc6 and saw the PPP problem again. FYI, I digged a little further and found that the 8139 interface (over which PPPoe goes) was registered as eth0: Feb 16 20:53:14 wombat kernel: 8139cp: 10/100 PCI Ethernet driver v1.3 (Mar 22, 2004) Feb 16 20:53:14 wombat kernel: 8139cp 0000:01:06.0: This (id 10ec:8139 rev 10) is not an 8139C+ compatible chip Feb 16 20:53:14 wombat kernel: 8139cp 0000:01:06.0: Try the "8139too" driver instead. Feb 16 20:53:14 wombat kernel: 8139too Fast Ethernet driver 0.9.28 Feb 16 20:53:14 wombat kernel: eth0: RealTek RTL8139 at 0xffffc20000028000, 00:30:84:40:1d:b2, IRQ 16 Feb 16 20:53:15 wombat kernel: forcedeth.c: Reverse Engineered nForce ethernet driver. Version 0.59. Feb 16 20:53:15 wombat kernel: forcedeth: using HIGHDMA Feb 16 20:53:15 wombat kernel: eth1: forcedeth.c: subsystem: 01043:8239 bound to 0000:00:08.0 Feb 16 20:53:15 wombat kernel: forcedeth: using HIGHDMA Feb 16 20:53:15 wombat kernel: eth1: forcedeth.c: subsystem: 01043:8239 bound to 0000:00:09.0 So instead of binding the forcedeth NICs to eth0, eth1 and the 8139too one to eth2, it first bound 8139too to eth0 and then _both_ forcedeth ones to eth1. Something smells fishy here. At that point, I didn't think that trying out whether the forcedeth interfaces would or wouldn't stall when rsyncing, as the kernel seemlingly wasn't sure which one was which anyway (yes, eth0, eth1, eth2 are aliased to their respective drivers in modprobe.conf). Tell me if I should try it nevertheless. Well I don't have a clue what is happening there. Do the ifup-* files in /etc/sysconfig/networking/devices look right? Reference my orginal comment 5. I now have 2911 installed on all 3 of my machines here. One machine does not use the foredeth module. The other has a older Nvidia force2/400 chipset. Both of those seem to work just fine. I grabbed FC7-test1 via bittorrent and have been seeding at 20KB/s almost continuously for 27 hours now and still going. Third machine; however, has an Asus MN2/E with the Nvidia Nforce MCP570 ultra chipset. Once upgraded to 2911, then a NFS transfer will die after about 7GB of move. A similar test using sftp, died more quickly. So it appears somewahat random, but consistant failure. If I stop netfs and nfs, down eth0 and re-start everything, I get the network back, but NFS reports "permisison denied" when I try to re-mount. I see similar problemes with both the 32 and 64 bit versions of 2911. People who are having this problem: try adding pci=nomsi to the kernel command line. Edit /etc/grub.conf so the line for kernel 2911 looks similar to below, then reboot. kernel /vmlinuz-2.6.19-1.2911.fc6 ro root=LABEL=/ pci=nomsi rhgb quiet (In reply to comment #29) > Well I don't have a clue what is happening there. > > Do the ifup-* files in /etc/sysconfig/networking/devices look right? Well, ifcfg-eth2 was missing even though it was referenced as eth device for ppp0. I've added this and will retry with pci=nomsi (and explanation what this does would be nice). (In reply to comment #31) > People who are having this problem: try adding > pci=nomsi > to the kernel command line. It seems to work so far, I've just downloaded the mentioned ISO file twice without problems. NB: with the added ifcfg-eth2, ppp0 came up fine (this one boot I tried it). One question, will this workaround find its way into the driver, i.e. are later kernels supposed to work without pci=nomsi (whatever this does -- I guess there's a reason it's not the default ;-)? I'm currently trying the pci=nomsi too - all OK so far ... pci=nomsi seems to work for me as well. pci=nomsi with the 2911 kernel works for me too, thanks. pci=nomsi disables PCI message signaled interrupts globally, which might cause some problems for people with exotic hardware that needs MSI. Editing /etc/modprobe.conf and adding this line should also work: options forcedeth msi=0 *** Bug 229111 has been marked as a duplicate of this bug. *** similar problem but different chip and driver: tg3. it's a NexCom blade system, with eth0 connected to a 3Com GBit switch and mouting a NFS server (called theboss) filesystem on it. NFS connection goes up and down, downtime may last several minutes. eth[01] are tg3 (Broadcom dual ethernet chip mounted onto the motherboard) eth2 is e100 2 real CPU, 4 virtual CPU (Hyperthreading is Enabled) this is a 2.6.19-1.2895 compiled for FC5. I see it also on a standard FC5 2.6.18-1.2257 update kernel. # uname -a Linux rack9 2.6.19-1.2895 #1 SMP Thu Feb 1 20:25:06 CET 2007 i686 i686 i386 GNU/Linux # cat /proc/cpuinfo processor : 0 vendor_id : GenuineIntel cpu family : 15 model : 2 model name : Intel(R) Xeon(TM) CPU 2.40GHz stepping : 9 cpu MHz : 2400.303 cache size : 512 KB physical id : 0 siblings : 2 core id : 0 cpu cores : 1 fdiv_bug : no hlt_bug : no f00f_bug : no coma_bug : no fpu : yes fpu_exception : yes cpuid level : 2 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe cid xtpr bogomips : 4802.68 processor : 1 vendor_id : GenuineIntel cpu family : 15 model : 2 model name : Intel(R) Xeon(TM) CPU 2.40GHz stepping : 9 cpu MHz : 2400.303 cache size : 512 KB physical id : 0 siblings : 2 core id : 0 cpu cores : 1 fdiv_bug : no hlt_bug : no f00f_bug : no coma_bug : no fpu : yes fpu_exception : yes cpuid level : 2 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe cid xtpr bogomips : 4799.51 # dmesg tg3.c:v3.69 (November 15, 2006) ACPI: PCI Interrupt 0000:03:01.0[A] -> GSI 24 (level, low) -> IRQ 20 eth0: Tigon3 [partno(BCM95704A41) rev 2003 PHY(serdes)] (PCIX:133MHz:64-bit) 10/100/1000BaseT Ethernet 00:10:f3:07:b8:de eth0: RXcsums[1] LinkChgREG[0] MIirq[0] ASF[0] Split[0] WireSpeed[0] TSOcap[1] eth0: dma_rwctrl[769f4000] dma_mask[64-bit] ACPI: PCI Interrupt 0000:03:01.1[B] -> GSI 25 (level, low) -> IRQ 21 eth1: Tigon3 [partno(BCM95704A41) rev 2003 PHY(serdes)] (PCIX:133MHz:64-bit) 10/100/1000BaseT Ethernet 00:10:f3:07:b8:df eth1: RXcsums[1] LinkChgREG[0] MIirq[0] ASF[0] Split[0] WireSpeed[0] TSOcap[1] eth1: dma_rwctrl[769f4000] dma_mask[64-bit] SCSI subsystem initialized e100: Intel(R) PRO/100 Network Driver, 3.5.17-k2-NAPI e100: Copyright(c) 1999-2006 Intel Corporation ACPI: PCI Interrupt 0000:04:08.0[A] -> GSI 20 (level, low) -> IRQ 22 e100: eth2: e100_probe: addr 0xfc040000, irq 22, MAC addr 00:10:F3:07:B8:E0 libata version 2.00 loaded. Fusion MPT base driver 3.04.02 Copyright (c) 1999-2005 LSI Logic Corporation Fusion MPT FC Host driver 3.04.02 ACPI: PCI Interrupt 0000:02:03.0[A] -> GSI 49 (level, low) -> IRQ 23 mptbase: Initiating ioc0 bringup .... audit(1171367036.418:24): avc: denied { getattr } for pid=2777 comm="hald" name="/" dev=ocfs2_dlmfs ino=6122 scontext=system_u:system_r:hald_t:s0 tco ntext=system_u:object_r:unlabeled_t:s0 tclass=dir eth1: no IPv6 routers present eth2: no IPv6 routers present eth0: no IPv6 routers present SELinux: initialized (dev 0:1c, type nfs), uses genfs_contexts audit(1171367066.343:25): avc: denied { getattr } for pid=2777 comm="hald" name="/" dev=ocfs2_dlmfs ino=6122 scontext=system_u:system_r:hald_t:s0 tco ntext=system_u:object_r:unlabeled_t:s0 tclass=dir SELinux: initialized (dev 0:1d, type nfs), uses genfs_contexts nfs: server theboss not responding, timed out audit(1171382461.317:26): avc: denied { append } for pid=2006 comm="syslogd" name="tty12" dev=tmpfs ino=819 scontext=system_u:system_r:syslogd_t:s0 t context=system_u:object_r:tty_device_t:s0 tclass=chr_file inode_doinit_with_dentry: getxattr returned 5 for dev=sdb1 ino=3131396 inode_doinit_with_dentry: getxattr returned 5 for dev=sdb1 ino=3131397 audit(1171422324.509:27): avc: denied { append } for pid=2006 comm="syslogd" name="tty12" dev=tmpfs ino=819 scontext=system_u:system_r:syslogd_t:s0 t context=system_u:object_r:tty_device_t:s0 tclass=chr_file audit(1171439203.366:28): avc: denied { getattr } for pid=2777 comm="hald" name="/" dev=ocfs2_dlmfs ino=6122 scontext=system_u:system_r:hald_t:s0 tco ntext=system_u:object_r:unlabeled_t:s0 tclass=dir audit(1171468969.439:29): avc: denied { append } for pid=2006 comm="syslogd" name="tty12" dev=tmpfs ino=819 scontext=system_u:system_r:syslogd_t:s0 t context=system_u:object_r:tty_device_t:s0 tclass=chr_file inode_doinit_with_dentry: getxattr returned 5 for dev=sdb1 ino=3131396 inode_doinit_with_dentry: getxattr returned 5 for dev=sdb1 ino=3131397 audit(1171540355.151:30): avc: denied { append } for pid=2006 comm="syslogd" name="tty12" dev=tmpfs ino=819 scontext=system_u:system_r:syslogd_t:s0 t context=system_u:object_r:tty_device_t:s0 tclass=chr_file nfs: server theboss not responding, timed out nfs: server theboss not responding, timed out SELinux: initialized (dev 0:1c, type nfs), uses genfs_contexts audit(1171554606.439:31): avc: denied { getattr } for pid=2777 comm="hald" name="/" dev=ocfs2_dlmfs ino=6122 scontext=system_u:system_r:hald_t:s0 tco ntext=system_u:object_r:unlabeled_t:s0 tclass=dir nfs: server theboss not responding, timed out nfs: server theboss not responding, timed out nfs: server theboss not responding, timed out nfs: server theboss not responding, timed out nfs: server theboss not responding, timed out # lspci 00:00.0 Host bridge: Intel Corporation E7501 Memory Controller Hub (rev 01) 00:00.1 Class ff00: Intel Corporation E7500/E7501 Host RASUM Controller (rev 01) 00:04.0 PCI bridge: Intel Corporation E7500/E7501 Hub Interface D PCI-to-PCI Bridge (rev 01) 00:1d.0 USB Controller: Intel Corporation 82801CA/CAM USB (Hub #1) (rev 02) 00:1d.1 USB Controller: Intel Corporation 82801CA/CAM USB (Hub #2) (rev 02) 00:1d.2 USB Controller: Intel Corporation 82801CA/CAM USB (Hub #3) (rev 02) 00:1e.0 PCI bridge: Intel Corporation 82801 PCI Bridge (rev 42) 00:1f.0 ISA bridge: Intel Corporation 82801CA LPC Interface Controller (rev 02) 00:1f.1 IDE interface: Intel Corporation 82801CA Ultra ATA Storage Controller (rev 02) 00:1f.3 SMBus: Intel Corporation 82801CA/CAM SMBus Controller (rev 02) 01:1c.0 PIC: Intel Corporation 82870P2 P64H2 I/OxAPIC (rev 04) 01:1d.0 PCI bridge: Intel Corporation 82870P2 P64H2 Hub PCI Bridge (rev 04) 01:1e.0 PIC: Intel Corporation 82870P2 P64H2 I/OxAPIC (rev 04) 01:1f.0 PCI bridge: Intel Corporation 82870P2 P64H2 Hub PCI Bridge (rev 04) 02:03.0 Fibre Channel: LSI Logic / Symbios Logic FC929X Fibre Channel Adapter (rev 81) 03:01.0 Ethernet controller: Broadcom Corporation NetXtreme BCM5704S Gigabit Ethernet (rev 03) 03:01.1 Ethernet controller: Broadcom Corporation NetXtreme BCM5704S Gigabit Ethernet (rev 03) 04:00.0 VGA compatible controller: Chips and Technologies F69030 (rev 61) 04:08.0 Ethernet controller: Intel Corporation 82801BA/BAM/CA/CAM Ethernet Controller (rev 42) (In reply to comment #37) > Editing /etc/modprobe.conf and adding this line should also work: > > options forcedeth msi=0 This does not work for me (but setting the kernel option pci=nomsi does). (In reply to comment #40) > (In reply to comment #37) > > Editing /etc/modprobe.conf and adding this line should also work: > > > > options forcedeth msi=0 > > This does not work for me (but setting the kernel option pci=nomsi does). Hm. The driver also has an "msix" option. Chuck, does "pci=nomsi" also influence MSIX (whatever that is ;-), i.e. would "msix=0" also be needed? (In reply to comment #41) > Hm. The driver also has an "msix" option. Chuck, does "pci=nomsi" also influence > MSIX (whatever that is ;-), i.e. would "msix=0" also be needed? Try it and see. :) MSIX and MSI are independent, so they might both be needed. Hmm, even with msi=1 and msix=1 it stalled on the second dowenload of the ISO file. Mind that I didn't try it twice with pci=nomsi on the kernel cmdline. Never mind comment #43, I got the module parameter logic backwards. In fact, with msi=0 and msix=1 it stalled as before, but with msi=0 and msix=0 I was able to download the same 3.4GB image four times in a row. pci=nomsi seems to solve it for me. This should be fixed in the latest kernel, 2911.6.3, available at: http://download.fedora.redhat.com/pub/fedora/linux/core/updates/testing/6/ Please test, making sure you remove pci=nomsi and/or msi=0/msix=0 options. The pci=nomsi option fixes bug 229111 https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=229111 (In reply to comment #46) > This should be fixed in the latest kernel, 2911.6.3, available at: It works for me. kernel 2911.6.3 works for me too (Ultra 20 M2). 2911.6.3 works for me as well. I'm not planning to close the bug until the update moves out of testing, though. I just got 2911.6.5 from the stable updates repository and, even after removing the pci=nomsi argument, it works fine. This can probably be closed now, then Works for me too. Thank you! |