| Summary: | Heavy network utilization crashes Fedora 15 host, driver r8169 | ||
|---|---|---|---|
| Product: | [Fedora] Fedora | Reporter: | szt <tszalay> |
| Component: | kernel | Assignee: | Kernel Maintainer List <kernel-maint> |
| Status: | CLOSED NEXTRELEASE | QA Contact: | Fedora Extras Quality Assurance <extras-qa> |
| Severity: | unspecified | Docs Contact: | |
| Priority: | unspecified | ||
| Version: | 15 | CC: | andy, anjo9292, antonio, corey.yeatman, didier.belhomme, gansalmon, itamar, joerg, jonathan, kernel-maint, madhu.chinakonda, marco.hartgring, p.zandbergen, thomas |
| Target Milestone: | --- | ||
| Target Release: | --- | ||
| Hardware: | x86_64 | ||
| OS: | Linux | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | Bug Fix | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2012-06-06 13:26:34 UTC | Type: | --- |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
|
Description
szt
2011-06-03 22:03:48 UTC
same with kernel 2.6.38.7-30.fc15.x86_64. I was having this problem; about a dozen re-occurrences. I had jumbo frames set on the interface; changed the MTU back to 1500 and haven't had another event. (In reply to comment #2) > I was having this problem; about a dozen re-occurrences. I had jumbo frames > set on the interface; changed the MTU back to 1500 and haven't had another > event. i never used mtu bigger than 1500 bytes. with kernel 2.6.38.8-32.fc15.x86_64: when i attempted to read one large file from nfs mounted drive the computer suddenly rebooted with no explanation in /var/log/messages. I had the exact same problem. It completely went away after switching to the r8168 driver, version 8.024.00, from the Realtek site. I had exactly the same problem than Pim, with the same cure : installing the r8168 driver replacing the r8169 that came with the kernel, solved the problem. I use kernel 2.6.35.13-92.fc14.x86_64 from Fedora 14. In my case, the computer rebooted when opening a vncviewer session. Do someone knows how to report the problem to the kernel team (in RedHat or directly to kernel.org) ? Regards all, Didier (happy again, being able to connect at 1000 !) (In reply to comment #4) > I had the exact same problem. It completely went away after switching to the > r8168 driver, version 8.024.00, from the Realtek site. Thanks, the genuine Realtek driver works like a charm. (In reply to comment #5) Didier, > I had exactly the same problem than Pim, with the same cure : installing the > r8168 driver replacing the r8169 that came with the kernel, solved the problem. > I use kernel 2.6.35.13-92.fc14.x86_64 from Fedora 14. I use or manage 3 other F14 boxes with same network card, all with 2.6.35.13-92.fc14.x86_64 kernel. All works fine. (Two desktops with different Gigabyte mainboards, and a Lenovo Thinkpad Edge, all with integrated Realtek RTL8111/8168B cards.) > Do someone knows how to report the problem to the kernel team (in RedHat or > directly to kernel.org) ? I'm quite sure they received the opening ticket (and probably all the comments) via email. Check "Email sent to:" at the top of this webpage. Tamás The model reported depends on the command. While lspci reports RTL8111/8168B: $ lspci | grep -i eth 03:00.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL8111/8168B PCI Express Gigabit Ethernet controller (rev 02) The dmesg output on my machine reports RTL8168c/8111c: $ grep -i 8169 /var/log/dmesg [ 11.644904] r8169 Gigabit Ethernet driver 2.3LK-NAPI loaded [ 11.644925] r8169 0000:03:00.0: PCI INT A -> GSI 17 (level, low) -> IRQ 17 [ 11.644981] r8169 0000:03:00.0: setting latency timer to 64 [ 11.645061] r8169 0000:03:00.0: irq 44 for MSI/MSI-X [ 11.645233] r8169 0000:03:00.0: eth0: RTL8168c/8111c at 0xffffc90006ade000, 00:30:48:b0:96:f0, XID 1c4000c0 IRQ 44 (In reply to comment #8) > The model reported depends on the command. Additional information to the original bug report: r8169 kernel modul reports RTL8168d/8111d Ethernet card version. More information about the setup : - I've only seen this problem when connecting the system to a gigabit switch (never when connected at 100 mbps) - The computer is a laptop from asmobile (ASUS OEM) model Z97V - The reported card is (from lspci | grep -i eth) : 06:00.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL8111/8168B PCI Express Gigabit Ethernet controller (rev 02) - The output from grep -i 8168 /var/log/dmesg : [ 0.148168] ACPI: PCI Interrupt Routing Table [\_SB_.PCI0._PRT] [ 3.032044] r8168 Gigabit Ethernet driver 8.024.00-NAPI loaded [ 3.032073] r8168 0000:06:00.0: PCI INT A -> GSI 16 (level, low) -> IRQ 16 [ 3.032096] r8168 0000:06:00.0: setting latency timer to 64 [ 3.032229] r8168 0000:06:00.0: irq 45 for MSI/MSI-X [ 3.033127] eth%d: RTL8168B/8111B at 0xffffc90011296000, 00:22:15:ff:8e:3a, IRQ 45 [ 3.058457] r8168: This product is covered by one or more of the following patents: US5,307,459, US5,434,872, US5,732,094, US6,570,884, US6,115,776, and US6,327,625. [ 3.058460] eth0: Identified chip type is 'RTL8168C/8111C'. [ 3.058462] r8168 Copyright (C) 2011 Realtek NIC software team <nicfae> With this driver, the system is running well at 1000mbps. Regards, Didier. Allow me to correct and extend statements from my comment #4 My symptoms were different; heavy network traffic would just hang my box, without any syslog information, not even while running in the text console mode. I have watchdog running, using iTCO_wdt, but it would not reset the system. Problems started with kernel 2.6.38 and up on Fedora 14, both with vanilla kernels and Fedora rawhide kernels. Things were fine up to 2.6.37. Sometimes running ttcp between this and another host would reproduce the problem in a minute. But not always. In real life, the box would hang if a PVR, using my Fedora box as an NFS server, would start an HD recording. As I found out the Realtek driver would solve the problem for these experimental kernels in Fedora 14, I expected it would be necessary for the standard Fedora 15 kernel too, and it was necessary indeed. My system is based on an MSI MS-7522 (X58 Platinum SLI) with dual onboard RTL8111/8168 NICs. The r8168 driver identifies the NICs as RTL8168B/8111B The r8169 driver identifies the NICs as RTL8168c/8111c lspci identifies the NICs as RTL8111/8168B rev 02 (10ec:8168 sub 1462:7522) Same issue here using 2.6.38.8-35.fc15.x86_64 kernel. As mentioned by others above, switching from the r8169 driver back to r8168 from Realtek website has solved this issue (for now). Issue only seems to be triggered when copying large amounts of data (in my case, large .mts video files). Whole machine locked up, no kernel panics or messages seen before it happened. I see a number of similar reports on bugzilla.kernel.org: https://bugzilla.kernel.org/show_bug.cgi?id=29282 https://bugzilla.kernel.org/show_bug.cgi?id=32962 https://bugzilla.kernel.org/show_bug.cgi?id=34172 The last one points to a possible fix by Hayes Wang. Happened to me too with 2.6.40.4-5.fc15.x86_64 on Gigabyte EX58-UD5. Upgraded from fc14 to fc15, and wanted to resize raid partitions and had big problems with existing lvm PVs being unable to shrink, so tore everything down, repartitioned, remade PVs VGs and LVs, updated uuids in grub,fstab, edited initramfs to update with new mdadm.conf and then no matter what I did, the new LV root would not remount in /sysroot which it certainly should have done! So back to beginning - reinstalled from live usb to get a bootable system then used fc15 liveusb to restore system files and data (while keeping new fstab, mdadm.conf and lvm backup). After temporary system made, I used fc15 live usb to restore 100Gb of files over NFS. No problems. Booted into this newly restored fc15 (kernel 2.6.40.4-5) so it could do its job while restoring large archives from the backup server. Then a few seconds later, screen fills with kernel messages: Sep 22 14:11:27 odin kernel: [ 978.857918] r8169 0000:08:00.0: eth0: link up Sep 22 14:11:27 odin kernel: [ 978.864923] r8169 0000:08:00.0: eth0: link up Sep 22 14:11:27 odin kernel: [ 978.868872] r8169 0000:08:00.0: eth0: link up Sep 22 14:11:27 odin kernel: [ 978.877846] r8169 0000:08:00.0: eth0: link up Sep 22 14:11:27 odin kernel: [ 978.890801] r8169 0000:08:00.0: eth0: link up Sep 22 14:11:27 odin kernel: [ 978.902770] r8169 0000:08:00.0: eth0: link up ... System locks up, drive light still on. I wait to see if it'll come back. Then get "watchdog detected hard lockup" errors... Have to do hard reset. After this happened a few times, I found this thread and this link: http://code.google.com/p/r8168/updates/list Downloaded r8168-8.025.tar.bz2 and followed instructions and issues and managed to compile and install r8168.ko even though it was a ubuntu flavour. wget http://r8168.googlecode.com/files/r8168-8.025.00.tar.bz2 tar -jxf r8168-8.025.00.tar.bz2 cd r8168-8.025.00 ./autorun.sh I continued to restore files, and so far the machine hasn't locked up even though the raids are resyncing in the background. I suggest escalating this bug to severe status. For me it was a showstopper, and I hope the info here will help others. I have the same issue as #11. No logs, no messages, system just freezes. But : I have a second NIC (Intel e1000) in my system which is used for network traffic. The Realtek NIC is not active, and no wires are connected. The driver is just loaded. The situation could be solved by putting r8169 on the blacklist. So something must be in the r8169 driver which harms the system although it is not used. I have upgraded to Fedora 16, and again am using a standard Fedora kernel with standard Fedora Realtek drivers. The system freezes seem to have gone, now just the NFS service freezes under heavy load. "service nfs-service restart" does not seem to actually restart anything. Every other network traffic continues. I'm not sure whether these NFS freezes are related to the Realtek drivers. |