Red Hat Bugzilla – Bug 280151
forcedeth driver causes kernel panic in nv_tx_done call
Last modified: 2014-06-29 18:59:13 EDT
Description of problem:
I'm trying out RHEL beta on my "home server" (router/proxy/mail/NAS), which was
basically desktop PC based on Asus K8N4-E Deluxe MB, based on NForce 4-4x. I'm
using internal gigabit ethernet interface (forcedeth driver) for high-speed
internal communications (server delivers nfs through it). The problem is, server
hangs under network load.
Version-Release number of selected component (if applicable):
I also tried kernel-2.6.18-36.el5.jwltest.41.x86_64 with same results
Steps to Reproduce:
1. Start experimenting with iperf. Eventually the bug will appear. So-called
"bidirectional" transfer tests together with MTU tweaking between tests seem to
trigger it fastest. Though, even if the router is left by itself, eventually
regular (http, mail, torrent) traffic can trigger this.
Kernel panic, system hangs. It is instant and and leaves no traces in
The system seems to have some problems with acpi - linux can't find any devices
on USB bus and SATA harddrives, though it detects USB controller and both SATA
controllers (from nvidia and silicon image) just fine.
Therefore, I'm using following kernel options: acpi=off nolapic. APIC is turned
off in BIOS, when it's turned on, but acpi=off parameter is passed, something
really messy happens. nolapic parameter doesn't really change anything, these
network problems happen whether it's used or not.
Contents of /proc/interrupts:
0: 312995458 XT-PIC timer
1: 8 XT-PIC i8042
2: 0 XT-PIC cascade
3: 0 XT-PIC ohci_hcd:usb1
5: 145359 XT-PIC sata_nv
7: 173362769 XT-PIC eth0
8: 0 XT-PIC rtc
11: 0 XT-PIC ehci_hcd:usb2, sata_nv
12: 114783321 XT-PIC eth1
After googling on problems similar to this, I came to conclusion it could be
interrupt-related problem. I got advice to use "options forcedeth
max_interrupt_work=16" option. I tried it, and it greatly reduced the
probability of the kernel panic happening - now system doesn't seem to hang
while routing at all, but experiments with iperf (major network load) still can
hang it. Therefore, it's not a solution.
As for the real kernel trace, well.. Since it's not in logs, I can't capture it
nicely. The best I could manage was making a photo of the screen with my cellphone..
Created attachment 188451 [details]
Trace one, full view
Created attachment 188461 [details]
Upper part of first trace
Created attachment 188471 [details]
Middle part of first trace
Created attachment 188481 [details]
Lower part of first trace
Created attachment 188491 [details]
Trace two - this one with max_interrupt_work=16
I got another report that looks quite similar to this. That showed me that it's
dying in skb_over_panic(). Did you happen to see any lines that began with this:
The call stack should look like this:
There is a patch upstream that brings back the use of the optimized data path
for do_nic_poll since it was left out of the original work. This might be
interesting to try, but I'm not sure it will matter too much.
Created attachment 229201 [details]
Upstream patch that would be interesting to try.
Author: Ayaz Abdulla <firstname.lastname@example.org>
Date: Fri Mar 23 05:49:37 2007 -0500
forcedeth: fix nic poll
The nic poll routine was missing the call to the optimized irq routine.
This patch adds the missing call for the optimized path.
I don't get anything in /var/log/messages, nothing is left there after crash,
and when I connect a monitor to this system I can't see the lines before the
ones I posted.
However, I'll rebuild kernel with this patch and will try it out soon. It looks
Btw, in last month the system only hang once or twice - with
max_interrupt_work=16 and simple routing tasks. Unfortunately, to achieve it I
had to move nfs serving task away from it.. To test this patch, I'll return full
load on this system.
Great! Thank you for trying this patch. I'm not sure it will help, but it
I'm getting some feedback that this patch is helping for RHEL4 -- I'll build
some new test kernels and post a link to them here.
Hmmmmmm it seems I've added this patch already. Can you try a kernel from here:
It should resolve your issue. Thanks!
I feel confident this is a duplicate of bug 245191
Please reopen this if the kernel from comment #12 does not resolve your problem.
*** This bug has been marked as a duplicate of 245191 ***
I tried with kernel-2.6.18-51.el5.jwltest.43.x86_64 from
http://people.redhat.com/linville/kernels/rhel5/ which includes this patch and
it doesn't crash anymore. I got "spurious 8259A interrupt: IRQ7." message once
in dmesg under load (IRQ7 is my eth0 interrupt) and I get TONS of "eth0: too
many iterations (6) in nv_nic_irq." messages, but at least everything seems to
work. I only did a synthetic testing with iperf, I'll reopen one of these bugs
in case of any problems under real load - but I guess it's safe to close them now.