Description of problem: Under high load, xw9400 systems at rhel4.6 (and 4.7) are freezing. The systems have the on-board Ethernet controller: nVidia Corporation MCP55 Ethernet (rev a3) The log continuously shows both messages below under the high load: eth0: too many iterations (6) in nv_nic_irq. APIC error on CPU0: 40(40) <-- "received illegal vector" How reproducible: A simple cp or dd of data to/from mounted NFS filesystems reproduces the issue Additional info: This issue never had occurred with RHEL4.5 Setting "options forcedeth max_interrupt_work=20" in /etc/modprobe.conf seems to have reduced the cases of freezing systems, but it has not eliminated it. Later, the bz#479408 provided a test kernel with forcedeth updated to latest upstream version, but the issue still reproduces. So, for the full picture: - linux-2.6.28.1 no crashes - linux-2.6.26.2 no crashes - kernel-2.6.18-128 (RHEL5.3) kernel half hangs The work around is to pass "msi=0 msix=0" to forcedeth driver. Below are the results: - a continues job that normally hangs the driver after 15min, now has run for more than 64 hours (1050 runs in total) - data copying 900 gb from and to NFS is looking good - during testing we did not see the APIC errors showup anymore - pperf test with and without the options set shows same transfer rate from and to a machine (+/-110 mbytes/sec) - as msi is mandatory for PCIe we tested SPECviewperf on the graphics card (these cards are PCIe 16x) and performance is roughly the same with and without the options set - the customer that is doing the data copy to NFS is seeing slow responsiveness on the local console but that is probably related to his home dir being on NFS too
Created attachment 330247 [details] lspci-t_and_lspci-vvv.txt
Created attachment 330248 [details] dmesg-21-jan-09.log
You can find the a vmcore in CAS@megatron, details here: $ cd /cores/20081229031732/work /cores/20081229031732/work$ ./crash Backtrace KERNEL: /cores/20081229031732/work/vmlinux DUMPFILE: /cores/20081229031732/work/223808-vmcore CPUS: 4 DATE: Thu Dec 11 04:20:53 2008 UPTIME: 01:32:46 LOAD AVERAGE: 1.61, 1.84, 1.52 TASKS: 116 NODENAME: - RELEASE: 2.6.9-78.0.8.ELsmp VERSION: #1 SMP Wed Nov 5 07:14:58 EST 2008 MACHINE: x86_64 (2800 Mhz) MEMORY: 16 GB PANIC: "Oops: 0002 [1] SMP " (check log for details) PID: 0 COMMAND: "swapper" TASK: 100010677f0 (1 of 4) [THREAD_INFO: 102360ea000] CPU: 2 STATE: TASK_RUNNING (SYSRQ) I did a look at vmcore, see below: struct net_device { name = "eth0" irq = 0x4a, <-- 74 state = 0x6, dma = 0x0, trans_start = 0x100a88772, last_rx = 0x100a88772, watchdog_timeo = 0x1388, watchdog_timer = { expires = 0x1004de698, priv = 0x1042bbd5b80, tx_queue_len = 0x3e8, crash> p/x jiffies $4 = 0x100505b70 jiffies - last_rx = ffffffffffa7d3fe (negative? last_rx in future?) jiffies - trans_start = ffffffffffa7d3fe (the same as above) jiffies - expires = 0x274d8 / watchdog_timeo = 32, so the watchdog has expired 32 times. struct fe_priv { .. rx_errors = 0x442, rx_over_errors = 0x442, recover_error = 0x0, pci_dev = 0x10001118100, irqmask = 0xff, <--------- masking events: NVREG_IRQ_RX_ERROR, NVREG_IRQ_RX, NVREG_IRQ_RX_NOBUF, NVREG_IRQ_TX_ERR NVREG_IRQ_TX_OK, NVREG_IRQ_TIMER, NVREG_IRQ_LINK, NVREG_IRQ_RX_FORCED which is NVREG_IRQMASK_THROUGHPUT, NVREG_IRQ_TIMER, so it seems ok. nic_poll_irq = 0x0, tx_flags = 0x80000000, tx_ring_size = 0x100, tx_stop = 0x0, vlangrp = 0x0, msi_flags = 0x50, <--- NV_MSI_CAPABLE|NV_MSI_ENABLED msi_x_entry = {{ <-- all zeros, ^--- !NV_MSI_X_ENABLED vector = 0x0, entry = 0x0 }, { vector = 0x0, entry = 0x0 }, { vector = 0x0, entry = 0x0 ... }, { vector = 0x0, entry = 0x0 }}, pause_flags = 0x77 The irq_desc is below: crash> p *(struct irq_desc *) 0xffffffff80507d80 $14 = { status = 0x0, <--- irq enabled handler = 0xffffffff8040a700, <--- msi_irq_wo_maskbit_type action = 0x10236285740, depth = 0x0, irq_count = 0x0, irqs_unhandled = 0x0, lock = { lock = 0x1, magic = 0xdead4ead } } crash> irqaction 0x10236285740 struct irqaction { handler = 0xffffffffa00f453e <nv_nic_irq_optimized>, flags = 0x4000000, mask = { bits = {0x0} }, name = 0x1042bbd5800 "eth0", dev_id = 0x1042bbd5800, next = 0x0 } so, dev_watchdog() does: if (netif_queue_stopped(dev) && (jiffies - dev->trans_start) > dev->watchdog_timeo) { printk(KERN_INFO "NETDEV WATCHDOG: %s: transmit timed out\n", dev->name); dev->tx_timeout(dev); ... but as I pointed above dev->trans_start is ahead from jiffies, so the 'if' is true and the watchdog fires up. I can't say why those values are ahead. Flavio
I get similar messages on F10 kernels as well, so if there is a fix will need to be post 2.6.27.
Created attachment 337266 [details] Latest backport driver.
Created attachment 337267 [details] Makefile for latest backport driver.
I have attached the latest backport driver with Makefile. Could you give that a try to see if you get a repro with with MSI only turned on (MSIX off).
Flavio, any news?