Bug 482868 - [RHEL4.7] MSI hangs on xw9400 during network stress test with forcedeth driver
[RHEL4.7] MSI hangs on xw9400 during network stress test with forcedeth driver
Status: CLOSED INSUFFICIENT_DATA
Product: Red Hat Enterprise Linux 4
Classification: Red Hat
Component: kernel (Show other bugs)
4.7
All Linux
high Severity high
: rc
: ---
Assigned To: Ivan Vecera
Martin Jenner
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2009-01-28 11:30 EST by Flavio Leitner
Modified: 2010-10-23 03:19 EDT (History)
8 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2009-12-23 18:26:14 EST
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
lspci-t_and_lspci-vvv.txt (34.98 KB, text/plain)
2009-01-28 11:33 EST, Flavio Leitner
no flags Details
dmesg-21-jan-09.log (119.43 KB, text/plain)
2009-01-28 11:35 EST, Flavio Leitner
no flags Details
Latest backport driver. (249.04 KB, text/plain)
2009-03-30 21:55 EDT, Ayaz
no flags Details
Makefile for latest backport driver. (2.32 KB, text/plain)
2009-03-30 21:55 EDT, Ayaz
no flags Details

  None (edit)
Description Flavio Leitner 2009-01-28 11:30:41 EST
Description of problem:

Under high load, xw9400 systems at rhel4.6 (and 4.7) are freezing. The systems
have the on-board Ethernet controller: nVidia Corporation MCP55 Ethernet 
(rev a3) 

The log continuously shows both messages below under the high load:
eth0: too many iterations (6) in nv_nic_irq.
APIC error on CPU0: 40(40) <-- "received illegal vector"

How reproducible:
A simple cp or dd of data to/from mounted NFS filesystems reproduces the issue

Additional info:
This issue never had occurred with RHEL4.5

Setting "options forcedeth max_interrupt_work=20" in /etc/modprobe.conf seems
to have reduced the cases of freezing systems, but it has not eliminated it.

Later, the bz#479408 provided a test kernel with forcedeth updated to 
latest upstream version, but the issue still reproduces.

So, for the full picture:
- linux-2.6.28.1 no crashes
- linux-2.6.26.2 no crashes
- kernel-2.6.18-128 (RHEL5.3) kernel half hangs

The work around is to pass "msi=0 msix=0" to forcedeth driver.
Below are the results:

- a continues job that normally hangs the driver after 15min, now has 
  run for more than 64 hours (1050 runs in total)

- data copying 900 gb from and to NFS is looking good

- during testing we did not see the APIC errors showup anymore

- pperf test with and without the options set shows same transfer rate
  from and to a machine (+/-110 mbytes/sec)

- as msi is mandatory for PCIe we tested SPECviewperf on the graphics card
  (these cards are PCIe 16x) and performance is roughly the same with 
  and without the options set

- the customer that is doing the data copy to NFS is seeing slow 
  responsiveness on the local console but that is probably related to 
  his home dir being on NFS too
Comment 1 Flavio Leitner 2009-01-28 11:33:38 EST
Created attachment 330247 [details]
lspci-t_and_lspci-vvv.txt
Comment 2 Flavio Leitner 2009-01-28 11:35:25 EST
Created attachment 330248 [details]
dmesg-21-jan-09.log
Comment 5 Flavio Leitner 2009-01-28 11:50:34 EST
You can find the a vmcore in CAS@megatron, details here:

$ cd /cores/20081229031732/work
/cores/20081229031732/work$ ./crash

Backtrace
     KERNEL: /cores/20081229031732/work/vmlinux
   DUMPFILE: /cores/20081229031732/work/223808-vmcore
       CPUS: 4
       DATE: Thu Dec 11 04:20:53 2008
     UPTIME: 01:32:46
LOAD AVERAGE: 1.61, 1.84, 1.52
      TASKS: 116
   NODENAME: -
    RELEASE: 2.6.9-78.0.8.ELsmp
    VERSION: #1 SMP Wed Nov 5 07:14:58 EST 2008
    MACHINE: x86_64  (2800 Mhz)
     MEMORY: 16 GB
      PANIC: "Oops: 0002 [1] SMP " (check log for details)
        PID: 0
    COMMAND: "swapper"
       TASK: 100010677f0  (1 of 4)  [THREAD_INFO: 102360ea000]
        CPU: 2
      STATE: TASK_RUNNING (SYSRQ)

I did a look at vmcore, see below:

struct net_device {
 name = "eth0"
 irq = 0x4a, <-- 74
 state = 0x6,
 dma = 0x0,
 trans_start = 0x100a88772,
 last_rx = 0x100a88772,
 watchdog_timeo = 0x1388,
 watchdog_timer = {
   expires = 0x1004de698,
 priv = 0x1042bbd5b80,
 tx_queue_len = 0x3e8,

crash> p/x jiffies
$4 = 0x100505b70

jiffies - last_rx  = ffffffffffa7d3fe (negative? last_rx in future?)
jiffies - trans_start = ffffffffffa7d3fe (the same as above)
jiffies - expires = 0x274d8 / watchdog_timeo = 32, so the watchdog
has expired 32 times.

struct fe_priv {
..
   rx_errors = 0x442,
   rx_over_errors = 0x442,
 recover_error = 0x0,
 pci_dev = 0x10001118100,
 irqmask = 0xff, <--------- masking events:
NVREG_IRQ_RX_ERROR, NVREG_IRQ_RX, NVREG_IRQ_RX_NOBUF, NVREG_IRQ_TX_ERR
NVREG_IRQ_TX_OK, NVREG_IRQ_TIMER, NVREG_IRQ_LINK, NVREG_IRQ_RX_FORCED

which is NVREG_IRQMASK_THROUGHPUT, NVREG_IRQ_TIMER, so it seems ok.

nic_poll_irq = 0x0,
 tx_flags = 0x80000000,
 tx_ring_size = 0x100,
 tx_stop = 0x0,
 vlangrp = 0x0,
 msi_flags = 0x50, <--- NV_MSI_CAPABLE|NV_MSI_ENABLED
 msi_x_entry = {{  <-- all zeros, ^--- !NV_MSI_X_ENABLED
     vector = 0x0,
     entry = 0x0
   }, {
     vector = 0x0,
     entry = 0x0
   }, {
     vector = 0x0,
     entry = 0x0
...
   }, {
     vector = 0x0,
     entry = 0x0
   }},
pause_flags = 0x77

The irq_desc is below:
crash> p *(struct irq_desc *) 0xffffffff80507d80
$14 = {
 status = 0x0, <--- irq enabled
 handler = 0xffffffff8040a700, <--- msi_irq_wo_maskbit_type
 action = 0x10236285740,
 depth = 0x0,
 irq_count = 0x0,
 irqs_unhandled = 0x0,
 lock = {
   lock = 0x1,
   magic = 0xdead4ead
 }
}

crash> irqaction 0x10236285740
struct irqaction {
 handler = 0xffffffffa00f453e <nv_nic_irq_optimized>,
 flags = 0x4000000,
 mask = {
   bits = {0x0}
 },
 name = 0x1042bbd5800 "eth0",
 dev_id = 0x1042bbd5800,
 next = 0x0
}

so, dev_watchdog() does:
      if (netif_queue_stopped(dev) &&
         (jiffies - dev->trans_start) > dev->watchdog_timeo) {
               printk(KERN_INFO "NETDEV WATCHDOG: %s: transmit timed out\n", dev->name);
               dev->tx_timeout(dev);
...
but as I pointed above dev->trans_start is ahead from jiffies, so the 'if'
is true and the watchdog fires up. I can't say why those values are ahead.

Flavio
Comment 19 Andy Gospodarek 2009-03-10 20:38:15 EDT
I get similar messages on F10 kernels as well, so if there is a fix will need to be post 2.6.27.
Comment 21 Ayaz 2009-03-30 21:55:28 EDT
Created attachment 337266 [details]
Latest backport driver.
Comment 22 Ayaz 2009-03-30 21:55:55 EDT
Created attachment 337267 [details]
Makefile for latest backport driver.
Comment 23 Ayaz 2009-03-30 21:57:00 EDT
I have attached the latest backport driver with Makefile. Could you give that a try to see if you get a repro with with MSI only turned on (MSIX off).
Comment 27 Ivan Vecera 2009-07-21 10:51:46 EDT
Flavio, any news?

Note You need to log in before you can comment on or make changes to this bug.