482868 – [RHEL4.7] MSI hangs on xw9400 during network stress test with forcedeth driver

Bug 482868 - [RHEL4.7] MSI hangs on xw9400 during network stress test with forcedeth driver

Summary: [RHEL4.7] MSI hangs on xw9400 during network stress test with forcedeth driver

Keywords:
Status:	CLOSED INSUFFICIENT_DATA
Alias:	None
Product:	Red Hat Enterprise Linux 4
Classification:	Red Hat
Component:	kernel
Sub Component:
Version:	4.7
Hardware:	All
OS:	Linux
Priority:	high
Severity:	high
Target Milestone:	rc
Target Release:	---
Assignee:	Ivan Vecera
QA Contact:	Martin Jenner
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2009-01-28 16:30 UTC by Flavio Leitner
Modified:	2018-11-14 18:28 UTC (History)
CC List:	8 users (show)
Fixed In Version:
Doc Type:	Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed:	2009-12-23 23:26:14 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)
lspci-t_and_lspci-vvv.txt (34.98 KB, text/plain) 2009-01-28 16:33 UTC, Flavio Leitner	no flags	Details
dmesg-21-jan-09.log (119.43 KB, text/plain) 2009-01-28 16:35 UTC, Flavio Leitner	no flags	Details
Latest backport driver. (249.04 KB, text/plain) 2009-03-31 01:55 UTC, Ayaz	no flags	Details
Makefile for latest backport driver. (2.32 KB, text/plain) 2009-03-31 01:55 UTC, Ayaz	no flags	Details
View All

Description Flavio Leitner 2009-01-28 16:30:41 UTC

Description of problem:

Under high load, xw9400 systems at rhel4.6 (and 4.7) are freezing. The systems
have the on-board Ethernet controller: nVidia Corporation MCP55 Ethernet 
(rev a3) 

The log continuously shows both messages below under the high load:
eth0: too many iterations (6) in nv_nic_irq.
APIC error on CPU0: 40(40) <-- "received illegal vector"

How reproducible:
A simple cp or dd of data to/from mounted NFS filesystems reproduces the issue

Additional info:
This issue never had occurred with RHEL4.5

Setting "options forcedeth max_interrupt_work=20" in /etc/modprobe.conf seems
to have reduced the cases of freezing systems, but it has not eliminated it.

Later, the bz#479408 provided a test kernel with forcedeth updated to 
latest upstream version, but the issue still reproduces.

So, for the full picture:
- linux-2.6.28.1 no crashes
- linux-2.6.26.2 no crashes
- kernel-2.6.18-128 (RHEL5.3) kernel half hangs

The work around is to pass "msi=0 msix=0" to forcedeth driver.
Below are the results:

- a continues job that normally hangs the driver after 15min, now has 
  run for more than 64 hours (1050 runs in total)

- data copying 900 gb from and to NFS is looking good

- during testing we did not see the APIC errors showup anymore

- pperf test with and without the options set shows same transfer rate
  from and to a machine (+/-110 mbytes/sec)

- as msi is mandatory for PCIe we tested SPECviewperf on the graphics card
  (these cards are PCIe 16x) and performance is roughly the same with 
  and without the options set

- the customer that is doing the data copy to NFS is seeing slow 
  responsiveness on the local console but that is probably related to 
  his home dir being on NFS too

Comment 1 Flavio Leitner 2009-01-28 16:33:38 UTC

Created attachment 330247 [details]
lspci-t_and_lspci-vvv.txt

Comment 2 Flavio Leitner 2009-01-28 16:35:25 UTC

Created attachment 330248 [details]
dmesg-21-jan-09.log

Comment 5 Flavio Leitner 2009-01-28 16:50:34 UTC

You can find the a vmcore in CAS@megatron, details here:

$ cd /cores/20081229031732/work
/cores/20081229031732/work$ ./crash

Backtrace
     KERNEL: /cores/20081229031732/work/vmlinux
   DUMPFILE: /cores/20081229031732/work/223808-vmcore
       CPUS: 4
       DATE: Thu Dec 11 04:20:53 2008
     UPTIME: 01:32:46
LOAD AVERAGE: 1.61, 1.84, 1.52
      TASKS: 116
   NODENAME: -
    RELEASE: 2.6.9-78.0.8.ELsmp
    VERSION: #1 SMP Wed Nov 5 07:14:58 EST 2008
    MACHINE: x86_64  (2800 Mhz)
     MEMORY: 16 GB
      PANIC: "Oops: 0002 [1] SMP " (check log for details)
        PID: 0
    COMMAND: "swapper"
       TASK: 100010677f0  (1 of 4)  [THREAD_INFO: 102360ea000]
        CPU: 2
      STATE: TASK_RUNNING (SYSRQ)

I did a look at vmcore, see below:

struct net_device {
 name = "eth0"
 irq = 0x4a, <-- 74
 state = 0x6,
 dma = 0x0,
 trans_start = 0x100a88772,
 last_rx = 0x100a88772,
 watchdog_timeo = 0x1388,
 watchdog_timer = {
   expires = 0x1004de698,
 priv = 0x1042bbd5b80,
 tx_queue_len = 0x3e8,

crash> p/x jiffies
$4 = 0x100505b70

jiffies - last_rx  = ffffffffffa7d3fe (negative? last_rx in future?)
jiffies - trans_start = ffffffffffa7d3fe (the same as above)
jiffies - expires = 0x274d8 / watchdog_timeo = 32, so the watchdog
has expired 32 times.

struct fe_priv {
..
   rx_errors = 0x442,
   rx_over_errors = 0x442,
 recover_error = 0x0,
 pci_dev = 0x10001118100,
 irqmask = 0xff, <--------- masking events:
NVREG_IRQ_RX_ERROR, NVREG_IRQ_RX, NVREG_IRQ_RX_NOBUF, NVREG_IRQ_TX_ERR
NVREG_IRQ_TX_OK, NVREG_IRQ_TIMER, NVREG_IRQ_LINK, NVREG_IRQ_RX_FORCED

which is NVREG_IRQMASK_THROUGHPUT, NVREG_IRQ_TIMER, so it seems ok.

nic_poll_irq = 0x0,
 tx_flags = 0x80000000,
 tx_ring_size = 0x100,
 tx_stop = 0x0,
 vlangrp = 0x0,
 msi_flags = 0x50, <--- NV_MSI_CAPABLE|NV_MSI_ENABLED
 msi_x_entry = {{  <-- all zeros, ^--- !NV_MSI_X_ENABLED
     vector = 0x0,
     entry = 0x0
   }, {
     vector = 0x0,
     entry = 0x0
   }, {
     vector = 0x0,
     entry = 0x0
...
   }, {
     vector = 0x0,
     entry = 0x0
   }},
pause_flags = 0x77

The irq_desc is below:
crash> p *(struct irq_desc *) 0xffffffff80507d80
$14 = {
 status = 0x0, <--- irq enabled
 handler = 0xffffffff8040a700, <--- msi_irq_wo_maskbit_type
 action = 0x10236285740,
 depth = 0x0,
 irq_count = 0x0,
 irqs_unhandled = 0x0,
 lock = {
   lock = 0x1,
   magic = 0xdead4ead
 }
}

crash> irqaction 0x10236285740
struct irqaction {
 handler = 0xffffffffa00f453e <nv_nic_irq_optimized>,
 flags = 0x4000000,
 mask = {
   bits = {0x0}
 },
 name = 0x1042bbd5800 "eth0",
 dev_id = 0x1042bbd5800,
 next = 0x0
}

so, dev_watchdog() does:
      if (netif_queue_stopped(dev) &&
         (jiffies - dev->trans_start) > dev->watchdog_timeo) {
               printk(KERN_INFO "NETDEV WATCHDOG: %s: transmit timed out\n", dev->name);
               dev->tx_timeout(dev);
...
but as I pointed above dev->trans_start is ahead from jiffies, so the 'if'
is true and the watchdog fires up. I can't say why those values are ahead.

Flavio

Comment 19 Andy Gospodarek 2009-03-11 00:38:15 UTC

I get similar messages on F10 kernels as well, so if there is a fix will need to be post 2.6.27.

Comment 21 Ayaz 2009-03-31 01:55:28 UTC

Created attachment 337266 [details]
Latest backport driver.

Comment 22 Ayaz 2009-03-31 01:55:55 UTC

Created attachment 337267 [details]
Makefile for latest backport driver.

Comment 23 Ayaz 2009-03-31 01:57:00 UTC

I have attached the latest backport driver with Makefile. Could you give that a try to see if you get a repro with with MSI only turned on (MSIX off).

Comment 27 Ivan Vecera 2009-07-21 14:51:46 UTC

Flavio, any news?

Note You need to log in before you can comment on or make changes to this bug.