Bug 504365 - ixgbe driver does not pass network traffic on system with 16 cores
Summary: ixgbe driver does not pass network traffic on system with 16 cores
Keywords:
Status: CLOSED DUPLICATE of bug 505653
Alias: None
Product: Red Hat Enterprise Linux 5
Classification: Red Hat
Component: kernel-xen
Version: 5.4
Hardware: All
OS: Linux
low
medium
Target Milestone: rc
: ---
Assignee: Andy Gospodarek
QA Contact: Red Hat Kernel QE team
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2009-06-05 19:49 UTC by Mark Wagner
Modified: 2014-06-29 23:01 UTC (History)
5 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2009-06-12 19:55:14 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
ixgbe-correctly-enable-interrupts-when-link-is-down.patch (102 bytes, text/plain)
2009-06-12 19:35 UTC, Andy Gospodarek
no flags Details
ixgbe-correctly-enable-interrupts-when-link-is-down.patch (2.21 KB, patch)
2009-06-12 19:36 UTC, Andy Gospodarek
no flags Details | Diff

Description Mark Wagner 2009-06-05 19:49:38 UTC
Description of problem:
When using the -152 kernel, the ixgbe driver will not pass network traffic (
send or receive) when the system has 16 cores.  If I turn off hyper threads and just use 8 cores, the driver appears to function

Version-Release number of selected component (if applicable):
-152 kernel from dzickus people page

How reproducible:
constantly

Steps to Reproduce:
1.load -152 kernel on system with ixgbe based card.
2.reboot
3. try to ping in or out of interface.
  
Actual results:
ping fails

Expected results:

ping succeeds

Additional info:

With 16 cores we see the following IRQ

$ awk '/eth0/ {print $NF}' /proc/interrupts
 eth0-TxRx-0
 eth0-TxRx-1
 eth0-TxRx-2
 eth0-TxRx-3
 eth0-TxRx-4
 eth0-TxRx-5
 eth0-TxRx-6
 eth0-TxRx-7
 eth0-TxRx-8
 eth0-TxRx-9
 eth0:lsc


With 8 cores we get:

eth0-TxRx-2
eth0-TxRx-3
eth0-TxRx-4
eth0-TxRx-5
eth0-TxRx-6
eth0-TxRx-7
eth0-tx-0
eth0-tx-1
eth0-tx-2
eth0-tx-3
eth0:lsc
eth0-TxRx-0
eth0-TxRx-1

Note that there are four eth0-tx Q.  Those don't show up in the 16 core
version.

Also from /var/log/messages:
Jun  4 19:19:31 perf22 kernel: ixgbe: 0000:04:00.0: ixgbe_init_interrupt_scheme: Multiqueue Enabled: Rx Queue count = 8, Tx Queue count = 8
Jun  4 19:19:31 perf22 kernel: ixgbe 0000:04:00.0: (PCI Express:2.5Gb/s:Width x8) 00:1b:21:2c:83:b9
Jun  4 19:19:31 perf22 kernel: ixgbe 0000:04:00.0: MAC: 1, PHY: 3, PBA No: e18269-001


When it was 16 cores I got this in the log
Jun  4 19:03:40 perf22 kernel: ixgbe: 0000:04:00.0: ixgbe_init_interrupt_scheme: Multiqueue Enabled: Rx Queue count = 16, Tx Queue count = 16
Jun  4 19:03:40 perf22 kernel: ixgbe 0000:04:00.0: (PCI Express:2.5Gb/s:Width x8) 00:1b:21:2c:83:b9
Jun  4 19:03:40 perf22 kernel: ixgbe 0000:04:00.0: MAC: 1, PHY: 3, PBA No: e18269-001
Jun  4 19:03:40 perf22 kernel: ixgbe 0000:04:00.0: Intel(R) 10 Gigabit Network Connection

Comment 1 Chris Wright 2009-06-05 19:55:05 UTC
Between 151 and 152 (for ixgbe) it's only 3 changes:

$ git log --pretty=format:"%h: %s" 2.6.18-151.el5..2.6.18-152.el5
drivers/net/ixgbe
8d25a47: [net] ixgbe: fix MSI-X allocation on 8+ core systems
7ccad56: [net] ixgbe: fix polling saturates CPU
b6d3719: [net] ixgbe: add GRO suppport

I backed only ixgbe back to 151.  So core kernel had GRO, but ixgbe
would've been using LRO.

1) effectively 151 ixgbe compiled against 152.  Works fine[1][2].
2) Add ixgbe-GRO (build, rmmod, insmod). Works fine[1][2].
3) Add napi fix (build, rmmod, insmod).  Works fine[1][2].
4) Add >8 core fix (build, rmmod, insmod).  Now back to stock 152, and
   back to the original problem that Mark Wagner reported, no traffic.[3][4][5]
5) Back out napi fix (reboot, insmod). Crash[6]. Reboot. Crash[6]. Reboot.
   Crash[6]...give up
6) Back out ixgbe-GRO (so just >8 core fix) (reboot, insmod).  Crash[6].
   Reboot.  Crash[6]...give up

*** switch to 151 kernel, which does NOT have GRO in it ***

7) working fine[1][2][7]
8) add napi fix (ixgbe-GRO not relevant to 151 kernel) (build, rmmod,
   insmod).  working fine[1][2][8]
9) add >8 core fix (build, rmmod, insmod), ping works, netperf crash[4][9],
   Reboot.  no ping traffic[10]. rmmod,insmod,crash[9]
10) backout napi fix (so only >8 core in 151), fresh boot, insmod, Crash[6]

[1] using single MSI interrupt
[2] exhaustive testing == 1 ping attempt and 1 netperf attempt ;-)
[3] first attempt w/ build, rmmod, insmod actually allowed ping to work
    and then crashed immediately on netperf run w/ VT-d BUG().  So rebooted.
[4] using MSI-X, getting 16 TxRx MSI-X interrupts
[5] no traffic passing w/ simple ping
[6] immediately on insmod.  BUG() include/linux/netdevice.h:1068, RIP
    ixgbe_clean_rxonly_many+0xfb
[7] ksoftirqd pegged at 100% cpu even after netperf run ends.
[8] ksoftirqd issue fixed...as expected
[9] BUG at drivers/pci/intel-iommu.c:1521, ixgbe_alloc_rx_buffers->pci_map_page
[10] arp request goes out, arp reply comes back, never makes it up stack

Comment 2 Andy Gospodarek 2009-06-12 19:35:00 UTC
Created attachment 347652 [details]
ixgbe-correctly-enable-interrupts-when-link-is-down.patch

This patch resolves the issue, but I plan to post it as a part of bug 505653.

Comment 3 Andy Gospodarek 2009-06-12 19:36:57 UTC
Created attachment 347653 [details]
ixgbe-correctly-enable-interrupts-when-link-is-down.patch

Apparently attaching a URL doesn't suck down the file and attach it.  Here's the patch for when that link starts 404'ing.

Comment 4 Andy Gospodarek 2009-06-12 19:55:14 UTC

*** This bug has been marked as a duplicate of bug 505653 ***


Note You need to log in before you can comment on or make changes to this bug.