Bug 504365 - ixgbe driver does not pass network traffic on system with 16 cores
ixgbe driver does not pass network traffic on system with 16 cores
Status: CLOSED DUPLICATE of bug 505653
Product: Red Hat Enterprise Linux 5
Classification: Red Hat
Component: kernel-xen (Show other bugs)
5.4
All Linux
low Severity medium
: rc
: ---
Assigned To: Andy Gospodarek
Red Hat Kernel QE team
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2009-06-05 15:49 EDT by Mark Wagner
Modified: 2014-06-29 19:01 EDT (History)
5 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2009-06-12 15:55:14 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
ixgbe-correctly-enable-interrupts-when-link-is-down.patch (102 bytes, text/plain)
2009-06-12 15:35 EDT, Andy Gospodarek
no flags Details
ixgbe-correctly-enable-interrupts-when-link-is-down.patch (2.21 KB, patch)
2009-06-12 15:36 EDT, Andy Gospodarek
no flags Details | Diff

  None (edit)
Description Mark Wagner 2009-06-05 15:49:38 EDT
Description of problem:
When using the -152 kernel, the ixgbe driver will not pass network traffic (
send or receive) when the system has 16 cores.  If I turn off hyper threads and just use 8 cores, the driver appears to function

Version-Release number of selected component (if applicable):
-152 kernel from dzickus people page

How reproducible:
constantly

Steps to Reproduce:
1.load -152 kernel on system with ixgbe based card.
2.reboot
3. try to ping in or out of interface.
  
Actual results:
ping fails

Expected results:

ping succeeds

Additional info:

With 16 cores we see the following IRQ

$ awk '/eth0/ {print $NF}' /proc/interrupts
 eth0-TxRx-0
 eth0-TxRx-1
 eth0-TxRx-2
 eth0-TxRx-3
 eth0-TxRx-4
 eth0-TxRx-5
 eth0-TxRx-6
 eth0-TxRx-7
 eth0-TxRx-8
 eth0-TxRx-9
 eth0:lsc


With 8 cores we get:

eth0-TxRx-2
eth0-TxRx-3
eth0-TxRx-4
eth0-TxRx-5
eth0-TxRx-6
eth0-TxRx-7
eth0-tx-0
eth0-tx-1
eth0-tx-2
eth0-tx-3
eth0:lsc
eth0-TxRx-0
eth0-TxRx-1

Note that there are four eth0-tx Q.  Those don't show up in the 16 core
version.

Also from /var/log/messages:
Jun  4 19:19:31 perf22 kernel: ixgbe: 0000:04:00.0: ixgbe_init_interrupt_scheme: Multiqueue Enabled: Rx Queue count = 8, Tx Queue count = 8
Jun  4 19:19:31 perf22 kernel: ixgbe 0000:04:00.0: (PCI Express:2.5Gb/s:Width x8) 00:1b:21:2c:83:b9
Jun  4 19:19:31 perf22 kernel: ixgbe 0000:04:00.0: MAC: 1, PHY: 3, PBA No: e18269-001


When it was 16 cores I got this in the log
Jun  4 19:03:40 perf22 kernel: ixgbe: 0000:04:00.0: ixgbe_init_interrupt_scheme: Multiqueue Enabled: Rx Queue count = 16, Tx Queue count = 16
Jun  4 19:03:40 perf22 kernel: ixgbe 0000:04:00.0: (PCI Express:2.5Gb/s:Width x8) 00:1b:21:2c:83:b9
Jun  4 19:03:40 perf22 kernel: ixgbe 0000:04:00.0: MAC: 1, PHY: 3, PBA No: e18269-001
Jun  4 19:03:40 perf22 kernel: ixgbe 0000:04:00.0: Intel(R) 10 Gigabit Network Connection
Comment 1 Chris Wright 2009-06-05 15:55:05 EDT
Between 151 and 152 (for ixgbe) it's only 3 changes:

$ git log --pretty=format:"%h: %s" 2.6.18-151.el5..2.6.18-152.el5
drivers/net/ixgbe
8d25a47: [net] ixgbe: fix MSI-X allocation on 8+ core systems
7ccad56: [net] ixgbe: fix polling saturates CPU
b6d3719: [net] ixgbe: add GRO suppport

I backed only ixgbe back to 151.  So core kernel had GRO, but ixgbe
would've been using LRO.

1) effectively 151 ixgbe compiled against 152.  Works fine[1][2].
2) Add ixgbe-GRO (build, rmmod, insmod). Works fine[1][2].
3) Add napi fix (build, rmmod, insmod).  Works fine[1][2].
4) Add >8 core fix (build, rmmod, insmod).  Now back to stock 152, and
   back to the original problem that Mark Wagner reported, no traffic.[3][4][5]
5) Back out napi fix (reboot, insmod). Crash[6]. Reboot. Crash[6]. Reboot.
   Crash[6]...give up
6) Back out ixgbe-GRO (so just >8 core fix) (reboot, insmod).  Crash[6].
   Reboot.  Crash[6]...give up

*** switch to 151 kernel, which does NOT have GRO in it ***

7) working fine[1][2][7]
8) add napi fix (ixgbe-GRO not relevant to 151 kernel) (build, rmmod,
   insmod).  working fine[1][2][8]
9) add >8 core fix (build, rmmod, insmod), ping works, netperf crash[4][9],
   Reboot.  no ping traffic[10]. rmmod,insmod,crash[9]
10) backout napi fix (so only >8 core in 151), fresh boot, insmod, Crash[6]

[1] using single MSI interrupt
[2] exhaustive testing == 1 ping attempt and 1 netperf attempt ;-)
[3] first attempt w/ build, rmmod, insmod actually allowed ping to work
    and then crashed immediately on netperf run w/ VT-d BUG().  So rebooted.
[4] using MSI-X, getting 16 TxRx MSI-X interrupts
[5] no traffic passing w/ simple ping
[6] immediately on insmod.  BUG() include/linux/netdevice.h:1068, RIP
    ixgbe_clean_rxonly_many+0xfb
[7] ksoftirqd pegged at 100% cpu even after netperf run ends.
[8] ksoftirqd issue fixed...as expected
[9] BUG at drivers/pci/intel-iommu.c:1521, ixgbe_alloc_rx_buffers->pci_map_page
[10] arp request goes out, arp reply comes back, never makes it up stack
Comment 2 Andy Gospodarek 2009-06-12 15:35:00 EDT
Created attachment 347652 [details]
ixgbe-correctly-enable-interrupts-when-link-is-down.patch

This patch resolves the issue, but I plan to post it as a part of bug 505653.
Comment 3 Andy Gospodarek 2009-06-12 15:36:57 EDT
Created attachment 347653 [details]
ixgbe-correctly-enable-interrupts-when-link-is-down.patch

Apparently attaching a URL doesn't suck down the file and attach it.  Here's the patch for when that link starts 404'ing.
Comment 4 Andy Gospodarek 2009-06-12 15:55:14 EDT

*** This bug has been marked as a duplicate of bug 505653 ***

Note You need to log in before you can comment on or make changes to this bug.