Description of problem: When using the -152 kernel, the ixgbe driver will not pass network traffic ( send or receive) when the system has 16 cores. If I turn off hyper threads and just use 8 cores, the driver appears to function Version-Release number of selected component (if applicable): -152 kernel from dzickus people page How reproducible: constantly Steps to Reproduce: 1.load -152 kernel on system with ixgbe based card. 2.reboot 3. try to ping in or out of interface. Actual results: ping fails Expected results: ping succeeds Additional info: With 16 cores we see the following IRQ $ awk '/eth0/ {print $NF}' /proc/interrupts eth0-TxRx-0 eth0-TxRx-1 eth0-TxRx-2 eth0-TxRx-3 eth0-TxRx-4 eth0-TxRx-5 eth0-TxRx-6 eth0-TxRx-7 eth0-TxRx-8 eth0-TxRx-9 eth0:lsc With 8 cores we get: eth0-TxRx-2 eth0-TxRx-3 eth0-TxRx-4 eth0-TxRx-5 eth0-TxRx-6 eth0-TxRx-7 eth0-tx-0 eth0-tx-1 eth0-tx-2 eth0-tx-3 eth0:lsc eth0-TxRx-0 eth0-TxRx-1 Note that there are four eth0-tx Q. Those don't show up in the 16 core version. Also from /var/log/messages: Jun 4 19:19:31 perf22 kernel: ixgbe: 0000:04:00.0: ixgbe_init_interrupt_scheme: Multiqueue Enabled: Rx Queue count = 8, Tx Queue count = 8 Jun 4 19:19:31 perf22 kernel: ixgbe 0000:04:00.0: (PCI Express:2.5Gb/s:Width x8) 00:1b:21:2c:83:b9 Jun 4 19:19:31 perf22 kernel: ixgbe 0000:04:00.0: MAC: 1, PHY: 3, PBA No: e18269-001 When it was 16 cores I got this in the log Jun 4 19:03:40 perf22 kernel: ixgbe: 0000:04:00.0: ixgbe_init_interrupt_scheme: Multiqueue Enabled: Rx Queue count = 16, Tx Queue count = 16 Jun 4 19:03:40 perf22 kernel: ixgbe 0000:04:00.0: (PCI Express:2.5Gb/s:Width x8) 00:1b:21:2c:83:b9 Jun 4 19:03:40 perf22 kernel: ixgbe 0000:04:00.0: MAC: 1, PHY: 3, PBA No: e18269-001 Jun 4 19:03:40 perf22 kernel: ixgbe 0000:04:00.0: Intel(R) 10 Gigabit Network Connection
Between 151 and 152 (for ixgbe) it's only 3 changes: $ git log --pretty=format:"%h: %s" 2.6.18-151.el5..2.6.18-152.el5 drivers/net/ixgbe 8d25a47: [net] ixgbe: fix MSI-X allocation on 8+ core systems 7ccad56: [net] ixgbe: fix polling saturates CPU b6d3719: [net] ixgbe: add GRO suppport I backed only ixgbe back to 151. So core kernel had GRO, but ixgbe would've been using LRO. 1) effectively 151 ixgbe compiled against 152. Works fine[1][2]. 2) Add ixgbe-GRO (build, rmmod, insmod). Works fine[1][2]. 3) Add napi fix (build, rmmod, insmod). Works fine[1][2]. 4) Add >8 core fix (build, rmmod, insmod). Now back to stock 152, and back to the original problem that Mark Wagner reported, no traffic.[3][4][5] 5) Back out napi fix (reboot, insmod). Crash[6]. Reboot. Crash[6]. Reboot. Crash[6]...give up 6) Back out ixgbe-GRO (so just >8 core fix) (reboot, insmod). Crash[6]. Reboot. Crash[6]...give up *** switch to 151 kernel, which does NOT have GRO in it *** 7) working fine[1][2][7] 8) add napi fix (ixgbe-GRO not relevant to 151 kernel) (build, rmmod, insmod). working fine[1][2][8] 9) add >8 core fix (build, rmmod, insmod), ping works, netperf crash[4][9], Reboot. no ping traffic[10]. rmmod,insmod,crash[9] 10) backout napi fix (so only >8 core in 151), fresh boot, insmod, Crash[6] [1] using single MSI interrupt [2] exhaustive testing == 1 ping attempt and 1 netperf attempt ;-) [3] first attempt w/ build, rmmod, insmod actually allowed ping to work and then crashed immediately on netperf run w/ VT-d BUG(). So rebooted. [4] using MSI-X, getting 16 TxRx MSI-X interrupts [5] no traffic passing w/ simple ping [6] immediately on insmod. BUG() include/linux/netdevice.h:1068, RIP ixgbe_clean_rxonly_many+0xfb [7] ksoftirqd pegged at 100% cpu even after netperf run ends. [8] ksoftirqd issue fixed...as expected [9] BUG at drivers/pci/intel-iommu.c:1521, ixgbe_alloc_rx_buffers->pci_map_page [10] arp request goes out, arp reply comes back, never makes it up stack
Created attachment 347652 [details] ixgbe-correctly-enable-interrupts-when-link-is-down.patch This patch resolves the issue, but I plan to post it as a part of bug 505653.
Created attachment 347653 [details] ixgbe-correctly-enable-interrupts-when-link-is-down.patch Apparently attaching a URL doesn't suck down the file and attach it. Here's the patch for when that link starts 404'ing.
*** This bug has been marked as a duplicate of bug 505653 ***