Bugzilla will be upgraded to version 5.0. The upgrade date is tentatively scheduled for 2 December 2018, pending final testing and feedback.
Bug 1401779 - NIC hangs due to corrupt napi lists
NIC hangs due to corrupt napi lists
Status: CLOSED ERRATA
Product: Red Hat Enterprise MRG
Classification: Red Hat
Component: realtime-kernel (Show other bugs)
2.1
x86_64 Linux
unspecified Severity urgent
: ---
: ---
Assigned To: Clark Williams
Ma Yuying
:
: 1401868 (view as bug list)
Depends On:
Blocks: 1402121
  Show dependency treegraph
 
Reported: 2016-12-06 00:08 EST by Jonathan Maxwell
Modified: 2017-05-31 02:15 EDT (History)
14 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Cause: The RT kernel does *not* disable interrupts inside driver ISRs (like the stock kernel does), so a call to __napi_scheule_irqoff() was actually being called with IRQs enabled. Consequence: napi poll list was being corrupted, causing improper NIC operation and potential kernel hang/panic. Fix: Change the definition of __napi_schedule_irqoff() to be the same as __napi_schedule(). This will force modifications of the poll list to be protected. Result: No corruption of napi poll_list, so correct operation of NIC drivers.
Story Points: ---
Clone Of:
: 1402121 (view as bug list)
Environment:
Last Closed: 2017-01-17 13:03:17 EST
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
Always disable irqs in napi_schedule*() (1.77 KB, patch)
2016-12-06 15:27 EST, Steven Rostedt
no flags Details | Diff


External Trackers
Tracker ID Priority Status Summary Last Updated
Red Hat Knowledge Base (Solution) 3061081 None None None 2017-05-31 02:15 EDT
Red Hat Product Errata RHSA-2017:0113 normal SHIPPED_LIVE Important: kernel-rt security and bug fix update 2017-01-17 17:47:44 EST

  None (edit)
Description Jonathan Maxwell 2016-12-06 00:08:21 EST
Description of problem:

The customer recently encountered NIC hangs caused by a corrupt napi list. 

1st one that triggered the issue was in a 3rd party SFC module. 

See Vmcore:

On optimus.gsslab.rdu2.redhat.com
$ retrace-server-interact 867734383 crash

crash> mod -t
NAME          TAINTS
sfc_affinity  O
sfc           O
sfc_resource  O
sfc_char      O
onload        O
sfc_aoe       O
crash>

[920510.714050] WARNING: at lib/list_debug.c:33 __list_add+0xbe/0xd0()
[920510.714051] list_add corruption. prev->next should be next (ffff880c4fa35b90), but was dead000000100100. (prev=ffff880c12cf8a08).

[920510.714084] CPU: 1 PID: 3930 Comm: irq/85-0000:07: Tainted: G           O   ------------   3.10.0-327.rt56.183.el6rt.x86_64 #1
[920510.714085] Hardware name: HP ProLiant DL380p Gen8, BIOS P70 02/10/2014
[920510.714086]  0000000000000009 ffff880c11533cf8 ffffffff815f078d ffff880c11533d38
[920510.714087]  ffffffff8105cd12 ffff880c11533d38 ffff880c142a7030 ffff880c4fa35b90
[920510.714088]  ffff880c12cf8a08 ffff880c0f687c00 ffff880c11528000 ffff880c11533d98
[920510.714089] Call Trace:
[920510.714093]  [<ffffffff815f078d>] dump_stack+0x19/0x1c
[920510.714096]  [<ffffffff8105cd12>] warn_slowpath_common+0x82/0xc0
[920510.714098]  [<ffffffff8105ce06>] warn_slowpath_fmt+0x46/0x50
[920510.714100]  [<ffffffff812d371e>] __list_add+0xbe/0xd0
[920510.714103]  [<ffffffff8151837e>] __napi_schedule+0x2e/0x70
[920510.714116]  [<ffffffffa03ff9fd>] efx_farch_msi_interrupt+0x5d/0x90 [sfc]
[920510.714119]  [<ffffffff810fcf5e>] irq_forced_thread_fn+0x2e/0x70
[920510.714120]  [<ffffffff810fe02f>] irq_thread+0x13f/0x1c0
[920510.714122]  [<ffffffff810fcf30>] ? irq_thread_fn+0x50/0x50
[920510.714123]  [<ffffffff810fce00>] ? irq_finalize_oneshot+0xf0/0xf0
[920510.714124]  [<ffffffff810fdef0>] ? irq_thread_check_affinity+0xb0/0xb0
[920510.714126]  [<ffffffff810fdef0>] ? irq_thread_check_affinity+0xb0/0xb0
[920510.714128]  [<ffffffff810886fe>] kthread+0xbe/0xd0
[920510.714130]  [<ffffffff81088640>] ? kthreadd+0x1d0/0x1d0
[920510.714132]  [<ffffffff815fc2c8>] ret_from_fork+0x58/0x90
[920510.714133]  [<ffffffff81088640>] ? kthreadd+0x1d0/0x1d0
[920510.714134] ---[ end trace 0000000000000002 ]---
[920510.714139] ------------[ cut here ]------------
[920510.714140] WARNING: at lib/list_debug.c:62 __list_del_entry+0x82/0xd0()
[920510.714141] list_del corruption. next->prev should be ffff880c12cf8a08, but was ffff880c142a7030

In the other there were no 3rd party drivers or sfc module. In this one it triggered in the bnx2x:

On optimus.gsslab.rdu2.redhat.com:
$ retrace-server-interact 137957660 crash

crash> mod -t
no tainted modules
crash> 

 [499516.248925] ------------[ cut here ]------------
[499516.254411] WARNING: at lib/list_debug.c:29 __list_add+0x77/0xd0()
[499516.261690] list_add corruption. next->prev should be prev (ffff881f66d8c288
), but was ffff881fffc55b90. (next=ffff881fffc55b90).
[499516.275365] Modules linked in: autofs4 nfsv3 nfs_acl nfs fscache lockd sunrp
c grace bonding ib_ipoib rdma_ucm ib_ucm ib_uverbs ib_umad rdma_cm ib_cm iw_cm v
fat fat iTCO_wdt iTCO_vendor_support microcode pcspkr serio_raw joydev sb_edac e
dac_core ipmi_si ipmi_msghandler i2c_i801 lpc_ich hpilo hpwdt ioatdma dca sg bnx
2x ptp pps_core libcrc32c mdio acpi_power_meter hwmon ext4 jbd2 mbcache sd_mod c
rc_t10dif crct10dif_common mlx4_ib ib_sa ib_mad ib_core ib_addr ipv6 mlx4_core h
psa wmi mgag200 ttm drm_kms_helper drm i2c_algo_bit sysimgblt sysfillrect i2c_co
re syscopyarea dm_mirror dm_region_hash dm_log dm_mod
[499516.337643] CPU: 2 PID: 23296 Comm: irq/142-eth0-fp Not tainted 3.10.0-327.r
t56.183.el6rt.x86_64 #1


[499516.382241] Call Trace:
[499516.385203]  [<ffffffff815f078d>] dump_stack+0x19/0x1c
[499516.385207]  [<ffffffff8105cd12>] warn_slowpath_common+0x82/0xc0
[499516.385209]  [<ffffffff8105ce06>] warn_slowpath_fmt+0x46/0x50
[499516.385211]  [<ffffffff812d36d7>] __list_add+0x77/0xd0
[499516.385213]  [<ffffffff812d3660>] ? list_del+0x40/0x40
[499516.385217]  [<ffffffff81518336>] __napi_schedule_irqoff+0x26/0x40
[499516.385232]  [<ffffffffa0354905>] bnx2x_msix_fp_int+0xd5/0x180 [bnx2x]
[499516.385234]  [<ffffffff810fcf5e>] irq_forced_thread_fn+0x2e/0x70
[499516.385236]  [<ffffffff810fe02f>] irq_thread+0x13f/0x1c0
[499516.385238]  [<ffffffff810fcf30>] ? irq_thread_fn+0x50/0x50
[499516.385239]  [<ffffffff810fce00>] ? irq_finalize_oneshot+0xf0/0xf0
[499516.385241]  [<ffffffff810fdef0>] ? irq_thread_check_affinity+0xb0/0xb0
[499516.385243]  [<ffffffff810fdef0>] ? irq_thread_check_affinity+0xb0/0xb0
[499516.385247]  [<ffffffff810886fe>] kthread+0xbe/0xd0
[499516.385248]  [<ffffffff81088640>] ? kthreadd+0x1d0/0x1d0
[499516.385251]  [<ffffffff815fc2c8>] ret_from_fork+0x58/0x90
[499516.385253]  [<ffffffff81088640>] ? kthreadd+0x1d0/0x1d0
[499516.385254] ---[ end trace 0000000000000002 ]---

Version-Release number of selected component (if applicable):

3.10.0-327.rt56.183.el6rt.x86_64

How reproducible:

Not reproducible at Red Hat. But happens some times at the customers site on different machines.

Actual results:

NIC driver detects that napi lists are corrupt. Subsequently all NICs that use napi are broken. They need to restart the system to recover.

Expected results:

No corrupt napi lists.

Additional info:

In both case had a Infiniband card and bnx2x drivers.

I can't find a matching bug. But I am finding out whether this started happening when they upgraded from RHEL6.5 (vmlinuz-3.10.0-229.rt56.147.el6rt.x86_64) to RHEL6.7 (vmlinuz-3.10.0-327.rt56.183.el6rt.x86_64). It's possible that this is a regression.
Comment 1 Tim Speetjens 2016-12-06 04:54:21 EST
This issue didn't happen an the kernel-rt 3.10.0-229.rt56.147.el6rt package
Comment 2 Michal Schmidt 2016-12-06 10:08:58 EST
*** Bug 1401868 has been marked as a duplicate of this bug. ***
Comment 3 Michal Schmidt 2016-12-06 10:13:12 EST
Reproducer suggestion Robert Stonehouse forwarded to bug 1401868:
============
There is no reproducer script shared with Solarflare (although the customer did reproduce in a test environment); We can guess it is an average market data feed; ~200 byte UDP multicast streams, fairly low data rate but bursty.

I think for this to be a good test you need to ensure that sfc.ko and bnx2x.ko share interrupts and hence NAPI contexts where possible. Also you probably need bursty traffic to ensure that NAPI scheduling lists are being regularly manipulated.
============
Comment 4 Michal Schmidt 2016-12-06 10:21:10 EST
(In reply to Jonathan Maxwell from comment #0)
> In the other there were no 3rd party drivers or sfc module. In this one it
> triggered in the bnx2x:

OK, this means we should be able to reproduce this more easily.
Comment 5 Clark Williams 2016-12-06 14:28:55 EST
Talked with Steven Rostedt about this and it looks like the RT series has an issue with __napi_schedule_irqoff(). He's going to post a patch for both us and the upstream kernel(s) which basically turns __napi_schedule_irqoff() into napi_schedule(). This will ensure that interrupts are off when adding to the tail of the poll_list member of the napi structure.
Comment 6 Steven Rostedt 2016-12-06 15:27 EST
Created attachment 1228673 [details]
Always disable irqs in napi_schedule*()

The function __napi_schedule_irqoff() is called from cases where interrupts are already disabled, but when PREEMPT_RT_FULL is defined, interrupts run not only as threads, but also can be preempted. As there are some interrupt handlers that expect to be called with interrupts disabled, or at least with preemption disabled (if all irqs are forced as threads), this will break calling __napi_schedule_irqoff(), as the per cpu napi->poll_list is protected with interrupts disabled (preemption disabling). Calling this without disabling interrupts/preemption can cause the napi->poll_list to be corrupted.

As bnx2x is not the only driver that uses this, and there may be even more in the future. The best case is to always disable interrupts even when calling __napi_schedule_irqoff(), when CONFIG_PREEMPT_RT_FULL is enabled.
Comment 7 Clark Williams 2016-12-06 17:57:10 EST
Do we think that we can get the customer to test this with the latest MRG/R kernel (which will be kernel-rt-3.10.0-514.rt56.208.el6rt, when I finish a brew build) or do we need to give them a hotfix on .183?

I'd prefer the later kernel, just because .183 *was* a hotfix and I don't like doing a hotfix on a hotfix.
Comment 8 Jonathan Maxwell 2016-12-06 18:20:50 EST
(In reply to Clark Williams from comment #7)
> Do we think that we can get the customer to test this with the latest MRG/R
> kernel (which will be kernel-rt-3.10.0-514.rt56.208.el6rt, when I finish a
> brew build) or do we need to give them a hotfix on .183?
> 
> I'd prefer the later kernel, just because .183 *was* a hotfix and I don't
> like doing a hotfix on a hotfix.

Hi, Thanks for the super fast response. Much appreciated. Great that we have a patch. This is a production environment. Will kernel-rt-3.10.0-514.rt56.208.el6rt with this patch be supported? Or will it be considered a test kernel?

I am sure the customer will ask.

Regards

Jon
Comment 9 Tim Speetjens 2016-12-07 04:35:24 EST
(In reply to Clark Williams from comment #7)
> Do we think that we can get the customer to test this with the latest MRG/R
> kernel (which will be kernel-rt-3.10.0-514.rt56.208.el6rt, when I finish a
> brew build) or do we need to give them a hotfix on .183?
> 
> I'd prefer the later kernel, just because .183 *was* a hotfix and I don't
> like doing a hotfix on a hotfix.

Can we also build one that is closer to what they run now?
Comment 10 Beth Uptagrafft 2016-12-07 14:02:28 EST
Yes, we will apply the patch to our most recently released MRG kernel, version 3.10.0-327.rt56.198.

I think it will be a hotfix kernel and will be supported until our next release that includes this patch is available, assuming this fix completely addresses their issue. Clark is building the kernel and we will do some basic smoke testing.
Comment 11 Jonathan Maxwell 2016-12-07 14:49:44 EST
It appears that they will need the kernel-rt-3.10.0-327.rt56.183. They use a 3rd party SFC driver that requires this. Tim updated the case as follows:

"(1) The most recent kernel that can be used is kernel-rt-3.10.0-327.rt56.183. On higher versions, SolarFlare onload cannot be used.

For this reason, a test/hotfix package should be based on that version."

Therefore please can we have the patch in kernel-rt-3.10.0-327.rt56.183? If this works then I guess it need to become the hotfix kernel, until we release the fix officially.

Thanks

Jon
Comment 12 Clark Williams 2016-12-08 10:43:50 EST
Building kernel-rt-3.10.0-327.rt56.183.bz1401779.el6rt now.
Comment 13 Clark Williams 2016-12-08 11:21:31 EST
Build completed and boot tested:

https://brewweb.engineering.redhat.com/brew/buildinfo?buildID=527974
Comment 14 Clark Williams 2016-12-08 11:25:33 EST
Build tagged as hotfix
Comment 29 errata-xmlrpc 2017-01-17 13:03:17 EST
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHSA-2017-0113.html

Note You need to log in before you can comment on or make changes to this bug.