Bug 1292902
Summary: | rt: netpoll: live lock with NAPI polling and busy polling on realtime kernel | |||
---|---|---|---|---|
Product: | Red Hat Enterprise Linux 7 | Reporter: | Clark Williams <williams> | |
Component: | kernel-rt | Assignee: | Clark Williams <williams> | |
kernel-rt sub component: | Misc | QA Contact: | Zhang Kexin <kzhang> | |
Status: | CLOSED ERRATA | Docs Contact: | ||
Severity: | high | |||
Priority: | high | CC: | bhu, daolivei, kzhang, lgoncalv, zshi | |
Version: | 7.3 | Keywords: | ZStream | |
Target Milestone: | rc | |||
Target Release: | 7.3 | |||
Hardware: | x86_64 | |||
OS: | Linux | |||
Whiteboard: | ||||
Fixed In Version: | Doc Type: | Bug Fix | ||
Doc Text: | Story Points: | --- | ||
Clone Of: | ||||
: | 1293230 (view as bug list) | Environment: | ||
Last Closed: | 2016-11-03 19:38:57 UTC | Type: | Bug | |
Regression: | --- | Mount Type: | --- | |
Documentation: | --- | CRM: | ||
Verified Versions: | Category: | --- | ||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | ||
Cloudforms Team: | --- | Target Upstream Version: | ||
Embargoed: | ||||
Bug Depends On: | ||||
Bug Blocks: | 1274397, 1282922, 1293230, 1295884, 1313485 | |||
Attachments: |
Description
Clark Williams
2015-12-18 16:55:55 UTC
Created attachment 1107309 [details]
netpoll: Always take poll_lock when doing polling
Patch to synchronize NAPI polling and busy-polling to prevent live-lock.
Note: the RT engineering team originally thought this was a problem in the ixgbe driver code but further BZs revealed that it was a consequence of how RT is implemented combined with the NAPI polling and busy-polling code in the network driver framework. Created attachment 1112320 [details]
Revert "ixgbevf: Prevent livelock spinning grabbing ixgbevf_qv_lock"
Created attachment 1112321 [details]
revert "ixgbe: Prevent livelock spinning grabbing ixgbe_qv_lock"
QE update, Reproduced on 3.10.0-327.rt56.204.el7.x86_64 with test like https://bugzilla.redhat.com/show_bug.cgi?id=1293230#c14 [ 1112.876788] INFO: rcu_preempt self-detected stall on CPU { 13} (t=60000 jiffies g=4995 c=4994 q=0) [ 1112.876789] sending NMI to all CPUs: [ 1112.876793] NMI backtrace for cpu 0 [ 1112.876796] CPU: 0 PID: 788 Comm: irq/86-0000:07: Not tainted 3.10.0-327.rt56.204.el7.x86_64 #1 [ 1112.876797] Hardware name: HP ProLiant DL388p Gen8, BIOS P70 12/14/2012 [ 1112.876799] task: ffff880416031780 ti: ffff880416040000 task.ti: ffff880416040000 [ 1112.876807] RIP: 0010:[<ffffffff810a9f8f>] [<ffffffff810a9f8f>] migrate_disable+0xf/0xf0 [ 1112.876808] RSP: 0018:ffff880416043b38 EFLAGS: 00000203 [ 1112.876808] RAX: ffff880416043fd8 RBX: ffff88042f613680 RCX: 0000000000000020 [ 1112.876809] RDX: 0000000000000000 RSI: 0000000000000020 RDI: 0000000000000200 [ 1112.876810] RBP: ffff880416043b78 R08: 000000000000003c R09: 0000000000000001 [ 1112.876810] R10: ffff880419a1368e R11: ffff880416efc980 R12: 0000000000013680 [ 1112.876811] R13: 0000000000000200 R14: 0000000000000020 R15: ffff880416031780 [ 1112.876812] FS: 0000000000000000(0000) GS:ffff88042f600000(0000) knlGS:0000000000000000 [ 1112.876813] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 1112.876813] CR2: 00000000006eb0f8 CR3: 00000000bb4b9000 CR4: 00000000000407f0 [ 1112.876814] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 1112.876815] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 [ 1112.876824] Stack: [ 1112.876827] ffff880416043b78 ffffffff81501dd4 ffff8800bc82dca0 0000000000000200 [ 1112.876828] ffff8804165fb000 ffff8800bc82dcb8 ffff880416efc980 0000000000000001 [ 1112.876830] ffff880416043b98 ffffffff815024b1 ffff880416efc000 ffff8804165fb000 [ 1112.876831] Call Trace: [ 1112.876836] [<ffffffff81501dd4>] ? __netdev_alloc_frag+0x54/0xe0 [ 1112.876838] [<ffffffff815024b1>] __alloc_rx_skb+0x51/0xb0 [ 1112.876840] [<ffffffff8150252b>] __netdev_alloc_skb+0x1b/0x40 [ 1112.876869] [<ffffffffa04c423f>] __efx_rx_packet+0xff/0x5f0 [sfc] [ 1112.876877] [<ffffffffa04c49d9>] efx_rx_packet+0x2a9/0x3f0 [sfc] [ 1112.876884] [<ffffffffa04be90b>] efx_ef10_ev_process+0x3bb/0x6b0 [sfc] [ 1112.876887] [<ffffffff81512ef9>] ? netif_receive_skb+0x89/0xe0 [ 1112.876893] [<ffffffffa04a8469>] efx_process_channel+0x99/0x1b0 [sfc] [ 1112.876898] [<ffffffffa04a8760>] efx_poll+0xb0/0x230 [sfc] [ 1112.876900] [<ffffffff81513f5b>] net_rx_action+0x1fb/0x360 [ 1112.876903] [<ffffffff81077558>] do_current_softirqs+0x1d8/0x3c0 [ 1112.876906] [<ffffffff8110bfc0>] ? irq_thread_fn+0x50/0x50 [ 1112.876908] [<ffffffff810777b4>] local_bh_enable+0x74/0xa0 [ 1112.876909] [<ffffffff8110c001>] irq_forced_thread_fn+0x41/0x70 [ 1112.876911] [<ffffffff8110c49f>] irq_thread+0x12f/0x180 [ 1112.876912] [<ffffffff8110c080>] ? wake_threads_waitq+0x50/0x50 [ 1112.876914] [<ffffffff8110c370>] ? irq_thread_check_affinity+0x30/0x30 [ 1112.876917] [<ffffffff81099e41>] kthread+0xc1/0xd0 [ 1112.876919] [<ffffffff81099d80>] ? kthread_worker_fn+0x170/0x170 [ 1112.876922] [<ffffffff81631558>] ret_from_fork+0x58/0x90 [ 1112.876923] [<ffffffff81099d80>] ? kthread_worker_fn+0x170/0x170 [ 1112.876934] Code: 75 08 48 83 87 88 07 00 00 01 e8 ed b1 ff ff 5d c3 66 66 2e 0f 1f 84 00 00 00 00 00 66 66 66 66 90 55 65 48 8b 04 25 78 c0 00 00 <48> 89 e5 41 55 41 54 53 65 48 8b 1c 25 80 c0 00 00 f7 80 44 c0 Verified on 3.10.0-415.rt56.298.el7.x86_64 Run the reproducer several hours, no problem found. *** Bug 1273264 has been marked as a duplicate of this bug. *** Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://rhn.redhat.com/errata/RHSA-2016-2584.html |