Bug 1292927 - rcuc starvation leads to rcu stall
rcuc starvation leads to rcu stall
Status: VERIFIED
Product: Red Hat Enterprise Linux 7
Classification: Red Hat
Component: kernel-rt (Show other bugs)
7.3
x86_64 Linux
high Severity high
: rc
: 7.3
Assigned To: Scott Wood
Jianlin Shi
: ZStream
Depends On:
Blocks: 1420851 1442258 1449577 1293229 1295885
  Show dependency treegraph
 
Reported: 2015-12-18 13:56 EST by Daniel Bristot de Oliveira
Modified: 2017-12-04 02:03 EST (History)
11 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
: 1293229 (view as bug list)
Environment:
Last Closed:
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
rt: rcu: Boost rcuc if it has 2 jiffies before splatting (3.18 KB, patch)
2015-12-18 14:14 EST, Clark Williams
no flags Details | Diff
rcu: Boost rcuc if it has 4 jiffies before splatting (3.25 KB, patch)
2016-01-06 21:43 EST, Luis Claudio R. Goncalves
no flags Details | Diff
softirq: Perform softirqs in local_bh_enable() for a limited amount of time (3.64 KB, patch)
2016-01-06 21:44 EST, Luis Claudio R. Goncalves
no flags Details | Diff
Server side script for testing the NAPI-POLL sfc issue (1.29 KB, text/plain)
2017-04-19 11:46 EDT, Luis Claudio R. Goncalves
no flags Details
Client side script for testing the NAPI-poll sfc issue (1.64 KB, text/plain)
2017-04-19 11:49 EDT, Luis Claudio R. Goncalves
no flags Details

  None (edit)
Description Daniel Bristot de Oliveira 2015-12-18 13:56:40 EST
Description of problem:

Kernel emits RCU stall messages if a RT task with priority higher than or equals
to FIFO:2 runs for more than 60 seconds.

This occurs due to the starvation of the rcuc/ threads, that runs with priority
FIFO:2.

How reproducible:
Always

Steps to Reproduce:
1. set rcu_nocbs=$CPU kernel parameter for a given CPU. For instance, CPU 3:

   Edit the file /etc/sysconfig/grub, and add the "rcu_nocbs=3" parameter in the
   GRUB_CMDLINE_LINUX= option.

   Apply grub's config using the following command:
   # grub2-mkconfig > /etc/grub2.cfg

2. Reboot the system

3. Move all rcu threads to a house keeping CPU. For example, to the CPU 0:
   # for i in `pgrep rcu[^c]` ; do taskset -pc 0 $i > /dev/null ; done

4. Run hackbench (from rt-tests package) on the CPU configured as rcu_nocbs,
   with a FIFO prio >= rcuc/CPU thread's prio. On or example, the CPU 3:

   # taskset -c 3 chrt -f 10 hackbench -g 2 -f 2 -s 1000 -l 10000000000 > /dev/null 2&>1
   
5. Wait for 61 seconds, kill hackbench, and run dmesg to see RCU stall messages.

Actual results:
   RCU stall messages in the CPU that is running hackbench.

Expected results:
   No RCU stall messages.

Additional info:
  Workaround: Increase rcuc threads priority.
Comment 2 Clark Williams 2015-12-18 14:14 EST
Created attachment 1107358 [details]
rt: rcu: Boost rcuc if it has 2 jiffies before splatting

Patch to prevent RCU starvation on RT kernel
Comment 5 Luis Claudio R. Goncalves 2016-01-06 21:43 EST
Created attachment 1112322 [details]
rcu: Boost rcuc if it has 4 jiffies before splatting
Comment 6 Luis Claudio R. Goncalves 2016-01-06 21:44 EST
Created attachment 1112323 [details]
softirq: Perform softirqs in local_bh_enable() for a limited amount of time
Comment 8 Jianlin Shi 2016-09-29 04:11:54 EDT
RCU stall messages still appear when run reproducer on 3.10.0-510.


[root@ibm-x3650m5-03 ~]# uname -a
Linux ibm-x3650m5-03.rhts.eng.pek2.redhat.com 3.10.0-510.rt56.415.el7.x86_64 #1 SMP PREEMPT RT Wed Sep 21 16:48:53 EDT 2016 x86_64 x86_64 x86_64 GNU/Linux

[root@ibm-x3650m5-03 ~]# cat /proc/cmdline 
BOOT_IMAGE=/vmlinuz-3.10.0-510.rt56.415.el7.x86_64 root=/dev/mapper/rhel_ibm--x3650m5--03-root ro crashkernel=auto rd.lvm.lv=rhel_ibm-x3650m5-03/root rd.lvm.lv=rhel_ibm-x3650m5-03/swap console=ttyS0,115200n81 LANG=en_US.UTF-8 rcu_nocbs=3
[root@ibm-x3650m5-03 ~]# for i in `pgrep rcu[^c]` ; do taskset -pc 0 $i > /dev/null ; done
[root@ibm-x3650m5-03 ~]# taskset -c 3 chrt -f 10 hackbench -g 2 -f 2 -s 1000 -l 10000000000 > /dev/null 2&>1


ibm-x3650m5-03 login: [  201.042244] INFO: rcu_preempt self-detected stall on CPU { 3}  (t=60000 jiffies g=3269 c=3268 q=5479)
[  201.042245] Task dump for CPU 3:
[  201.042246] hackbench       R  running task        0  3208   3207 0x00000080
[  201.042260]  ffff88086cc7c080 00000000d580a3e4 ffff88046fac3d90 ffffffff810bc1a6
[  201.042261]  0000000000000003 ffffffff81a01f00 ffff88046fac3da8 ffffffff810bffd9
[  201.042262]  0000000000000004 ffff88046fac3dd8 ffffffff8112fdc0 ffff88046fad20e0
[  201.042262] Call Trace:
[  201.042269]  <IRQ>  [<ffffffff810bc1a6>] sched_show_task+0xb6/0x120
[  201.042271]  [<ffffffff810bffd9>] dump_cpu_task+0x39/0x70
[  201.042274]  [<ffffffff8112fdc0>] rcu_dump_cpu_stacks+0x90/0xd0
[  201.042276]  [<ffffffff8113467a>] rcu_check_callbacks+0x49a/0x840
[  201.042279]  [<ffffffff8108f0e2>] update_process_times+0x42/0x70
[  201.042281]  [<ffffffff810e98d5>] tick_sched_handle.isra.18+0x25/0x60
[  201.042282]  [<ffffffff810e9aa4>] tick_sched_timer+0x44/0x70
[  201.042285]  [<ffffffff810ab605>] __run_hrtimer+0x85/0x270
[  201.042286]  [<ffffffff810e9a60>] ? tick_sched_do_timer+0x50/0x50
[  201.042288]  [<ffffffff810ac460>] hrtimer_interrupt+0x120/0x2a0
[  201.042291]  [<ffffffff810437f7>] local_apic_timer_interrupt+0x37/0x60
[  201.042295]  [<ffffffff8169292f>] smp_apic_timer_interrupt+0x3f/0x60
[  201.042297]  [<ffffffff8169109d>] apic_timer_interrupt+0x6d/0x80
[  201.042300]  <EOI>  [<ffffffff810a3320>] ? task_work_run+0xe0/0xe0
[  201.042303]  [<ffffffff812a5d59>] ? sock_has_perm+0x49/0xc0
[  201.042305]  [<ffffffff812a5eb3>] selinux_socket_recvmsg+0x23/0x30
[  201.042307]  [<ffffffff812a2fb6>] security_socket_recvmsg+0x16/0x20
[  201.042310]  [<ffffffff8154accc>] sock_aio_read.part.7+0xdc/0x160
[  201.042312]  [<ffffffff8154ad71>] sock_aio_read+0x21/0x30
[  201.042314]  [<ffffffff811f2b9d>] do_sync_read+0x8d/0xd0
[  201.042316]  [<ffffffff811f340d>] vfs_read+0x14d/0x170
[  201.042317]  [<ffffffff811f3f2f>] SyS_read+0x7f/0xe0
[  201.042318]  [<ffffffff81690409>] system_call_fastpath+0x16/0x1b


BTW, also tried on 3.10.0-327.10.1.rt56.211.el7_2.x86_64, no RCU stall messages appeared.
Comment 9 Clark Williams 2016-09-29 11:05:25 EDT
(In reply to Jianlin Shi from comment #8)
> RCU stall messages still appear when run reproducer on 3.10.0-510.
> 

Looks like I was missing a section of the patch which modified kernel/rcutree.c (the else clause). I've updated the patch and when my current -511 build finishes, I'll kick off a new build (also -511 based) which should fix this issue. 

I'll move to Modified when the build is done.
Comment 21 Luis Claudio R. Goncalves 2017-04-19 11:46 EDT
Created attachment 1272681 [details]
Server side script for testing the NAPI-POLL  sfc issue

This script should run at x3650m2-01.farm.hsv.redhat.com (RT test box in the Huntsville lab).

From a console session run:

./SERVER

This test requires netperf2, already in the box. There is a note in the script showing how to get netperf2 code if necessary.
Comment 22 Luis Claudio R. Goncalves 2017-04-19 11:49 EDT
Created attachment 1272682 [details]
Client side script for testing the NAPI-poll sfc issue

This script mut run at rhelrt-17.farm.hsv.redhat.com (RT test box at the Huntsville lab).

From and ssh session run:

./CLIENT_17

Go to the console session and wait for the error.
Comment 26 Scott Wood 2017-06-20 23:58:35 EDT
Setting back to MODIFIED as there are patches in the tree to fix the originally reported issue, which I cannot reproduce in the current tree.  The netdev stall is not RT-specific and not related to the previous issues in this BZ.  If there are any other scenarios that lead to RCU stalls in the current tree, please file a new BZ with specific reproduction instructions.
Comment 29 Jianlin Shi 2017-12-04 02:03:02 EST
with steps in description, get error message on 327:

[  213.303023] INFO: rcu_preempt self-detected stall on CPU { 3}(t=60000 jiffies g=15011 c=15010 q=0)
[  213.303025] sending NMI to all CPUs:
[  213.303038] NMI backtrace for cpu 3
[  213.303041] CPU: 3 PID: 9176 Comm: hackbench Not tainted 3.10.0-327.4.5.rt56.206.el7_2.x86_64 #1
[  213.303042] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2007
[  213.303045] task: ffff8803fad52f00 ti: ffff8803fad1c000 task.ti: ffff8803fad1c000
[  213.303054] RIP: 0010:[<ffffffff81041f8f>]  [<ffffffff81041f8f>] flat_send_IPI_mask+0x8f/0xd0
[  213.303055] RSP: 0000:ffff88041fd83d98  EFLAGS: 00010046
[  213.303056] RAX: 0000000000000000 RBX: 0000000000000c00 RCX: 0000000000000000
[  213.303057] RDX: 0000000000000c00 RSI: 0000000000000002 RDI: 0000000000000082
[  213.303058] RBP: ffff88041fd83db8 R08: 0000000000022e08 R09: 0000000000000000
[  213.303058] R10: ffffffff81d55de8 R11: 3a73555043206c6c R12: 000000000000000f
[  213.303059] R13: 0000000000000004 R14: ffff88041fd8fb80 R15: 0000000000000000
[  213.303061] FS:  00007f6122f92740(0000) GS:ffff88041fd80000(0000) knlGS:0000000000000000
[  213.303062] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[  213.303063] CR2: 00007fff9dbb6fa0 CR3: 00000003f996a000 CR4: 00000000000006e0
[  213.303068] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[  213.303073] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[  213.303073] Stack:
[  213.303076]  0000000000000082 0000000200000004 0000000000000001 0000000000000003
[  213.303078]  ffff88041fd83de0 ffffffff8103c7c1 ffffffff81974e80 0000000000000003
[  213.303079]  ffffffff81974e80 ffff88041fd83e40 ffffffff811170f9 0000000000000082
[  213.303079] Call Trace:
[  213.303083]  <IRQ>
[  213.303088]  [<ffffffff8103c7c1>] arch_trigger_all_cpu_backtrace+0x181/0x190
[  213.303092]  [<ffffffff811170f9>] rcu_check_callbacks+0x399/0x770
[  213.303097]  [<ffffffff81081f22>] update_process_times+0x42/0x60
[  213.303101]  [<ffffffff810cdbb5>] tick_sched_handle.isra.19+0x25/0x60
[  213.303103]  [<ffffffff810cdd94>] tick_sched_timer+0x44/0x70
[  213.303107]  [<ffffffff8109e175>] __run_hrtimer+0x85/0x270
[  213.303109]  [<ffffffff810cdd50>] ? tick_sched_do_timer+0x60/0x60
[  213.303112]  [<ffffffff8109eff0>] hrtimer_interrupt+0x120/0x2a0
[  213.303114]  [<ffffffff8103a757>] local_apic_timer_interrupt+0x37/0x60
[  213.303118]  [<ffffffff81633d6f>] smp_apic_timer_interrupt+0x3f/0x60
[  213.303121]  [<ffffffff8163265d>] apic_timer_interrupt+0x6d/0x80
[  213.303122]  <EOI>
[  213.303125]  [<ffffffff81631aed>] ? sysret_audit+0x17/0x21
[  213.303137] Code: 25 00 a3 5a ff 80 e6 10 75 f2 44 89 e2 c1 e2 18 89 14 25 10 a3 5a ff 89 f2 09 da 80 cf 04 83 fe 02 0f 44 d3 89 14 25 00 a3 5a ff <48> 83 3d 31 72 90 00 00 74 12 57 9d0f 1f 44 00 00 48 83 c4 10
[  213.303139] NMI backtrace for cpu 0
[  213.303142] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 3.10.0-327.4.5.rt56.206.el7_2.x86_64#1
[  213.303143] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2007
[  213.303144] task: ffffffff81936440 ti: ffffffff81920000 task.ti: ffffffff81920000
[  213.303150] RIP: 0010:[<ffffffff81041adc>]  [<ffffffff81041adc>] native_apic_mem_write+0xc/0x10
[  213.303150] RSP: 0018:ffff88041fc03ea8  EFLAGS: 00010006
[  213.303151] RAX: ffffffff81948600 RBX: ffff88041fc0d800 RCX: 0000000000000020
[  213.303152] RDX: 0000000225c17d03 RSI: 000000000000ec71 RDI: 0000000000000380
[  213.303153] RBP: ffff88041fc03ea8 R08: 00000000000988ee R09: 0000000000000000
[  213.303153] R10: 0000000000000004 R11: 0000000000000005 R12: 00000000000ec6aa
[  213.303154] R13: 0000000000000000 R14: ffff88041fc0ee60 R15: ffff88041fc0f900
[  213.303155] FS:  0000000000000000(0000) GS:ffff88041fc00000(0000) knlGS:0000000000000000
[  213.303156] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[  213.303157] CR2: 00007f77a1a35416 CR3: 00000003fad3e000 CR4: 00000000000006f0
[  213.303162] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[  213.303166] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[  213.303167] Stack:
[  213.303169]  ffff88041fc03eb8 ffffffff81039e9d ffff88041fc03ee0 ffffffff810cbbcb
[  213.303170]  00000031a9e8be00 0000000000000000 0000000000000004 ffff88041fc03ef0
[  213.303172]  ffffffff810cd724 ffff88041fc03f78 ffffffff8109f040 00000031a9d9d9a9
[  213.303172] Call Trace:
[  213.303174]  <IRQ>
[  213.303177]  [<ffffffff81039e9d>] lapic_next_event+0x1d/0x30
[  213.303179]  [<ffffffff810cbbcb>] clockevents_program_event+0x6b/0xf0
[  213.303181]  [<ffffffff810cd724>] tick_program_event+0x24/0x30
[  213.303184]  [<ffffffff8109f040>] hrtimer_interrupt+0x170/0x2a0
[  213.303187]  [<ffffffff8103a757>] local_apic_timer_interrupt+0x37/0x60
[  213.303189]  [<ffffffff81633d6f>] smp_apic_timer_interrupt+0x3f/0x60
[  213.303192]  [<ffffffff8163265d>] apic_timer_interrupt+0x6d/0x80
[  213.303193]  <EOI>
[  213.303195]  [<ffffffff8104a086>] ? native_safe_halt+0x6/0x10
[  213.303201]  [<ffffffff8100c7ed>] default_idle+0x2d/0x130
[  213.303204]  [<ffffffff8100d40e>] arch_cpu_idle+0x2e/0x40
[  213.303207]  [<ffffffff810c350f>] cpu_startup_entry+0x2af/0x340
[  213.303211]  [<ffffffff81613b64>] rest_init+0x84/0x90
[  213.303216]  [<ffffffff81aa0037>] start_kernel+0x417/0x438
[  213.303218]  [<ffffffff81a9fa29>] ? repair_env_string+0x5c/0x5c
[  213.303220]  [<ffffffff81a9f120>] ? early_idt_handlers+0x120/0x120
[  213.303222]  [<ffffffff81a9f5e2>] x86_64_start_reservations+0x2a/0x2c
[  213.303224]  [<ffffffff81a9f734>] x86_64_start_kernel+0x150/0x173
[  213.303236] Code: 1f 44 00 00 55 ba 00 00 08 00 48 89 e5 e8 cd fd ff ff 5d c3 66 2e 0f 1f 84 00 00 00 00 00 90 55 89 ff 48 89 e5 89 b7 00 a0 5a ff <5d> c3 66 90 55 89 ff 8b 87 00 a0 5aff 48 89 e5 5d c3 66 90 55
[  213.303239] NMI backtrace for cpu 1
[  213.303241] CPU: 1 PID: 0 Comm: swapper/1 Not tainted 3.10.0-327.4.5.rt56.206.el7_2.x86_64#1
[  213.303242] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2007
[  213.303243] task: ffff8803fc59de00 ti: ffff8803fc5c8000 task.ti: ffff8803fc5c8000
[  213.303249] RIP: 0010:[<ffffffff8104af3c>]  [<ffffffff8104af3c>] pvclock_clocksource_read+0xcc/0xe0
[  213.303250] RSP: 0018:ffff88041fc83e08  EFLAGS: 00000083
[  213.303250] RAX: 000010f7d4d6c891 RBX: 0000000000000000 RCX: 0000000000000000
[  213.303251] RDX: 000000000002508c RSI: 0000000080000431 RDI: ffff88041ff7b040
[  213.303252] RBP: ffff88041fc83e20 R08: 000000000002508c R09: 0000000000000001
[  213.303252] R10: 0000000000000000 R11: 0000000000000000 R12: 000010f7d4d6c891
[  213.303253] R13: 0000000000000000 R14: 0000000000067f54 R15: ffff88041fc8f900
[  213.303254] FS:  0000000000000000(0000) GS:ffff88041fc80000(0000) knlGS:0000000000000000
[  213.303255] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[  213.303256] CR2: 00007f55a6aa8000 CR3: 00000003fad3e000 CR4: 00000000000006e0
[  213.303261] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[  213.303266] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[  213.303266] Stack:
[  213.303268]  ffffffff81049beb ffff8803fc5c8000 ffff8803fc5cbfd8 ffff88041fc83e48
[  213.303270]  ffffffff81049beb ffffffff810b798f ffffffff81948e80 00000000000000d4
[  213.303271]  ffff88041fc83e58 ffffffff81049c29 ffff88041fc83e88 ffffffff810c559c
[  213.303272] Call Trace:
[  213.303273]  <IRQ>
[  213.303275]  [<ffffffff81049beb>] ? kvm_clock_read+0x3b/0x70
[  213.303276]  [<ffffffff81049beb>] kvm_clock_read+0x3b/0x70
[  213.303281]  [<ffffffff810b798f>] ? cpupri_set+0x9f/0x100
[  213.303283]  [<ffffffff81049c29>] kvm_clock_get_cycles+0x9/0x10
[  213.303285]  [<ffffffff810c559c>] ktime_get+0x4c/0xd0
[  213.303288]  [<ffffffff810cdd6f>] tick_sched_timer+0x1f/0x70
[  213.303291]  [<ffffffff8109e175>] __run_hrtimer+0x85/0x270
[  213.303293]  [<ffffffff810cdd50>] ? tick_sched_do_timer+0x60/0x60
[  213.303295]  [<ffffffff8109eff0>] hrtimer_interrupt+0x120/0x2a0
[  213.303299]  [<ffffffff8103a757>] local_apic_timer_interrupt+0x37/0x60
[  213.303301]  [<ffffffff81633d6f>] smp_apic_timer_interrupt+0x3f/0x60
[  213.303303]  [<ffffffff8163265d>] apic_timer_interrupt+0x6d/0x80
[  213.303304]  <EOI>
[  213.303306]  [<ffffffff8104a086>] ? native_safe_halt+0x6/0x10
[  213.303308]  [<ffffffff8100c7ed>] default_idle+0x2d/0x130
[  213.303311]  [<ffffffff8100d40e>] arch_cpu_idle+0x2e/0x40
[  213.303313]  [<ffffffff810c350f>] cpu_startup_entry+0x2af/0x340
[  213.303315]  [<ffffffff81038648>] start_secondary+0x1b8/0x230
[  213.303327] Code: 08 5b 41 5c 5d c3 89 da 48 89 45 e8 83 e2 fd 88 57 1d e8 c8 fe ff ff 48 8b 45 e8 eb af 49 39 c4 72 db f0 4c 0f b1 25 6c 87 c2 00 <4c> 39 e0 74 cd eb eb 66 66 66 66 2e0f 1f 84 00 00 00 00 00 55
[  213.303332] NMI backtrace for cpu 2
[  213.303336] CPU: 2 PID: 0 Comm: swapper/2 Not tainted 3.10.0-327.4.5.rt56.206.el7_2.x86_64#1
[  213.303338] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2007
[  213.303340] task: ffff8803fc59e9c0 ti: ffff8803fc5cc000 task.ti: ffff8803fc5cc000
[  213.303350] RIP: 0010:[<ffffffff81041adc>]  [<ffffffff81041adc>] native_apic_mem_write+0xc/0x10
[  213.303352] RSP: 0018:ffff88041fd03ea8  EFLAGS: 00010002
[  213.303353] RAX: ffffffff81948600 RBX: ffff88041fd0d800 RCX: 0000000000000020
[  213.303354] RDX: 0000000225c17d03 RSI: 000000000000ec51 RDI: 0000000000000380
[  213.303355] RBP: ffff88041fd03ea8 R08: 00000000000203ba R09: 0000000000000000
[  213.303356] R10: 0000000000000004 R11: 0000000000000005 R12: 00000000000ec4aa
[  213.303357] R13: 0000000000000000 R14: ffff88041fd0ee60 R15: ffff8803fa0cf100
[  213.303358] FS:  0000000000000000(0000) GS:ffff88041fd00000(0000) knlGS:0000000000000000
[  213.303359] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[  213.303360] CR2: 00007f74a2ad1720 CR3: 0000000035a78000 CR4: 00000000000006e0
[  213.303366] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[  213.303375] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[  213.303376] Stack:
[  213.303380]  ffff88041fd03eb8 ffffffff81039e9d ffff88041fd03ee0 ffffffff810cbbcb
[  213.303381]  00000031a9e8be00 0000000000000000 0000000000000004 ffff88041fd03ef0
[  213.303384]  ffffffff810cd724 ffff88041fd03f78 ffffffff8109f040 00000031a9d9ee45
[  213.303384] Call Trace:
[  213.303387]  <IRQ>
[  213.303393]  [<ffffffff81039e9d>] lapic_next_event+0x1d/0x30
[  213.303396]  [<ffffffff810cbbcb>] clockevents_program_event+0x6b/0xf0
[  213.303399]  [<ffffffff810cd724>] tick_program_event+0x24/0x30
[  213.303405]  [<ffffffff8109f040>] hrtimer_interrupt+0x170/0x2a0
[  213.303408]  [<ffffffff8103a757>] local_apic_timer_interrupt+0x37/0x60
[  213.303411]  [<ffffffff81633d6f>] smp_apic_timer_interrupt+0x3f/0x60
[  213.303415]  [<ffffffff8163265d>] apic_timer_interrupt+0x6d/0x80
[  213.303417]  <EOI>
[  213.303420]  [<ffffffff8104a086>] ? native_safe_halt+0x6/0x10
[  213.303425]  [<ffffffff8100c7ed>] default_idle+0x2d/0x130
[  213.303428]  [<ffffffff8100d40e>] arch_cpu_idle+0x2e/0x40
[  213.303432]  [<ffffffff810c350f>] cpu_startup_entry+0x2af/0x340
[  213.303435]  [<ffffffff81038648>] start_secondary+0x1b8/0x230
[  213.303453] Code: 1f 44 00 00 55 ba 00 00 08 00 48 89 e5 e8 cd fd ff ff 5d c3 66 2e 0f 1f 84 00 00 00 00 00 90 55 89 ff 48 89 e5 89 b7 00 a0 5a ff <5d> c3 66 90 55 89 ff 8b 87 00 a0 5aff 48 89 e5 5d c3 66 90 55
[  213.304035] INFO: rcu_preempt detected stalls on CPUs/tasks: { 3} (detected by 0, t=60002 jiffies, g=15011, c=15010, q=0)



error message on 3.10.0-799:

[  148.976021] INFO: rcu_preempt self-detected stall on CPU { 3}  (t=60000 jiffies g=7891 c=7890 q=830)
[  148.976022] Task dump for CPU 3:
[  148.976025] hackbench       R  running task        0  1683   1681 0x00000080
[  148.976026] Call Trace:
[  148.976037]  <IRQ>  [<ffffffffbd0c25f6>] sched_show_task+0xb6/0x120
[  148.976039]  [<ffffffffbd0c6839>] dump_cpu_task+0x39/0x70
[  148.976044]  [<ffffffffbd13efe0>] rcu_dump_cpu_stacks+0x90/0xd0
[  148.976046]  [<ffffffffbd143ca6>] rcu_check_callbacks+0x476/0x860
[  148.976051]  [<ffffffffbd095e8b>] update_process_times+0x4b/0x80
[  148.976054]  [<ffffffffbd0f57a0>] tick_sched_handle+0x30/0x70
[  148.976055]  [<ffffffffbd0f5bc9>] tick_sched_timer+0x39/0x80
[  148.976058]  [<ffffffffbd0b1bd4>] __run_hrtimer+0xc4/0x2c0
[  148.976060]  [<ffffffffbd0f5b90>] ? tick_sched_do_timer+0x50/0x50
[  148.976061]  [<ffffffffbd0b2b00>] hrtimer_interrupt+0x130/0x350
[  148.976066]  [<ffffffffbd049565>] local_apic_timer_interrupt+0x35/0x60
[  148.976070]  [<ffffffffbd6f09dd>] smp_apic_timer_interrupt+0x3d/0x50
[  148.976072]  [<ffffffffbd6ef0dd>] apic_timer_interrupt+0x6d/0x80
[  148.976076]  <EOI>  [<ffffffffbd67581f>] ? unix_stream_recvmsg+0x3f/0x70
[  148.976079]  [<ffffffffbd6711a0>] ? unix_state_double_unlock+0x60/0x60
[  148.976082]  [<ffffffffbd599766>] ? sock_aio_read.part.10+0x146/0x160
[  148.976084]  [<ffffffffbd5997a1>] ? sock_aio_read+0x21/0x30
[  148.976087]  [<ffffffffbd20646d>] ? do_sync_read+0x8d/0xd0
[  148.976089]  [<ffffffffbd206f25>] ? vfs_read+0x145/0x170
[  148.976091]  [<ffffffffbd207d3f>] ? SyS_read+0x7f/0xe0
[  148.976093]  [<ffffffffbd6ee489>] ? system_call_fastpath+0x16/0x1b

the original issue have been fixed on 3.10.0-799, set VERIFIED

Note You need to log in before you can comment on or make changes to this bug.