RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.
Bug 1292927 - rcuc starvation leads to rcu stall
Summary: rcuc starvation leads to rcu stall
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 7
Classification: Red Hat
Component: kernel-rt
Version: 7.3
Hardware: x86_64
OS: Linux
high
high
Target Milestone: rc
: 7.3
Assignee: Crystal Wood
QA Contact: Jianlin Shi
URL:
Whiteboard:
Depends On:
Blocks: 1293229 1295885 1420851 1442258 1449577
TreeView+ depends on / blocked
 
Reported: 2015-12-18 18:56 UTC by Daniel Bristot de Oliveira
Modified: 2019-09-12 09:37 UTC (History)
11 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
: 1293229 (view as bug list)
Environment:
Last Closed: 2018-04-10 09:07:09 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
rt: rcu: Boost rcuc if it has 2 jiffies before splatting (3.18 KB, patch)
2015-12-18 19:14 UTC, Clark Williams
no flags Details | Diff
rcu: Boost rcuc if it has 4 jiffies before splatting (3.25 KB, patch)
2016-01-07 02:43 UTC, Luis Claudio R. Goncalves
no flags Details | Diff
softirq: Perform softirqs in local_bh_enable() for a limited amount of time (3.64 KB, patch)
2016-01-07 02:44 UTC, Luis Claudio R. Goncalves
no flags Details | Diff
Server side script for testing the NAPI-POLL sfc issue (1.29 KB, text/plain)
2017-04-19 15:46 UTC, Luis Claudio R. Goncalves
no flags Details
Client side script for testing the NAPI-poll sfc issue (1.64 KB, text/plain)
2017-04-19 15:49 UTC, Luis Claudio R. Goncalves
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHSA-2018:0676 0 None None None 2018-04-10 09:09:31 UTC

Description Daniel Bristot de Oliveira 2015-12-18 18:56:40 UTC
Description of problem:

Kernel emits RCU stall messages if a RT task with priority higher than or equals
to FIFO:2 runs for more than 60 seconds.

This occurs due to the starvation of the rcuc/ threads, that runs with priority
FIFO:2.

How reproducible:
Always

Steps to Reproduce:
1. set rcu_nocbs=$CPU kernel parameter for a given CPU. For instance, CPU 3:

   Edit the file /etc/sysconfig/grub, and add the "rcu_nocbs=3" parameter in the
   GRUB_CMDLINE_LINUX= option.

   Apply grub's config using the following command:
   # grub2-mkconfig > /etc/grub2.cfg

2. Reboot the system

3. Move all rcu threads to a house keeping CPU. For example, to the CPU 0:
   # for i in `pgrep rcu[^c]` ; do taskset -pc 0 $i > /dev/null ; done

4. Run hackbench (from rt-tests package) on the CPU configured as rcu_nocbs,
   with a FIFO prio >= rcuc/CPU thread's prio. On or example, the CPU 3:

   # taskset -c 3 chrt -f 10 hackbench -g 2 -f 2 -s 1000 -l 10000000000 > /dev/null 2&>1
   
5. Wait for 61 seconds, kill hackbench, and run dmesg to see RCU stall messages.

Actual results:
   RCU stall messages in the CPU that is running hackbench.

Expected results:
   No RCU stall messages.

Additional info:
  Workaround: Increase rcuc threads priority.

Comment 2 Clark Williams 2015-12-18 19:14:14 UTC
Created attachment 1107358 [details]
rt: rcu: Boost rcuc if it has 2 jiffies before splatting

Patch to prevent RCU starvation on RT kernel

Comment 5 Luis Claudio R. Goncalves 2016-01-07 02:43:29 UTC
Created attachment 1112322 [details]
rcu: Boost rcuc if it has 4 jiffies before splatting

Comment 6 Luis Claudio R. Goncalves 2016-01-07 02:44:31 UTC
Created attachment 1112323 [details]
softirq: Perform softirqs in local_bh_enable() for a limited amount of time

Comment 8 Jianlin Shi 2016-09-29 08:11:54 UTC
RCU stall messages still appear when run reproducer on 3.10.0-510.


[root@ibm-x3650m5-03 ~]# uname -a
Linux ibm-x3650m5-03.rhts.eng.pek2.redhat.com 3.10.0-510.rt56.415.el7.x86_64 #1 SMP PREEMPT RT Wed Sep 21 16:48:53 EDT 2016 x86_64 x86_64 x86_64 GNU/Linux

[root@ibm-x3650m5-03 ~]# cat /proc/cmdline 
BOOT_IMAGE=/vmlinuz-3.10.0-510.rt56.415.el7.x86_64 root=/dev/mapper/rhel_ibm--x3650m5--03-root ro crashkernel=auto rd.lvm.lv=rhel_ibm-x3650m5-03/root rd.lvm.lv=rhel_ibm-x3650m5-03/swap console=ttyS0,115200n81 LANG=en_US.UTF-8 rcu_nocbs=3
[root@ibm-x3650m5-03 ~]# for i in `pgrep rcu[^c]` ; do taskset -pc 0 $i > /dev/null ; done
[root@ibm-x3650m5-03 ~]# taskset -c 3 chrt -f 10 hackbench -g 2 -f 2 -s 1000 -l 10000000000 > /dev/null 2&>1


ibm-x3650m5-03 login: [  201.042244] INFO: rcu_preempt self-detected stall on CPU { 3}  (t=60000 jiffies g=3269 c=3268 q=5479)
[  201.042245] Task dump for CPU 3:
[  201.042246] hackbench       R  running task        0  3208   3207 0x00000080
[  201.042260]  ffff88086cc7c080 00000000d580a3e4 ffff88046fac3d90 ffffffff810bc1a6
[  201.042261]  0000000000000003 ffffffff81a01f00 ffff88046fac3da8 ffffffff810bffd9
[  201.042262]  0000000000000004 ffff88046fac3dd8 ffffffff8112fdc0 ffff88046fad20e0
[  201.042262] Call Trace:
[  201.042269]  <IRQ>  [<ffffffff810bc1a6>] sched_show_task+0xb6/0x120
[  201.042271]  [<ffffffff810bffd9>] dump_cpu_task+0x39/0x70
[  201.042274]  [<ffffffff8112fdc0>] rcu_dump_cpu_stacks+0x90/0xd0
[  201.042276]  [<ffffffff8113467a>] rcu_check_callbacks+0x49a/0x840
[  201.042279]  [<ffffffff8108f0e2>] update_process_times+0x42/0x70
[  201.042281]  [<ffffffff810e98d5>] tick_sched_handle.isra.18+0x25/0x60
[  201.042282]  [<ffffffff810e9aa4>] tick_sched_timer+0x44/0x70
[  201.042285]  [<ffffffff810ab605>] __run_hrtimer+0x85/0x270
[  201.042286]  [<ffffffff810e9a60>] ? tick_sched_do_timer+0x50/0x50
[  201.042288]  [<ffffffff810ac460>] hrtimer_interrupt+0x120/0x2a0
[  201.042291]  [<ffffffff810437f7>] local_apic_timer_interrupt+0x37/0x60
[  201.042295]  [<ffffffff8169292f>] smp_apic_timer_interrupt+0x3f/0x60
[  201.042297]  [<ffffffff8169109d>] apic_timer_interrupt+0x6d/0x80
[  201.042300]  <EOI>  [<ffffffff810a3320>] ? task_work_run+0xe0/0xe0
[  201.042303]  [<ffffffff812a5d59>] ? sock_has_perm+0x49/0xc0
[  201.042305]  [<ffffffff812a5eb3>] selinux_socket_recvmsg+0x23/0x30
[  201.042307]  [<ffffffff812a2fb6>] security_socket_recvmsg+0x16/0x20
[  201.042310]  [<ffffffff8154accc>] sock_aio_read.part.7+0xdc/0x160
[  201.042312]  [<ffffffff8154ad71>] sock_aio_read+0x21/0x30
[  201.042314]  [<ffffffff811f2b9d>] do_sync_read+0x8d/0xd0
[  201.042316]  [<ffffffff811f340d>] vfs_read+0x14d/0x170
[  201.042317]  [<ffffffff811f3f2f>] SyS_read+0x7f/0xe0
[  201.042318]  [<ffffffff81690409>] system_call_fastpath+0x16/0x1b


BTW, also tried on 3.10.0-327.10.1.rt56.211.el7_2.x86_64, no RCU stall messages appeared.

Comment 9 Clark Williams 2016-09-29 15:05:25 UTC
(In reply to Jianlin Shi from comment #8)
> RCU stall messages still appear when run reproducer on 3.10.0-510.
> 

Looks like I was missing a section of the patch which modified kernel/rcutree.c (the else clause). I've updated the patch and when my current -511 build finishes, I'll kick off a new build (also -511 based) which should fix this issue. 

I'll move to Modified when the build is done.

Comment 21 Luis Claudio R. Goncalves 2017-04-19 15:46:54 UTC
Created attachment 1272681 [details]
Server side script for testing the NAPI-POLL  sfc issue

This script should run at x3650m2-01.farm.hsv.redhat.com (RT test box in the Huntsville lab).

From a console session run:

./SERVER

This test requires netperf2, already in the box. There is a note in the script showing how to get netperf2 code if necessary.

Comment 22 Luis Claudio R. Goncalves 2017-04-19 15:49:39 UTC
Created attachment 1272682 [details]
Client side script for testing the NAPI-poll sfc issue

This script mut run at rhelrt-17.farm.hsv.redhat.com (RT test box at the Huntsville lab).

From and ssh session run:

./CLIENT_17

Go to the console session and wait for the error.

Comment 26 Crystal Wood 2017-06-21 03:58:35 UTC
Setting back to MODIFIED as there are patches in the tree to fix the originally reported issue, which I cannot reproduce in the current tree.  The netdev stall is not RT-specific and not related to the previous issues in this BZ.  If there are any other scenarios that lead to RCU stalls in the current tree, please file a new BZ with specific reproduction instructions.

Comment 29 Jianlin Shi 2017-12-04 07:03:02 UTC
with steps in description, get error message on 327:

[  213.303023] INFO: rcu_preempt self-detected stall on CPU { 3}(t=60000 jiffies g=15011 c=15010 q=0)
[  213.303025] sending NMI to all CPUs:
[  213.303038] NMI backtrace for cpu 3
[  213.303041] CPU: 3 PID: 9176 Comm: hackbench Not tainted 3.10.0-327.4.5.rt56.206.el7_2.x86_64 #1
[  213.303042] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2007
[  213.303045] task: ffff8803fad52f00 ti: ffff8803fad1c000 task.ti: ffff8803fad1c000
[  213.303054] RIP: 0010:[<ffffffff81041f8f>]  [<ffffffff81041f8f>] flat_send_IPI_mask+0x8f/0xd0
[  213.303055] RSP: 0000:ffff88041fd83d98  EFLAGS: 00010046
[  213.303056] RAX: 0000000000000000 RBX: 0000000000000c00 RCX: 0000000000000000
[  213.303057] RDX: 0000000000000c00 RSI: 0000000000000002 RDI: 0000000000000082
[  213.303058] RBP: ffff88041fd83db8 R08: 0000000000022e08 R09: 0000000000000000
[  213.303058] R10: ffffffff81d55de8 R11: 3a73555043206c6c R12: 000000000000000f
[  213.303059] R13: 0000000000000004 R14: ffff88041fd8fb80 R15: 0000000000000000
[  213.303061] FS:  00007f6122f92740(0000) GS:ffff88041fd80000(0000) knlGS:0000000000000000
[  213.303062] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[  213.303063] CR2: 00007fff9dbb6fa0 CR3: 00000003f996a000 CR4: 00000000000006e0
[  213.303068] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[  213.303073] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[  213.303073] Stack:
[  213.303076]  0000000000000082 0000000200000004 0000000000000001 0000000000000003
[  213.303078]  ffff88041fd83de0 ffffffff8103c7c1 ffffffff81974e80 0000000000000003
[  213.303079]  ffffffff81974e80 ffff88041fd83e40 ffffffff811170f9 0000000000000082
[  213.303079] Call Trace:
[  213.303083]  <IRQ>
[  213.303088]  [<ffffffff8103c7c1>] arch_trigger_all_cpu_backtrace+0x181/0x190
[  213.303092]  [<ffffffff811170f9>] rcu_check_callbacks+0x399/0x770
[  213.303097]  [<ffffffff81081f22>] update_process_times+0x42/0x60
[  213.303101]  [<ffffffff810cdbb5>] tick_sched_handle.isra.19+0x25/0x60
[  213.303103]  [<ffffffff810cdd94>] tick_sched_timer+0x44/0x70
[  213.303107]  [<ffffffff8109e175>] __run_hrtimer+0x85/0x270
[  213.303109]  [<ffffffff810cdd50>] ? tick_sched_do_timer+0x60/0x60
[  213.303112]  [<ffffffff8109eff0>] hrtimer_interrupt+0x120/0x2a0
[  213.303114]  [<ffffffff8103a757>] local_apic_timer_interrupt+0x37/0x60
[  213.303118]  [<ffffffff81633d6f>] smp_apic_timer_interrupt+0x3f/0x60
[  213.303121]  [<ffffffff8163265d>] apic_timer_interrupt+0x6d/0x80
[  213.303122]  <EOI>
[  213.303125]  [<ffffffff81631aed>] ? sysret_audit+0x17/0x21
[  213.303137] Code: 25 00 a3 5a ff 80 e6 10 75 f2 44 89 e2 c1 e2 18 89 14 25 10 a3 5a ff 89 f2 09 da 80 cf 04 83 fe 02 0f 44 d3 89 14 25 00 a3 5a ff <48> 83 3d 31 72 90 00 00 74 12 57 9d0f 1f 44 00 00 48 83 c4 10
[  213.303139] NMI backtrace for cpu 0
[  213.303142] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 3.10.0-327.4.5.rt56.206.el7_2.x86_64#1
[  213.303143] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2007
[  213.303144] task: ffffffff81936440 ti: ffffffff81920000 task.ti: ffffffff81920000
[  213.303150] RIP: 0010:[<ffffffff81041adc>]  [<ffffffff81041adc>] native_apic_mem_write+0xc/0x10
[  213.303150] RSP: 0018:ffff88041fc03ea8  EFLAGS: 00010006
[  213.303151] RAX: ffffffff81948600 RBX: ffff88041fc0d800 RCX: 0000000000000020
[  213.303152] RDX: 0000000225c17d03 RSI: 000000000000ec71 RDI: 0000000000000380
[  213.303153] RBP: ffff88041fc03ea8 R08: 00000000000988ee R09: 0000000000000000
[  213.303153] R10: 0000000000000004 R11: 0000000000000005 R12: 00000000000ec6aa
[  213.303154] R13: 0000000000000000 R14: ffff88041fc0ee60 R15: ffff88041fc0f900
[  213.303155] FS:  0000000000000000(0000) GS:ffff88041fc00000(0000) knlGS:0000000000000000
[  213.303156] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[  213.303157] CR2: 00007f77a1a35416 CR3: 00000003fad3e000 CR4: 00000000000006f0
[  213.303162] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[  213.303166] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[  213.303167] Stack:
[  213.303169]  ffff88041fc03eb8 ffffffff81039e9d ffff88041fc03ee0 ffffffff810cbbcb
[  213.303170]  00000031a9e8be00 0000000000000000 0000000000000004 ffff88041fc03ef0
[  213.303172]  ffffffff810cd724 ffff88041fc03f78 ffffffff8109f040 00000031a9d9d9a9
[  213.303172] Call Trace:
[  213.303174]  <IRQ>
[  213.303177]  [<ffffffff81039e9d>] lapic_next_event+0x1d/0x30
[  213.303179]  [<ffffffff810cbbcb>] clockevents_program_event+0x6b/0xf0
[  213.303181]  [<ffffffff810cd724>] tick_program_event+0x24/0x30
[  213.303184]  [<ffffffff8109f040>] hrtimer_interrupt+0x170/0x2a0
[  213.303187]  [<ffffffff8103a757>] local_apic_timer_interrupt+0x37/0x60
[  213.303189]  [<ffffffff81633d6f>] smp_apic_timer_interrupt+0x3f/0x60
[  213.303192]  [<ffffffff8163265d>] apic_timer_interrupt+0x6d/0x80
[  213.303193]  <EOI>
[  213.303195]  [<ffffffff8104a086>] ? native_safe_halt+0x6/0x10
[  213.303201]  [<ffffffff8100c7ed>] default_idle+0x2d/0x130
[  213.303204]  [<ffffffff8100d40e>] arch_cpu_idle+0x2e/0x40
[  213.303207]  [<ffffffff810c350f>] cpu_startup_entry+0x2af/0x340
[  213.303211]  [<ffffffff81613b64>] rest_init+0x84/0x90
[  213.303216]  [<ffffffff81aa0037>] start_kernel+0x417/0x438
[  213.303218]  [<ffffffff81a9fa29>] ? repair_env_string+0x5c/0x5c
[  213.303220]  [<ffffffff81a9f120>] ? early_idt_handlers+0x120/0x120
[  213.303222]  [<ffffffff81a9f5e2>] x86_64_start_reservations+0x2a/0x2c
[  213.303224]  [<ffffffff81a9f734>] x86_64_start_kernel+0x150/0x173
[  213.303236] Code: 1f 44 00 00 55 ba 00 00 08 00 48 89 e5 e8 cd fd ff ff 5d c3 66 2e 0f 1f 84 00 00 00 00 00 90 55 89 ff 48 89 e5 89 b7 00 a0 5a ff <5d> c3 66 90 55 89 ff 8b 87 00 a0 5aff 48 89 e5 5d c3 66 90 55
[  213.303239] NMI backtrace for cpu 1
[  213.303241] CPU: 1 PID: 0 Comm: swapper/1 Not tainted 3.10.0-327.4.5.rt56.206.el7_2.x86_64#1
[  213.303242] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2007
[  213.303243] task: ffff8803fc59de00 ti: ffff8803fc5c8000 task.ti: ffff8803fc5c8000
[  213.303249] RIP: 0010:[<ffffffff8104af3c>]  [<ffffffff8104af3c>] pvclock_clocksource_read+0xcc/0xe0
[  213.303250] RSP: 0018:ffff88041fc83e08  EFLAGS: 00000083
[  213.303250] RAX: 000010f7d4d6c891 RBX: 0000000000000000 RCX: 0000000000000000
[  213.303251] RDX: 000000000002508c RSI: 0000000080000431 RDI: ffff88041ff7b040
[  213.303252] RBP: ffff88041fc83e20 R08: 000000000002508c R09: 0000000000000001
[  213.303252] R10: 0000000000000000 R11: 0000000000000000 R12: 000010f7d4d6c891
[  213.303253] R13: 0000000000000000 R14: 0000000000067f54 R15: ffff88041fc8f900
[  213.303254] FS:  0000000000000000(0000) GS:ffff88041fc80000(0000) knlGS:0000000000000000
[  213.303255] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[  213.303256] CR2: 00007f55a6aa8000 CR3: 00000003fad3e000 CR4: 00000000000006e0
[  213.303261] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[  213.303266] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[  213.303266] Stack:
[  213.303268]  ffffffff81049beb ffff8803fc5c8000 ffff8803fc5cbfd8 ffff88041fc83e48
[  213.303270]  ffffffff81049beb ffffffff810b798f ffffffff81948e80 00000000000000d4
[  213.303271]  ffff88041fc83e58 ffffffff81049c29 ffff88041fc83e88 ffffffff810c559c
[  213.303272] Call Trace:
[  213.303273]  <IRQ>
[  213.303275]  [<ffffffff81049beb>] ? kvm_clock_read+0x3b/0x70
[  213.303276]  [<ffffffff81049beb>] kvm_clock_read+0x3b/0x70
[  213.303281]  [<ffffffff810b798f>] ? cpupri_set+0x9f/0x100
[  213.303283]  [<ffffffff81049c29>] kvm_clock_get_cycles+0x9/0x10
[  213.303285]  [<ffffffff810c559c>] ktime_get+0x4c/0xd0
[  213.303288]  [<ffffffff810cdd6f>] tick_sched_timer+0x1f/0x70
[  213.303291]  [<ffffffff8109e175>] __run_hrtimer+0x85/0x270
[  213.303293]  [<ffffffff810cdd50>] ? tick_sched_do_timer+0x60/0x60
[  213.303295]  [<ffffffff8109eff0>] hrtimer_interrupt+0x120/0x2a0
[  213.303299]  [<ffffffff8103a757>] local_apic_timer_interrupt+0x37/0x60
[  213.303301]  [<ffffffff81633d6f>] smp_apic_timer_interrupt+0x3f/0x60
[  213.303303]  [<ffffffff8163265d>] apic_timer_interrupt+0x6d/0x80
[  213.303304]  <EOI>
[  213.303306]  [<ffffffff8104a086>] ? native_safe_halt+0x6/0x10
[  213.303308]  [<ffffffff8100c7ed>] default_idle+0x2d/0x130
[  213.303311]  [<ffffffff8100d40e>] arch_cpu_idle+0x2e/0x40
[  213.303313]  [<ffffffff810c350f>] cpu_startup_entry+0x2af/0x340
[  213.303315]  [<ffffffff81038648>] start_secondary+0x1b8/0x230
[  213.303327] Code: 08 5b 41 5c 5d c3 89 da 48 89 45 e8 83 e2 fd 88 57 1d e8 c8 fe ff ff 48 8b 45 e8 eb af 49 39 c4 72 db f0 4c 0f b1 25 6c 87 c2 00 <4c> 39 e0 74 cd eb eb 66 66 66 66 2e0f 1f 84 00 00 00 00 00 55
[  213.303332] NMI backtrace for cpu 2
[  213.303336] CPU: 2 PID: 0 Comm: swapper/2 Not tainted 3.10.0-327.4.5.rt56.206.el7_2.x86_64#1
[  213.303338] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2007
[  213.303340] task: ffff8803fc59e9c0 ti: ffff8803fc5cc000 task.ti: ffff8803fc5cc000
[  213.303350] RIP: 0010:[<ffffffff81041adc>]  [<ffffffff81041adc>] native_apic_mem_write+0xc/0x10
[  213.303352] RSP: 0018:ffff88041fd03ea8  EFLAGS: 00010002
[  213.303353] RAX: ffffffff81948600 RBX: ffff88041fd0d800 RCX: 0000000000000020
[  213.303354] RDX: 0000000225c17d03 RSI: 000000000000ec51 RDI: 0000000000000380
[  213.303355] RBP: ffff88041fd03ea8 R08: 00000000000203ba R09: 0000000000000000
[  213.303356] R10: 0000000000000004 R11: 0000000000000005 R12: 00000000000ec4aa
[  213.303357] R13: 0000000000000000 R14: ffff88041fd0ee60 R15: ffff8803fa0cf100
[  213.303358] FS:  0000000000000000(0000) GS:ffff88041fd00000(0000) knlGS:0000000000000000
[  213.303359] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[  213.303360] CR2: 00007f74a2ad1720 CR3: 0000000035a78000 CR4: 00000000000006e0
[  213.303366] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[  213.303375] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[  213.303376] Stack:
[  213.303380]  ffff88041fd03eb8 ffffffff81039e9d ffff88041fd03ee0 ffffffff810cbbcb
[  213.303381]  00000031a9e8be00 0000000000000000 0000000000000004 ffff88041fd03ef0
[  213.303384]  ffffffff810cd724 ffff88041fd03f78 ffffffff8109f040 00000031a9d9ee45
[  213.303384] Call Trace:
[  213.303387]  <IRQ>
[  213.303393]  [<ffffffff81039e9d>] lapic_next_event+0x1d/0x30
[  213.303396]  [<ffffffff810cbbcb>] clockevents_program_event+0x6b/0xf0
[  213.303399]  [<ffffffff810cd724>] tick_program_event+0x24/0x30
[  213.303405]  [<ffffffff8109f040>] hrtimer_interrupt+0x170/0x2a0
[  213.303408]  [<ffffffff8103a757>] local_apic_timer_interrupt+0x37/0x60
[  213.303411]  [<ffffffff81633d6f>] smp_apic_timer_interrupt+0x3f/0x60
[  213.303415]  [<ffffffff8163265d>] apic_timer_interrupt+0x6d/0x80
[  213.303417]  <EOI>
[  213.303420]  [<ffffffff8104a086>] ? native_safe_halt+0x6/0x10
[  213.303425]  [<ffffffff8100c7ed>] default_idle+0x2d/0x130
[  213.303428]  [<ffffffff8100d40e>] arch_cpu_idle+0x2e/0x40
[  213.303432]  [<ffffffff810c350f>] cpu_startup_entry+0x2af/0x340
[  213.303435]  [<ffffffff81038648>] start_secondary+0x1b8/0x230
[  213.303453] Code: 1f 44 00 00 55 ba 00 00 08 00 48 89 e5 e8 cd fd ff ff 5d c3 66 2e 0f 1f 84 00 00 00 00 00 90 55 89 ff 48 89 e5 89 b7 00 a0 5a ff <5d> c3 66 90 55 89 ff 8b 87 00 a0 5aff 48 89 e5 5d c3 66 90 55
[  213.304035] INFO: rcu_preempt detected stalls on CPUs/tasks: { 3} (detected by 0, t=60002 jiffies, g=15011, c=15010, q=0)



error message on 3.10.0-799:

[  148.976021] INFO: rcu_preempt self-detected stall on CPU { 3}  (t=60000 jiffies g=7891 c=7890 q=830)
[  148.976022] Task dump for CPU 3:
[  148.976025] hackbench       R  running task        0  1683   1681 0x00000080
[  148.976026] Call Trace:
[  148.976037]  <IRQ>  [<ffffffffbd0c25f6>] sched_show_task+0xb6/0x120
[  148.976039]  [<ffffffffbd0c6839>] dump_cpu_task+0x39/0x70
[  148.976044]  [<ffffffffbd13efe0>] rcu_dump_cpu_stacks+0x90/0xd0
[  148.976046]  [<ffffffffbd143ca6>] rcu_check_callbacks+0x476/0x860
[  148.976051]  [<ffffffffbd095e8b>] update_process_times+0x4b/0x80
[  148.976054]  [<ffffffffbd0f57a0>] tick_sched_handle+0x30/0x70
[  148.976055]  [<ffffffffbd0f5bc9>] tick_sched_timer+0x39/0x80
[  148.976058]  [<ffffffffbd0b1bd4>] __run_hrtimer+0xc4/0x2c0
[  148.976060]  [<ffffffffbd0f5b90>] ? tick_sched_do_timer+0x50/0x50
[  148.976061]  [<ffffffffbd0b2b00>] hrtimer_interrupt+0x130/0x350
[  148.976066]  [<ffffffffbd049565>] local_apic_timer_interrupt+0x35/0x60
[  148.976070]  [<ffffffffbd6f09dd>] smp_apic_timer_interrupt+0x3d/0x50
[  148.976072]  [<ffffffffbd6ef0dd>] apic_timer_interrupt+0x6d/0x80
[  148.976076]  <EOI>  [<ffffffffbd67581f>] ? unix_stream_recvmsg+0x3f/0x70
[  148.976079]  [<ffffffffbd6711a0>] ? unix_state_double_unlock+0x60/0x60
[  148.976082]  [<ffffffffbd599766>] ? sock_aio_read.part.10+0x146/0x160
[  148.976084]  [<ffffffffbd5997a1>] ? sock_aio_read+0x21/0x30
[  148.976087]  [<ffffffffbd20646d>] ? do_sync_read+0x8d/0xd0
[  148.976089]  [<ffffffffbd206f25>] ? vfs_read+0x145/0x170
[  148.976091]  [<ffffffffbd207d3f>] ? SyS_read+0x7f/0xe0
[  148.976093]  [<ffffffffbd6ee489>] ? system_call_fastpath+0x16/0x1b

the original issue have been fixed on 3.10.0-799, set VERIFIED

Comment 32 errata-xmlrpc 2018-04-10 09:07:09 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2018:0676


Note You need to log in before you can comment on or make changes to this bug.