Bug 1296668

Summary: INFO: rcu_sched self-detected stall on CPU[11405.392812] INFO: rcu_sched self-detected stall on CPU
Product: Red Hat Enterprise Linux 7 Reporter: Bill Peck <bpeck>
Component: kernel-aarch64Assignee: Oleg Nesterov <onestero>
kernel-aarch64 sub component: Process management QA Contact: Jeff Bastian <jbastian>
Status: CLOSED CURRENTRELEASE Docs Contact:
Severity: medium    
Priority: medium CC: jbastian, jfeeney
Version: 7.3   
Target Milestone: rc   
Target Release: ---   
Hardware: aarch64   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2016-02-10 21:26:22 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Bill Peck 2016-01-07 19:29:13 UTC
Description of problem:
beaker test /kernel/Biscayne/ltp-lite causes

INFO: rcu_sched self-detected stall on CPU[11405.392812] INFO: rcu_sched self-detected stall on CPU 

Version-Release number of selected component (if applicable):
  kernel-4.4.0-0.rc5.22.el7


How reproducible:
Every time on AMD Seattle systems.  Does not happen on Mustang or HP McDivitt

Actual results:

[11225.895492] Task dump for CPU 1: 
[11225.898709] float_power     R  running task        0  9919   9846 0x00000202 
[11225.905751] Call Trace: 
[11225.908188] [<fffffe0000091930>] ret_from_fork+0x0/0x50 
[11405.391812] INFO: rcu_sched self-detected stall on CPU[11405.392812] INFO: rcu_sched self-detected stall on CPU 
[11405.392815] 	1-...: (10486090 ticks this GP) idle=5a1/140000000000001/0 softirq=145365/145365 fqs=3423851  
[11405.392816] 	 (t=10500232 jiffies g=91545 c=91544 q=229437) 
[11405.392818] Task dump for CPU 0: 
[11405.392819] float_power     R  running task        0  9915   9846 0x00000202 
[11405.392822] Call Trace: 
[11405.392825] [<fffffe0000091930>] ret_from_fork+0x0/0x50 
[11405.392826] Task dump for CPU 1: 
[11405.392826] float_power     R  running task        0  9919   9846 0x00000202 
[11405.392828] Call Trace: 
[11405.392830] [<fffffe0000096ed4>] dump_backtrace+0x0/0x17c 
[11405.392833] [<fffffe0000097074>] show_stack+0x24/0x2c 
[11405.392835] [<fffffe00000f19c0>] sched_show_task+0xa0/0xf4 
[11405.392837] [<fffffe00000f3dac>] dump_cpu_task+0x48/0x54 
[11405.392838] [<fffffe000011bf74>] rcu_dump_cpu_stacks+0xa4/0xf4 
[11405.392840] [<fffffe000011feb4>] rcu_check_callbacks+0x4fc/0x8f4 
[11405.392842] [<fffffe000012527c>] update_process_times+0x44/0x74 
[11405.392844] [<fffffe0000134e38>] tick_sched_handle.isra.15+0x3c/0x7c 
[11405.392846] [<fffffe0000134ec4>] tick_sched_timer+0x4c/0x84 
[11405.392848] [<fffffe00001259d4>] __hrtimer_run_queues+0x13c/0x248 
[11405.392850] [<fffffe00001262e0>] hrtimer_interrupt+0xa0/0x1d4 
[11405.392853] [<fffffe00005c7a90>] arch_timer_handler_phys+0x3c/0x48 
[11405.392855] [<fffffe0000115978>] handle_percpu_devid_irq+0x94/0x124 
[11405.392857] [<fffffe0000110dd0>] generic_handle_irq+0x34/0x4c 
[11405.392859] [<fffffe0000111158>] __handle_domain_irq+0x6c/0xc4 
[11405.392860] [<fffffe00000904a4>] gic_handle_irq+0x64/0xb8 
[11405.392862] Exception stack(0xfffffe035d893bb0 to 0xfffffe035d893cd0) 
[11405.392863] 3ba0:                                   fffffe035471bd7c 0000000000000000 
[11405.392865] 3bc0: fffffe035d893d00 fffffe00001089f8 00000000a0000145 fffffe035471bd7c 
[11405.392867] 3be0: 0000000000000000 0000000000000000 fffffe03fe0e5a80 fffffe03fe0e5a90 
[11405.392868] 3c00: fffffe03fe0c5a80 0000000000000000 0000000000000000 0000000000000000 
[11405.392870] 3c20: 00000000000000de 0000000000000028 000003ffb13c7af4 0000000032ab0ec0 
[11405.392871] 3c40: 000000000000000c 000000000088e922 0000000021287924 0000000807cd621a 
[11405.392873] 3c60: fffffe000009699c 000003ffb11fd028 000003ffe652c9f0 fffffe035471bd7c 
[11405.392875] 3c80: 0000000000000000 0000000000004022 0000000000000000 0000000000000000 
[11405.392876] 3ca0: fffffe035471bd7c 0000000000000000 fffffe035471bd60 fffffe0000782000 
[11405.392877] 3cc0: fffffe035d890000 fffffe035d893d00 
[11405.392879] [<fffffe00000914e8>] el1_irq+0x68/0xc0 
[11405.392882] [<fffffe0000753cd4>] rwsem_down_write_failed+0x98/0x2f8 
[11405.392884] [<fffffe0000753490>] down_write+0x60/0x64 
[11405.392886] [<fffffe00001d8c94>] vm_mmap_pgoff+0x88/0xe8 
[11405.392889] [<fffffe00001f0a50>] SyS_mmap_pgoff+0x190/0x214 
[11405.392891] [<fffffe00000969f0>] sys_mmap+0x54/0x68 
[11405.392893] [<fffffe0000091a0c>] __sys_trace_return+0x0/0x4 
 
[11405.658723]  
[11405.660383] 	0-...: (10483669 ticks this GP) idle=2a7/140000000000001/0 softirq=144874/144874 fqs=3423936  
[11405.670027] 	 (t=10500510 jiffies g=91545 c=91544 q=229437) 

Additional info:
beaker links in follow up comment

Comment 3 Bill Peck 2016-01-07 19:40:08 UTC
passes on: amd-seattle-05.lab.eng.rdu.redhat.com
repeatedly fails on: amd-seattle-06.khw.lab.eng.bos.redhat.com

Comment 4 Bill Peck 2016-01-13 14:48:06 UTC
This looks to be fixed with 4.4.0-0.23.el7

Comment 5 John Feeney 2016-02-10 20:49:44 UTC
Per comment #4, I am moving this to ON_QA so it can be closed.

Comment 6 Jeff Bastian 2016-02-10 21:25:47 UTC
ltp-lite ran on amd-seattle-05.khw.lab.eng.bos.redhat.com with the 4.5.0-0.rc3.27.el7 kernel with no rcu_sched stalls.  I'll mark this verified and close it.  (We can re-open it if the problem comes back.)
  https://beaker.engineering.redhat.com/jobs/1219825