Bug 1031362

Summary: kernel softlockup while executing the command ppc64_cpu --smt=on
Product: [Fedora] Fedora Reporter: IBM Bug Proxy <bugproxy>
Component: kernelAssignee: Kernel Maintainer List <kernel-maint>
Status: CLOSED INSUFFICIENT_DATA QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: high Docs Contact:
Priority: unspecified    
Version: 20CC: gansalmon, itamar, jkachuck, jonathan, kernel-maint, madhu.chinakonda, wgomerin
Target Milestone: ---Flags: jforbes: needinfo?
bugproxy: needinfo?
bugproxy: needinfo?
Target Release: ---   
Hardware: ppc64   
OS: All   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2014-03-17 18:44:48 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Attachments:
Description Flags
dmesg output none

Description IBM Bug Proxy 2013-11-17 08:50:39 UTC
== Comment: #0 - IRANNA D. ANKAD <iranna.ankad.com> - 2013-10-28 10:46:01 ==
I just issued below set of 3 commands & it threw many kernel softlockup traces causing the last command to hang.

[root@jupiterioc-lp3 ~]# ppc64_cpu --smt
SMT is on
[root@jupiterioc-lp3 ~]# ppc64_cpu --smt=off
[root@jupiterioc-lp3 ~]# ppc64_cpu --smt=on

<...... command hangs.....>

FYI here is sniff of call traces, for more details attaching fresh dmesg command output.


Fedora release 20 (Heisenbug)
Kernel 3.11.0-300.fc20.ppc64p7 on an ppc64 (hvc0)

jupiterioc-lp3 login: [ 4548.388596] INFO: rcu_sched self-detected stall on CPU { 1}  (t=384082 jiffies g=2224 c=2223 q=13748)
[ 4548.388653] CPU: 1 PID: 2103 Comm: ppc64_cpu Not tainted 3.11.0-300.fc20.ppc64p7 #1
[ 4548.388660] Call Trace:
[ 4548.388669] [c000000bbdb82a00] [c000000000014ba0] .show_stack+0x130/0x200 (unreliable)
[ 4548.388680] [c000000bbdb82ad0] [c00000000083e19c] .dump_stack+0x88/0xb4
[ 4548.388689] [c000000bbdb82b50] [c000000000168a38] .rcu_check_callbacks+0x418/0x8d0
[ 4548.388698] [c000000bbdb82c90] [c0000000000abea8] .update_process_times+0x58/0xb0
[ 4548.388706] [c000000bbdb82d20] [c000000000114ab0] .tick_sched_handle.isra.16+0x40/0xd0
[ 4548.388714] [c000000bbdb82db0] [c000000000114ba4] .tick_sched_timer+0x64/0xa0
[ 4548.388722] [c000000bbdb82e50] [c0000000000cd094] .__run_hrtimer+0xb4/0x2a0
[ 4548.388730] [c000000bbdb82ef0] [c0000000000ce048] .hrtimer_interrupt+0x148/0x330
[ 4548.388738] [c000000bbdb83000] [c00000000001e8a0] .timer_interrupt+0x120/0x2e0
[ 4548.388746] [c000000bbdb830b0] [c000000000002554] decrementer_common+0x154/0x180
[ 4548.388757] --- Exception: 901 at .__bitmap_weight+0x44/0x100
[ 4548.388757]     LR = .build_sched_domains+0xc3c/0xdb0
[ 4548.388775] [c000000bbdb833a0] [c000000bbdb83450] 0xc000000bbdb83450 (unreliable)
[ 4548.388783] [c000000bbdb83450] [c0000000000e196c] .build_sched_domains+0xc3c/0xdb0
[ 4548.388791] [c000000bbdb835a0] [c0000000000e1dc0] .partition_sched_domains+0x260/0x3f0
[ 4548.388799] [c000000bbdb83680] [c000000000139864] .cpuset_update_active_cpus+0x24/0x60
[ 4548.388807] [c000000bbdb836f0] [c0000000000e1ff8] .cpuset_cpu_active+0xa8/0xd0
[ 4548.388815] [c000000bbdb83770] [c000000000833dac] .notifier_call_chain+0x8c/0x100
[ 4548.388823] [c000000bbdb83810] [c0000000000983f0] .cpu_notify+0x40/0xa0
[ 4548.388830] [c000000bbdb83890] [c000000000098694] ._cpu_up+0x204/0x210
[ 4548.388837] [c000000bbdb83950] [c0000000000987ec] .cpu_up+0x14c/0x1d0
[ 4548.388846] [c000000bbdb839e0] [c0000000006bbb74] .cpu_subsys_online+0x54/0xc0
[ 4548.388854] [c000000bbdb83a80] [c0000000004f99d8] .device_online+0xb8/0x120
[ 4548.388861] [c000000bbdb83b10] [c0000000004f9af4] .store_online+0xb4/0xf0
[ 4548.388868] [c000000bbdb83bb0] [c0000000004f57c4] .dev_attr_store+0x64/0xa0
[ 4548.388876] [c000000bbdb83c40] [c0000000002e2404] .sysfs_write_file+0xf4/0x1d0
[ 4548.388885] [c000000bbdb83cf0] [c000000000242b58] .vfs_write+0xe8/0x260
[ 4548.388892] [c000000bbdb83d90] [c000000000243854] .SyS_write+0x64/0xe0
[ 4548.388900] [c000000bbdb83e30] [c000000000009dd4] syscall_exit+0x0/0x98

Comment 1 IBM Bug Proxy 2013-11-17 08:50:56 UTC
Created attachment 825102 [details]
dmesg output

Comment 2 IBM Bug Proxy 2013-11-27 06:30:37 UTC
------- Comment From iranna.ankad.com 2013-11-27 06:27 EDT-------
(In reply to comment #9)
> Iranna,
>
> Can you please provide machine access?
>
> -Bharani

Hello Bharani,
The original system (P7+ Jupiter) is busy running some priority tests for next one week. So I thought of recreating this issue on another P7 Jupiter system but with latest F20 Beta kernel. I could not recreate this issue. I also confirm that this scenario works fine on P8 with F20 Beta as well. So..for now I am OK to close this bug. I shall reopen, if I happen to notice again.   Thanks!

FYI
[root@als0153 ~]# ppc64_cpu --smt=off
[root@als0153 ~]# ppc64_cpu --smt=on
[root@als0153 ~]# ppc64_cpu --smt=off
[root@als0153 ~]# ppc64_cpu --smt=on
[root@als0153 ~]# uname -a
Linux als0153.austin.ibm.com 3.11.6-301.fc20.ppc64p7 #1 SMP Mon Oct 21 18:49:17 MST 2013 ppc64 ppc64 ppc64 GNU/Linux
[root@als0153 ~]#

Comment 3 Justin M. Forbes 2014-02-24 14:02:47 UTC
*********** MASS BUG UPDATE **************

We apologize for the inconvenience.  There is a large number of bugs to go through and several of them have gone stale.  Due to this, we are doing a mass bug update across all of the Fedora 20 kernel bugs.

Fedora 20 has now been rebased to 3.13.4-200.fc20.  Please test this kernel update and let us know if you issue has been resolved or if it is still present with the newer kernel.

If you experience different issues, please open a new bug report for those.

Comment 4 Justin M. Forbes 2014-03-17 18:44:48 UTC
*********** MASS BUG UPDATE **************

This bug has been in a needinfo state for several weeks and is being closed with insufficient data due to inactivity. If this is still an issue with Fedora 20, please feel free to reopen the bug and provide the additional information requested.