Bug 471614

Summary: [4.7.z] Possible Dead Lock in SysV IPC Messages Queue Part
Product: Red Hat Enterprise Linux 4 Reporter: Qian Cai <qcai>
Component: kernelAssignee: Danny Feng <dfeng>
Status: CLOSED WONTFIX QA Contact: Martin Jenner <mjenner>
Severity: medium Docs Contact:
Priority: medium    
Version: 4.7.zCC: nhorman, tgraf
Target Milestone: rc   
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2012-06-20 16:01:24 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Qian Cai 2008-11-14 17:14:21 UTC
Description of problem:
There were possible Kernel hangs while testing RHEL 4.7.z Kernel.
http://rhts.redhat.com/cgi-bin/rhts/test_log.cgi?id=5110083

SysRq : Show CPUs
CPU0:
000001002a6a5e68 ffffffff80414220 0000000000000077 0000000000000000
       0000000000000000 ffffffff8023e754 ffffffff80414220 ffffffff8023e76b
       0000000000000000 ffffffff8023e8bf
Call Trace:<ffffffff8023e754>{showacpu+45} <ffffffff8023e76b>{sysrq_handle_showcpus+9}
       <ffffffff8023e8bf>{__handle_sysrq+115} <ffffffff801b3c8d>{write_sysrq_trigger+43}
       <ffffffff8017bdde>{vfs_write+207} <ffffffff8017bec6>{sys_write+69}
       <ffffffff801102f6>{system_call+126}
CPU1:
0000010037c3bf68 0000000000000000 0000010020c6bf58 0000000000000001
       0000007fbffff7c8 ffffffff8023e754 0000000000000000 ffffffff8011d1d2
       000000000051d990 ffffffff80110bf5
Call Trace:<IRQ> <ffffffff8023e754>{showacpu+45} <ffffffff8011d1d2>{smp_call_function_interrupt+64}
       <ffffffff80110bf5>{call_function_interrupt+133}  <EOI>

http://rhts.redhat.com/cgi-bin/rhts/test_log.cgi?id=4665804

SysRq : Show CPUs
CPU1:
40076f38 023447a4 00000077 00000000 0220e107 022fae5c 00000001 0220e11a
       0220e250 022f9a34 022fae6d 00000246 00000006 00000002 405ec200 00000002
       40076fac 0218cc10 02336477 023364c0 0215bc7e 40076fac f6de9000 405ec200
Call Trace:
 [<0220e107>] showacpu+0x27/0x33
 [<0220e11a>] sysrq_handle_showcpus+0x7/0x19
 [<0220e250>] __handle_sysrq+0x62/0xd9
 [<0218cc10>] write_sysrq_trigger+0x37/0x3e
 [<0215bc7e>] vfs_write+0xb6/0xe2
 [<0215bd48>] sys_write+0x3c/0x62
Badness in smp_call_function at arch/i386/kernel/smp.c:577
 [<02116b62>] smp_call_function+0x50/0xc9
 [<02105fd4>] show_trace+0x1d/0x6b
 [<02106095>] show_stack+0x73/0x79
 [<0220e12a>] sysrq_handle_showcpus+0x17/0x19
 [<0220e250>] __handle_sysrq+0x62/0xd9
 [<0218cc10>] write_sysrq_trigger+0x37/0x3e
 [<0215bc7e>] vfs_write+0xb6/0xe2
 [<0215bd48>] sys_write+0x3c/0x62
CPU0:
0239df84 0239d000 00000000 023d6120 0220e107 022fae5c 00000000 02116c69
       0239d000 00000000 fffecd6f 0239d000 00000000 00000000 00000000 023d6120
       004a7007 00000000 0232007b 0000007b fffffffb 021040e8 00000060 00000246
Call Trace:
 [<0220e107>] showacpu+0x27/0x33
 [<02116c69>] smp_call_function_interrupt+0x3a/0x79
 [<021040a0>] cpu_idle+0x26/0x3b
 [<0239e786>] start_kernel+0x199/0x19d
CPU2:
039e2f5c 039e2000 00000000 00000000 0220e107 022fae5c 00000002 02116c69
       039e2000 00000000 fffecd6f 039e2000 00000000 00000000 00000000 00000000
       00000000 00000000 41e9007b 0000007b fffffffb 021040e8 00000060 00000246
Call Trace:
 [<0220e107>] showacpu+0x27/0x33
 [<02116c69>] smp_call_function_interrupt+0x3a/0x79
  <021040e8>] mwait_idle+0x33/0x42
 [<021040a0>] cpu_idle+0x26/0x3b
CPU3:
039e3f5c 039e3000 00000000 00000000 0220e107 022fae5c 00000003 02116c69
       039e3000 00000000 fffecd6f 039e3000 00000000 00000000 00000000 00000000
       00000000 00000000 41e9007b 0000007b fffffffb 021040e8 00000060 00000246
Call Trace:
021040e8>] mwait_idle+0x33/0x42
 [<0220e107>] showacpu+0x27/0x33
 [<02116c69>] smp_call_function_interrupt+0x3a/0x79
 [<021040e8>] mwait_idle+0x33/0x42
 [<021040a0>] cpu_idle+0x26/0x3b
CPU3:
039e3f5c 039e3000 00000000 00000000 0220e107 022fae5c 00000003 02116c69
       039e3000 00000000 fffecd6f 039e3000 00000000 00000000 00000000 00000000
       00000000 00000000 41e9007b 0000007b fffffffb 021040e8 00000060 00000246
Call Trace:
 [<0220e107>] showacpu+0x27/0x33
 [<02116c69>] smp_call_function_interrupt+0x3a/0x79
 [<021040e8>] mwait_idle+0x33/0x42
 [<021040a0>] cpu_idle+0x26/0x3b

From Vitaly Mayatskikh,

Both test sets (ktst_msg and audit/syscalls) use kernel ipc. In both cases one
CPU core freezes on spinlock with disabled interrupts. I suppose, we have dead
lock in SysV IPC messages queue part.


Version-Release number of selected component (if applicable):
kernel-2.6.9-78.0.8.EL

How reproducible:
Maybe always based on seen several of those before.
http://rhts.redhat.com/cgi-bin/rhts/test_log.cgi?id=4665804
http://rhts.redhat.com/cgi-bin/rhts/test_log.cgi?id=4636061
http://rhts.redhat.com/cgi-bin/rhts/test_log.cgi?id=4979277

Comment 1 Danny Feng 2010-03-11 07:55:32 UTC
get time to look into this issue, it looks like all the hangs happened on AMD cpu?
I guess this is a bug from MWAIT feature. Mind to help me testing following patch?


Index: linux-2.6.9/arch/i386/kernel/process.c
===================================================================
--- linux-2.6.9.orig/arch/i386/kernel/process.c
+++ linux-2.6.9/arch/i386/kernel/process.c
@@ -187,7 +187,8 @@ static void mwait_idle(void)

 void __init select_idle_routine(const struct cpuinfo_x86 *c)
 {
-       if (cpu_has(c, X86_FEATURE_MWAIT)) {
+       if (cpu_has(c, X86_FEATURE_MWAIT) &&
+           (c->x86_vendor == X86_VENDOR_INTEL)) {
                printk("monitor/mwait feature present.\n");
                /*
                 * Skip, if setup has overridden idle.
Index: linux-2.6.9/arch/x86_64/kernel/process.c
===================================================================
--- linux-2.6.9.orig/arch/x86_64/kernel/process.c
+++ linux-2.6.9/arch/x86_64/kernel/process.c
@@ -174,7 +174,8 @@ static void mwait_idle(void)
 void __init select_idle_routine(const struct cpuinfo_x86 *c)
 {
        static int printed;
-       if (cpu_has(c, X86_FEATURE_MWAIT)) {
+       if (cpu_has(c, X86_FEATURE_MWAIT) &&
+           (c->x86_vendor == X86_VENDOR_INTEL)) {
                /*
                 * Skip, if setup has overridden idle.
                 * One CPU supports mwait => All CPUs supports mwait

Comment 2 Vitaly Mayatskikh 2010-03-15 14:40:12 UTC
> I guess this is a bug from MWAIT feature.

What kind of bug is it?

Comment 3 Jiri Pallich 2012-06-20 16:01:24 UTC
Thank you for submitting this issue for consideration in Red Hat Enterprise Linux. The release for which you requested us to review is now End of Life. 
Please See https://access.redhat.com/support/policy/updates/errata/

If you would like Red Hat to re-consider your feature request for an active release, please re-open the request via appropriate support channels and provide additional supporting details about the importance of this issue.