Bug 509305

Summary: BUG: soft lockup when Running libhugetlbfs Testsuite
Product: Red Hat Enterprise Linux 5 Reporter: Qian Cai <qcai>
Component: kernelAssignee: Danny Feng <dfeng>
Status: CLOSED DUPLICATE QA Contact: Red Hat Kernel QE team <kernel-qe>
Severity: medium Docs Contact:
Priority: medium    
Version: 5.3CC: czhang
Target Milestone: rc   
Target Release: ---   
Hardware: ia64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2010-10-31 09:22:58 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Qian Cai 2009-07-02 05:39:21 UTC
De]# make checkscription of problem:
When running libhugetlbfs test suite on an Fujitsu PrimeQuest ia64 machine, it is soft lockup seeing in the sysrq-t output.

# make check
...
madvise_reserve.sh (256M: 64):	FAIL (assumed) kernel bug
fadvise_reserve.sh (256M: 64):	FAIL (assumed) kernel bug
set shmmax limit to 67108864
shm-perms (256M: 64):	Bad configuration: Page size is too large for configured SEGMENT_SIZE
private (256M: 64):
<hanging ...>

# echo t >/proc/sysrq-trigger
...
private       R  running task       0 16087  16042                     (NOTLB)
bash          R  running task       0 16089   4611                     (NOTLB)
BUG: soft lockup - CPU#0 stuck for 10s! [bash:16089]
Modules linked in: autofs4 hidp rfcomm l2cap bluetooth sunrpc ipv6 xfrm_nalgo crypto_api vfat fat dm_multipath scsi_dh button parport_pc lp parport tg3 e100 libphy mii e1000 sg dm_raid45 dm_message dm_region_hash dm_mem_cache dm_snapshot dm_zero dm_mirror dm_log dm_mod lpfc scsi_transport_fc shpchp mptspi mptscsih mptbase scsi_transport_spi sd_mod scsi_mod ext3 jbd uhci_hcd ohci_hcd ehci_hcd

Pid: 16089, CPU 0, comm:                 bash
psr : 0000121008526010 ifs : 8000000000000003 ip  : [<a00000010064ea71>]    Not tainted
ip is at _spin_unlock_irqrestore+0x31/0x60
unat: 0000000000000000 pfs : 000000000000050d rsc : 0000000000000003
rnat: 0000000000000000 bsps: 0000000000000000 pr  : 000000000059aa59
ldrs: 0000000000000000 ccv : 0000000000000000 fpsr: 0009804c8a70033f
csd : 0000000000000000 ssd : 0000000000000000
b0  : a0000001003ca7b0 b6  : a0000001000378c0 b7  : a0000001002de0a0
f6  : 1003e00000994ac46142a f7  : 1003e0000000000000190
f8  : 1003e00000994ac46129a f9  : 1003e0000000000000001
f10 : 0fffdccccccccc8c00000 f11 : 1003e0000000000000000
r1  : a000000100c12950 r2  : 000000000000ffff r3  : 00000000000003ff
r8  : 0000000000000400 r9  : a000000100a70ab0 r10 : a000000100a70ab0
r11 : 0000000000000000 r12 : e000000125857e20 r13 : e000000125850000
r14 : 0000000000004000 r15 : a000000100a72ab0 r16 : ffffffffffff6ce0
r17 : 00000000000003f9 r18 : a000000100a72ab0 r19 : 0000000000007fff
r20 : a00000010085d280 r21 : a000000100a12fa8 r22 : e000000125850208
r23 : a000000100a2ae40 r24 : a00000010095a1e0 r25 : a000000100a12c38
r26 : a0000001007e0138 r27 : a00000010085cf80 r28 : a000000100a13008
r29 : a000000100aedc68 r30 : 0000000000000000 r31 : a000000100a2b1d4

Call Trace:
 [<a000000100013ba0>] show_stack+0x40/0xa0
                                sp=e000000125857a80 bsp=e0000001258515b0
 [<a0000001000144a0>] show_regs+0x840/0x880
                                sp=e000000125857c50 bsp=e000000125851558
 [<a0000001000ed770>] softlockup_tick+0x2f0/0x3a0
                                sp=e000000125857c50 bsp=e000000125851510
 [<a0000001000963f0>] run_local_timers+0x30/0x60
                                sp=e000000125857c50 bsp=e0000001258514f0
 [<a0000001000964a0>] update_process_times+0x80/0x100
                                sp=e000000125857c50 bsp=e0000001258514b8
 [<a0000001000376e0>] timer_interrupt+0x180/0x360
                                sp=e000000125857c50 bsp=e000000125851478
 [<a0000001000eddd0>] handle_IRQ_event+0x130/0x260
                                sp=e000000125857c50 bsp=e000000125851438
 [<a0000001000ee030>] __do_IRQ+0x130/0x420
                                sp=e000000125857c50 bsp=e0000001258513e8
 [<a000000100011820>] ia64_handle_irq+0x160/0x200
                                sp=e000000125857c50 bsp=e0000001258513b8
 [<a00000010000c040>] __ia64_leave_kernel+0x0/0x280
                                sp=e000000125857c50 bsp=e0000001258513b8
 [<a00000010064ea70>] _spin_unlock_irqrestore+0x30/0x60
                                sp=e000000125857e20 bsp=e0000001258513a0
 [<a0000001003ca7b0>] __handle_sysrq+0x2d0/0x300
                                sp=e000000125857e20 bsp=e000000125851350
 [<a000000100202810>] write_sysrq_trigger+0xb0/0xe0
                                sp=e000000125857e20 bsp=e000000125851320
 [<a000000100170f60>] vfs_write+0x200/0x3a0
                                sp=e000000125857e20 bsp=e0000001258512d0
 [<a000000100171ab0>] sys_write+0x70/0xe0
                                sp=e000000125857e20 bsp=e000000125851258
 [<a00000010000bdd0>] __ia64_trace_syscall+0xd0/0x110
                                sp=e000000125857e30 bsp=e000000125851258
 [<a000000000010620>] __start_ivt_text+0xffffffff00010620/0x400
                                sp=e000000125858000 bsp=e000000125851258

Version-Release number of selected component (if applicable):
kernel-2.6.18-128.el5
kernel-2.6.18-128.1.14.el5
kernel-2.6.18-128.1.16.el5
kernel-2.6.18-156.el5 (it took a few attempts to reproduce it here though)

How reproducible:
always

Steps to Reproduce:
1. download the latest libhugetlbfs package from,
   https://sourceforge.net/projects/libhugetlbfs/files/libhugetlbfs
         /libhugetlbfs-2.5.tar.gz

2. setup hugepages,

   # echo 20 > /proc/sys/vm/nr_hugepages
   # cat /proc/meminfo (should see hugepages are there)
   # mkdir /dev/hugetlbfs
   # mount -t hugetlbfs hugetlbfs /dev/hugetlbfs

3. run the testsuite,

   # cd libhugetlbfs-2.5
   # make
   # make check

4. after a while the test suite is unable to continue when running this test,

   private (256M: 64):

5. issue sysrq-t at this point,

   # echo t >/proc/sysrq-trigger
  
Actual results:
Soft lockup.

Expected results:
No soft lockup.

Additional info:
This has not been seen on other architectures.

Comment 1 Danny Feng 2010-04-27 07:49:15 UTC
Just curious about the status of rhel6, could you please take a try with libhugeltbfs for rhel6 on the same machine?

Comment 2 Danny Feng 2010-06-08 11:46:07 UTC
I'm trying to reproduce it..

with latest rhel5 kernel: 2.6.18-192.el5
libhugetlbfs: 2.8

yes the system is alert me a softlockup, but this is a false positive alert because kernel isn't hang their...

Can we downgrade the priority and severity since the softlockup warning is a false positive alert..

Comment 3 Qian Cai 2010-06-28 04:46:31 UTC
(In reply to comment #1)
> Just curious about the status of rhel6, could you please take a try with
> libhugeltbfs for rhel6 on the same machine?    
IA-64 is not supported in rhel6.

Comment 5 Caspar Zhang 2010-11-05 09:21:52 UTC
Seen a similar failure on the same machine with RHEL5.6-20101019.0

The test not hung in private test, but in heap-overflow test. See below.


shm-perms (256M: 64):	Bad configuration: Page size is too large for configured SEGMENT_SIZE
private (256M: 64):	PASS
fork-cow (256M: 64):	PASS
direct (256M: 64):	PASS
malloc (256M: 64):	PASS
LD_PRELOAD=libhugetlbfs.so HUGETLB_MORECORE=yes malloc (256M: 64):	PASS
malloc_manysmall (256M: 64):	PASS
LD_PRELOAD=libhugetlbfs.so HUGETLB_MORECORE=yes malloc_manysmall (256M: 64):	PASS
heapshrink (256M: 64):	PASS
LD_PRELOAD=libheapshrink.so heapshrink (256M: 64):	PASS
LD_PRELOAD=libhugetlbfs.so HUGETLB_MORECORE=yes heapshrink (256M: 64):	PASS
LD_PRELOAD=libhugetlbfs.so libheapshrink.so HUGETLB_MORECORE=yes heapshrink (256M: 64):	PASS
LD_PRELOAD=libheapshrink.so HUGETLB_MORECORE_SHRINK=es HUGETLB_MORECORE=yes heapshrink (256M: 64):	PASS (inconclusive)
LD_PRELOAD=libhugetlbfs.so libheapshrink.so HUGETLB_MORECORE_SHRINK=yes HUGETLB_MORECORE=yes heapshrink (256M: 64):	FAIL	Heap did not shrink
HUGETLB_VERBOSE=1 HUGETLB_MORECORE=yes heap-overflow (256M: 64):	VM: killing process heap-overflow
SysRq : Show State

<hanging>
                                                       sibling
  task                 PC          pid father child younger older
init          S a00000010067a1b0     0     1      0     2               (NOTLB)

Call Trace:
 [<a000000100678b90>] schedule+0x1e10/0x2100
                                sp=e000000102dcf9c0 bsp=e000000102dc9358
 [<a00000010067a1b0>] schedule_timeout+0x110/0x180
                                sp=e000000102dcfa50 bsp=e000000102dc9328
 [<a0000001001b0090>] do_select+0x2d0/0x7e0
                                sp=e000000102dcfa80 bsp=e000000102dc9250
 [<a0000001001b0990>] core_sys_select+0x3f0/0x560
                                sp=e000000102dcfcd0 bsp=e000000102dc91f0
 [<a0000001001b18b0>] sys_select+0x410/0x480
                                sp=e000000102dcfe00 bsp=e000000102dc9160
 [<a00000010000be40>] ia64_ret_from_syscall+0x0/0x40
                                sp=e000000102dcfe30 bsp=e000000102dc9160
 [<a000000000010620>] __start_ivt_text+0xffffffff00010620/0x400
                                sp=e000000102dd0000 bsp=e000000102dc9160
migration/0   S a000000100073dc0     0     2      1             3       (L-TLB)

Call Trace:
 [<a000000100678b90>] schedule+0x1e10/0x2100
                                sp=e000020006bffcc0 bsp=e000020006bf9170
 [<a000000100073dc0>] migration_thread+0x4e0/0x660
                                sp=e000020006bffd50 bsp=e000020006bf9110
 [<a0000001000b3990>] kthread+0x230/0x2c0
                                sp=e000020006bffd50 bsp=e000020006bf90c8
 [<a000000100012210>] kernel_thread_helper+0x30/0x60
                                sp=e000020006bffe30 bsp=e000020006bf90a0
 [<a0000001000090c0>] start_kernel_thread+0x20/0x40
                                sp=e000020006bffe30 bsp=e000020006bf90a0
ksoftirqd/0   S a00000010008c9d0     0     3      1             4     2 (L-TLB)

Call Trace:
 [<a000000100678b90>] schedule+0x1e10/0x2100
                                sp=e000000102defcc0 bsp=e000000102de9138
 [<a00000010008c9d0>] ksoftirqd+0xd0/0x280
                                sp=e000000102defd50 bsp=e000000102de9110
 [<a0000001000b3990>] kthread+0x230/0x2c0
                                sp=e000000102defd50 bsp=e000000102de90c8
 [<a000000100012210>] kernel_thread_helper+0x30/0x60
                                sp=e000000102defe30 bsp=e000000102de90a0
 [<a0000001000090c0>] start_kernel_thread+0x20/0x40
                                sp=e000000102defe30 bsp=e000000102de90a0
watchdog/0    S a0000001000f1d80     0     4      1             5     3 (L-TLB)

Call Trace:
 [<a000000100678b90>] schedule+0x1e10/0x2100
                                sp=e000000102df7cb0 bsp=e000000102df1128
 [<a0000001000f1d80>] watchdog+0xe0/0x120
                                sp=e000000102df7d40 bsp=e000000102df1110
 [<a0000001000b3990>] kthread+0x230/0x2c0
                                sp=e000000102df7d50 bsp=e000000102df10c8
 [<a000000100012210>] kernel_thread_helper+0x30/0x60
                                sp=e000000102df7e30 bsp=e000000102df10a0
 [<a0000001000090c0>] start_kernel_thread+0x20/0x40
                                sp=e000000102df7e30 bsp=e000000102df10a0
migration/1   S a000000100073dc0     0     5      1             6     4 (L-TLB)

Call Trace:
 [<a000000100678b90>] schedule+0x1e10/0x2100
                                sp=e000000102dffcc0 bsp=e000000102df9170
 [<a000000100073dc0>] migration_thread+0x4e0/0x660
                                sp=e000000102dffd50 bsp=e000000102df9110
 [<a0000001000b3990>] kthread+0x230/0x2c0
                                sp=e000000102dffd50 bsp=e000000102df90c8
 [<a000000100012210>] kernel_thread_helper+0x30/0x60
                                sp=e000000102dffe30 bsp=e000000102df90a0
 [<a0000001000090c0>] start_kernel_thread+0x20/0x40
                                sp=e000000102dffe30 bsp=e000000102df90a0
ksoftirqd/1   S a00000010008c9d0     0     6      1             7     5 (L-TLB)

Call Trace:
 [<a000000100678b90>] schedule+0x1e10/0x2100
                                sp=e000000102e07cc0 bsp=e000000102e01138
 [<a00000010008c9d0>] ksoftirqd+0xd0/0x280
                                sp=e000000102e07d50 bsp=e000000102e01110
 [<a0000001000b3990>] kthread+0x230/0x2c0
                                sp=e000000102e07d50 bsp=e000000102e010c8
 [<a000000100012210>] kernel_thread_helper+0x30/0x60
                                sp=e000000102e07e30 bsp=e000000102e010a0
 [<a0000001000090c0>] start_kernel_thread+0x20/0x40
                                sp=e000000102e07e30 bsp=e000000102e010a0
watchdog/1    S a0000001000f1d80     0     7      1             8     6 (L-TLB)

<snip>

Call Trace:
 [<a000000100678b90>] schedule+0x1e10/0x2100
                                sp=e00000011c067d70 bsp=e00000011c061410
 [<a000000100084610>] do_wait+0x1e70/0x2320
                                sp=e00000011c067e00 bsp=e00000011c061350
 [<a000000100084b20>] sys_wait4+0x60/0x80
                                sp=e00000011c067e30 bsp=e00000011c0612f8
 [<a00000010000bd70>] __ia64_trace_syscall+0xd0/0x110
                                sp=e00000011c067e30 bsp=e00000011c0612f8
 [<a000000000010620>] __start_ivt_text+0xffffffff00010620/0x400
                                sp=e00000011c068000 bsp=e00000011c0612f8
sshd          S a00000010067a0e0     0 17704   3974 17706               (NOTLB)

Call Trace:
 [<a000000100678b90>] schedule+0x1e10/0x2100
                                sp=e0000001328979c0 bsp=e000000132891358
 [<a00000010067a0e0>] schedule_timeout+0x40/0x180
                                sp=e000000132897a50 bsp=e000000132891328
 [<a0000001001b0090>] do_select+0x2d0/0x7e0
                                sp=e000000132897a80 bsp=e000000132891250
 [<a0000001001b0990>] core_sys_select+0x3f0/0x560
                                sp=e000000132897cd0 bsp=e0000001328911f0
 [<a0000001001b15c0>] sys_select+0x120/0x480
                                sp=e000000132897e00 bsp=e000000132891160
 [<a00000010000bd70>] __ia64_trace_syscall+0xd0/0x110
                                sp=e000000132897e30 bsp=e000000132891160
 [<a000000000010620>] __start_ivt_text+0xffffffff00010620/0x400
                                sp=e000000132898000 bsp=e000000132891160
bash          R  running task       0 17706  17704                     (NOTLB)

Call Trace:
 [<e000000132890000>] 0xe000000132890000
                                sp=e0000001335efd10 bsp=e0000001335e8f68
make          S a000000100084610     0 17851   4520 17874               (NOTLB)

Call Trace:
 [<a000000100678b90>] schedule+0x1e10/0x2100
                                sp=e00000012c01fd70 bsp=e00000012c019338
 [<a000000100084610>] do_wait+0x1e70/0x2320
                                sp=e00000012c01fe00 bsp=e00000012c019280
 [<a000000100084b20>] sys_wait4+0x60/0x80
                                sp=e00000012c01fe30 bsp=e00000012c019228
 [<a00000010000bd70>] __ia64_trace_syscall+0xd0/0x110
                                sp=e00000012c01fe30 bsp=e00000012c019228
 [<a000000000010620>] __start_ivt_text+0xffffffff00010620/0x400
                                sp=e00000012c020000 bsp=e00000012c019228
sh            S a000000100084610     0 17874  17851 17875               (NOTLB)

Call Trace:
 [<a000000100678b90>] schedule+0x1e10/0x2100
                                sp=e0000001310e7d70 bsp=e0000001310e1238
 [<a000000100084610>] do_wait+0x1e70/0x2320
                                sp=e0000001310e7e00 bsp=e0000001310e1178
 [<a000000100084b20>] sys_wait4+0x60/0x80
                                sp=e0000001310e7e30 bsp=e0000001310e1120
 [<a00000010000bd70>] __ia64_trace_syscall+0xd0/0x110
                                sp=e0000001310e7e30 bsp=e0000001310e1120
 [<a000000000010620>] __start_ivt_text+0xffffffff00010620/0x400
                                sp=e0000001310e8000 bsp=e0000001310e1120
python        S a000000100084610     0 17875  17874 17941               (NOTLB)

Call Trace:
 [<a000000100678b90>] schedule+0x1e10/0x2100
                                sp=e00002000edc7d70 bsp=e00002000edc12d0
 [<a000000100084610>] do_wait+0x1e70/0x2320
                                sp=e00002000edc7e00 bsp=e00002000edc1218
 [<a000000100084b20>] sys_wait4+0x60/0x80
                                sp=e00002000edc7e30 bsp=e00002000edc11b8
 [<a00000010000bd70>] __ia64_trace_syscall+0xd0/0x110
                                sp=e00002000edc7e30 bsp=e00002000edc11b8
 [<a000000000010620>] __start_ivt_text+0xffffffff00010620/0x400
                                sp=e00002000edc8000 bsp=e00002000edc11b8
heap-overflow S a000000100084610     0 17941  17875 17942               (NOTLB)

Call Trace:
 [<a000000100678b90>] schedule+0x1e10/0x2100
                                sp=e00002000c93fd70 bsp=e00002000c939228
 [<a000000100084610>] do_wait+0x1e70/0x2320
                                sp=e00002000c93fe00 bsp=e00002000c939168
 [<a000000100084b20>] sys_wait4+0x60/0x80
                                sp=e00002000c93fe30 bsp=e00002000c939110
 [<a00000010000bd70>] __ia64_trace_syscall+0xd0/0x110
                                sp=e00002000c93fe30 bsp=e00002000c939110
 [<a000000000010620>] __start_ivt_text+0xffffffff00010620/0x400
                                sp=e00002000c940000 bsp=e00002000c939110
heap-overflow R  running task       0 17942  17941                     (NOTLB)

Call Trace:
 [<200000000039aeb0>] 0x200000000039aeb0
                                sp=e00000013246fda0 bsp=e0000001324690a0

<hanging>

Comment 6 Danny Feng 2010-11-05 11:05:53 UTC
(In reply to comment #5)
> Seen a similar failure on the same machine with RHEL5.6-20101019.0
> 
> The test not hung in private test, but in heap-overflow test. See below.
> 
> 

Please note this bug is for softlockup only. A userspace test hang there is not kernel issue. Thanks.

Or did you notice the kernel is hanging with the testcase?

> shm-perms (256M: 64): Bad configuration: Page size is too large for configured
> SEGMENT_SIZE
> private (256M: 64): PASS
> fork-cow (256M: 64): PASS
> direct (256M: 64): PASS
> malloc (256M: 64): PASS
> LD_PRELOAD=libhugetlbfs.so HUGETLB_MORECORE=yes malloc (256M: 64): PASS
> malloc_manysmall (256M: 64): PASS
> LD_PRELOAD=libhugetlbfs.so HUGETLB_MORECORE=yes malloc_manysmall (256M: 64):
> PASS
> heapshrink (256M: 64): PASS
> LD_PRELOAD=libheapshrink.so heapshrink (256M: 64): PASS
> LD_PRELOAD=libhugetlbfs.so HUGETLB_MORECORE=yes heapshrink (256M: 64): PASS
> LD_PRELOAD=libhugetlbfs.so libheapshrink.so HUGETLB_MORECORE=yes heapshrink
> (256M: 64): PASS
> LD_PRELOAD=libheapshrink.so HUGETLB_MORECORE_SHRINK=es HUGETLB_MORECORE=yes
> heapshrink (256M: 64): PASS (inconclusive)
> LD_PRELOAD=libhugetlbfs.so libheapshrink.so HUGETLB_MORECORE_SHRINK=yes
> HUGETLB_MORECORE=yes heapshrink (256M: 64): FAIL Heap did not shrink
> HUGETLB_VERBOSE=1 HUGETLB_MORECORE=yes heap-overflow (256M: 64): VM: killing
> process heap-overflow
> SysRq : Show State
> 
> <hanging>
>                                                        sibling
>   task                 PC          pid father child younger older
> init          S a00000010067a1b0     0     1      0     2               (NOTLB)
> 
> Call Trace:
>  [<a000000100678b90>] schedule+0x1e10/0x2100
>                                 sp=e000000102dcf9c0 bsp=e000000102dc9358
>  [<a00000010067a1b0>] schedule_timeout+0x110/0x180
>                                 sp=e000000102dcfa50 bsp=e000000102dc9328
>  [<a0000001001b0090>] do_select+0x2d0/0x7e0
>                                 sp=e000000102dcfa80 bsp=e000000102dc9250
>  [<a0000001001b0990>] core_sys_select+0x3f0/0x560
>                                 sp=e000000102dcfcd0 bsp=e000000102dc91f0
>  [<a0000001001b18b0>] sys_select+0x410/0x480
>                                 sp=e000000102dcfe00 bsp=e000000102dc9160
>  [<a00000010000be40>] ia64_ret_from_syscall+0x0/0x40
>                                 sp=e000000102dcfe30 bsp=e000000102dc9160
>  [<a000000000010620>] __start_ivt_text+0xffffffff00010620/0x400
>                                 sp=e000000102dd0000 bsp=e000000102dc9160
> migration/0   S a000000100073dc0     0     2      1             3       (L-TLB)
> 
> Call Trace:
>  [<a000000100678b90>] schedule+0x1e10/0x2100
>                                 sp=e000020006bffcc0 bsp=e000020006bf9170
>  [<a000000100073dc0>] migration_thread+0x4e0/0x660
>                                 sp=e000020006bffd50 bsp=e000020006bf9110
>  [<a0000001000b3990>] kthread+0x230/0x2c0
>                                 sp=e000020006bffd50 bsp=e000020006bf90c8
>  [<a000000100012210>] kernel_thread_helper+0x30/0x60
>                                 sp=e000020006bffe30 bsp=e000020006bf90a0
>  [<a0000001000090c0>] start_kernel_thread+0x20/0x40
>                                 sp=e000020006bffe30 bsp=e000020006bf90a0
> ksoftirqd/0   S a00000010008c9d0     0     3      1             4     2 (L-TLB)
> 
> Call Trace:
>  [<a000000100678b90>] schedule+0x1e10/0x2100
>                                 sp=e000000102defcc0 bsp=e000000102de9138
>  [<a00000010008c9d0>] ksoftirqd+0xd0/0x280
>                                 sp=e000000102defd50 bsp=e000000102de9110
>  [<a0000001000b3990>] kthread+0x230/0x2c0
>                                 sp=e000000102defd50 bsp=e000000102de90c8
>  [<a000000100012210>] kernel_thread_helper+0x30/0x60
>                                 sp=e000000102defe30 bsp=e000000102de90a0
>  [<a0000001000090c0>] start_kernel_thread+0x20/0x40
>                                 sp=e000000102defe30 bsp=e000000102de90a0
> watchdog/0    S a0000001000f1d80     0     4      1             5     3 (L-TLB)
> 
> Call Trace:
>  [<a000000100678b90>] schedule+0x1e10/0x2100
>                                 sp=e000000102df7cb0 bsp=e000000102df1128
>  [<a0000001000f1d80>] watchdog+0xe0/0x120
>                                 sp=e000000102df7d40 bsp=e000000102df1110
>  [<a0000001000b3990>] kthread+0x230/0x2c0
>                                 sp=e000000102df7d50 bsp=e000000102df10c8
>  [<a000000100012210>] kernel_thread_helper+0x30/0x60
>                                 sp=e000000102df7e30 bsp=e000000102df10a0
>  [<a0000001000090c0>] start_kernel_thread+0x20/0x40
>                                 sp=e000000102df7e30 bsp=e000000102df10a0
> migration/1   S a000000100073dc0     0     5      1             6     4 (L-TLB)
> 
> Call Trace:
>  [<a000000100678b90>] schedule+0x1e10/0x2100
>                                 sp=e000000102dffcc0 bsp=e000000102df9170
>  [<a000000100073dc0>] migration_thread+0x4e0/0x660
>                                 sp=e000000102dffd50 bsp=e000000102df9110
>  [<a0000001000b3990>] kthread+0x230/0x2c0
>                                 sp=e000000102dffd50 bsp=e000000102df90c8
>  [<a000000100012210>] kernel_thread_helper+0x30/0x60
>                                 sp=e000000102dffe30 bsp=e000000102df90a0
>  [<a0000001000090c0>] start_kernel_thread+0x20/0x40
>                                 sp=e000000102dffe30 bsp=e000000102df90a0
> ksoftirqd/1   S a00000010008c9d0     0     6      1             7     5 (L-TLB)
> 
> Call Trace:
>  [<a000000100678b90>] schedule+0x1e10/0x2100
>                                 sp=e000000102e07cc0 bsp=e000000102e01138
>  [<a00000010008c9d0>] ksoftirqd+0xd0/0x280
>                                 sp=e000000102e07d50 bsp=e000000102e01110
>  [<a0000001000b3990>] kthread+0x230/0x2c0
>                                 sp=e000000102e07d50 bsp=e000000102e010c8
>  [<a000000100012210>] kernel_thread_helper+0x30/0x60
>                                 sp=e000000102e07e30 bsp=e000000102e010a0
>  [<a0000001000090c0>] start_kernel_thread+0x20/0x40
>                                 sp=e000000102e07e30 bsp=e000000102e010a0
> watchdog/1    S a0000001000f1d80     0     7      1             8     6 (L-TLB)
> 
> <snip>
> 
> Call Trace:
>  [<a000000100678b90>] schedule+0x1e10/0x2100
>                                 sp=e00000011c067d70 bsp=e00000011c061410
>  [<a000000100084610>] do_wait+0x1e70/0x2320
>                                 sp=e00000011c067e00 bsp=e00000011c061350
>  [<a000000100084b20>] sys_wait4+0x60/0x80
>                                 sp=e00000011c067e30 bsp=e00000011c0612f8
>  [<a00000010000bd70>] __ia64_trace_syscall+0xd0/0x110
>                                 sp=e00000011c067e30 bsp=e00000011c0612f8
>  [<a000000000010620>] __start_ivt_text+0xffffffff00010620/0x400
>                                 sp=e00000011c068000 bsp=e00000011c0612f8
> sshd          S a00000010067a0e0     0 17704   3974 17706               (NOTLB)
> 
> Call Trace:
>  [<a000000100678b90>] schedule+0x1e10/0x2100
>                                 sp=e0000001328979c0 bsp=e000000132891358
>  [<a00000010067a0e0>] schedule_timeout+0x40/0x180
>                                 sp=e000000132897a50 bsp=e000000132891328
>  [<a0000001001b0090>] do_select+0x2d0/0x7e0
>                                 sp=e000000132897a80 bsp=e000000132891250
>  [<a0000001001b0990>] core_sys_select+0x3f0/0x560
>                                 sp=e000000132897cd0 bsp=e0000001328911f0
>  [<a0000001001b15c0>] sys_select+0x120/0x480
>                                 sp=e000000132897e00 bsp=e000000132891160
>  [<a00000010000bd70>] __ia64_trace_syscall+0xd0/0x110
>                                 sp=e000000132897e30 bsp=e000000132891160
>  [<a000000000010620>] __start_ivt_text+0xffffffff00010620/0x400
>                                 sp=e000000132898000 bsp=e000000132891160
> bash          R  running task       0 17706  17704                     (NOTLB)
> 
> Call Trace:
>  [<e000000132890000>] 0xe000000132890000
>                                 sp=e0000001335efd10 bsp=e0000001335e8f68
> make          S a000000100084610     0 17851   4520 17874               (NOTLB)
> 
> Call Trace:
>  [<a000000100678b90>] schedule+0x1e10/0x2100
>                                 sp=e00000012c01fd70 bsp=e00000012c019338
>  [<a000000100084610>] do_wait+0x1e70/0x2320
>                                 sp=e00000012c01fe00 bsp=e00000012c019280
>  [<a000000100084b20>] sys_wait4+0x60/0x80
>                                 sp=e00000012c01fe30 bsp=e00000012c019228
>  [<a00000010000bd70>] __ia64_trace_syscall+0xd0/0x110
>                                 sp=e00000012c01fe30 bsp=e00000012c019228
>  [<a000000000010620>] __start_ivt_text+0xffffffff00010620/0x400
>                                 sp=e00000012c020000 bsp=e00000012c019228
> sh            S a000000100084610     0 17874  17851 17875               (NOTLB)
> 
> Call Trace:
>  [<a000000100678b90>] schedule+0x1e10/0x2100
>                                 sp=e0000001310e7d70 bsp=e0000001310e1238
>  [<a000000100084610>] do_wait+0x1e70/0x2320
>                                 sp=e0000001310e7e00 bsp=e0000001310e1178
>  [<a000000100084b20>] sys_wait4+0x60/0x80
>                                 sp=e0000001310e7e30 bsp=e0000001310e1120
>  [<a00000010000bd70>] __ia64_trace_syscall+0xd0/0x110
>                                 sp=e0000001310e7e30 bsp=e0000001310e1120
>  [<a000000000010620>] __start_ivt_text+0xffffffff00010620/0x400
>                                 sp=e0000001310e8000 bsp=e0000001310e1120
> python        S a000000100084610     0 17875  17874 17941               (NOTLB)
> 
> Call Trace:
>  [<a000000100678b90>] schedule+0x1e10/0x2100
>                                 sp=e00002000edc7d70 bsp=e00002000edc12d0
>  [<a000000100084610>] do_wait+0x1e70/0x2320
>                                 sp=e00002000edc7e00 bsp=e00002000edc1218
>  [<a000000100084b20>] sys_wait4+0x60/0x80
>                                 sp=e00002000edc7e30 bsp=e00002000edc11b8
>  [<a00000010000bd70>] __ia64_trace_syscall+0xd0/0x110
>                                 sp=e00002000edc7e30 bsp=e00002000edc11b8
>  [<a000000000010620>] __start_ivt_text+0xffffffff00010620/0x400
>                                 sp=e00002000edc8000 bsp=e00002000edc11b8
> heap-overflow S a000000100084610     0 17941  17875 17942               (NOTLB)
> 
> Call Trace:
>  [<a000000100678b90>] schedule+0x1e10/0x2100
>                                 sp=e00002000c93fd70 bsp=e00002000c939228
>  [<a000000100084610>] do_wait+0x1e70/0x2320
>                                 sp=e00002000c93fe00 bsp=e00002000c939168
>  [<a000000100084b20>] sys_wait4+0x60/0x80
>                                 sp=e00002000c93fe30 bsp=e00002000c939110
>  [<a00000010000bd70>] __ia64_trace_syscall+0xd0/0x110
>                                 sp=e00002000c93fe30 bsp=e00002000c939110
>  [<a000000000010620>] __start_ivt_text+0xffffffff00010620/0x400
>                                 sp=e00002000c940000 bsp=e00002000c939110
> heap-overflow R  running task       0 17942  17941                     (NOTLB)
> 
> Call Trace:
>  [<200000000039aeb0>] 0x200000000039aeb0
>                                 sp=e00000013246fda0 bsp=e0000001324690a0
> 
> <hanging>

Comment 7 Caspar Zhang 2010-11-05 12:03:00 UTC
(In reply to comment #6)
> (In reply to comment #5)
> > Seen a similar failure on the same machine with RHEL5.6-20101019.0
> > The test not hung in private test, but in heap-overflow test. See below.
> Please note this bug is for softlockup only. A userspace test hang there is not
> kernel issue. Thanks.
> 
> Or did you notice the kernel is hanging with the testcase?

Got it. Kernel seemed not hung. Ctrl+C can interrupt the testcase.

Thanks.