Bug 428713 - [RHEL5 U2] BUG: spinlock lockup on CPU#1
Summary: [RHEL5 U2] BUG: spinlock lockup on CPU#1
Keywords:
Status: CLOSED DEFERRED
Alias: None
Product: Red Hat Enterprise Linux 5
Classification: Red Hat
Component: kernel
Version: 5.2
Hardware: ppc64
OS: Linux
high
high
Target Milestone: rc
: ---
Assignee: Peter Zijlstra
QA Contact: Martin Jenner
URL: http://rhts.lab.boston.redhat.com/cgi...
Whiteboard:
Depends On:
Blocks: KernelPrio5.3
TreeView+ depends on / blocked
 
Reported: 2008-01-14 19:29 UTC by Jeff Burke
Modified: 2014-08-11 05:40 UTC (History)
5 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2008-09-04 16:03:36 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
add some more info to the lockup print (494 bytes, patch)
2008-01-14 20:31 UTC, Peter Zijlstra
no flags Details | Diff

Description Jeff Burke 2008-01-14 19:29:39 UTC
Description of problem:
 While running the KernelTier1 tests on ibm-qs20-02.lab.boston.redhat.com
The scrashme test caused a BUG:

Version-Release number of selected component (if applicable):
2.6.18-68.el5

How reproducible:
Unknown

Steps to Reproduce:
1. Install RHEL5.1 on ibm-qs20-02.lab.boston.redhat.com
2. Install the 2.6.18-68.el5 kernel
3. Reboot then run the scrashme test
  
Actual results:
BUG: spinlock lockup on CPU#1, scrashme/11099, c0000000004a0bc0 (Tainted: P     )
Call Trace:
[C00000001C483630] [C000000000010B48] .show_stack+0x68/0x1b0 (unreliable)
[C00000001C4836D0] [C0000000001BC3AC] ._raw_spin_lock+0x138/0x184
[C00000001C483770] [C000000000355520] ._spin_lock+0x10/0x24
[C00000001C4837F0] [C00000000011EC84] .__mark_inode_dirty+0x110/0x1c8
[C00000001C483890] [C0000000001121F0] .file_update_time+0xc0/0xe8
[C00000001C483920] [C0000000000B8728] .__generic_file_aio_write_nolock+0x2c8/0x448
[C00000001C483A20] [C0000000000B8920] .generic_file_aio_write+0x78/0x104
[C00000001C483AE0] [D000000000142F10] .ext3_file_write+0x2c/0xd4 [ext3]
[C00000001C483B70] [C0000000000ED89C] .do_sync_write+0xd4/0x130
[C00000001C483CF0] [C0000000000EE620] .vfs_write+0x118/0x200
[C00000001C483D90] [C0000000000EED90] .sys_write+0x4c/0x8c
[C00000001C483E30] [C0000000000086A4] syscall_exit+0x0/0x40
BUG: spinlock lockup on CPU#3, scrashme/11036, c0000000004a0bc0 (Tainted: P     )
Call Trace:
[C000000028F87200] [C000000000010B48] .show_stack+0x68/0x1b0 (unreliable)
[C000000028F872A0] [C0000000001BC3AC] ._raw_spin_lock+0x138/0x184
[C000000028F87340] [C000000000355520] ._spin_lock+0x10/0x24
[C000000028F873C0] [C00000000011EC84] .__mark_inode_dirty+0x110/0x1c8
[C000000028F87460] [C0000000000BED10] .__set_page_dirty_nobuffers+0x11c/0x140
[C000000028F874F0] [C0000000000F1178] .mark_buffer_dirty+0x4c/0x60
[C000000028F87570] [C0000000000F1230] .__block_commit_write+0xa4/0x120
[C000000028F87630] [C0000000000F12E4] .generic_commit_write+0x38/0x74
[C000000028F876C0] [D000000000148348] .ext3_ordered_commit_write+0xac/0x114 [ext3]
[C000000028F87770] [C0000000000B81E8] .generic_file_buffered_write+0x5b0/0x828
[C000000028F87920] [C0000000000B8804] .__generic_file_aio_write_nolock+0x3a4/0x448
[C000000028F87A20] [C0000000000B8920] .generic_file_aio_write+0x78/0x104
[C000000028F87AE0] [D000000000142F10] .ext3_file_write+0x2c/0xd4 [ext3]
[C000000028F87B70] [C0000000000ED89C] .do_sync_write+0xd4/0x130
[C000000028F87CF0] [C0000000000EE620] .vfs_write+0x118/0x200
[C000000028F87D90] [C0000000000EED90] .sys_write+0x4c/0x8c
[C000000028F87E30] [C0000000000086A4] syscall_exit+0x0/0x40
BUG: spinlock lockup on CPU#2, syslogd/1609, c0000000004a0bc0 (Tainted: P     )
Call Trace:
[C00000003EE7B620] [C000000000010B48] .show_stack+0x68/0x1b0 (unreliable)
[C00000003EE7B6C0] [C0000000001BC3AC] ._raw_spin_lock+0x138/0x184
[C00000003EE7B760] [C000000000355520] ._spin_lock+0x10/0x24
[C00000003EE7B7E0] [C00000000011EC84] .__mark_inode_dirty+0x110/0x1c8
[C00000003EE7B880] [C0000000001121F0] .file_update_time+0xc0/0xe8
[C00000003EE7B910] [C0000000000B8728] .__generic_file_aio_write_nolock+0x2c8/0x448
[C00000003EE7BA10] [C0000000000B8A24] .__generic_file_write_nolock+0x78/0xb0
[C00000003EE7BB80] [C0000000000B8AC0] .generic_file_writev+0x64/0xf8
[C00000003EE7BC30] [C00000000012A7B0] .compat_do_readv_writev+0x240/0x3f0
[C00000003EE7BD90] [C00000000012A9E8] .compat_sys_writev+0x88/0xbc
[C00000003EE7BE30] [C0000000000086A4] syscall_exit+0x0/0x40
BUG: soft lockup - CPU#3 stuck for 10s! [scrashme:11036]
NIP: C0000000001BC33C LR: C0000000001BC3AC CTR: 0000000000000000
REGS: c000000028f87020 TRAP: 0901   Tainted: P       (2.6.18-68.el5debug)
MSR: 9000000000009032 <EE,ME,IR,DR>  CR: 88000488  XER: 00000000
TASK = c000000022d39080[11036] 'scrashme' THREAD: c000000028f84000 CPU: 3
NIP [C0000000001BC33C] ._raw_spin_lock+0xc8/0x184
LR [C0000000001BC3AC] ._raw_spin_lock+0x138/0x184
Call Trace:
[C000000028F872A0] [C0000000001BC3AC] ._raw_spin_lock+0x138/0x184 (unreliable)
[C000000028F87340] [C000000000355520] ._spin_lock+0x10/0x24
[C000000028F873C0] [C00000000011EC84] .__mark_inode_dirty+0x110/0x1c8
[C000000028F87460] [C0000000000BED10] .__set_page_dirty_nobuffers+0x11c/0x140
[C000000028F874F0] [C0000000000F1178] .mark_buffer_dirty+0x4c/0x60
[C000000028F87570] [C0000000000F1230] .__block_commit_write+0xa4/0x120
[C000000028F87630] [C0000000000F12E4] .generic_commit_write+0x38/0x74
[C000000028F876C0] [D000000000148348] .ext3_ordered_commit_write+0xac/0x114 [ext3]
[C000000028F87770] [C0000000000B81E8] .generic_file_buffered_write+0x5b0/0x828
[C000000028F87920] [C0000000000B8804] .__generic_file_aio_write_nolock+0x3a4/0x448
[C000000028F87A20] [C0000000000B8920] .generic_file_aio_write+0x78/0x104
[C000000028F87AE0] [D000000000142F10] .ext3_file_write+0x2c/0xd4 [ext3]
[C000000028F87B70] [C0000000000ED89C] .do_sync_write+0xd4/0x130
[C000000028F87CF0] [C0000000000EE620] .vfs_write+0x118/0x200
[C000000028F87D90] [C0000000000EED90] .sys_write+0x4c/0x8c
[C000000028F87E30] [C0000000000086A4] syscall_exit+0x0/0x40
Instruction dump:
40a2fff0 4c00012c 2fa00000 39400001 41be009c e91e8050 39200000 39600000 
48000030 996d01c9 800d0008 7ce0f828 <2c070000> 40820010 7c00f92d 40a2fff0 
BUG: soft lockup - CPU#2 stuck for 10s! [syslogd:1609]
NIP: C0000000001BC33C LR: C0000000001BC3AC CTR: 0000000000000000
REGS: c00000003ee7b440 TRAP: 0901   Tainted: P       (2.6.18-68.el5debug)
MSR: 9000000000009032 <EE,ME,IR,DR>  CR: 88004488  XER: 00000000
TASK = c00000002041c0e0[1609] 'syslogd' THREAD: c00000003ee78000 CPU: 2
NIP [C0000000001BC33C] ._raw_spin_lock+0xc8/0x184
LR [C0000000001BC3AC] ._raw_spin_lock+0x138/0x184
Call Trace:
[C00000003EE7B6C0] [C0000000001BC3AC] ._raw_spin_lock+0x138/0x184 (unreliable)
[C00000003EE7B760] [C000000000355520] ._spin_lock+0x10/0x24
[C00000003EE7B7E0] [C00000000011EC84] .__mark_inode_dirty+0x110/0x1c8
[C00000003EE7B880] [C0000000001121F0] .file_update_time+0xc0/0xe8
[C00000003EE7B910] [C0000000000B8728] .__generic_file_aio_write_nolock+0x2c8/0x448
[C00000003EE7BA10] [C0000000000B8A24] .__generic_file_write_nolock+0x78/0xb0
[C00000003EE7BB80] [C0000000000B8AC0] .generic_file_writev+0x64/0xf8
[C00000003EE7BC30] [C00000000012A7B0] .compat_do_readv_writev+0x240/0x3f0
[C00000003EE7BD90] [C00000000012A9E8] .compat_sys_writev+0x88/0xbc
[C00000003EE7BE30] [C0000000000086A4] syscall_exit+0x0/0x40
Instruction dump:
40a2fff0 4c00012c 2fa00000 39400001 41be009c e91e8050 39200000 39600000 
48000030 996d01c9 800d0008 7ce0f828 <2c070000> 40820010 7c00f92d 40a2fff0 
BUG: soft lockup - CPU#1 stuck for 10s! [scrashme:11099]
NIP: C0000000001BC33C LR: C0000000001BC3AC CTR: 0000000000000000
REGS: c00000001c483450 TRAP: 0901   Tainted: P       (2.6.18-68.el5debug)
MSR: 9000000000009032 <EE,ME,IR,DR>  CR: 88000488  XER: 00000000
TASK = c000000001f47d60[11099] 'scrashme' THREAD: c00000001c480000 CPU: 1
NIP [C0000000001BC33C] ._raw_spin_lock+0xc8/0x184
LR [C0000000001BC3AC] ._raw_spin_lock+0x138/0x184
Call Trace:
[C00000001C4836D0] [C0000000001BC3AC] ._raw_spin_lock+0x138/0x184 (unreliable)
[C00000001C483770] [C000000000355520] ._spin_lock+0x10/0x24
[C00000001C4837F0] [C00000000011EC84] .__mark_inode_dirty+0x110/0x1c8
[C00000001C483890] [C0000000001121F0] .file_update_time+0xc0/0xe8
[C00000001C483920] [C0000000000B8728] .__generic_file_aio_write_nolock+0x2c8/0x448
[C00000001C483A20] [C0000000000B8920] .generic_file_aio_write+0x78/0x104
[C00000001C483AE0] [D000000000142F10] .ext3_file_write+0x2c/0xd4 [ext3]
[C00000001C483B70] [C0000000000ED89C] .do_sync_write+0xd4/0x130
[C00000001C483CF0] [C0000000000EE620] .vfs_write+0x118/0x200
[C00000001C483D90] [C0000000000EED90] .sys_write+0x4c/0x8c
[C00000001C483E30] [C0000000000086A4] syscall_exit+0x0/0x40
Instruction dump:
40a2fff0 4c00012c 2fa00000 39400001 41be009c e91e8050 39200000 39600000 
48000030 996d01c9 800d0008 7ce0f828 <2c070000> 40820010 7c00f92d 40a2fff0 
BUG: soft lockup - CPU#0 stuck for 10s! [scrashme:11053]
NIP: C0000000000E4E10 LR: C0000000000E4E0C CTR: C000000000071348
REGS: c00000000ffff750 TRAP: 0901   Tainted: P       (2.6.18-68.el5debug)
MSR: 9000000000009032 <EE,ME,IR,DR>  CR: 24000482  XER: 00000000
TASK = c00000000cdbaaf0[11053] 'scrashme' THREAD: c000000036778000 CPU: 0
NIP [C0000000000E4E10] .kmem_cache_zalloc+0x11c/0x178
LR [C0000000000E4E0C] .kmem_cache_zalloc+0x118/0x178
Call Trace:
[C00000000FFFF9D0] [C0000000000E4E0C] .kmem_cache_zalloc+0x118/0x178 (unreliable)
[C00000000FFFFA70] [C0000000000A70CC] .__audit_signal_info+0x178/0x2ac
[C00000000FFFFB00] [C0000000000792A4] .check_kill_permission+0x8c/0x184
[C00000000FFFFB90] [C00000000007A990] .group_send_sig_info+0x2c/0xa0
[C00000000FFFFC40] [C00000000007AA54] .send_group_sig_info+0x50/0x88
[C00000000FFFFCE0] [C000000000071374] .it_real_fn+0x2c/0x74
[C00000000FFFFD70] [C000000000089308] .hrtimer_run_queues+0x174/0x22c
[C00000000FFFFE40] [C0000000000773BC] .run_timer_softirq+0x38/0x210
[C00000000FFFFEF0] [C000000000072BE0] .__do_softirq+0xa8/0x168
[C00000000FFFFF90] [C000000000027944] .call_do_softirq+0x14/0x24
[C00000003677B580] [C00000000000CD38] .do_softirq+0x70/0x98
[C00000003677B610] [C000000000072B20] .irq_exit+0x5c/0x74
[C00000003677B690] [C000000000024CAC] .timer_interrupt+0x448/0x478
[C00000003677B770] [C0000000000034F4] decrementer_common+0xf4/0x100
--- Exception: 901 at .sync_sb_inodes+0x9c/0x308
    LR = .sync_sb_inodes+0x2bc/0x308
[C00000003677BB10] [C00000000011E9C4] .sync_inodes_sb+0xd0/0x100
[C00000003677BBF0] [C00000000011EA88] .__sync_inodes+0x94/0x120
[C00000003677BC90] [C00000000011EB38] .sync_inodes+0x24/0x60
[C00000003677BD20] [C0000000000F1598] .do_sync+0x38/0xc0
[C00000003677BDB0] [C0000000000F1634] .sys_sync+0x14/0x28
[C00000003677BE30] [C0000000000086A4] syscall_exit+0x0/0x40
Instruction dump:
7d295214 e8690028 48000024 381d053c 7d200028 31290001 7d20012d 40a2fff4 
7fa3eb78 7f84e378 4bfff4a5 7fe10164 <7c651b78> 7f84e378 7f66db78 7fa3eb78 

Expected results:
This not complete with xero failures

Additional info:
So far this is the only system we have seen this on. This is a ppc system.
This seems to be hardware specific

Comment 1 Jeff Burke 2008-01-14 19:33:41 UTC
This happened while testing the kernel-debug-2.6.18-68.el5 variant.

Comment 2 Peter Zijlstra 2008-01-14 20:31:30 UTC
Created attachment 291630 [details]
add some more info to the lockup print

Could you give it a run with this added?

Comment 3 Don Zickus 2008-01-14 21:01:29 UTC
Would it be easier if we reserved you a machine with the test on it instead?

Not sure how much free cycles Jeff has.

Comment 4 Peter Zijlstra 2008-01-15 22:23:37 UTC
Ok, I grabbed that machine and compiled a CVS ppc64-debug kernel with a little
extra spinlock debug sauce. completed scrashme successfully.

Of course my kernel never got P tainted,... no idea where that came from.

Comment 5 Don Zickus 2008-01-16 01:42:26 UTC
The P tainted comes from a previous test that purposely loads a bad kernel
module stub to check that P tainted appears.


Comment 6 Peter Zijlstra 2008-01-17 13:39:32 UTC
I can't seem to reproduce, I ran scrashme for hours on that machine (until my 1d
reservation ran out), but it never triggered.

Comment 7 Linda Wang 2008-02-06 21:21:45 UTC
peter, can you try with the regular kernel just to see if you can
reproduce it?


Note You need to log in before you can comment on or make changes to this bug.