Bug 428713

Summary: [RHEL5 U2] BUG: spinlock lockup on CPU#1
Product: Red Hat Enterprise Linux 5 Reporter: Jeff Burke <jburke>
Component: kernelAssignee: Peter Zijlstra <pzijlstr>
Status: CLOSED DEFERRED QA Contact: Martin Jenner <mjenner>
Severity: high Docs Contact:
Priority: high    
Version: 5.2CC: dzickus, esandeen, jbacik, lwang, mgahagan
Target Milestone: rc   
Target Release: ---   
Hardware: ppc64   
OS: Linux   
URL: http://rhts.lab.boston.redhat.com/cgi-bin/rhts/test_log.cgi?id=1604756
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2008-09-04 16:03:36 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 448732    
Attachments:
Description Flags
add some more info to the lockup print none

Description Jeff Burke 2008-01-14 19:29:39 UTC
Description of problem:
 While running the KernelTier1 tests on ibm-qs20-02.lab.boston.redhat.com
The scrashme test caused a BUG:

Version-Release number of selected component (if applicable):
2.6.18-68.el5

How reproducible:
Unknown

Steps to Reproduce:
1. Install RHEL5.1 on ibm-qs20-02.lab.boston.redhat.com
2. Install the 2.6.18-68.el5 kernel
3. Reboot then run the scrashme test
  
Actual results:
BUG: spinlock lockup on CPU#1, scrashme/11099, c0000000004a0bc0 (Tainted: P     )
Call Trace:
[C00000001C483630] [C000000000010B48] .show_stack+0x68/0x1b0 (unreliable)
[C00000001C4836D0] [C0000000001BC3AC] ._raw_spin_lock+0x138/0x184
[C00000001C483770] [C000000000355520] ._spin_lock+0x10/0x24
[C00000001C4837F0] [C00000000011EC84] .__mark_inode_dirty+0x110/0x1c8
[C00000001C483890] [C0000000001121F0] .file_update_time+0xc0/0xe8
[C00000001C483920] [C0000000000B8728] .__generic_file_aio_write_nolock+0x2c8/0x448
[C00000001C483A20] [C0000000000B8920] .generic_file_aio_write+0x78/0x104
[C00000001C483AE0] [D000000000142F10] .ext3_file_write+0x2c/0xd4 [ext3]
[C00000001C483B70] [C0000000000ED89C] .do_sync_write+0xd4/0x130
[C00000001C483CF0] [C0000000000EE620] .vfs_write+0x118/0x200
[C00000001C483D90] [C0000000000EED90] .sys_write+0x4c/0x8c
[C00000001C483E30] [C0000000000086A4] syscall_exit+0x0/0x40
BUG: spinlock lockup on CPU#3, scrashme/11036, c0000000004a0bc0 (Tainted: P     )
Call Trace:
[C000000028F87200] [C000000000010B48] .show_stack+0x68/0x1b0 (unreliable)
[C000000028F872A0] [C0000000001BC3AC] ._raw_spin_lock+0x138/0x184
[C000000028F87340] [C000000000355520] ._spin_lock+0x10/0x24
[C000000028F873C0] [C00000000011EC84] .__mark_inode_dirty+0x110/0x1c8
[C000000028F87460] [C0000000000BED10] .__set_page_dirty_nobuffers+0x11c/0x140
[C000000028F874F0] [C0000000000F1178] .mark_buffer_dirty+0x4c/0x60
[C000000028F87570] [C0000000000F1230] .__block_commit_write+0xa4/0x120
[C000000028F87630] [C0000000000F12E4] .generic_commit_write+0x38/0x74
[C000000028F876C0] [D000000000148348] .ext3_ordered_commit_write+0xac/0x114 [ext3]
[C000000028F87770] [C0000000000B81E8] .generic_file_buffered_write+0x5b0/0x828
[C000000028F87920] [C0000000000B8804] .__generic_file_aio_write_nolock+0x3a4/0x448
[C000000028F87A20] [C0000000000B8920] .generic_file_aio_write+0x78/0x104
[C000000028F87AE0] [D000000000142F10] .ext3_file_write+0x2c/0xd4 [ext3]
[C000000028F87B70] [C0000000000ED89C] .do_sync_write+0xd4/0x130
[C000000028F87CF0] [C0000000000EE620] .vfs_write+0x118/0x200
[C000000028F87D90] [C0000000000EED90] .sys_write+0x4c/0x8c
[C000000028F87E30] [C0000000000086A4] syscall_exit+0x0/0x40
BUG: spinlock lockup on CPU#2, syslogd/1609, c0000000004a0bc0 (Tainted: P     )
Call Trace:
[C00000003EE7B620] [C000000000010B48] .show_stack+0x68/0x1b0 (unreliable)
[C00000003EE7B6C0] [C0000000001BC3AC] ._raw_spin_lock+0x138/0x184
[C00000003EE7B760] [C000000000355520] ._spin_lock+0x10/0x24
[C00000003EE7B7E0] [C00000000011EC84] .__mark_inode_dirty+0x110/0x1c8
[C00000003EE7B880] [C0000000001121F0] .file_update_time+0xc0/0xe8
[C00000003EE7B910] [C0000000000B8728] .__generic_file_aio_write_nolock+0x2c8/0x448
[C00000003EE7BA10] [C0000000000B8A24] .__generic_file_write_nolock+0x78/0xb0
[C00000003EE7BB80] [C0000000000B8AC0] .generic_file_writev+0x64/0xf8
[C00000003EE7BC30] [C00000000012A7B0] .compat_do_readv_writev+0x240/0x3f0
[C00000003EE7BD90] [C00000000012A9E8] .compat_sys_writev+0x88/0xbc
[C00000003EE7BE30] [C0000000000086A4] syscall_exit+0x0/0x40
BUG: soft lockup - CPU#3 stuck for 10s! [scrashme:11036]
NIP: C0000000001BC33C LR: C0000000001BC3AC CTR: 0000000000000000
REGS: c000000028f87020 TRAP: 0901   Tainted: P       (2.6.18-68.el5debug)
MSR: 9000000000009032 <EE,ME,IR,DR>  CR: 88000488  XER: 00000000
TASK = c000000022d39080[11036] 'scrashme' THREAD: c000000028f84000 CPU: 3
NIP [C0000000001BC33C] ._raw_spin_lock+0xc8/0x184
LR [C0000000001BC3AC] ._raw_spin_lock+0x138/0x184
Call Trace:
[C000000028F872A0] [C0000000001BC3AC] ._raw_spin_lock+0x138/0x184 (unreliable)
[C000000028F87340] [C000000000355520] ._spin_lock+0x10/0x24
[C000000028F873C0] [C00000000011EC84] .__mark_inode_dirty+0x110/0x1c8
[C000000028F87460] [C0000000000BED10] .__set_page_dirty_nobuffers+0x11c/0x140
[C000000028F874F0] [C0000000000F1178] .mark_buffer_dirty+0x4c/0x60
[C000000028F87570] [C0000000000F1230] .__block_commit_write+0xa4/0x120
[C000000028F87630] [C0000000000F12E4] .generic_commit_write+0x38/0x74
[C000000028F876C0] [D000000000148348] .ext3_ordered_commit_write+0xac/0x114 [ext3]
[C000000028F87770] [C0000000000B81E8] .generic_file_buffered_write+0x5b0/0x828
[C000000028F87920] [C0000000000B8804] .__generic_file_aio_write_nolock+0x3a4/0x448
[C000000028F87A20] [C0000000000B8920] .generic_file_aio_write+0x78/0x104
[C000000028F87AE0] [D000000000142F10] .ext3_file_write+0x2c/0xd4 [ext3]
[C000000028F87B70] [C0000000000ED89C] .do_sync_write+0xd4/0x130
[C000000028F87CF0] [C0000000000EE620] .vfs_write+0x118/0x200
[C000000028F87D90] [C0000000000EED90] .sys_write+0x4c/0x8c
[C000000028F87E30] [C0000000000086A4] syscall_exit+0x0/0x40
Instruction dump:
40a2fff0 4c00012c 2fa00000 39400001 41be009c e91e8050 39200000 39600000 
48000030 996d01c9 800d0008 7ce0f828 <2c070000> 40820010 7c00f92d 40a2fff0 
BUG: soft lockup - CPU#2 stuck for 10s! [syslogd:1609]
NIP: C0000000001BC33C LR: C0000000001BC3AC CTR: 0000000000000000
REGS: c00000003ee7b440 TRAP: 0901   Tainted: P       (2.6.18-68.el5debug)
MSR: 9000000000009032 <EE,ME,IR,DR>  CR: 88004488  XER: 00000000
TASK = c00000002041c0e0[1609] 'syslogd' THREAD: c00000003ee78000 CPU: 2
NIP [C0000000001BC33C] ._raw_spin_lock+0xc8/0x184
LR [C0000000001BC3AC] ._raw_spin_lock+0x138/0x184
Call Trace:
[C00000003EE7B6C0] [C0000000001BC3AC] ._raw_spin_lock+0x138/0x184 (unreliable)
[C00000003EE7B760] [C000000000355520] ._spin_lock+0x10/0x24
[C00000003EE7B7E0] [C00000000011EC84] .__mark_inode_dirty+0x110/0x1c8
[C00000003EE7B880] [C0000000001121F0] .file_update_time+0xc0/0xe8
[C00000003EE7B910] [C0000000000B8728] .__generic_file_aio_write_nolock+0x2c8/0x448
[C00000003EE7BA10] [C0000000000B8A24] .__generic_file_write_nolock+0x78/0xb0
[C00000003EE7BB80] [C0000000000B8AC0] .generic_file_writev+0x64/0xf8
[C00000003EE7BC30] [C00000000012A7B0] .compat_do_readv_writev+0x240/0x3f0
[C00000003EE7BD90] [C00000000012A9E8] .compat_sys_writev+0x88/0xbc
[C00000003EE7BE30] [C0000000000086A4] syscall_exit+0x0/0x40
Instruction dump:
40a2fff0 4c00012c 2fa00000 39400001 41be009c e91e8050 39200000 39600000 
48000030 996d01c9 800d0008 7ce0f828 <2c070000> 40820010 7c00f92d 40a2fff0 
BUG: soft lockup - CPU#1 stuck for 10s! [scrashme:11099]
NIP: C0000000001BC33C LR: C0000000001BC3AC CTR: 0000000000000000
REGS: c00000001c483450 TRAP: 0901   Tainted: P       (2.6.18-68.el5debug)
MSR: 9000000000009032 <EE,ME,IR,DR>  CR: 88000488  XER: 00000000
TASK = c000000001f47d60[11099] 'scrashme' THREAD: c00000001c480000 CPU: 1
NIP [C0000000001BC33C] ._raw_spin_lock+0xc8/0x184
LR [C0000000001BC3AC] ._raw_spin_lock+0x138/0x184
Call Trace:
[C00000001C4836D0] [C0000000001BC3AC] ._raw_spin_lock+0x138/0x184 (unreliable)
[C00000001C483770] [C000000000355520] ._spin_lock+0x10/0x24
[C00000001C4837F0] [C00000000011EC84] .__mark_inode_dirty+0x110/0x1c8
[C00000001C483890] [C0000000001121F0] .file_update_time+0xc0/0xe8
[C00000001C483920] [C0000000000B8728] .__generic_file_aio_write_nolock+0x2c8/0x448
[C00000001C483A20] [C0000000000B8920] .generic_file_aio_write+0x78/0x104
[C00000001C483AE0] [D000000000142F10] .ext3_file_write+0x2c/0xd4 [ext3]
[C00000001C483B70] [C0000000000ED89C] .do_sync_write+0xd4/0x130
[C00000001C483CF0] [C0000000000EE620] .vfs_write+0x118/0x200
[C00000001C483D90] [C0000000000EED90] .sys_write+0x4c/0x8c
[C00000001C483E30] [C0000000000086A4] syscall_exit+0x0/0x40
Instruction dump:
40a2fff0 4c00012c 2fa00000 39400001 41be009c e91e8050 39200000 39600000 
48000030 996d01c9 800d0008 7ce0f828 <2c070000> 40820010 7c00f92d 40a2fff0 
BUG: soft lockup - CPU#0 stuck for 10s! [scrashme:11053]
NIP: C0000000000E4E10 LR: C0000000000E4E0C CTR: C000000000071348
REGS: c00000000ffff750 TRAP: 0901   Tainted: P       (2.6.18-68.el5debug)
MSR: 9000000000009032 <EE,ME,IR,DR>  CR: 24000482  XER: 00000000
TASK = c00000000cdbaaf0[11053] 'scrashme' THREAD: c000000036778000 CPU: 0
NIP [C0000000000E4E10] .kmem_cache_zalloc+0x11c/0x178
LR [C0000000000E4E0C] .kmem_cache_zalloc+0x118/0x178
Call Trace:
[C00000000FFFF9D0] [C0000000000E4E0C] .kmem_cache_zalloc+0x118/0x178 (unreliable)
[C00000000FFFFA70] [C0000000000A70CC] .__audit_signal_info+0x178/0x2ac
[C00000000FFFFB00] [C0000000000792A4] .check_kill_permission+0x8c/0x184
[C00000000FFFFB90] [C00000000007A990] .group_send_sig_info+0x2c/0xa0
[C00000000FFFFC40] [C00000000007AA54] .send_group_sig_info+0x50/0x88
[C00000000FFFFCE0] [C000000000071374] .it_real_fn+0x2c/0x74
[C00000000FFFFD70] [C000000000089308] .hrtimer_run_queues+0x174/0x22c
[C00000000FFFFE40] [C0000000000773BC] .run_timer_softirq+0x38/0x210
[C00000000FFFFEF0] [C000000000072BE0] .__do_softirq+0xa8/0x168
[C00000000FFFFF90] [C000000000027944] .call_do_softirq+0x14/0x24
[C00000003677B580] [C00000000000CD38] .do_softirq+0x70/0x98
[C00000003677B610] [C000000000072B20] .irq_exit+0x5c/0x74
[C00000003677B690] [C000000000024CAC] .timer_interrupt+0x448/0x478
[C00000003677B770] [C0000000000034F4] decrementer_common+0xf4/0x100
--- Exception: 901 at .sync_sb_inodes+0x9c/0x308
    LR = .sync_sb_inodes+0x2bc/0x308
[C00000003677BB10] [C00000000011E9C4] .sync_inodes_sb+0xd0/0x100
[C00000003677BBF0] [C00000000011EA88] .__sync_inodes+0x94/0x120
[C00000003677BC90] [C00000000011EB38] .sync_inodes+0x24/0x60
[C00000003677BD20] [C0000000000F1598] .do_sync+0x38/0xc0
[C00000003677BDB0] [C0000000000F1634] .sys_sync+0x14/0x28
[C00000003677BE30] [C0000000000086A4] syscall_exit+0x0/0x40
Instruction dump:
7d295214 e8690028 48000024 381d053c 7d200028 31290001 7d20012d 40a2fff4 
7fa3eb78 7f84e378 4bfff4a5 7fe10164 <7c651b78> 7f84e378 7f66db78 7fa3eb78 

Expected results:
This not complete with xero failures

Additional info:
So far this is the only system we have seen this on. This is a ppc system.
This seems to be hardware specific

Comment 1 Jeff Burke 2008-01-14 19:33:41 UTC
This happened while testing the kernel-debug-2.6.18-68.el5 variant.

Comment 2 Peter Zijlstra 2008-01-14 20:31:30 UTC
Created attachment 291630 [details]
add some more info to the lockup print

Could you give it a run with this added?

Comment 3 Don Zickus 2008-01-14 21:01:29 UTC
Would it be easier if we reserved you a machine with the test on it instead?

Not sure how much free cycles Jeff has.

Comment 4 Peter Zijlstra 2008-01-15 22:23:37 UTC
Ok, I grabbed that machine and compiled a CVS ppc64-debug kernel with a little
extra spinlock debug sauce. completed scrashme successfully.

Of course my kernel never got P tainted,... no idea where that came from.

Comment 5 Don Zickus 2008-01-16 01:42:26 UTC
The P tainted comes from a previous test that purposely loads a bad kernel
module stub to check that P tainted appears.


Comment 6 Peter Zijlstra 2008-01-17 13:39:32 UTC
I can't seem to reproduce, I ran scrashme for hours on that machine (until my 1d
reservation ran out), but it never triggered.

Comment 7 Linda Wang 2008-02-06 21:21:45 UTC
peter, can you try with the regular kernel just to see if you can
reproduce it?