Bug 428713 - [RHEL5 U2] BUG: spinlock lockup on CPU#1
[RHEL5 U2] BUG: spinlock lockup on CPU#1
Status: CLOSED DEFERRED
Product: Red Hat Enterprise Linux 5
Classification: Red Hat
Component: kernel (Show other bugs)
5.2
ppc64 Linux
high Severity high
: rc
: ---
Assigned To: Peter Zijlstra
Martin Jenner
http://rhts.lab.boston.redhat.com/cgi...
:
Depends On:
Blocks: KernelPrio5.3
  Show dependency treegraph
 
Reported: 2008-01-14 14:29 EST by Jeff Burke
Modified: 2014-08-11 01:40 EDT (History)
5 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2008-09-04 12:03:36 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
add some more info to the lockup print (494 bytes, patch)
2008-01-14 15:31 EST, Peter Zijlstra
no flags Details | Diff

  None (edit)
Description Jeff Burke 2008-01-14 14:29:39 EST
Description of problem:
 While running the KernelTier1 tests on ibm-qs20-02.lab.boston.redhat.com
The scrashme test caused a BUG:

Version-Release number of selected component (if applicable):
2.6.18-68.el5

How reproducible:
Unknown

Steps to Reproduce:
1. Install RHEL5.1 on ibm-qs20-02.lab.boston.redhat.com
2. Install the 2.6.18-68.el5 kernel
3. Reboot then run the scrashme test
  
Actual results:
BUG: spinlock lockup on CPU#1, scrashme/11099, c0000000004a0bc0 (Tainted: P     )
Call Trace:
[C00000001C483630] [C000000000010B48] .show_stack+0x68/0x1b0 (unreliable)
[C00000001C4836D0] [C0000000001BC3AC] ._raw_spin_lock+0x138/0x184
[C00000001C483770] [C000000000355520] ._spin_lock+0x10/0x24
[C00000001C4837F0] [C00000000011EC84] .__mark_inode_dirty+0x110/0x1c8
[C00000001C483890] [C0000000001121F0] .file_update_time+0xc0/0xe8
[C00000001C483920] [C0000000000B8728] .__generic_file_aio_write_nolock+0x2c8/0x448
[C00000001C483A20] [C0000000000B8920] .generic_file_aio_write+0x78/0x104
[C00000001C483AE0] [D000000000142F10] .ext3_file_write+0x2c/0xd4 [ext3]
[C00000001C483B70] [C0000000000ED89C] .do_sync_write+0xd4/0x130
[C00000001C483CF0] [C0000000000EE620] .vfs_write+0x118/0x200
[C00000001C483D90] [C0000000000EED90] .sys_write+0x4c/0x8c
[C00000001C483E30] [C0000000000086A4] syscall_exit+0x0/0x40
BUG: spinlock lockup on CPU#3, scrashme/11036, c0000000004a0bc0 (Tainted: P     )
Call Trace:
[C000000028F87200] [C000000000010B48] .show_stack+0x68/0x1b0 (unreliable)
[C000000028F872A0] [C0000000001BC3AC] ._raw_spin_lock+0x138/0x184
[C000000028F87340] [C000000000355520] ._spin_lock+0x10/0x24
[C000000028F873C0] [C00000000011EC84] .__mark_inode_dirty+0x110/0x1c8
[C000000028F87460] [C0000000000BED10] .__set_page_dirty_nobuffers+0x11c/0x140
[C000000028F874F0] [C0000000000F1178] .mark_buffer_dirty+0x4c/0x60
[C000000028F87570] [C0000000000F1230] .__block_commit_write+0xa4/0x120
[C000000028F87630] [C0000000000F12E4] .generic_commit_write+0x38/0x74
[C000000028F876C0] [D000000000148348] .ext3_ordered_commit_write+0xac/0x114 [ext3]
[C000000028F87770] [C0000000000B81E8] .generic_file_buffered_write+0x5b0/0x828
[C000000028F87920] [C0000000000B8804] .__generic_file_aio_write_nolock+0x3a4/0x448
[C000000028F87A20] [C0000000000B8920] .generic_file_aio_write+0x78/0x104
[C000000028F87AE0] [D000000000142F10] .ext3_file_write+0x2c/0xd4 [ext3]
[C000000028F87B70] [C0000000000ED89C] .do_sync_write+0xd4/0x130
[C000000028F87CF0] [C0000000000EE620] .vfs_write+0x118/0x200
[C000000028F87D90] [C0000000000EED90] .sys_write+0x4c/0x8c
[C000000028F87E30] [C0000000000086A4] syscall_exit+0x0/0x40
BUG: spinlock lockup on CPU#2, syslogd/1609, c0000000004a0bc0 (Tainted: P     )
Call Trace:
[C00000003EE7B620] [C000000000010B48] .show_stack+0x68/0x1b0 (unreliable)
[C00000003EE7B6C0] [C0000000001BC3AC] ._raw_spin_lock+0x138/0x184
[C00000003EE7B760] [C000000000355520] ._spin_lock+0x10/0x24
[C00000003EE7B7E0] [C00000000011EC84] .__mark_inode_dirty+0x110/0x1c8
[C00000003EE7B880] [C0000000001121F0] .file_update_time+0xc0/0xe8
[C00000003EE7B910] [C0000000000B8728] .__generic_file_aio_write_nolock+0x2c8/0x448
[C00000003EE7BA10] [C0000000000B8A24] .__generic_file_write_nolock+0x78/0xb0
[C00000003EE7BB80] [C0000000000B8AC0] .generic_file_writev+0x64/0xf8
[C00000003EE7BC30] [C00000000012A7B0] .compat_do_readv_writev+0x240/0x3f0
[C00000003EE7BD90] [C00000000012A9E8] .compat_sys_writev+0x88/0xbc
[C00000003EE7BE30] [C0000000000086A4] syscall_exit+0x0/0x40
BUG: soft lockup - CPU#3 stuck for 10s! [scrashme:11036]
NIP: C0000000001BC33C LR: C0000000001BC3AC CTR: 0000000000000000
REGS: c000000028f87020 TRAP: 0901   Tainted: P       (2.6.18-68.el5debug)
MSR: 9000000000009032 <EE,ME,IR,DR>  CR: 88000488  XER: 00000000
TASK = c000000022d39080[11036] 'scrashme' THREAD: c000000028f84000 CPU: 3
NIP [C0000000001BC33C] ._raw_spin_lock+0xc8/0x184
LR [C0000000001BC3AC] ._raw_spin_lock+0x138/0x184
Call Trace:
[C000000028F872A0] [C0000000001BC3AC] ._raw_spin_lock+0x138/0x184 (unreliable)
[C000000028F87340] [C000000000355520] ._spin_lock+0x10/0x24
[C000000028F873C0] [C00000000011EC84] .__mark_inode_dirty+0x110/0x1c8
[C000000028F87460] [C0000000000BED10] .__set_page_dirty_nobuffers+0x11c/0x140
[C000000028F874F0] [C0000000000F1178] .mark_buffer_dirty+0x4c/0x60
[C000000028F87570] [C0000000000F1230] .__block_commit_write+0xa4/0x120
[C000000028F87630] [C0000000000F12E4] .generic_commit_write+0x38/0x74
[C000000028F876C0] [D000000000148348] .ext3_ordered_commit_write+0xac/0x114 [ext3]
[C000000028F87770] [C0000000000B81E8] .generic_file_buffered_write+0x5b0/0x828
[C000000028F87920] [C0000000000B8804] .__generic_file_aio_write_nolock+0x3a4/0x448
[C000000028F87A20] [C0000000000B8920] .generic_file_aio_write+0x78/0x104
[C000000028F87AE0] [D000000000142F10] .ext3_file_write+0x2c/0xd4 [ext3]
[C000000028F87B70] [C0000000000ED89C] .do_sync_write+0xd4/0x130
[C000000028F87CF0] [C0000000000EE620] .vfs_write+0x118/0x200
[C000000028F87D90] [C0000000000EED90] .sys_write+0x4c/0x8c
[C000000028F87E30] [C0000000000086A4] syscall_exit+0x0/0x40
Instruction dump:
40a2fff0 4c00012c 2fa00000 39400001 41be009c e91e8050 39200000 39600000 
48000030 996d01c9 800d0008 7ce0f828 <2c070000> 40820010 7c00f92d 40a2fff0 
BUG: soft lockup - CPU#2 stuck for 10s! [syslogd:1609]
NIP: C0000000001BC33C LR: C0000000001BC3AC CTR: 0000000000000000
REGS: c00000003ee7b440 TRAP: 0901   Tainted: P       (2.6.18-68.el5debug)
MSR: 9000000000009032 <EE,ME,IR,DR>  CR: 88004488  XER: 00000000
TASK = c00000002041c0e0[1609] 'syslogd' THREAD: c00000003ee78000 CPU: 2
NIP [C0000000001BC33C] ._raw_spin_lock+0xc8/0x184
LR [C0000000001BC3AC] ._raw_spin_lock+0x138/0x184
Call Trace:
[C00000003EE7B6C0] [C0000000001BC3AC] ._raw_spin_lock+0x138/0x184 (unreliable)
[C00000003EE7B760] [C000000000355520] ._spin_lock+0x10/0x24
[C00000003EE7B7E0] [C00000000011EC84] .__mark_inode_dirty+0x110/0x1c8
[C00000003EE7B880] [C0000000001121F0] .file_update_time+0xc0/0xe8
[C00000003EE7B910] [C0000000000B8728] .__generic_file_aio_write_nolock+0x2c8/0x448
[C00000003EE7BA10] [C0000000000B8A24] .__generic_file_write_nolock+0x78/0xb0
[C00000003EE7BB80] [C0000000000B8AC0] .generic_file_writev+0x64/0xf8
[C00000003EE7BC30] [C00000000012A7B0] .compat_do_readv_writev+0x240/0x3f0
[C00000003EE7BD90] [C00000000012A9E8] .compat_sys_writev+0x88/0xbc
[C00000003EE7BE30] [C0000000000086A4] syscall_exit+0x0/0x40
Instruction dump:
40a2fff0 4c00012c 2fa00000 39400001 41be009c e91e8050 39200000 39600000 
48000030 996d01c9 800d0008 7ce0f828 <2c070000> 40820010 7c00f92d 40a2fff0 
BUG: soft lockup - CPU#1 stuck for 10s! [scrashme:11099]
NIP: C0000000001BC33C LR: C0000000001BC3AC CTR: 0000000000000000
REGS: c00000001c483450 TRAP: 0901   Tainted: P       (2.6.18-68.el5debug)
MSR: 9000000000009032 <EE,ME,IR,DR>  CR: 88000488  XER: 00000000
TASK = c000000001f47d60[11099] 'scrashme' THREAD: c00000001c480000 CPU: 1
NIP [C0000000001BC33C] ._raw_spin_lock+0xc8/0x184
LR [C0000000001BC3AC] ._raw_spin_lock+0x138/0x184
Call Trace:
[C00000001C4836D0] [C0000000001BC3AC] ._raw_spin_lock+0x138/0x184 (unreliable)
[C00000001C483770] [C000000000355520] ._spin_lock+0x10/0x24
[C00000001C4837F0] [C00000000011EC84] .__mark_inode_dirty+0x110/0x1c8
[C00000001C483890] [C0000000001121F0] .file_update_time+0xc0/0xe8
[C00000001C483920] [C0000000000B8728] .__generic_file_aio_write_nolock+0x2c8/0x448
[C00000001C483A20] [C0000000000B8920] .generic_file_aio_write+0x78/0x104
[C00000001C483AE0] [D000000000142F10] .ext3_file_write+0x2c/0xd4 [ext3]
[C00000001C483B70] [C0000000000ED89C] .do_sync_write+0xd4/0x130
[C00000001C483CF0] [C0000000000EE620] .vfs_write+0x118/0x200
[C00000001C483D90] [C0000000000EED90] .sys_write+0x4c/0x8c
[C00000001C483E30] [C0000000000086A4] syscall_exit+0x0/0x40
Instruction dump:
40a2fff0 4c00012c 2fa00000 39400001 41be009c e91e8050 39200000 39600000 
48000030 996d01c9 800d0008 7ce0f828 <2c070000> 40820010 7c00f92d 40a2fff0 
BUG: soft lockup - CPU#0 stuck for 10s! [scrashme:11053]
NIP: C0000000000E4E10 LR: C0000000000E4E0C CTR: C000000000071348
REGS: c00000000ffff750 TRAP: 0901   Tainted: P       (2.6.18-68.el5debug)
MSR: 9000000000009032 <EE,ME,IR,DR>  CR: 24000482  XER: 00000000
TASK = c00000000cdbaaf0[11053] 'scrashme' THREAD: c000000036778000 CPU: 0
NIP [C0000000000E4E10] .kmem_cache_zalloc+0x11c/0x178
LR [C0000000000E4E0C] .kmem_cache_zalloc+0x118/0x178
Call Trace:
[C00000000FFFF9D0] [C0000000000E4E0C] .kmem_cache_zalloc+0x118/0x178 (unreliable)
[C00000000FFFFA70] [C0000000000A70CC] .__audit_signal_info+0x178/0x2ac
[C00000000FFFFB00] [C0000000000792A4] .check_kill_permission+0x8c/0x184
[C00000000FFFFB90] [C00000000007A990] .group_send_sig_info+0x2c/0xa0
[C00000000FFFFC40] [C00000000007AA54] .send_group_sig_info+0x50/0x88
[C00000000FFFFCE0] [C000000000071374] .it_real_fn+0x2c/0x74
[C00000000FFFFD70] [C000000000089308] .hrtimer_run_queues+0x174/0x22c
[C00000000FFFFE40] [C0000000000773BC] .run_timer_softirq+0x38/0x210
[C00000000FFFFEF0] [C000000000072BE0] .__do_softirq+0xa8/0x168
[C00000000FFFFF90] [C000000000027944] .call_do_softirq+0x14/0x24
[C00000003677B580] [C00000000000CD38] .do_softirq+0x70/0x98
[C00000003677B610] [C000000000072B20] .irq_exit+0x5c/0x74
[C00000003677B690] [C000000000024CAC] .timer_interrupt+0x448/0x478
[C00000003677B770] [C0000000000034F4] decrementer_common+0xf4/0x100
--- Exception: 901 at .sync_sb_inodes+0x9c/0x308
    LR = .sync_sb_inodes+0x2bc/0x308
[C00000003677BB10] [C00000000011E9C4] .sync_inodes_sb+0xd0/0x100
[C00000003677BBF0] [C00000000011EA88] .__sync_inodes+0x94/0x120
[C00000003677BC90] [C00000000011EB38] .sync_inodes+0x24/0x60
[C00000003677BD20] [C0000000000F1598] .do_sync+0x38/0xc0
[C00000003677BDB0] [C0000000000F1634] .sys_sync+0x14/0x28
[C00000003677BE30] [C0000000000086A4] syscall_exit+0x0/0x40
Instruction dump:
7d295214 e8690028 48000024 381d053c 7d200028 31290001 7d20012d 40a2fff4 
7fa3eb78 7f84e378 4bfff4a5 7fe10164 <7c651b78> 7f84e378 7f66db78 7fa3eb78 

Expected results:
This not complete with xero failures

Additional info:
So far this is the only system we have seen this on. This is a ppc system.
This seems to be hardware specific
Comment 1 Jeff Burke 2008-01-14 14:33:41 EST
This happened while testing the kernel-debug-2.6.18-68.el5 variant.
Comment 2 Peter Zijlstra 2008-01-14 15:31:30 EST
Created attachment 291630 [details]
add some more info to the lockup print

Could you give it a run with this added?
Comment 3 Don Zickus 2008-01-14 16:01:29 EST
Would it be easier if we reserved you a machine with the test on it instead?

Not sure how much free cycles Jeff has.
Comment 4 Peter Zijlstra 2008-01-15 17:23:37 EST
Ok, I grabbed that machine and compiled a CVS ppc64-debug kernel with a little
extra spinlock debug sauce. completed scrashme successfully.

Of course my kernel never got P tainted,... no idea where that came from.
Comment 5 Don Zickus 2008-01-15 20:42:26 EST
The P tainted comes from a previous test that purposely loads a bad kernel
module stub to check that P tainted appears.
Comment 6 Peter Zijlstra 2008-01-17 08:39:32 EST
I can't seem to reproduce, I ran scrashme for hours on that machine (until my 1d
reservation ran out), but it never triggered.
Comment 7 Linda Wang 2008-02-06 16:21:45 EST
peter, can you try with the regular kernel just to see if you can
reproduce it?

Note You need to log in before you can comment on or make changes to this bug.