Description of problem: While running the KernelTier1 tests on ibm-qs20-02.lab.boston.redhat.com The scrashme test caused a BUG: Version-Release number of selected component (if applicable): 2.6.18-68.el5 How reproducible: Unknown Steps to Reproduce: 1. Install RHEL5.1 on ibm-qs20-02.lab.boston.redhat.com 2. Install the 2.6.18-68.el5 kernel 3. Reboot then run the scrashme test Actual results: BUG: spinlock lockup on CPU#1, scrashme/11099, c0000000004a0bc0 (Tainted: P ) Call Trace: [C00000001C483630] [C000000000010B48] .show_stack+0x68/0x1b0 (unreliable) [C00000001C4836D0] [C0000000001BC3AC] ._raw_spin_lock+0x138/0x184 [C00000001C483770] [C000000000355520] ._spin_lock+0x10/0x24 [C00000001C4837F0] [C00000000011EC84] .__mark_inode_dirty+0x110/0x1c8 [C00000001C483890] [C0000000001121F0] .file_update_time+0xc0/0xe8 [C00000001C483920] [C0000000000B8728] .__generic_file_aio_write_nolock+0x2c8/0x448 [C00000001C483A20] [C0000000000B8920] .generic_file_aio_write+0x78/0x104 [C00000001C483AE0] [D000000000142F10] .ext3_file_write+0x2c/0xd4 [ext3] [C00000001C483B70] [C0000000000ED89C] .do_sync_write+0xd4/0x130 [C00000001C483CF0] [C0000000000EE620] .vfs_write+0x118/0x200 [C00000001C483D90] [C0000000000EED90] .sys_write+0x4c/0x8c [C00000001C483E30] [C0000000000086A4] syscall_exit+0x0/0x40 BUG: spinlock lockup on CPU#3, scrashme/11036, c0000000004a0bc0 (Tainted: P ) Call Trace: [C000000028F87200] [C000000000010B48] .show_stack+0x68/0x1b0 (unreliable) [C000000028F872A0] [C0000000001BC3AC] ._raw_spin_lock+0x138/0x184 [C000000028F87340] [C000000000355520] ._spin_lock+0x10/0x24 [C000000028F873C0] [C00000000011EC84] .__mark_inode_dirty+0x110/0x1c8 [C000000028F87460] [C0000000000BED10] .__set_page_dirty_nobuffers+0x11c/0x140 [C000000028F874F0] [C0000000000F1178] .mark_buffer_dirty+0x4c/0x60 [C000000028F87570] [C0000000000F1230] .__block_commit_write+0xa4/0x120 [C000000028F87630] [C0000000000F12E4] .generic_commit_write+0x38/0x74 [C000000028F876C0] [D000000000148348] .ext3_ordered_commit_write+0xac/0x114 [ext3] [C000000028F87770] [C0000000000B81E8] .generic_file_buffered_write+0x5b0/0x828 [C000000028F87920] [C0000000000B8804] .__generic_file_aio_write_nolock+0x3a4/0x448 [C000000028F87A20] [C0000000000B8920] .generic_file_aio_write+0x78/0x104 [C000000028F87AE0] [D000000000142F10] .ext3_file_write+0x2c/0xd4 [ext3] [C000000028F87B70] [C0000000000ED89C] .do_sync_write+0xd4/0x130 [C000000028F87CF0] [C0000000000EE620] .vfs_write+0x118/0x200 [C000000028F87D90] [C0000000000EED90] .sys_write+0x4c/0x8c [C000000028F87E30] [C0000000000086A4] syscall_exit+0x0/0x40 BUG: spinlock lockup on CPU#2, syslogd/1609, c0000000004a0bc0 (Tainted: P ) Call Trace: [C00000003EE7B620] [C000000000010B48] .show_stack+0x68/0x1b0 (unreliable) [C00000003EE7B6C0] [C0000000001BC3AC] ._raw_spin_lock+0x138/0x184 [C00000003EE7B760] [C000000000355520] ._spin_lock+0x10/0x24 [C00000003EE7B7E0] [C00000000011EC84] .__mark_inode_dirty+0x110/0x1c8 [C00000003EE7B880] [C0000000001121F0] .file_update_time+0xc0/0xe8 [C00000003EE7B910] [C0000000000B8728] .__generic_file_aio_write_nolock+0x2c8/0x448 [C00000003EE7BA10] [C0000000000B8A24] .__generic_file_write_nolock+0x78/0xb0 [C00000003EE7BB80] [C0000000000B8AC0] .generic_file_writev+0x64/0xf8 [C00000003EE7BC30] [C00000000012A7B0] .compat_do_readv_writev+0x240/0x3f0 [C00000003EE7BD90] [C00000000012A9E8] .compat_sys_writev+0x88/0xbc [C00000003EE7BE30] [C0000000000086A4] syscall_exit+0x0/0x40 BUG: soft lockup - CPU#3 stuck for 10s! [scrashme:11036] NIP: C0000000001BC33C LR: C0000000001BC3AC CTR: 0000000000000000 REGS: c000000028f87020 TRAP: 0901 Tainted: P (2.6.18-68.el5debug) MSR: 9000000000009032 <EE,ME,IR,DR> CR: 88000488 XER: 00000000 TASK = c000000022d39080[11036] 'scrashme' THREAD: c000000028f84000 CPU: 3 NIP [C0000000001BC33C] ._raw_spin_lock+0xc8/0x184 LR [C0000000001BC3AC] ._raw_spin_lock+0x138/0x184 Call Trace: [C000000028F872A0] [C0000000001BC3AC] ._raw_spin_lock+0x138/0x184 (unreliable) [C000000028F87340] [C000000000355520] ._spin_lock+0x10/0x24 [C000000028F873C0] [C00000000011EC84] .__mark_inode_dirty+0x110/0x1c8 [C000000028F87460] [C0000000000BED10] .__set_page_dirty_nobuffers+0x11c/0x140 [C000000028F874F0] [C0000000000F1178] .mark_buffer_dirty+0x4c/0x60 [C000000028F87570] [C0000000000F1230] .__block_commit_write+0xa4/0x120 [C000000028F87630] [C0000000000F12E4] .generic_commit_write+0x38/0x74 [C000000028F876C0] [D000000000148348] .ext3_ordered_commit_write+0xac/0x114 [ext3] [C000000028F87770] [C0000000000B81E8] .generic_file_buffered_write+0x5b0/0x828 [C000000028F87920] [C0000000000B8804] .__generic_file_aio_write_nolock+0x3a4/0x448 [C000000028F87A20] [C0000000000B8920] .generic_file_aio_write+0x78/0x104 [C000000028F87AE0] [D000000000142F10] .ext3_file_write+0x2c/0xd4 [ext3] [C000000028F87B70] [C0000000000ED89C] .do_sync_write+0xd4/0x130 [C000000028F87CF0] [C0000000000EE620] .vfs_write+0x118/0x200 [C000000028F87D90] [C0000000000EED90] .sys_write+0x4c/0x8c [C000000028F87E30] [C0000000000086A4] syscall_exit+0x0/0x40 Instruction dump: 40a2fff0 4c00012c 2fa00000 39400001 41be009c e91e8050 39200000 39600000 48000030 996d01c9 800d0008 7ce0f828 <2c070000> 40820010 7c00f92d 40a2fff0 BUG: soft lockup - CPU#2 stuck for 10s! [syslogd:1609] NIP: C0000000001BC33C LR: C0000000001BC3AC CTR: 0000000000000000 REGS: c00000003ee7b440 TRAP: 0901 Tainted: P (2.6.18-68.el5debug) MSR: 9000000000009032 <EE,ME,IR,DR> CR: 88004488 XER: 00000000 TASK = c00000002041c0e0[1609] 'syslogd' THREAD: c00000003ee78000 CPU: 2 NIP [C0000000001BC33C] ._raw_spin_lock+0xc8/0x184 LR [C0000000001BC3AC] ._raw_spin_lock+0x138/0x184 Call Trace: [C00000003EE7B6C0] [C0000000001BC3AC] ._raw_spin_lock+0x138/0x184 (unreliable) [C00000003EE7B760] [C000000000355520] ._spin_lock+0x10/0x24 [C00000003EE7B7E0] [C00000000011EC84] .__mark_inode_dirty+0x110/0x1c8 [C00000003EE7B880] [C0000000001121F0] .file_update_time+0xc0/0xe8 [C00000003EE7B910] [C0000000000B8728] .__generic_file_aio_write_nolock+0x2c8/0x448 [C00000003EE7BA10] [C0000000000B8A24] .__generic_file_write_nolock+0x78/0xb0 [C00000003EE7BB80] [C0000000000B8AC0] .generic_file_writev+0x64/0xf8 [C00000003EE7BC30] [C00000000012A7B0] .compat_do_readv_writev+0x240/0x3f0 [C00000003EE7BD90] [C00000000012A9E8] .compat_sys_writev+0x88/0xbc [C00000003EE7BE30] [C0000000000086A4] syscall_exit+0x0/0x40 Instruction dump: 40a2fff0 4c00012c 2fa00000 39400001 41be009c e91e8050 39200000 39600000 48000030 996d01c9 800d0008 7ce0f828 <2c070000> 40820010 7c00f92d 40a2fff0 BUG: soft lockup - CPU#1 stuck for 10s! [scrashme:11099] NIP: C0000000001BC33C LR: C0000000001BC3AC CTR: 0000000000000000 REGS: c00000001c483450 TRAP: 0901 Tainted: P (2.6.18-68.el5debug) MSR: 9000000000009032 <EE,ME,IR,DR> CR: 88000488 XER: 00000000 TASK = c000000001f47d60[11099] 'scrashme' THREAD: c00000001c480000 CPU: 1 NIP [C0000000001BC33C] ._raw_spin_lock+0xc8/0x184 LR [C0000000001BC3AC] ._raw_spin_lock+0x138/0x184 Call Trace: [C00000001C4836D0] [C0000000001BC3AC] ._raw_spin_lock+0x138/0x184 (unreliable) [C00000001C483770] [C000000000355520] ._spin_lock+0x10/0x24 [C00000001C4837F0] [C00000000011EC84] .__mark_inode_dirty+0x110/0x1c8 [C00000001C483890] [C0000000001121F0] .file_update_time+0xc0/0xe8 [C00000001C483920] [C0000000000B8728] .__generic_file_aio_write_nolock+0x2c8/0x448 [C00000001C483A20] [C0000000000B8920] .generic_file_aio_write+0x78/0x104 [C00000001C483AE0] [D000000000142F10] .ext3_file_write+0x2c/0xd4 [ext3] [C00000001C483B70] [C0000000000ED89C] .do_sync_write+0xd4/0x130 [C00000001C483CF0] [C0000000000EE620] .vfs_write+0x118/0x200 [C00000001C483D90] [C0000000000EED90] .sys_write+0x4c/0x8c [C00000001C483E30] [C0000000000086A4] syscall_exit+0x0/0x40 Instruction dump: 40a2fff0 4c00012c 2fa00000 39400001 41be009c e91e8050 39200000 39600000 48000030 996d01c9 800d0008 7ce0f828 <2c070000> 40820010 7c00f92d 40a2fff0 BUG: soft lockup - CPU#0 stuck for 10s! [scrashme:11053] NIP: C0000000000E4E10 LR: C0000000000E4E0C CTR: C000000000071348 REGS: c00000000ffff750 TRAP: 0901 Tainted: P (2.6.18-68.el5debug) MSR: 9000000000009032 <EE,ME,IR,DR> CR: 24000482 XER: 00000000 TASK = c00000000cdbaaf0[11053] 'scrashme' THREAD: c000000036778000 CPU: 0 NIP [C0000000000E4E10] .kmem_cache_zalloc+0x11c/0x178 LR [C0000000000E4E0C] .kmem_cache_zalloc+0x118/0x178 Call Trace: [C00000000FFFF9D0] [C0000000000E4E0C] .kmem_cache_zalloc+0x118/0x178 (unreliable) [C00000000FFFFA70] [C0000000000A70CC] .__audit_signal_info+0x178/0x2ac [C00000000FFFFB00] [C0000000000792A4] .check_kill_permission+0x8c/0x184 [C00000000FFFFB90] [C00000000007A990] .group_send_sig_info+0x2c/0xa0 [C00000000FFFFC40] [C00000000007AA54] .send_group_sig_info+0x50/0x88 [C00000000FFFFCE0] [C000000000071374] .it_real_fn+0x2c/0x74 [C00000000FFFFD70] [C000000000089308] .hrtimer_run_queues+0x174/0x22c [C00000000FFFFE40] [C0000000000773BC] .run_timer_softirq+0x38/0x210 [C00000000FFFFEF0] [C000000000072BE0] .__do_softirq+0xa8/0x168 [C00000000FFFFF90] [C000000000027944] .call_do_softirq+0x14/0x24 [C00000003677B580] [C00000000000CD38] .do_softirq+0x70/0x98 [C00000003677B610] [C000000000072B20] .irq_exit+0x5c/0x74 [C00000003677B690] [C000000000024CAC] .timer_interrupt+0x448/0x478 [C00000003677B770] [C0000000000034F4] decrementer_common+0xf4/0x100 --- Exception: 901 at .sync_sb_inodes+0x9c/0x308 LR = .sync_sb_inodes+0x2bc/0x308 [C00000003677BB10] [C00000000011E9C4] .sync_inodes_sb+0xd0/0x100 [C00000003677BBF0] [C00000000011EA88] .__sync_inodes+0x94/0x120 [C00000003677BC90] [C00000000011EB38] .sync_inodes+0x24/0x60 [C00000003677BD20] [C0000000000F1598] .do_sync+0x38/0xc0 [C00000003677BDB0] [C0000000000F1634] .sys_sync+0x14/0x28 [C00000003677BE30] [C0000000000086A4] syscall_exit+0x0/0x40 Instruction dump: 7d295214 e8690028 48000024 381d053c 7d200028 31290001 7d20012d 40a2fff4 7fa3eb78 7f84e378 4bfff4a5 7fe10164 <7c651b78> 7f84e378 7f66db78 7fa3eb78 Expected results: This not complete with xero failures Additional info: So far this is the only system we have seen this on. This is a ppc system. This seems to be hardware specific
This happened while testing the kernel-debug-2.6.18-68.el5 variant.
Created attachment 291630 [details] add some more info to the lockup print Could you give it a run with this added?
Would it be easier if we reserved you a machine with the test on it instead? Not sure how much free cycles Jeff has.
Ok, I grabbed that machine and compiled a CVS ppc64-debug kernel with a little extra spinlock debug sauce. completed scrashme successfully. Of course my kernel never got P tainted,... no idea where that came from.
The P tainted comes from a previous test that purposely loads a bad kernel module stub to check that P tainted appears.
I can't seem to reproduce, I ran scrashme for hours on that machine (until my 1d reservation ran out), but it never triggered.
peter, can you try with the regular kernel just to see if you can reproduce it?