Bug 247470 - soft cpu lockups with heavy cpu + io
Summary: soft cpu lockups with heavy cpu + io
Status: CLOSED INSUFFICIENT_DATA
Alias: None
Product: Fedora
Classification: Fedora
Component: kernel (Show other bugs)
(Show other bugs)
Version: 7
Hardware: All Linux
low
medium
Target Milestone: ---
Assignee: Kernel Maintainer List
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Keywords:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2007-07-09 14:59 UTC by Ra P.
Modified: 2008-01-09 15:59 UTC (History)
1 user (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2008-01-09 15:59:36 UTC
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)

Description Ra P. 2007-07-09 14:59:46 UTC
Running a shred on a partition (unmounted, of course), gives me a ton of lockups.

Well, not a ton, but three so far today.
The box hangs for a minute or so then continues. The disk whirrs away happily.

dmesg says:

BUG: soft lockup detected on CPU#1!
 [<c0451ea2>] softlockup_tick+0xa5/0xb4
 [<c042e930>] update_process_times+0x3b/0x5e
 [<c043d298>] tick_sched_timer+0x57/0x9a
 [<c0439df5>] hrtimer_interrupt+0x12b/0x1b6
 [<c043d241>] tick_sched_timer+0x0/0x9a
 [<c0419c40>] smp_apic_timer_interrupt+0x6f/0x80
 [<c04059bc>] apic_timer_interrupt+0x28/0x30
 [<c0403281>] mwait_idle_with_hints+0x3b/0x3f
 [<c04033d6>] cpu_idle+0xa3/0xc4
 =======================
BUG: soft lockup detected on CPU#0!
 [<c0451ea2>] softlockup_tick+0xa5/0xb4
 [<c042e930>] update_process_times+0x3b/0x5e
 [<c043d298>] tick_sched_timer+0x57/0x9a
 [<c0439df5>] hrtimer_interrupt+0x12b/0x1b6
 [<c043d241>] tick_sched_timer+0x0/0x9a
 [<c0419c40>] smp_apic_timer_interrupt+0x6f/0x80
 [<c04059bc>] apic_timer_interrupt+0x28/0x30
 [<f88c6788>] scsi_request_fn+0x2ce/0x31d [scsi_mod]
 [<c04d9e72>] blk_run_queue+0x37/0x63
 [<f88c5421>] scsi_next_command+0x25/0x2f [scsi_mod]
 [<f88c55e1>] scsi_end_request+0xa1/0xab [scsi_mod]
 [<f88c5791>] scsi_io_completion+0x163/0x328 [scsi_mod]
 [<f88525ca>] sd_rw_intr+0x218/0x242 [sd_mod]
 [<f903a39e>] rm_check_pci_config_space+0x1ce/0x5a0 [nvidia]
 [<f88c13db>] scsi_finish_command+0x81/0x88 [scsi_mod]
 [<f92b1798>] nv_verify_pci_config+0x47/0x91 [nvidia]
 [<c04da574>] blk_done_softirq+0x49/0x54
 [<c042b2e5>] __do_softirq+0x5d/0xba
 [<c04071b7>] do_softirq+0x59/0xb1
 [<c042b1c7>] ksoftirqd+0x0/0xc1
 [<c042b226>] ksoftirqd+0x5f/0xc1
 [<c0436da8>] kthread+0xb0/0xd8
 [<c0436cf8>] kthread+0x0/0xd8
 [<c0405b3f>] kernel_thread_helper+0x7/0x10
 =======================
Clocksource tsc unstable (delta = 60925666023 ns)
Time: acpi_pm clocksource has been installed.

*****

BUG: soft lockup detected on CPU#1!
 [<c0451ea2>] softlockup_tick+0xa5/0xb4
 [<c042e930>] update_process_times+0x3b/0x5e
 [<c043d298>] tick_sched_timer+0x57/0x9a
 [<c0439df5>] hrtimer_interrupt+0x12b/0x1b6
 [<c043d241>] tick_sched_timer+0x0/0x9a
 [<c042de8b>] process_timeout+0x0/0x5
 [<c0419c40>] smp_apic_timer_interrupt+0x6f/0x80
 [<c042e748>] lock_timer_base+0x19/0x35
 [<c04059bc>] apic_timer_interrupt+0x28/0x30
 [<c06009e6>] schedule_timeout+0x6b/0x8d
 [<c042de8b>] process_timeout+0x0/0x5
 [<c042007b>] find_busiest_group+0x207/0x4c5
 [<c042dcee>] run_timer_softirq+0x10a/0x17b
 [<c042de8b>] process_timeout+0x0/0x5
 [<c042a588>] it_real_fn+0x12/0x16
 [<c042b2e5>] __do_softirq+0x5d/0xba
 [<c04071b7>] do_softirq+0x59/0xb1
 [<c042b1c7>] ksoftirqd+0x0/0xc1
 [<c042b226>] ksoftirqd+0x5f/0xc1
 [<c0436da8>] kthread+0xb0/0xd8
 [<c0436cf8>] kthread+0x0/0xd8
 [<c0405b3f>] kernel_thread_helper+0x7/0x10
 =======================
BUG: soft lockup detected on CPU#0!
 [<c0451ea2>] softlockup_tick+0xa5/0xb4
 [<c042e930>] update_process_times+0x3b/0x5e
 [<c043d298>] tick_sched_timer+0x57/0x9a
 [<c0439df5>] hrtimer_interrupt+0x12b/0x1b6
 [<c043d241>] tick_sched_timer+0x0/0x9a
 [<c0419c40>] smp_apic_timer_interrupt+0x6f/0x80
 [<c04059bc>] apic_timer_interrupt+0x28/0x30
 [<c060007b>] __sched_text_start+0x5eb/0x89e
 [<c047672b>] fget_light+0x36/0x66
 [<c048007d>] do_sys_poll+0x173/0x327
 [<c0480ba3>] __pollwait+0x0/0xac
 [<c042229f>] default_wake_function+0x0/0xc
 [<c042229f>] default_wake_function+0x0/0xc
 [<c042229f>] default_wake_function+0x0/0xc
 [<c042229f>] default_wake_function+0x0/0xc
 [<c042229f>] default_wake_function+0x0/0xc
 [<c042229f>] default_wake_function+0x0/0xc
 [<c042229f>] default_wake_function+0x0/0xc
 [<c042229f>] default_wake_function+0x0/0xc
 [<c042229f>] default_wake_function+0x0/0xc
 [<c042229f>] default_wake_function+0x0/0xc
 [<c042229f>] default_wake_function+0x0/0xc
 [<c042229f>] default_wake_function+0x0/0xc
 [<c042229f>] default_wake_function+0x0/0xc
 [<c042229f>] default_wake_function+0x0/0xc
 [<c042229f>] default_wake_function+0x0/0xc
 [<c042229f>] default_wake_function+0x0/0xc
 [<c042229f>] default_wake_function+0x0/0xc
 [<c0422295>] try_to_wake_up+0x3aa/0x3b4
 [<c0598781>] sock_aio_write+0xf6/0x102
 [<c0420a23>] __wake_up+0x32/0x43
 [<c05f8655>] unix_ioctl+0x8b/0x93
 [<c0598b47>] sock_ioctl+0x19f/0x1be
 [<c0450fcd>] audit_syscall_exit+0x294/0x2b0
 [<c0450d0f>] audit_syscall_entry+0x10d/0x137
 [<c0480265>] sys_poll+0x34/0x37
 [<c0404f70>] syscall_call+0x7/0xb
 =======================

*****

BUG: soft lockup detected on CPU#1!
 [<c0451ea2>] softlockup_tick+0xa5/0xb4
 [<c042e930>] update_process_times+0x3b/0x5e
 [<c043d298>] tick_sched_timer+0x57/0x9a
 [<c0439df5>] hrtimer_interrupt+0x12b/0x1b6
 [<c043d241>] tick_sched_timer+0x0/0x9a
 [<c0419c40>] smp_apic_timer_interrupt+0x6f/0x80
 [<c04059bc>] apic_timer_interrupt+0x28/0x30
 [<c0403281>] mwait_idle_with_hints+0x3b/0x3f
 [<c04033d6>] cpu_idle+0xa3/0xc4
 =======================

Comment 1 Ra P. 2007-07-09 15:01:00 UTC
Not cutting edge kernel 2.6.21-1.3228.fc7 with an nvidia binary driver loaded.

Not sure if that is at all relevant though.

Comment 2 Ra P. 2007-07-10 08:06:33 UTC
and again..

BUG: soft lockup detected on CPU#1!
 [<c0451ea2>] softlockup_tick+0xa5/0xb4
 [<c042e930>] update_process_times+0x3b/0x5e
 [<c043d298>] tick_sched_timer+0x57/0x9a
 [<c0439df5>] hrtimer_interrupt+0x12b/0x1b6
 [<c043d241>] tick_sched_timer+0x0/0x9a
 [<c0419c40>] smp_apic_timer_interrupt+0x6f/0x80
 [<c041fab5>] __wake_up_common+0x32/0x55
 [<c04059bc>] apic_timer_interrupt+0x28/0x30
 [<c0601895>] _spin_unlock_irqrestore+0x8/0x9
 [<c05f9679>] unix_write_space+0x47/0x71
 [<c059c620>] sock_wfree+0x21/0x36
 [<c059de78>] __kfree_skb+0xb5/0x113
 [<c059f345>] memcpy_toiovec+0x27/0x4a
 [<c05f804a>] unix_stream_recvmsg+0x336/0x4aa
 [<c0598889>] sock_aio_read+0xfc/0x108
 [<c047555c>] do_sync_read+0xc7/0x10a
 [<c0436e71>] autoremove_wake_function+0x0/0x35
 [<c059868b>] sock_aio_write+0x0/0x102
 [<c0475dfd>] vfs_read+0xba/0x152
 [<c047623f>] sys_read+0x41/0x67
 [<c0404f70>] syscall_call+0x7/0xb
 =======================


Comment 3 Ra P. 2007-07-10 11:43:10 UTC
and again:

BUG: soft lockup detected on CPU#0!
 [<c0451ea2>] softlockup_tick+0xa5/0xb4
 [<c042e930>] update_process_times+0x3b/0x5e
 [<c043d298>] tick_sched_timer+0x57/0x9a
 [<c0439df5>] hrtimer_interrupt+0x12b/0x1b6
 [<c043d241>] tick_sched_timer+0x0/0x9a
 [<c0419c40>] smp_apic_timer_interrupt+0x6f/0x80
 [<c04059bc>] apic_timer_interrupt+0x28/0x30
 [<c042007b>] find_busiest_group+0x207/0x4c5
 [<c0404f36>] system_call+0x16/0x32
 =======================


Comment 4 Ra P. 2007-07-10 12:41:18 UTC
once more!

BUG: soft lockup detected on CPU#1!
 [<c0451ea2>] softlockup_tick+0xa5/0xb4
 [<c042e930>] update_process_times+0x3b/0x5e
 [<c043d298>] tick_sched_timer+0x57/0x9a
 [<c0439df5>] hrtimer_interrupt+0x12b/0x1b6
 [<c043d241>] tick_sched_timer+0x0/0x9a
 [<c0419c40>] smp_apic_timer_interrupt+0x6f/0x80
 [<c04059bc>] apic_timer_interrupt+0x28/0x30
 [<c0601895>] _spin_unlock_irqrestore+0x8/0x9
 [<f92b67d2>] os_release_sema+0x3f/0x6c [nvidia]
 [<f902fe2a>] _nv002658rm+0x12/0x1c [nvidia]
 [<f900f058>] _nv003624rm+0xa4/0xd0 [nvidia]
 [<f903a7ad>] _nv001996rm+0x3d/0x770 [nvidia]
 [<f903aaf3>] _nv001996rm+0x383/0x770 [nvidia]
 [<f92b62e5>] os_pci_read_dword+0x2b/0x34 [nvidia]
 [<f903a624>] rm_check_pci_config_space+0x454/0x5a0 [nvidia]
 [<f90383cc>] rm_ioctl+0x1c/0x24 [nvidia]
 [<c04e7070>] copy_from_user+0x3a/0x66
 [<f92b3663>] nv_kern_ioctl+0x2f4/0x365 [nvidia]
 [<f92b3709>] nv_kern_unlocked_ioctl+0x18/0x1d [nvidia]
 [<f92b36f1>] nv_kern_unlocked_ioctl+0x0/0x1d [nvidia]
 [<c047f713>] do_ioctl+0x1f/0x62
 [<c047f99a>] vfs_ioctl+0x244/0x256
 [<c047f9f8>] sys_ioctl+0x4c/0x64
 [<c0404f70>] syscall_call+0x7/0xb
 =======================


Comment 5 Ra P. 2007-07-10 12:52:03 UTC
If a bug falls in bugzilla, and there are no kernel maintainers around to
comment on it, is it still a bug?

Another spectacular lockup!

BUG: soft lockup detected on CPU#0!
 [<c0451ea2>] softlockup_tick+0xa5/0xb4
 [<c042e930>] update_process_times+0x3b/0x5e
 [<c043d298>] tick_sched_timer+0x57/0x9a
 [<c0439df5>] hrtimer_interrupt+0x12b/0x1b6
 [<c043d241>] tick_sched_timer+0x0/0x9a
 [<c0419c40>] smp_apic_timer_interrupt+0x6f/0x80
 [<c04059bc>] apic_timer_interrupt+0x28/0x30
 [<c0403281>] mwait_idle_with_hints+0x3b/0x3f
 [<c04033d6>] cpu_idle+0xa3/0xc4
 [<c071bb5f>] start_kernel+0x435/0x43d
 [<c071b25a>] unknown_bootoption+0x0/0x202
 =======================


Comment 6 Ra P. 2007-07-10 13:13:54 UTC
BUG: soft lockup detected on CPU#1!
 [<c0451ea2>] softlockup_tick+0xa5/0xb4
 [<c042e930>] update_process_times+0x3b/0x5e
 [<c043d298>] tick_sched_timer+0x57/0x9a
 [<c0439df5>] hrtimer_interrupt+0x12b/0x1b6
 [<c043d241>] tick_sched_timer+0x0/0x9a
 [<c0419c40>] smp_apic_timer_interrupt+0x6f/0x80
 [<c04059bc>] apic_timer_interrupt+0x28/0x30
 [<c0403281>] mwait_idle_with_hints+0x3b/0x3f
 [<c04033d6>] cpu_idle+0xa3/0xc4
 =======================


Comment 7 Ra P. 2007-07-10 14:05:36 UTC
A shorter one this time:

BUG: soft lockup detected on CPU#1!
 [<c0451ea2>] softlockup_tick+0xa5/0xb4
 [<c042e930>] update_process_times+0x3b/0x5e
 [<c043d298>] tick_sched_timer+0x57/0x9a
 [<c0439df5>] hrtimer_interrupt+0x12b/0x1b6
 [<c043d241>] tick_sched_timer+0x0/0x9a
 [<c0419c40>] smp_apic_timer_interrupt+0x6f/0x80
 [<c04059bc>] apic_timer_interrupt+0x28/0x30
 =======================

Comment 8 Christopher Brown 2007-09-17 21:13:32 UTC
Hello,

I'm reviewing this bug as part of the kernel bug triage project, an attempt to
isolate current bugs in the fedora kernel.

http://fedoraproject.org/wiki/KernelBugTriage

I am CC'ing myself to this bug and will try and assist you in resolving it if I can.

There hasn't been much activity on this bug for a while. Could you tell me if
you are still having problems with the latest kernel? From a brief look, could
you try with nohz=off as a boot parameter and tell me if you see the same errors.

If the problem no longer exists then please close this bug or I'll do so in a
few days if there is no additional information lodged.

Cheers
Chris

Comment 9 Christopher Brown 2007-09-18 14:30:46 UTC
Please could you also boot with:

nosoftlockup

as a boot parameter.

Cheers
Chris

Comment 10 Christopher Brown 2008-01-09 15:59:36 UTC
As indicated previously there has been no update on the progress of this bug
therefore I am closing it as INSUFFICIENT_DATA. Please re-open if the issue
still occurs for you and I will try to assist in its resolution. Thank you for
taking the time to report the initial bug.


Note You need to log in before you can comment on or make changes to this bug.