Red Hat Bugzilla – Bug 243083
soft lockup detected on CPU#0 and CPU#1
Last modified: 2007-11-30 17:12:06 EST
Description of problem: The following traces appear in /var/log/messages with F7 kernel (2.6.21-1.3194) for a Dell Precision 370 (P4 3Ghz step1): Trace 1: Jun 6 14:21:46 linuxbf kernel: BUG: soft lockup detected on CPU#0! Jun 6 14:21:46 linuxbf kernel: [<c0451f3e>] softlockup_tick+0xa5/0xb4 Jun 6 14:21:46 linuxbf kernel: [<c042e930>] update_process_times+0x3b/0x5e Jun 6 14:21:46 linuxbf kernel: [<c043d2bd>] tick_sched_timer+0x78/0xbb Jun 6 14:21:46 linuxbf kernel: [<c0439df5>] hrtimer_interrupt+0x12b/0x1b6 Jun 6 14:21:46 linuxbf kernel: [<c043d245>] tick_sched_timer+0x0/0xbb Jun 6 14:21:46 linuxbf kernel: [<c05b8578>] rt_check_expire+0x0/0x158 Jun 6 14:21:46 linuxbf kernel: [<c0419c40>] smp_apic_timer_interrupt+0x6f/0x80 Jun 6 14:21:46 linuxbf kernel: [<c04059bc>] apic_timer_interrupt+0x28/0x30 Jun 6 14:21:46 linuxbf kernel: [<c05b8578>] rt_check_expire+0x0/0x158 Jun 6 14:21:46 linuxbf kernel: [<c042007b>] find_busiest_group+0x207/0x4c5 Jun 6 14:21:46 linuxbf kernel: [<c042dcee>] run_timer_softirq+0x10a/0x17b Jun 6 14:21:46 linuxbf kernel: [<c05b8578>] rt_check_expire+0x0/0x158 Jun 6 14:21:46 linuxbf kernel: [<c042b2e5>] __do_softirq+0x5d/0xba Jun 6 14:21:46 linuxbf kernel: [<c04071b7>] do_softirq+0x59/0xb1 Jun 6 14:21:46 linuxbf kernel: [<c042b1c7>] ksoftirqd+0x0/0xc1 Jun 6 14:21:46 linuxbf kernel: [<c042b226>] ksoftirqd+0x5f/0xc1 Jun 6 14:21:46 linuxbf kernel: [<c0436da8>] kthread+0xb0/0xd8 Jun 6 14:21:46 linuxbf kernel: [<c0436cf8>] kthread+0x0/0xd8 Jun 6 14:21:46 linuxbf kernel: [<c0405b3f>] kernel_thread_helper+0x7/0x10 Jun 6 14:21:46 linuxbf kernel: ======================= Jun 6 14:49:50 linuxbf kernel: BUG: soft lockup detected on CPU#1! Jun 6 14:49:50 linuxbf kernel: [<c0451f3e>] softlockup_tick+0xa5/0xb4 Jun 6 14:49:50 linuxbf kernel: [<c042e930>] update_process_times+0x3b/0x5e Jun 6 14:49:50 linuxbf kernel: [<c043d2bd>] tick_sched_timer+0x78/0xbb Jun 6 14:49:50 linuxbf kernel: [<c0439df5>] hrtimer_interrupt+0x12b/0x1b6 Jun 6 14:49:50 linuxbf kernel: [<c043d245>] tick_sched_timer+0x0/0xbb Jun 6 14:49:50 linuxbf kernel: [<c0419c40>] smp_apic_timer_interrupt+0x6f/0x80 Jun 6 14:49:50 linuxbf kernel: [<c04059bc>] apic_timer_interrupt+0x28/0x30 Jun 6 14:49:50 linuxbf kernel: ======================= Trace 2: Jun 6 21:44:32 linuxbf kernel: Clocksource tsc unstable (delta = 501984757941 ns) Jun 6 21:44:32 linuxbf ntpd[1985]: synchronized to LOCAL(0), stratum 10 Jun 6 21:44:32 linuxbf kernel: Time: hpet clocksource has been installed. Jun 6 21:44:32 linuxbf kernel: BUG: soft lockup detected on CPU#0! Jun 6 21:44:32 linuxbf kernel: [<c0451f3e>] softlockup_tick+0xa5/0xb4 Jun 6 21:44:32 linuxbf kernel: [<c042e930>] update_process_times+0x3b/0x5e Jun 6 21:44:32 linuxbf kernel: [<c043d2bd>] tick_sched_timer+0x78/0xbb Jun 6 21:44:32 linuxbf kernel: [<c0439df5>] hrtimer_interrupt+0x12b/0x1b6 Jun 6 21:44:32 linuxbf kernel: [<c043d245>] tick_sched_timer+0x0/0xbb Jun 6 21:44:32 linuxbf kernel: [<c05c3634>] inet_twdr_hangman+0x0/0x94 Jun 6 21:44:32 linuxbf kernel: [<c0419c40>] smp_apic_timer_interrupt+0x6f/0x80 Jun 6 21:44:32 linuxbf kernel: [<c042e863>] __mod_timer+0xa1/0xab Jun 6 21:44:32 linuxbf kernel: [<c04059bc>] apic_timer_interrupt+0x28/0x30 Jun 6 21:44:32 linuxbf kernel: [<c05c3634>] inet_twdr_hangman+0x0/0x94 Jun 6 21:44:32 linuxbf kernel: [<c042007b>] find_busiest_group+0x207/0x4c5 Jun 6 21:44:32 linuxbf kernel: [<c042dcee>] run_timer_softirq+0x10a/0x17b Jun 6 21:44:32 linuxbf kernel: [<c05c3634>] inet_twdr_hangman+0x0/0x94 Jun 6 21:44:32 linuxbf kernel: [<c042a588>] it_real_fn+0x12/0x16 Jun 6 21:44:32 linuxbf kernel: [<c042b2e5>] __do_softirq+0x5d/0xba Jun 6 21:44:32 linuxbf kernel: [<c04071b7>] do_softirq+0x59/0xb1 Jun 6 21:44:32 linuxbf kernel: [<c042b1c7>] ksoftirqd+0x0/0xc1 Jun 6 21:44:32 linuxbf kernel: [<c042b226>] ksoftirqd+0x5f/0xc1 Jun 6 21:44:32 linuxbf kernel: [<c0436da8>] kthread+0xb0/0xd8 Jun 6 21:44:32 linuxbf kernel: [<c0436cf8>] kthread+0x0/0xd8 Jun 6 21:44:32 linuxbf kernel: [<c0405b3f>] kernel_thread_helper+0x7/0x10 Jun 6 21:44:32 linuxbf kernel: ======================= Jun 6 21:44:32 linuxbf kernel: sd 0:0:0:0: SCSI error: return code = 0x06000000 Jun 6 21:44:32 linuxbf kernel: end_request: I/O error, dev sda, sector 32452941 Jun 6 21:44:32 linuxbf kernel: EXT3-fs error (device dm-0): read_block_bitmap: Cannot read block bitmap - block_group = 123, block_bitmap = 4030464 Jun 6 21:44:32 linuxbf kernel: BUG: soft lockup detected on CPU#1! Jun 6 21:44:32 linuxbf kernel: [<c0451f3e>] softlockup_tick+0xa5/0xb4 Jun 6 21:44:32 linuxbf kernel: [<c042e930>] update_process_times+0x3b/0x5e Jun 6 21:44:32 linuxbf kernel: [<c043d2bd>] tick_sched_timer+0x78/0xbb Jun 6 21:44:32 linuxbf kernel: [<c0439df5>] hrtimer_interrupt+0x12b/0x1b6 Jun 6 21:44:32 linuxbf kernel: [<c043d245>] tick_sched_timer+0x0/0xbb Jun 6 21:44:32 linuxbf kernel: [<c0419c40>] smp_apic_timer_interrupt+0x6f/0x80 Jun 6 21:44:32 linuxbf kernel: [<c04059bc>] apic_timer_interrupt+0x28/0x30 Jun 6 21:44:32 linuxbf kernel: [<c043007b>] do_notify_parent+0xf1/0x154 Jun 6 21:44:32 linuxbf kernel: [<c0403281>] mwait_idle_with_hints+0x3b/0x3f Jun 6 21:44:32 linuxbf kernel: [<c04033d6>] cpu_idle+0xa3/0xc4 Jun 6 21:44:32 linuxbf kernel: ======================= Version-Release number of selected component (if applicable): How reproducible: Computer was running latest kernel for FC5 without any problems until upgraded to F7. Then this bug appeared twice as show above. System is also polluted with bug #240982. System is now unstable and hang a few times or becomes to crawl. Fix badly needed.
please try kernel parameter clocksource=acpi_pm
What I did: - after my bug report, I stayed in non-hyperthreading mode (set in bios), had the previous reported kernel traces in /var/log/messages but did not experience any problem with the computer for a full days of work (many cross-compilations, find(1) for /, etc): no crawling, no crash. - read your query, set 'clocksource=acpi_pm' in /etc/grub.conf, used the bios to re-activate hyperthreading. System booted finely. I can see in /var/log/messages: Jun 7 19:18:21 linuxbf kernel: Time: acpi_pm clocksource has been installed. (times are local time in France) Now the system works correctly, but I must go back home. I'll let the computer run and report further problems (or lack of!) (FYI Bug #240982 is still present.) Thanks.
This morning the computer was still running. However it was much less responsive than yesterday when I rebooted it. I started to fill the bugzilla form while the computer was crawling more and more until it froze. The only solution was to switch it off/on. I went back to single core operation (thru bios) and now it works correctly. Having added 'clocksource=acpi_pm' got rid of 'soft lockup' messages in /var/log/messages. I'll report later, but I think that the freezing problem is linked to bug #240982 and not the present one which has vanished with the 'clocksource' statement.
Using 'clocksource=acpi_pm' and hyperthreading disabled for many hours now, I still have no more "soft lockup" nor crawling.
Computer ran for a week-end without any problem. IMHO 'clocksource=acpi_pm' fixed the problem.
are you still running without hyperthreading enabled ? Could someone explain this parameter in basic detail please I can't find into on it.
when I say this parameter I mean the 'clocksource=acpi_pm' option.
Yes I'm still in single core mode, set thru bios. I wait for a fix for bug #240982 before trying dual core mode again: I can't afford freezes these days.
Went back to hyperthreading set thru bios. Dropped parameter 'clocksource=acpi_pm'. Updated kernel to 2.6.21-1.3228. No more 'soft lockup' in /var/log/messages 40 minutes after reboot. Will report later if this error message is back.
Computer froze after one hour. No particular output in /var/log/messages. Was unable to ssh in the computer, lost the mouse pointer when hit ctrl-alt-f1 while trying to get a text terminal. Went back to F7 original kernel, no dual core, clocksource=acpi_pm. Will retry later when I can afford to lose some more time...
I am experiencing the same problem but only if I run my folding@home client. I get the messages in /var/log/messages and I cannot start new processes (but all the running one are fine). If I kill the folding@home client everything is fine.
Created attachment 156990 [details] dmesg trace This is a snippet of the kernel messages. I will maybe try to change the clocksource when I'll reboot the machine.
I'm running folding@home also. Thanks to have pointed it that to me. If there is a problem at a low level on threads or sockets, then F@H may activate a yet unknown bug! I'll try later to reboot with 3228 and no F@H.
I can confirm the same behavior with 3228.
Went back to hyperthreading with kernel 3228, no clocksource parameter. I leave the office now and won't be back until Tuesday. I let the computer idling without folding@home.
Did not need to reboot since friday with kernel 3228 (computer running 6 days). No more softlockup message. This kernel is fine for me, but I did not try it with F@H. Time to close this bug report?