Bug 243083 - soft lockup detected on CPU#0 and CPU#1
soft lockup detected on CPU#0 and CPU#1
Status: CLOSED ERRATA
Product: Fedora
Classification: Fedora
Component: kernel (Show other bugs)
7
i386 Linux
low Severity high
: ---
: ---
Assigned To: Kernel Maintainer List
Brian Brock
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2007-06-07 04:58 EDT by Bernard Fouché
Modified: 2007-11-30 17:12 EST (History)
2 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2007-08-29 14:38:46 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
dmesg trace (4.55 KB, text/plain)
2007-06-14 09:10 EDT, Matteo Corti
no flags Details

  None (edit)
Description Bernard Fouché 2007-06-07 04:58:06 EDT
Description of problem:

The following traces appear in /var/log/messages with F7 kernel (2.6.21-1.3194)
for a Dell Precision 370 (P4 3Ghz step1):

Trace 1:
Jun  6 14:21:46 linuxbf kernel: BUG: soft lockup detected on CPU#0!
Jun  6 14:21:46 linuxbf kernel:  [<c0451f3e>] softlockup_tick+0xa5/0xb4
Jun  6 14:21:46 linuxbf kernel:  [<c042e930>] update_process_times+0x3b/0x5e
Jun  6 14:21:46 linuxbf kernel:  [<c043d2bd>] tick_sched_timer+0x78/0xbb
Jun  6 14:21:46 linuxbf kernel:  [<c0439df5>] hrtimer_interrupt+0x12b/0x1b6
Jun  6 14:21:46 linuxbf kernel:  [<c043d245>] tick_sched_timer+0x0/0xbb
Jun  6 14:21:46 linuxbf kernel:  [<c05b8578>] rt_check_expire+0x0/0x158
Jun  6 14:21:46 linuxbf kernel:  [<c0419c40>] smp_apic_timer_interrupt+0x6f/0x80
Jun  6 14:21:46 linuxbf kernel:  [<c04059bc>] apic_timer_interrupt+0x28/0x30
Jun  6 14:21:46 linuxbf kernel:  [<c05b8578>] rt_check_expire+0x0/0x158
Jun  6 14:21:46 linuxbf kernel:  [<c042007b>] find_busiest_group+0x207/0x4c5
Jun  6 14:21:46 linuxbf kernel:  [<c042dcee>] run_timer_softirq+0x10a/0x17b
Jun  6 14:21:46 linuxbf kernel:  [<c05b8578>] rt_check_expire+0x0/0x158
Jun  6 14:21:46 linuxbf kernel:  [<c042b2e5>] __do_softirq+0x5d/0xba
Jun  6 14:21:46 linuxbf kernel:  [<c04071b7>] do_softirq+0x59/0xb1
Jun  6 14:21:46 linuxbf kernel:  [<c042b1c7>] ksoftirqd+0x0/0xc1
Jun  6 14:21:46 linuxbf kernel:  [<c042b226>] ksoftirqd+0x5f/0xc1
Jun  6 14:21:46 linuxbf kernel:  [<c0436da8>] kthread+0xb0/0xd8
Jun  6 14:21:46 linuxbf kernel:  [<c0436cf8>] kthread+0x0/0xd8
Jun  6 14:21:46 linuxbf kernel:  [<c0405b3f>] kernel_thread_helper+0x7/0x10
Jun  6 14:21:46 linuxbf kernel:  =======================
Jun  6 14:49:50 linuxbf kernel: BUG: soft lockup detected on CPU#1!
Jun  6 14:49:50 linuxbf kernel:  [<c0451f3e>] softlockup_tick+0xa5/0xb4
Jun  6 14:49:50 linuxbf kernel:  [<c042e930>] update_process_times+0x3b/0x5e
Jun  6 14:49:50 linuxbf kernel:  [<c043d2bd>] tick_sched_timer+0x78/0xbb
Jun  6 14:49:50 linuxbf kernel:  [<c0439df5>] hrtimer_interrupt+0x12b/0x1b6
Jun  6 14:49:50 linuxbf kernel:  [<c043d245>] tick_sched_timer+0x0/0xbb
Jun  6 14:49:50 linuxbf kernel:  [<c0419c40>] smp_apic_timer_interrupt+0x6f/0x80
Jun  6 14:49:50 linuxbf kernel:  [<c04059bc>] apic_timer_interrupt+0x28/0x30
Jun  6 14:49:50 linuxbf kernel:  =======================


Trace 2:
Jun  6 21:44:32 linuxbf kernel: Clocksource tsc unstable (delta = 501984757941 ns)
Jun  6 21:44:32 linuxbf ntpd[1985]: synchronized to LOCAL(0), stratum 10
Jun  6 21:44:32 linuxbf kernel: Time: hpet clocksource has been installed.
Jun  6 21:44:32 linuxbf kernel: BUG: soft lockup detected on CPU#0!
Jun  6 21:44:32 linuxbf kernel:  [<c0451f3e>] softlockup_tick+0xa5/0xb4
Jun  6 21:44:32 linuxbf kernel:  [<c042e930>] update_process_times+0x3b/0x5e
Jun  6 21:44:32 linuxbf kernel:  [<c043d2bd>] tick_sched_timer+0x78/0xbb
Jun  6 21:44:32 linuxbf kernel:  [<c0439df5>] hrtimer_interrupt+0x12b/0x1b6
Jun  6 21:44:32 linuxbf kernel:  [<c043d245>] tick_sched_timer+0x0/0xbb
Jun  6 21:44:32 linuxbf kernel:  [<c05c3634>] inet_twdr_hangman+0x0/0x94
Jun  6 21:44:32 linuxbf kernel:  [<c0419c40>] smp_apic_timer_interrupt+0x6f/0x80
Jun  6 21:44:32 linuxbf kernel:  [<c042e863>] __mod_timer+0xa1/0xab
Jun  6 21:44:32 linuxbf kernel:  [<c04059bc>] apic_timer_interrupt+0x28/0x30
Jun  6 21:44:32 linuxbf kernel:  [<c05c3634>] inet_twdr_hangman+0x0/0x94
Jun  6 21:44:32 linuxbf kernel:  [<c042007b>] find_busiest_group+0x207/0x4c5
Jun  6 21:44:32 linuxbf kernel:  [<c042dcee>] run_timer_softirq+0x10a/0x17b
Jun  6 21:44:32 linuxbf kernel:  [<c05c3634>] inet_twdr_hangman+0x0/0x94
Jun  6 21:44:32 linuxbf kernel:  [<c042a588>] it_real_fn+0x12/0x16
Jun  6 21:44:32 linuxbf kernel:  [<c042b2e5>] __do_softirq+0x5d/0xba
Jun  6 21:44:32 linuxbf kernel:  [<c04071b7>] do_softirq+0x59/0xb1
Jun  6 21:44:32 linuxbf kernel:  [<c042b1c7>] ksoftirqd+0x0/0xc1
Jun  6 21:44:32 linuxbf kernel:  [<c042b226>] ksoftirqd+0x5f/0xc1
Jun  6 21:44:32 linuxbf kernel:  [<c0436da8>] kthread+0xb0/0xd8
Jun  6 21:44:32 linuxbf kernel:  [<c0436cf8>] kthread+0x0/0xd8
Jun  6 21:44:32 linuxbf kernel:  [<c0405b3f>] kernel_thread_helper+0x7/0x10
Jun  6 21:44:32 linuxbf kernel:  =======================
Jun  6 21:44:32 linuxbf kernel: sd 0:0:0:0: SCSI error: return code = 0x06000000
Jun  6 21:44:32 linuxbf kernel: end_request: I/O error, dev sda, sector 32452941
Jun  6 21:44:32 linuxbf kernel: EXT3-fs error (device dm-0): read_block_bitmap:
Cannot read block bitmap - block_group = 123, block_bitmap = 4030464
Jun  6 21:44:32 linuxbf kernel: BUG: soft lockup detected on CPU#1!
Jun  6 21:44:32 linuxbf kernel:  [<c0451f3e>] softlockup_tick+0xa5/0xb4
Jun  6 21:44:32 linuxbf kernel:  [<c042e930>] update_process_times+0x3b/0x5e
Jun  6 21:44:32 linuxbf kernel:  [<c043d2bd>] tick_sched_timer+0x78/0xbb
Jun  6 21:44:32 linuxbf kernel:  [<c0439df5>] hrtimer_interrupt+0x12b/0x1b6
Jun  6 21:44:32 linuxbf kernel:  [<c043d245>] tick_sched_timer+0x0/0xbb
Jun  6 21:44:32 linuxbf kernel:  [<c0419c40>] smp_apic_timer_interrupt+0x6f/0x80
Jun  6 21:44:32 linuxbf kernel:  [<c04059bc>] apic_timer_interrupt+0x28/0x30
Jun  6 21:44:32 linuxbf kernel:  [<c043007b>] do_notify_parent+0xf1/0x154
Jun  6 21:44:32 linuxbf kernel:  [<c0403281>] mwait_idle_with_hints+0x3b/0x3f
Jun  6 21:44:32 linuxbf kernel:  [<c04033d6>] cpu_idle+0xa3/0xc4
Jun  6 21:44:32 linuxbf kernel:  =======================

Version-Release number of selected component (if applicable):


How reproducible:

Computer was running latest kernel for FC5 without any problems until upgraded
to F7. Then this bug appeared twice as show above. System is also polluted with
bug #240982. System is now unstable and hang a few times or becomes to crawl.
Fix badly needed.
Comment 1 Chuck Ebbert 2007-06-07 12:16:10 EDT
please try kernel parameter
    clocksource=acpi_pm
Comment 2 Bernard Fouché 2007-06-07 13:29:24 EDT
What I did:

- after my bug report, I stayed in non-hyperthreading mode (set in bios), had
the previous reported kernel traces in /var/log/messages but did not experience
any problem with the computer for a full days of work (many cross-compilations,
find(1) for /, etc): no crawling, no crash.

- read your query, set 'clocksource=acpi_pm' in /etc/grub.conf, used the bios to
re-activate hyperthreading. System booted finely. I can see in /var/log/messages:

Jun  7 19:18:21 linuxbf kernel: Time: acpi_pm clocksource has been installed.

(times are local time in France)

Now the system works correctly, but I must go back home. I'll let the computer
run and report further problems (or lack of!) (FYI Bug #240982 is still present.)

Thanks.
Comment 3 Bernard Fouché 2007-06-08 03:32:22 EDT
This morning the computer was still running. However it was much less responsive
than yesterday when I rebooted it. I started to fill the bugzilla form while the
computer was crawling more and more until it froze. The only solution was to
switch it off/on.

I went back to single core operation (thru bios) and now it works correctly.

Having added 'clocksource=acpi_pm' got rid of 'soft lockup' messages in
/var/log/messages.

I'll report later, but I think that the freezing problem is linked to bug
#240982 and not the present one which has vanished with the 'clocksource' statement.
Comment 4 Bernard Fouché 2007-06-08 11:13:10 EDT
Using 'clocksource=acpi_pm' and hyperthreading disabled for many hours now, I
still have no more "soft lockup" nor crawling.
Comment 5 Bernard Fouché 2007-06-11 04:13:20 EDT
Computer ran for a week-end without any problem. IMHO 'clocksource=acpi_pm'
fixed the problem.
Comment 6 Matt Darcy 2007-06-11 04:47:32 EDT
are you still running without hyperthreading enabled ?
Could someone explain this parameter in basic detail please I can't find into 
on it.
Comment 7 Matt Darcy 2007-06-11 06:59:16 EDT
when I say this parameter I mean the 'clocksource=acpi_pm' option.
Comment 8 Bernard Fouché 2007-06-11 11:32:47 EDT
Yes I'm still in single core mode, set thru bios. I wait for a fix for bug
#240982 before trying dual core mode again: I can't afford freezes these days.
Comment 9 Bernard Fouché 2007-06-14 04:30:41 EDT
Went back to hyperthreading set thru bios. Dropped parameter
'clocksource=acpi_pm'. Updated kernel to 2.6.21-1.3228. No more 'soft lockup' in
/var/log/messages 40 minutes after reboot. Will report later if this error
message is back.
Comment 10 Bernard Fouché 2007-06-14 05:39:12 EDT
Computer froze after one hour. No particular output in /var/log/messages. Was
unable to ssh in the computer, lost the mouse pointer when hit ctrl-alt-f1 while
trying to get a text terminal. Went back to F7 original kernel, no dual core,
clocksource=acpi_pm. Will retry later when I can afford to lose some more time...
Comment 11 Matteo Corti 2007-06-14 09:06:50 EDT
I am experiencing the same problem but only if I run my folding@home client. I
get the messages in /var/log/messages and I cannot start new processes (but all
the running one are fine). If I kill the folding@home client everything is fine.
Comment 12 Matteo Corti 2007-06-14 09:10:15 EDT
Created attachment 156990 [details]
dmesg trace

This is a snippet of the kernel messages. I will maybe try to change the
clocksource when I'll reboot the machine.
Comment 13 Bernard Fouché 2007-06-14 09:50:58 EDT
I'm running folding@home also. Thanks to have pointed it that to me. If there is
a problem at a low level on threads or sockets, then F@H may activate a yet
unknown bug! I'll try later to reboot with 3228 and no F@H.
Comment 14 Matteo Corti 2007-06-14 09:54:12 EDT
I can confirm the same behavior with 3228.
Comment 15 Bernard Fouché 2007-06-14 13:11:16 EDT
Went back to hyperthreading with kernel 3228, no clocksource parameter. I leave
the office now and won't be back until Tuesday. I let the computer idling
without folding@home.
Comment 16 Bernard Fouché 2007-06-20 12:29:05 EDT
Did not need to reboot since friday with kernel 3228 (computer running 6 days).
No more softlockup message. This kernel is fine for me, but I did not try it
with F@H. Time to close this bug report?

Note You need to log in before you can comment on or make changes to this bug.