442920 – BUG: soft lockup - CPU#0 stuck for 61s!

Bug 442920 - BUG: soft lockup - CPU#0 stuck for 61s!

Summary: BUG: soft lockup - CPU#0 stuck for 61s!

Keywords:
Status:	CLOSED INSUFFICIENT_DATA
Alias:	None
Product:	Fedora
Classification:	Fedora
Component:	kernel
Sub Component:
Version:	9
Hardware:	x86_64
OS:	Linux
Priority:	low
Severity:	low
Target Milestone:	---
Assignee:	Kernel Maintainer List
QA Contact:	Fedora Extras Quality Assurance
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2008-04-17 16:00 UTC by Pete Zaitcev
Modified:	2008-09-29 17:39 UTC (History)
CC List:	3 users (show)
Fixed In Version:
Clone Of:
Environment:
Last Closed:	2008-09-29 17:39:31 UTC
Type:	---
Embargoed:
Dependent Products:

Attachments	(Terms of Use)
/var/log/messages (unedited) (1.14 MB, text/plain) 2008-04-17 16:01 UTC, Pete Zaitcev	no flags	Details
version, messages log and version information (10.29 KB, text/plain) 2008-06-08 03:52 UTC, jamie levy	no flags	Details
System log of a similar problem (7.06 KB, text/plain) 2008-09-18 08:24 UTC, Bojan Smojver	no flags	Details
Output of lspci -vv and lspci -nn (Shuttle K45) (12.41 KB, text/plain) 2008-09-25 01:26 UTC, Bojan Smojver	no flags	Details
View All

Description Pete Zaitcev 2008-04-17 16:00:05 UTC

Description of problem:

Desktop dies. Unable to switch to text mode.

Version-Release number of selected component (if applicable):

kernel-2.6.25-0.218.rc8.git7.fc9.x86_64

How reproducible:

Happens overnight or if the system is left idle for a few hours.

Steps to Reproduce:
1. Leave idle
2. Verify operation
  
Actual results:

Desktop hangs eventually. Error message is recorded.

Expected results:

No hang.

Additional info:

Looks like it started just very recently. I'm pretty sure the
kernel-2.6.25-0.195.rc8.git1.fc9.x86_64 is ok (from reading
saved /var/log/messages).

I am unable to get save dmesg, but there's /var/log/messages.

Here's an excerpt (will attach a complete one):

Apr 15 19:37:43 niphredil kernel: BUG: soft lockup - CPU#0 stuck for 61s!
[swapper:0]
Apr 15 19:37:43 niphredil kernel: CPU 0:
Apr 15 19:37:43 niphredil kernel: Modules linked in: tun ipt_MASQUERADE
iptable_nat nf_nat bridge ib_iser rdma_cm ib_cm iw_cm ib_sa ib_mad ib_core
ib_addr iscsi_tcp libiscsi scsi_transport_iscsi nf_conntrack_netbios_ns
nf_conntrack_ipv4 xt_state nf_conntrack ipt_REJECT iptable_filter ip_tables
xt_tcpudp ip6t_REJECT ip6table_filter ip6_tables x_tables ipv6 cpufreq_ondemand
powernow_k8 freq_table kvm_amd kvm arc4 ecb crypto_blkcipher b43 rfkill
snd_usb_audio mac80211 snd_usb_lib snd_rawmidi cfg80211 input_polldev
snd_hda_intel snd_seq_dummy dcdbas snd_seq_oss snd_seq_midi_event snd_seq
snd_seq_device snd_pcm_oss snd_mixer_oss snd_pcm sdhci snd_timer joydev mmc_core
b44 ricoh_mmc snd_page_alloc snd_hwdep mii snd k8temp hwmon soundcore i2c_piix4
i2c_core ssb shpchp video sg output wmi battery ac button sr_mod cdrom
pata_atiixp dm_snapshot dm_zero dm_mirror dm_mod ahci libata sd_mod scsi_mod
ext3 jbd mbcache uhci_hcd ohci_hcd ehci_hcd [last unloaded: pcspkr]
Apr 15 19:37:43 niphredil kernel: Pid: 0, comm: swapper Not tainted
2.6.25-0.218.rc8.git7.fc9.x86_64 #1
Apr 15 19:37:43 niphredil kernel: RIP: 0010:[_spin_unlock_irqrestore+8/10] 
[_spin_unlock_irqrestore+8/10] _spin_unlock_irqrestore+0x8/0xa
Apr 15 19:37:43 niphredil kernel: RSP: 0018:ffffffff81455d98  EFLAGS: 00000293
Apr 15 19:37:43 niphredil kernel: RAX: 0000000000000000 RBX: ffffffff81455d98
RCX: ffffffff81455d98
Apr 15 19:37:43 niphredil kernel: RDX: 00001ec2439ee80e RSI: 0000000000000293
RDI: ffffffff81504220
Apr 15 19:37:43 niphredil kernel: RBP: ffffffff81455d28 R08: ffff8100010045b0
R09: 0000000000a68a32
Apr 15 19:37:43 niphredil kernel: R10: ffff81000100bf80 R11: ffffffff81455e98
R12: ffffffff810490f3
Apr 15 19:37:43 niphredil kernel: R13: ffffffff81455d18 R14: ffff8100010045b0
R15: 00000f9a74dc7969
Apr 15 19:37:43 niphredil kernel: FS:  00007f9bdbef1700(0000)
GS:ffffffff813f2000(0000) knlGS:00000000f7f24940
Apr 15 19:37:43 niphredil kernel: CS:  0010 DS: 0018 ES: 0018 CR0: 000000008005003b
Apr 15 19:37:43 niphredil kernel: CR2: 00007fe126652000 CR3: 0000000000201000
CR4: 00000000000006a0
Apr 15 19:37:43 niphredil kernel: DR0: 0000000000000000 DR1: 0000000000000000
DR2: 0000000000000000
Apr 15 19:37:43 niphredil kernel: DR3: 0000000000000000 DR6: 00000000ffff0ff0
DR7: 0000000000000400
Apr 15 19:37:43 niphredil kernel: 
Apr 15 19:37:43 niphredil kernel: Call Trace:
Apr 15 19:37:43 niphredil kernel:  [tick_broadcast_oneshot_control+230/239] ?
tick_broadcast_oneshot_control+0xe6/0xef
Apr 15 19:37:43 niphredil kernel:  [tick_notify+482/821] ? tick_notify+0x1e2/0x335
Apr 15 19:37:43 niphredil kernel:  [notifier_call_chain+51/91] ?
notifier_call_chain+0x33/0x5b
Apr 15 19:37:43 niphredil kernel:  [raw_notifier_call_chain+15/17] ?
raw_notifier_call_chain+0xf/0x11
Apr 15 19:37:43 niphredil kernel:  [clockevents_notify+43/92] ?
clockevents_notify+0x2b/0x5c
Apr 15 19:37:43 niphredil kernel:  [acpi_state_timer_broadcast+65/67] ?
acpi_state_timer_broadcast+0x41/0x43
Apr 15 19:37:43 niphredil kernel:  [acpi_idle_enter_bm+776/885] ?
acpi_idle_enter_bm+0x308/0x375
Apr 15 19:37:43 niphredil kernel:  [menu_select+111/143] ? menu_select+0x6f/0x8f
Apr 15 19:37:43 niphredil kernel:  [cpuidle_idle_call+134/186] ?
cpuidle_idle_call+0x86/0xba
Apr 15 19:37:43 niphredil kernel:  [cpuidle_idle_call+0/186] ?
cpuidle_idle_call+0x0/0xba
Apr 15 19:37:43 niphredil kernel:  [default_idle+0/95] ? default_idle+0x0/0x5f
Apr 15 19:37:43 niphredil kernel:  [cpu_idle+160/232] ? cpu_idle+0xa0/0xe8
Apr 15 19:37:43 niphredil kernel:  [rest_init+90/92] ? rest_init+0x5a/0x5c
Apr 15 19:37:43 niphredil kernel:

Comment 1 Pete Zaitcev 2008-04-17 16:01:13 UTC

Created attachment 302761 [details]
/var/log/messages (unedited)

Comment 2 Kevin Fenzi 2008-04-17 17:17:55 UTC

This looks similar to what I am seeing in my multicpu kvm guests now... 
so perhaps they are fine, and this is a more general problem?

See https://bugzilla.redhat.com/show_bug.cgi?id=438617

Comment 3 Pete Zaitcev 2008-04-17 18:48:03 UTC

In my case there's no KVM and/or Xen. Only the final trace in bug 438617
originated in an idle state, and there was no ACPI involved. I filed this
one because it looked different to me. It's easier to dup bugs than to
clone them anyway.

Comment 4 Chuck Ebbert 2008-04-27 03:32:36 UTC

davej reported what looks like the same thing in bug 444059 and there are some
similar reports for F8.

Can you try booting with 'processor.max_cstate=1'?

Comment 5 Bug Zapper 2008-05-14 09:35:18 UTC

Changing version to '9' as part of upcoming Fedora 9 GA.
More information and reason for this action is here:
http://fedoraproject.org/wiki/BugZappers/HouseKeeping

Comment 6 Dave Jones 2008-06-04 15:43:09 UTC

Pete, does that have an ATI chipset by any chance ?

If it's the same bug I saw, it's fixed by 'something' in .26-rc, but I've no
idea which changeset, as there's so many of them, and the bug takes a while to
reproduce, which makes bisecting difficult.

Comment 7 jamie levy 2008-06-08 03:52:13 UTC

Created attachment 308630 [details]
version, messages log and version information

I am seeing the same problem with the last kernel i386 Fedora 8.  The
attachment contains my kernel information and messages log.  I have had this
happen twice right after running Snort.

Comment 8 Solomon Peachy 2008-06-12 20:26:37 UTC

I've been seeing this bug for a while too, with both F8 and F9.  It's certainly
been there since 2.6.24, and is still present (though apparently not as bad)
with the 2.6.25.4-30.fc9.x86_64 kernel.  This machine (Ferrari 4000 laptop) has
an ATI chipset (RS480 aka 200M).  

I've never been able to recreate it with any reliability (beyond "it eventually
happens") but it seems to be more easily triggerable when the wireless card
(p54pci) has the RFKill switch on and the 802.11+ stack trying to scan/find
something in the background.

Still, your note that "something fixed it in 2.6.26-rc" is encouraging.

Comment 9 Faisal Malallah 2008-08-14 08:40:36 UTC

I confirm the same bug on F8 with the latest kernel. It is now happening too frequently; whenever I leave my computer idle overnight, I find it hanged in the morning with this message.

Comment 10 Bojan Smojver 2008-09-18 08:24:41 UTC

Created attachment 317045 [details]
System log of a similar problem

More or less the same as already reported. Something to do with BIND (i.e. named). Note that this was just after unsuccessful attempt to create and IPSec tunnel.

Comment 11 Bojan Smojver 2008-09-18 08:26:47 UTC

Please note, the attachment from comment #10 is from an i686 machine, so this is not just x86_64 specific. Kernel is: 2.6.26.3-29.fc9.i686.

Comment 12 Bojan Smojver 2008-09-25 01:26:42 UTC

Created attachment 317642 [details]
Output of lspci -vv and lspci -nn (Shuttle K45)

Comment 13 Chuck Ebbert 2008-09-29 17:34:30 UTC

(In reply to comment #10)
> Created an attachment (id=317045) [details]
> System log of the similar problem
> 
> More or less the same as already reported. Something to do with BIND (i.e.
> named). Note that this was just after unsuccessful attempt to create and IPSec
> tunnel.

That is not even close to being the same problem as the original report. The original was a lockup in the timer code, while this one is a lockup in the IPsec code.

Comment 14 Chuck Ebbert 2008-09-29 17:37:52 UTC

(In reply to comment #7)
> Created an attachment (id=308630) [details]
> version, messages log and version information
> 
> I am seeing the same problem with the last kernel i386 Fedora 8.  The
> attachment contains my kernel information and messages log.  I have had this
> happen twice right after running Snort.

Also not the "same problem". This is a lockup in the wireless code.

Comment 15 Chuck Ebbert 2008-09-29 17:39:31 UTC

Closing this bug. Anyone still having problems should open a separate bug report and attach information about their lockup to that.

Note You need to log in before you can comment on or make changes to this bug.