694818 – BUG: soft lockup - CPU#1 stuck for 67s! [while trying to free memory]

Bug 694818 - BUG: soft lockup - CPU#1 stuck for 67s! [while trying to free memory]

Summary: BUG: soft lockup - CPU#1 stuck for 67s! [while trying to free memory]

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Fedora
Classification:	Fedora
Component:	kernel
Sub Component:
Version:	15
Hardware:	x86_64
OS:	Linux
Priority:	unspecified
Severity:	medium
Target Milestone:	---
Assignee:	Kernel Maintainer List
QA Contact:	Fedora Extras Quality Assurance
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2011-04-08 14:17 UTC by Andy Lawrence
Modified:	2011-06-24 11:39 UTC (History)
CC List:	7 users (show)
Fixed In Version:
Clone Of:
Environment:
Last Closed:	2011-06-24 11:39:09 UTC
Type:	---
Embargoed:
Dependent Products:

Attachments	(Terms of Use)
dmesg dump (65.45 KB, text/x-log) 2011-04-08 14:17 UTC, Andy Lawrence	no flags	Details
messages dump (3.41 MB, text/x-log) 2011-04-08 14:18 UTC, Andy Lawrence	no flags	Details
View All

Description Andy Lawrence 2011-04-08 14:17:21 UTC

Description of problem:
Soft lockup occurring 5-10 per day, seems to happen more at heavier CPU loads.  The system somewhat responds but I must reboot to clear the fault.

This is a Sandy Bridge Laptop, see dmesg attached.

Version-Release number of selected component (if applicable):

Linux sandy 2.6.38.2-12.fc15.x86_64.debug #1 SMP Thu Apr 7 02:52:50 UTC 2011 x86_64 x86_64 x86_64 GNU/Linux, Running XFCE 4.6 Desktop, all updates as of 4/7/11.

How reproducible:
Fairly reproducible, happens many times per day.

I have a debug kernel loaded with symbols but I'm not 100% sure I am providing useful info, please ask for anything further.

BUG: soft lockup - CPU#1 stuck for 67s! [kswapd0:57]
Modules linked in: fuse thinkpad_acpi coretemp sunrpc cpufreq_ondemand acpi_cpufreq freq_table mperf ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 ip6table_filter ip6_tables snd_hda_codec_hdmi snd_hda_codec_conexant arc4 uvcvideo videodev v4l2_compat_ioctl32 microcode snd_hda_intel btusb bluetooth snd_hda_codec snd_hwdep snd_seq snd_seq_device joydev iwlagn snd_pcm i2c_i801 iwlcore mac80211 cfg80211 rfkill snd_timer iTCO_wdt snd soundcore iTCO_vendor_support snd_page_alloc e1000e wmi uinput ipv6 sdhci_pci sdhci mmc_core i915 drm_kms_helper drm i2c_algo_bit i2c_core video [last unloaded: scsi_wait_scan]
irq event stamp: 85594154
hardirqs last  enabled at (85594153): [<ffffffff814b1673>] __mutex_unlock_slowpath+0x112/0x122
hardirqs last disabled at (85594154): [<ffffffff814b32ea>] save_args+0x6a/0x70
softirqs last  enabled at (85593584): [<ffffffff8105e1d0>] __do_softirq+0x1c4/0x1da
softirqs last disabled at (85593579): [<ffffffff8100ab1c>] call_softirq+0x1c/0x30
CPU 1 
Modules linked in: fuse thinkpad_acpi coretemp sunrpc cpufreq_ondemand acpi_cpufreq freq_table mperf ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 ip6table_filter ip6_tables snd_hda_codec_hdmi snd_hda_codec_conexant arc4 uvcvideo videodev v4l2_compat_ioctl32 microcode snd_hda_intel btusb bluetooth snd_hda_codec snd_hwdep snd_seq snd_seq_device joydev iwlagn snd_pcm i2c_i801 iwlcore mac80211 cfg80211 rfkill snd_timer iTCO_wdt snd soundcore iTCO_vendor_support snd_page_alloc e1000e wmi uinput ipv6 sdhci_pci sdhci mmc_core i915 drm_kms_helper drm i2c_algo_bit i2c_core video [last unloaded: scsi_wait_scan]
Pid: 57, comm: kswapd0 Not tainted 2.6.38.2-12.fc15.x86_64.debug #1 LENOVO 4239CTO/4239CTO
RIP: 0010:[<ffffffff810764c6>]  [<ffffffff810764c6>] arch_local_irq_restore+0x6/0xd
RSP: 0018:ffff88003776bcb0  EFLAGS: 00000246
RAX: 00000000051a1029 RBX: ffff88003773a3c0 RCX: 000000000000cb07
RDX: 0000000000000080 RSI: ffff88003773aac0 RDI: 0000000000000246
RBP: ffff88003776bcb0 R08: ffffffff8160a7e0 R09: ffff88003776bc30
R10: ffffffff00000003 R11: ffff8800753d3ec0 R12: ffffffff8100a5ce
R13: 0000000000000080 R14: ffff88003776bc30 R15: ffffffff81085591
FS:  0000000000000000(0000) GS:ffff880075000000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 0000000006b64024 CR3: 0000000001a03000 CR4: 00000000000406e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process kswapd0 (pid: 57, threadinfo ffff88003776a000, task ffff88003773a3c0)
Stack:
 ffff88003776bce0 ffffffff814b167b ffff88003773aaf8 ffff880037363980
 0000000000000000 00000000000001de ffff88003776bcf0 ffffffff814b1691
 ffff88003776bd40 ffffffffa007a398 ffff88003776bd40 ffff880037363980
Call Trace:
 [<ffffffff814b167b>] __mutex_unlock_slowpath+0x11a/0x122
 [<ffffffff814b1691>] mutex_unlock+0xe/0x10
 [<ffffffffa007a398>] i915_gem_inactive_shrink+0x161/0x194 [i915]
 [<ffffffff810f4f56>] shrink_slab+0x6d/0x166
 [<ffffffff810f7c74>] kswapd+0x52e/0x79c
 [<ffffffff810f7746>] ? kswapd+0x0/0x79c
 [<ffffffff81073429>] kthread+0xa8/0xb0
 [<ffffffff8100aa24>] kernel_thread_helper+0x4/0x10
 [<ffffffff814b3454>] ? restore_args+0x0/0x30
 [<ffffffff81073381>] ? kthread+0x0/0xb0
 [<ffffffff8100aa20>] ? kernel_thread_helper+0x0/0x10
Code: 83 c4 38 89 c8 5b 41 5c 41 5d 41 5e 41 5f 5d c3 55 48 89 e5 0f 1f 44 00 00 bf fa ff ff ff e8 f7 fe ff ff 5d c3 55 48 89 e5 57 9d <0f> 1f 44 00 00 5d c3 55 48 89 e5 9c 58 0f 1f 44 00 00 48 89 c2 

I will upload the usual log files.

Happy to test kernel patches etc.

Comment 1 Andy Lawrence 2011-04-08 14:17:50 UTC

Created attachment 490805 [details]
dmesg dump

Comment 2 Andy Lawrence 2011-04-08 14:18:58 UTC

Created attachment 490806 [details]
messages dump

Comment 3 Andy Lawrence 2011-04-08 14:36:26 UTC

This is similar to https://bugzilla.redhat.com/show_bug.cgi?id=643661 but a different process name.

Comment 4 Chuck Ebbert 2011-04-14 05:02:12 UTC

The debug kernel has some heavy debugging options enabled, does this also happen with a non-debug kernel? Also, your system looks to be under heavy memory pressure.

Comment 5 Andy Lawrence 2011-04-14 12:21:16 UTC

Yeah, I loaded the debug kernel hoping to provide more useful output when the bug hit.

Funny you mention memory, this has not happened since I installed another 4G of memory.  Before it only had 2G (no swap either), but I just installed it 2 days ago so not much time with it.

Thanks

Comment 6 James Bottomley 2011-04-27 16:01:19 UTC

I keep seeing something similar in an attempted Fedora 15 Beta install.  I've also got a Lenovo Sandybridge laptop with 2GB of ram.

Best guess is that it's writeout related because anything that touches the filesystem hangs after it happens.  I can trigger it reliably by untarring a 90GB tar file of someone's home directory.

Whatever it is, it's also occurring with upstream, so I'll report it there

Comment 7 Andy Lawrence 2011-04-27 23:59:41 UTC

James, since I added that additional 4G of memory it has not happened once!  Still a bug but....

Please post the upstream tracker once you file.

Thanks

Comment 8 James Bottomley 2011-04-29 13:55:57 UTC

(In reply to comment #7)
> Please post the upstream tracker once you file.

I'm not sure what you mean by this.  The thread is going on here:

http://marc.info/?t=130392066000001

The current theory is that it's a bad interaction between the shrinkers and the cgroup memory controller.

One apparent workaround is just to disable the cgroup memory controller

Comment 9 Andy Lawrence 2011-04-29 15:34:36 UTC

(In reply to comment #8)
> (In reply to comment #7)
> > Please post the upstream tracker once you file.
> 
> I'm not sure what you mean by this.  The thread is going on here:
> 

You mentioned posting this upstream, I'm guessing at bugzilla.kernel.org?  If so please scroll to the top of this bug report and add it to the "External Tracker" box.

Thanks

Comment 10 llevet 2011-06-01 14:33:48 UTC

Hi,

Same append for me.

BUG: soft lockup - CPU#1 stuck for 67s! [kswapd0:35]

I'm on fedora 15 kernel : 2.6.38.6-27.fc15.x86_64 #1 SMP
My cpu is Intel(R) Core(TM) i3-2100T CPU @ 2.50GHz
MotherBoard : Gigabyte Technology Co., Ltd. H67N-USB3-B3/H67N-USB3-B3, BIOS F5 03/31/2011

I have 2GB of memory.My HDD is in ahci mode.

I can reprodure the bug by this :

dd if=/dev/zero of=/root/test.dat bs=50M count=50
dd if=/root/test.dat of=/dev/null
dd if=/dev/zero of=/root/test.dat bs=50M count=50
and lockup at during this last command.

The computer responding to ping, but enable to have access to anything.

I have try disable cpuspeed and add cgroup_disable=memory to kernel command with no success, the computer hang each time you solicit all the memory.

Thanks.

Ludo.

Comment 11 llevet 2011-06-07 13:22:39 UTC

Hi,

The bug is corrected on version 2.6.38.8


http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=f06590bd718ed950c98828e30ef93204028f3210

So, waiting for a new build of FC15's  kernel.


Thanks.

Ludo.

Comment 12 llevet 2011-06-19 21:17:22 UTC

Hi,

I'm just update kernel to 2.6.38.8-32.fc15.x86_64 but the is already a problem.
My computer don't hang now, but kswapd0 still take 100% off on core after my test :

3 times after boot :
dd if=/dev/zero of=/root/test.dat bs=50M count=50
dd if=/root/test.dat of=/dev/null

and the result : 

top - 23:13:27 up 12 min,  1 user,  load average: 1.01, 0.99, 0.62
Tasks: 106 total,   2 running, 104 sleeping,   0 stopped,   0 zombie
Cpu(s):  0.0%us, 25.2%sy,  0.0%ni, 74.8%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Mem:   1937388k total,  1093916k used,   843472k free,     7596k buffers
Swap:  3899388k total,        0k used,  3899388k free,   906172k cached

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
   35 root      20   0     0    0    0 R 99.8  0.0  10:17.97 kswapd0
    1 root      20   0 37076 4268 1496 S  0.0  0.2   0:00.93 systemd
    2 root      20   0     0    0    0 S  0.0  0.0   0:00.00 kthreadd
    3 root      20   0     0    0    0 S  0.0  0.0   0:00.00 ksoftirqd/0
    6 root      RT   0     0    0    0 S  0.0  0.0   0:00.00 migration/0
...

You can see : uptime 12 min and cpu for kswapd0 10 min and don't stop ...

Thanks.

Ludo.

Comment 13 Chuck Ebbert 2011-06-24 11:39:09 UTC

(In reply to comment #12)
> Hi,
> 
> I'm just update kernel to 2.6.38.8-32.fc15.x86_64 but the is already a problem.
> My computer don't hang now, but kswapd0 still take 100% off on core after my
> test :

That is bug 712019 .  Since the computer no longer hangs with softlockup messages, which is what the original report was about, I'll close this bug.

Note You need to log in before you can comment on or make changes to this bug.