Description of problem: Soft lockup occurring 5-10 per day, seems to happen more at heavier CPU loads. The system somewhat responds but I must reboot to clear the fault. This is a Sandy Bridge Laptop, see dmesg attached. Version-Release number of selected component (if applicable): Linux sandy 2.6.38.2-12.fc15.x86_64.debug #1 SMP Thu Apr 7 02:52:50 UTC 2011 x86_64 x86_64 x86_64 GNU/Linux, Running XFCE 4.6 Desktop, all updates as of 4/7/11. How reproducible: Fairly reproducible, happens many times per day. I have a debug kernel loaded with symbols but I'm not 100% sure I am providing useful info, please ask for anything further. BUG: soft lockup - CPU#1 stuck for 67s! [kswapd0:57] Modules linked in: fuse thinkpad_acpi coretemp sunrpc cpufreq_ondemand acpi_cpufreq freq_table mperf ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 ip6table_filter ip6_tables snd_hda_codec_hdmi snd_hda_codec_conexant arc4 uvcvideo videodev v4l2_compat_ioctl32 microcode snd_hda_intel btusb bluetooth snd_hda_codec snd_hwdep snd_seq snd_seq_device joydev iwlagn snd_pcm i2c_i801 iwlcore mac80211 cfg80211 rfkill snd_timer iTCO_wdt snd soundcore iTCO_vendor_support snd_page_alloc e1000e wmi uinput ipv6 sdhci_pci sdhci mmc_core i915 drm_kms_helper drm i2c_algo_bit i2c_core video [last unloaded: scsi_wait_scan] irq event stamp: 85594154 hardirqs last enabled at (85594153): [<ffffffff814b1673>] __mutex_unlock_slowpath+0x112/0x122 hardirqs last disabled at (85594154): [<ffffffff814b32ea>] save_args+0x6a/0x70 softirqs last enabled at (85593584): [<ffffffff8105e1d0>] __do_softirq+0x1c4/0x1da softirqs last disabled at (85593579): [<ffffffff8100ab1c>] call_softirq+0x1c/0x30 CPU 1 Modules linked in: fuse thinkpad_acpi coretemp sunrpc cpufreq_ondemand acpi_cpufreq freq_table mperf ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 ip6table_filter ip6_tables snd_hda_codec_hdmi snd_hda_codec_conexant arc4 uvcvideo videodev v4l2_compat_ioctl32 microcode snd_hda_intel btusb bluetooth snd_hda_codec snd_hwdep snd_seq snd_seq_device joydev iwlagn snd_pcm i2c_i801 iwlcore mac80211 cfg80211 rfkill snd_timer iTCO_wdt snd soundcore iTCO_vendor_support snd_page_alloc e1000e wmi uinput ipv6 sdhci_pci sdhci mmc_core i915 drm_kms_helper drm i2c_algo_bit i2c_core video [last unloaded: scsi_wait_scan] Pid: 57, comm: kswapd0 Not tainted 2.6.38.2-12.fc15.x86_64.debug #1 LENOVO 4239CTO/4239CTO RIP: 0010:[<ffffffff810764c6>] [<ffffffff810764c6>] arch_local_irq_restore+0x6/0xd RSP: 0018:ffff88003776bcb0 EFLAGS: 00000246 RAX: 00000000051a1029 RBX: ffff88003773a3c0 RCX: 000000000000cb07 RDX: 0000000000000080 RSI: ffff88003773aac0 RDI: 0000000000000246 RBP: ffff88003776bcb0 R08: ffffffff8160a7e0 R09: ffff88003776bc30 R10: ffffffff00000003 R11: ffff8800753d3ec0 R12: ffffffff8100a5ce R13: 0000000000000080 R14: ffff88003776bc30 R15: ffffffff81085591 FS: 0000000000000000(0000) GS:ffff880075000000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b CR2: 0000000006b64024 CR3: 0000000001a03000 CR4: 00000000000406e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 Process kswapd0 (pid: 57, threadinfo ffff88003776a000, task ffff88003773a3c0) Stack: ffff88003776bce0 ffffffff814b167b ffff88003773aaf8 ffff880037363980 0000000000000000 00000000000001de ffff88003776bcf0 ffffffff814b1691 ffff88003776bd40 ffffffffa007a398 ffff88003776bd40 ffff880037363980 Call Trace: [<ffffffff814b167b>] __mutex_unlock_slowpath+0x11a/0x122 [<ffffffff814b1691>] mutex_unlock+0xe/0x10 [<ffffffffa007a398>] i915_gem_inactive_shrink+0x161/0x194 [i915] [<ffffffff810f4f56>] shrink_slab+0x6d/0x166 [<ffffffff810f7c74>] kswapd+0x52e/0x79c [<ffffffff810f7746>] ? kswapd+0x0/0x79c [<ffffffff81073429>] kthread+0xa8/0xb0 [<ffffffff8100aa24>] kernel_thread_helper+0x4/0x10 [<ffffffff814b3454>] ? restore_args+0x0/0x30 [<ffffffff81073381>] ? kthread+0x0/0xb0 [<ffffffff8100aa20>] ? kernel_thread_helper+0x0/0x10 Code: 83 c4 38 89 c8 5b 41 5c 41 5d 41 5e 41 5f 5d c3 55 48 89 e5 0f 1f 44 00 00 bf fa ff ff ff e8 f7 fe ff ff 5d c3 55 48 89 e5 57 9d <0f> 1f 44 00 00 5d c3 55 48 89 e5 9c 58 0f 1f 44 00 00 48 89 c2 I will upload the usual log files. Happy to test kernel patches etc.
Created attachment 490805 [details] dmesg dump
Created attachment 490806 [details] messages dump
This is similar to https://bugzilla.redhat.com/show_bug.cgi?id=643661 but a different process name.
The debug kernel has some heavy debugging options enabled, does this also happen with a non-debug kernel? Also, your system looks to be under heavy memory pressure.
Yeah, I loaded the debug kernel hoping to provide more useful output when the bug hit. Funny you mention memory, this has not happened since I installed another 4G of memory. Before it only had 2G (no swap either), but I just installed it 2 days ago so not much time with it. Thanks
I keep seeing something similar in an attempted Fedora 15 Beta install. I've also got a Lenovo Sandybridge laptop with 2GB of ram. Best guess is that it's writeout related because anything that touches the filesystem hangs after it happens. I can trigger it reliably by untarring a 90GB tar file of someone's home directory. Whatever it is, it's also occurring with upstream, so I'll report it there
James, since I added that additional 4G of memory it has not happened once! Still a bug but.... Please post the upstream tracker once you file. Thanks
(In reply to comment #7) > Please post the upstream tracker once you file. I'm not sure what you mean by this. The thread is going on here: http://marc.info/?t=130392066000001 The current theory is that it's a bad interaction between the shrinkers and the cgroup memory controller. One apparent workaround is just to disable the cgroup memory controller
(In reply to comment #8) > (In reply to comment #7) > > Please post the upstream tracker once you file. > > I'm not sure what you mean by this. The thread is going on here: > You mentioned posting this upstream, I'm guessing at bugzilla.kernel.org? If so please scroll to the top of this bug report and add it to the "External Tracker" box. Thanks
Hi, Same append for me. BUG: soft lockup - CPU#1 stuck for 67s! [kswapd0:35] I'm on fedora 15 kernel : 2.6.38.6-27.fc15.x86_64 #1 SMP My cpu is Intel(R) Core(TM) i3-2100T CPU @ 2.50GHz MotherBoard : Gigabyte Technology Co., Ltd. H67N-USB3-B3/H67N-USB3-B3, BIOS F5 03/31/2011 I have 2GB of memory.My HDD is in ahci mode. I can reprodure the bug by this : dd if=/dev/zero of=/root/test.dat bs=50M count=50 dd if=/root/test.dat of=/dev/null dd if=/dev/zero of=/root/test.dat bs=50M count=50 and lockup at during this last command. The computer responding to ping, but enable to have access to anything. I have try disable cpuspeed and add cgroup_disable=memory to kernel command with no success, the computer hang each time you solicit all the memory. Thanks. Ludo.
Hi, The bug is corrected on version 2.6.38.8 http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=f06590bd718ed950c98828e30ef93204028f3210 So, waiting for a new build of FC15's kernel. Thanks. Ludo.
Hi, I'm just update kernel to 2.6.38.8-32.fc15.x86_64 but the is already a problem. My computer don't hang now, but kswapd0 still take 100% off on core after my test : 3 times after boot : dd if=/dev/zero of=/root/test.dat bs=50M count=50 dd if=/root/test.dat of=/dev/null and the result : top - 23:13:27 up 12 min, 1 user, load average: 1.01, 0.99, 0.62 Tasks: 106 total, 2 running, 104 sleeping, 0 stopped, 0 zombie Cpu(s): 0.0%us, 25.2%sy, 0.0%ni, 74.8%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st Mem: 1937388k total, 1093916k used, 843472k free, 7596k buffers Swap: 3899388k total, 0k used, 3899388k free, 906172k cached PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 35 root 20 0 0 0 0 R 99.8 0.0 10:17.97 kswapd0 1 root 20 0 37076 4268 1496 S 0.0 0.2 0:00.93 systemd 2 root 20 0 0 0 0 S 0.0 0.0 0:00.00 kthreadd 3 root 20 0 0 0 0 S 0.0 0.0 0:00.00 ksoftirqd/0 6 root RT 0 0 0 0 S 0.0 0.0 0:00.00 migration/0 ... You can see : uptime 12 min and cpu for kswapd0 10 min and don't stop ... Thanks. Ludo.
(In reply to comment #12) > Hi, > > I'm just update kernel to 2.6.38.8-32.fc15.x86_64 but the is already a problem. > My computer don't hang now, but kswapd0 still take 100% off on core after my > test : That is bug 712019 . Since the computer no longer hangs with softlockup messages, which is what the original report was about, I'll close this bug.