995982 – [abrt] BUG: soft lockup - CPU#7 stuck for 22s! [umount:969]

Bug 995982 - [abrt] BUG: soft lockup - CPU#7 stuck for 22s! [umount:969] [NEEDINFO]

Summary: [abrt] BUG: soft lockup - CPU#7 stuck for 22s! [umount:969]

Keywords:
Status:	CLOSED INSUFFICIENT_DATA
Alias:	None
Product:	Fedora
Classification:	Fedora
Component:	kernel
Sub Component:
Version:	19
Hardware:	x86_64
OS:	Unspecified
Priority:	unspecified
Severity:	unspecified
Target Milestone:	---
Assignee:	Kernel Maintainer List
QA Contact:	Fedora Extras Quality Assurance
Docs Contact:
URL:
Whiteboard:	abrt_hash:317f2eb9bb7646e7ef0b3ec29b5...
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2013-08-12 06:31 UTC by Stefan Schwarzer
Modified:	2014-03-10 14:44 UTC (History)
CC List:	7 users (show)
Fixed In Version:
Doc Type:	Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed:	2014-03-10 14:44:29 UTC
Type:	---
Embargoed:
Dependent Products:
Flags:	jforbes: needinfo?

Attachments	(Terms of Use)
File: dmesg (69.67 KB, text/plain) 2013-08-12 06:31 UTC, Stefan Schwarzer	no flags	Details
Log of the lockup messages on 2013-08-12 (55.58 KB, application/gzip) 2013-08-12 06:38 UTC, Stefan Schwarzer	no flags	Details
View All

Description Stefan Schwarzer 2013-08-12 06:31:47 UTC

Description of problem:
Maybe this bug report should be classified differently than abrt is going to suggest. Please feel free to re-classify it.

It happens about once every few weeks that my PC gets stuck when trying to unmount an external USB disk after the nightly backup. "Every few weeks" means a month or maybe only one week passes. When a lockup happens, I can't use the machine anymore (including not being able to log in via ssh) and have to switch the machine off and on. Every time this lockup happens, when I get to the machine, its fan(s) is/are running louder as if under a high CPU load (which might actually be the case and would be in line with the "soft lockup" messages).

This is the first time I was notified about soft lockups by abrt and in /var/log/messages. I'll attach the section from /var/log/messages which has lots of "soft lockup" messages which occurred this morning.

Some more info on the backup problem: After having activated the MARK facility in syslog recently, I noticed that the lockups happen when my backup script tries to unmount the external disk after the backup. Using sync in the shell script has the same effect. I know the time the backup finished by the log message "/usr/bin/rsnapshot -v daily: completed successfully". I added a sleep command between the time the backup finishes and the unmount invocation. But even increasing this time to 2 or 3 hours still gives me occasional lockups. (I think setting the time to 2 hours helped a bit, but it's difficult to tell. I increased only recently to 3 hours, so can't tell anything about the effectiveness.) Two recent lockups happened with Fedora 19, but if I remember correctly, I had this problem also with Ubuntu.

There are two disks which are used alternatingly by the backup script, both are Western Digital My Book 1.5 TB devices. The output of lsusb is:
Bus 002 Device 004: ID 1058:1123 Western Digital Technologies, Inc. My Book 3.0
Bus 002 Device 003: ID 1058:1123 Western Digital Technologies, Inc. My Book 3.0

In another bug report, https://bugzilla.redhat.com/show_bug.cgi?id=951089 , there's a mention of SMART problems, but for both of my disks smartctl tells me "SMART overall-health self-assessment test result: PASSED" and for the short test "The previous self-test routine completed without error". That said, the disks may be a few years old by now, and if you think it's likely to help, I'll replace them. (That said, I think even if the disks were faulty, the kernel shouldn't lock up.)

Please let me know if you need more information.

By the way, even if there's no real fix available, it would already help me a lot if the system rebooted in such a lockup situation. Then I could at least again log into the system remotely after it's rebooted.

Additional info:
reporter:       libreport-2.1.6
BUG: soft lockup - CPU#7 stuck for 22s! [umount:969]
Modules linked in: fuse bnep bluetooth rfkill nf_conntrack_netbios_ns nf_conntrack_broadcast ipt_MASQUERADE ip6table_nat nf_nat_ipv6 ip6table_mangle ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 iptable_nat nf_nat_ipv4 nf_nat iptable_mangle nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack ebtable_filter ebtables ip6table_filter ip6_tables snd_hda_codec_hdmi snd_hda_codec_realtek acpi_cpufreq dcdbas mperf snd_hda_intel snd_hda_codec iTCO_wdt coretemp kvm_intel iTCO_vendor_support snd_ctxfi snd_hwdep snd_seq snd_seq_device broadcom snd_pcm kvm crc32c_intel snd_page_alloc joydev tg3 snd_timer snd soundcore i7core_edac i2c_i801 edac_core mei_me ptp pps_core microcode mei lpc_ich mfd_core serio_raw usblp uinput radeon i2c_algo_bit drm_kms_helper ttm drm usb_storage i2c_core
CPU: 7 PID: 969 Comm: umount Not tainted 3.10.5-201.fc19.x86_64 #1
Hardware name: Dell Inc. Vostro 430/054KM3, BIOS 2.4.0 01/13/2011
task: ffff880041566320 ti: ffff8801028be000 task.ti: ffff8801028be000
RIP: 0010:[<ffffffff8163ef23>]  [<ffffffff8163ef23>] _raw_spin_lock+0x13/0x30
RSP: 0018:ffff8801028bfe18  EFLAGS: 00000206
RAX: 0000000000005050 RBX: ffffffff81138e2b RCX: 0000000000000002
RDX: 000000000000004f RSI: ffff88010b016a38 RDI: ffff88010b016678
RBP: ffff8801028bfe18 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000100 R11: 0000000000000005 R12: 0000000000000000
R13: ffffffff812c5ad4 R14: ffff8801028bfd88 R15: 0000000000000000
FS:  00007f06cb431880(0000) GS:ffff8801bfdc0000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 00007f5ae259d000 CR3: 000000014e432000 CR4: 00000000000007e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Stack:
 ffff8801028bfe60 ffffffff811d6edb ffff8801643bf800 ffff8801643bf8a0
 ffff8801643bf800 ffff8801643bf8a0 ffffffff81825320 ffff88016e432c40
 ffff88016e432c20 ffff8801028bfe88 ffffffff81199ae0 ffff8801b53c09c0
Call Trace:
 [<ffffffff811d6edb>] fsnotify_unmount_inodes+0x8b/0x1c0
 [<ffffffff81199ae0>] generic_shutdown_super+0x40/0xd0
 [<ffffffff81199d87>] kill_block_super+0x27/0x70
 [<ffffffff8119a0fd>] deactivate_locked_super+0x3d/0x60
 [<ffffffff8119a166>] deactivate_super+0x46/0x60
 [<ffffffff811b5075>] mntput_no_expire+0xc5/0x120
 [<ffffffff811b5e51>] SyS_umount+0x91/0x3a0
 [<ffffffff81647559>] system_call_fastpath+0x16/0x1b
Code: 75 e6 b8 01 00 00 00 5d c3 66 66 66 66 66 66 2e 0f 1f 84 00 00 00 00 00 66 66 66 66 90 55 48 89 e5 b8 00 01 00 00 f0 66 0f c1 07 <0f> b6 d4 38 c2 74 0f 66 0f 1f 44 00 00 f3 90 0f b6 07 38 d0 75

Comment 1 Stefan Schwarzer 2013-08-12 06:31:54 UTC

Created attachment 785565 [details]
File: dmesg

Comment 2 Stefan Schwarzer 2013-08-12 06:38:17 UTC

Created attachment 785566 [details]
Log of the lockup messages on 2013-08-12

It seems that abrt didn't attach the log of the lockups I added in the UI.

Comment 3 Josh Boyer 2013-09-18 20:28:06 UTC

*********** MASS BUG UPDATE **************

We apologize for the inconvenience.  There is a large number of bugs to go through and several of them have gone stale.  Due to this, we are doing a mass bug update across all of the Fedora 19 kernel bugs.

Fedora 19 has now been rebased to 3.11.1-200.fc19.  Please test this kernel update and let us know if you issue has been resolved or if it is still present with the newer kernel.

If you experience different issues, please open a new bug report for those.

Comment 4 Stefan Schwarzer 2013-10-04 05:18:50 UTC

Unfortunately, the new kernel didn't solve my problem. I got a lockup again.

Comment 5 Seb L. 2013-11-22 10:20:49 UTC

I just got the exact same crash with the exact same call trace while attempting to unmount a few filesystems on Fedora 18, kernel version 3.11.4-101.fc18.x86_64:

Nov 21 20:13:22 siws kernel: [709332.494607] CPU: 6 PID: 21471 Comm: umount Tainted: PF        C O 3.11.4-101.fc18.x86_64 #1
Nov 21 20:13:22 siws kernel: [709332.494610] Hardware name: System manufacturer System Product Name/P6T6 WS REVOLUTION, BIOS 0905    12/24/2010
Nov 21 20:13:22 siws kernel: [709332.494613] task: ffff88034c3d3560 ti: ffff8801f414e000 task.ti: ffff8801f414e000
Nov 21 20:13:22 siws kernel: [709332.494615] RIP: 0010:[<ffffffff8166c619>]  [<ffffffff8166c619>] _raw_spin_lock+0x29/0x30
Nov 21 20:13:22 siws kernel: [709332.494623] RSP: 0018:ffff8801f414fde8  EFLAGS: 00000246
Nov 21 20:13:22 siws kernel: [709332.494625] RAX: 0000000000006b6b RBX: ffff8801f414fd58 RCX: 0000000000000002
Nov 21 20:13:22 siws kernel: [709332.494627] RDX: 000000000000006b RSI: ffff88013b1248a8 RDI: ffff88013b125490
Nov 21 20:13:22 siws kernel: [709332.494629] RBP: ffff8801f414fde8 R08: 0000000000000000 R09: 0000000000000000
Nov 21 20:13:22 siws kernel: [709332.494631] R10: 0000000000000000 R11: 0000000002000000 R12: ffff8801f414fd48
Nov 21 20:13:22 siws kernel: [709332.494633] R13: ffff8801f414fd48 R14: 0000000091827364 R15: ffff8801f414fd68
Nov 21 20:13:22 siws kernel: [709332.494636] FS:  00007f31b5597840(0000) GS:ffff88063fcc0000(0000) knlGS:0000000000000000
Nov 21 20:13:22 siws kernel: [709332.494639] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
Nov 21 20:13:22 siws kernel: [709332.494641] CR2: 00007f31b51930e0 CR3: 00000002c982c000 CR4: 00000000000007e0
Nov 21 20:13:22 siws kernel: [709332.494643] Stack:
Nov 21 20:13:22 siws kernel: [709332.494644]  ffff8801f414fe38 ffffffff811ef4db ffff880617325c00 ffff880617325ca0
Nov 21 20:13:22 siws kernel: [709332.494649]  ffff8801f414fe18 ffff880617325c00 ffff880617325ca0 ffffffff81828e00
Nov 21 20:13:22 siws kernel: [709332.494653]  ffff8805b7412640 ffff8805b7412620 ffff8801f414fe68 ffffffff811b17eb
Nov 21 20:13:22 siws kernel: [709332.494656] Call Trace:
Nov 21 20:13:22 siws kernel: [709332.494664]  [<ffffffff811ef4db>] fsnotify_unmount_inodes+0x8b/0x1c0
Nov 21 20:13:22 siws kernel: [709332.494669]  [<ffffffff811b17eb>] generic_shutdown_super+0x4b/0xe0
Nov 21 20:13:22 siws kernel: [709332.494674]  [<ffffffff811cc69d>] ? free_vfsmnt+0x3d/0x50
Nov 21 20:13:22 siws kernel: [709332.494677]  [<ffffffff811b18b0>] kill_block_super+0x30/0x80
Nov 21 20:13:22 siws kernel: [709332.494681]  [<ffffffff811b216d>] deactivate_locked_super+0x4d/0x80
Nov 21 20:13:22 siws kernel: [709332.494684]  [<ffffffff811b277e>] deactivate_super+0x4e/0x70
Nov 21 20:13:22 siws kernel: [709332.494688]  [<ffffffff811ccfd7>] mntput_no_expire+0xd7/0x130
Nov 21 20:13:22 siws kernel: [709332.494691]  [<ffffffff811ce0dc>] SyS_umount+0x9c/0x3a0
Nov 21 20:13:22 siws kernel: [709332.494696]  [<ffffffff81675299>] system_call_fastpath+0x16/0x1b
Nov 21 20:13:22 siws kernel: [709332.494698] Code: 90 90 66 66 66 66 90 55 48 89 e5 b8 00 01 00 00 f0 66 0f c1 07 0f b6 d4 38 c2 74 0f 66 0f 1f 44 00 00 f3 90 0f b6 07 38 d0 75 f7 <5d> c3 0f 1f 44 00 00 66 66 66 66 90 55 48 89 e5 80 07 01 48 89

Comment 6 Justin M. Forbes 2014-01-03 22:06:43 UTC

*********** MASS BUG UPDATE **************

We apologize for the inconvenience.  There is a large number of bugs to go through and several of them have gone stale.  Due to this, we are doing a mass bug update across all of the Fedora 19 kernel bugs.

Fedora 19 has now been rebased to 3.12.6-200.fc19.  Please test this kernel update (or newer) and let us know if you issue has been resolved or if it is still present with the newer kernel.

If you have moved on to Fedora 20, and are still experiencing this issue, please change the version to Fedora 20.

If you experience different issues, please open a new bug report for those.

Comment 7 Justin M. Forbes 2014-03-10 14:44:29 UTC

*********** MASS BUG UPDATE **************

This bug has been in a needinfo state for more than 1 month and is being closed with insufficient data due to inactivity. If this is still an issue with Fedora 19, please feel free to reopen the bug and provide the additional information requested.

Note You need to log in before you can comment on or make changes to this bug.