Bug 1096572

Summary:

[abrt] BUG: soft lockup - CPU#6 stuck for 23s! [systemd-udevd:9995]

Product:

[Fedora] Fedora

Reporter:

Chuck Forsberg <caf>

Component:

kernel

Assignee:

Kernel Maintainer List <kernel-maint>

Status:

CLOSED RAWHIDE

QA Contact:

Fedora Extras Quality Assurance <extras-qa>

Severity:

unspecified

Docs Contact:

Priority:

unspecified

Version:

rawhide

CC:

aviro, elad, fredex, gansalmon, itamar, jonathan, kernel-maint, madhu.chinakonda, mchehab, twohotis

Target Milestone:

---

Target Release:

---

Hardware:

x86_64

OS:

Unspecified

URL:

https://retrace.fedoraproject.org/faf/reports/bthash/f8b0a799a4b341f461ad73e5a05a311c4858bb23

Whiteboard:

abrt_hash:cfe0d94510c04f964d605d77fec6769fc27958d5

Fixed In Version:

Doc Type:

Bug Fix

Doc Text:

Story Points:

---

Clone Of:

Environment:

Last Closed:

2014-11-06 00:03:47 UTC

Type:

---

Regression:

---

Mount Type:

---

Documentation:

---

CRM:

Verified Versions:

Category:

---

oVirt Team:

---

RHEL 7.3 requirements from Atomic Host:

Cloudforms Team:

---

Target Upstream Version:

Embargoed:

Attachments:

Description	Flags
File: dmesg	none
The .config for the kernel	none

Description Chuck Forsberg 2014-05-12 04:17:19 UTC

Additional info:
reporter:       libreport-2.2.2
BUG: soft lockup - CPU#6 stuck for 23s! [systemd-udevd:9995]
Modules linked in: des_generic md4 nls_utf8 cifs rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd fscache bnep bluetooth fuse xt_CHECKSUM ipt_MASQUERADE ip6t_rpfilter ip6t_REJECT xt_conntrack cfg80211 ebtable_nat ebtable_broute bridge stp llc ebtable_filter ebtables ip6table_nat nf_conntrack_ipv6 nf_defrag_ipv6 nf_nat_ipv6 ip6table_mangle ip6table_security ip6table_raw ip6table_filter ip6_tables iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack iptable_mangle iptable_security iptable_raw s5h1411 snd_virtuoso snd_oxygen_lib snd_hda_codec_hdmi snd_hda_codec_realtek snd_hda_codec_generic snd_hda_intel snd_hda_controller snd_hda_codec snd_mpu401_uart eeepc_wmi cx25840 asus_wmi cx23885 btcx_risc sparse_keymap rfkill iTCO_wdt altera_ci videobuf_dvb tda18271 iTCO_vendor_support altera_stapl snd_usb_audio tveeprom cx2341x snd_usbmidi_lib videobuf_dma_sg videobuf_core dvb_core snd_rawmidi snd_hwdep snd_seq snd_seq_device snd_pcm x86_pkg_temp_thermal rc_core coretemp v4l2_common r8169 videodev ums_realtek kvm_intel uas usb_storage usblp ftdi_sio snd_timer media mii joydev mei_me lpc_ich mfd_core snd i2c_i801 shpchp serio_raw mei soundcore kvm crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel microcode binfmt_misc sunrpc nouveau mxm_wmi i2c_algo_bit drm_kms_helper ttm drm i2c_core wmi video
CPU: 6 PID: 9995 Comm: systemd-udevd Not tainted 3.15.0-0.rc5.git0.1.fc21.x86_64 #1
Hardware name: System manufacturer System Product Name/P8Z77-V LE PLUS, BIOS 0908 12/10/2013
task: ffff8807c0e3ce80 ti: ffff8807c148c000 task.ti: ffff8807c148c000
RIP: 0010:[<ffffffff81206e82>]  [<ffffffff81206e82>] dentry_kill+0x22/0x2b0
RSP: 0018:ffff8807c148dba8  EFLAGS: 00000246
RAX: 00000000004000c4 RBX: fefefefefefefeff RCX: dead000000200200
RDX: ffff8807e2922c80 RSI: 0000000000000000 RDI: ffff8807e2922840
RBP: ffff8807c148dbc8 R08: ffff8807e29228c0 R09: 8080808080808080
R10: fefefefefefefeff R11: ffff880751bdf840 R12: ffff880751bdfb40
R13: ffff8807c148db98 R14: ffff8807c148db30 R15: ffffffff810bbfa7
FS:  00007fe8074d0880(0000) GS:ffff88081ed80000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007fe807504000 CR3: 000000079a2ea000 CR4: 00000000001407e0
Stack:
 ffff8807e29228c0 ffff8807e2922780 ffff8807c148dc10 ffff8807e2922840
 ffff8807c148dbf8 ffffffff8120734b ffff8807c148dc10 ffff880751bdf840
 0000000000000024 ffff8807c148de50 ffff8807c148dc48 ffffffff8120767c
Call Trace:
 [<ffffffff8120734b>] shrink_dentry_list+0x7b/0x120
 [<ffffffff8120767c>] check_submounts_and_drop+0x7c/0xb0
 [<ffffffff8126a96d>] kernfs_dop_revalidate+0x5d/0xd0
 [<ffffffff811f9f06>] lookup_fast+0x276/0x2f0
 [<ffffffff812f043c>] ? security_inode_permission+0x1c/0x30
 [<ffffffff811fb057>] link_path_walk+0x1d7/0xec0
 [<ffffffff8120fc24>] ? mntput+0x24/0x40
 [<ffffffff811fbe4e>] ? path_lookupat+0x10e/0xd70
 [<ffffffff811fa716>] ? getname_flags+0x56/0x1b0
 [<ffffffff811fbda7>] path_lookupat+0x67/0xd70
 [<ffffffff811fa692>] ? final_putname+0x22/0x50
 [<ffffffff81200f32>] ? user_path_at_empty+0x72/0xd0
 [<ffffffff811d2225>] ? kmem_cache_alloc+0x35/0x1f0
 [<ffffffff811fa716>] ? getname_flags+0x56/0x1b0
 [<ffffffff811fcada>] filename_lookup+0x2a/0xd0
 [<ffffffff81200f27>] user_path_at_empty+0x67/0xd0
 [<ffffffff811f492b>] SyS_readlink+0x5b/0x130
 [<ffffffff817119e9>] system_call_fastpath+0x16/0x1b
Code: 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 55 48 89 e5 41 56 41 55 41 54 53 48 89 fb 8b 07 f6 c4 80 0f 85 1a 02 00 00 4c 8b 6f 30 <41> 89 f6 4d 85 ed 74 2e 49 8d bd 88 00 00 00 e8 4a 13 50 00 85

Comment 1 Chuck Forsberg 2014-05-12 04:17:24 UTC

Created attachment 894541 [details]
File: dmesg

Comment 2 Alexander Viro 2014-05-13 09:04:28 UTC

Reproducer would be nice.  It *might* be somebody managing to hog ->i_lock for obscenely long, but even then preempt would've kicked that sucker off CPU eventually - each pass through the loop in shrink_dentry_list() starts with no spinlocks held.  And I would really like to see how had that been triggered, preempt or no preempt - ability to create that much work for shrink_dentry_list() is really bad, especially if we have serious ->i_lock contention somehow.

What's the .config of that kernel, BTW?

Comment 3 Josh Boyer 2014-05-13 12:26:34 UTC

*** Bug 1097096 has been marked as a duplicate of this bug. ***

Comment 4 Josh Boyer 2014-05-13 12:33:13 UTC

Created attachment 895118 [details]
The .config for the kernel

.config attached.  It's just a stock Fedora rawhide kernel.  I have no idea what a reproducer would be.  Hopefully the reporter can fill that in.

Comment 5 Josh Boyer 2014-05-20 12:21:32 UTC

*** Bug 1099465 has been marked as a duplicate of this bug. ***

Comment 6 Josh Boyer 2014-05-23 22:48:42 UTC

*** Bug 1100910 has been marked as a duplicate of this bug. ***

Comment 7 Onyeibo Oku 2014-05-27 05:40:51 UTC

This bug is real and is messing with my ASUS N56VJ laptop badly.  It happens when I disconnect my Android phone (it has SD cardb. I'd be happy to provide logs if needed

Comment 8 Josh Boyer 2014-05-29 11:18:41 UTC

*** Bug 1102452 has been marked as a duplicate of this bug. ***

Comment 9 fred smith 2014-05-29 14:41:35 UTC

I reported 1102452, which I believe is reproducible. here's what I was doing when it failed:

Using an external USB3.0 storage device (http://www.newegg.com/Product/Product.aspx?Item=N82E16817332028) with the USB 3.0 cable it came with, find a file or directory tree of several gigabytes, copy it to the device, repeatedly, removing the file after each copy. It takes only 3 or 4 (or 5) copies for things to suddenly go bad. The device suddenly is no longer available, is no longer mounted, and at least sometimes the /dev entry is gone. shortly thereafter the messages about the CPU being stuck start showing up and I get the abrt alarm offering to report the bug.

I have no other USB 3.0 storage devices to test with.

This testing is on an Asus motherboard (http://www.newegg.com/Product/Product.aspx?Item=N82E16813131874) and AMD CPU (http://www.newegg.com/Product/Product.aspx?Item=N82E16819113286) with the latest (as of about a week ago) BIOS.

this storage device APPEARS to work OK with a USB 2.0 cable, but I may simply have not beaten on it hard or long enough. (NOTE that it works fine using esata.) In case it makes any difference, it is configured as RAID-1 with 2 1TB drives. I've not tried it in any other configuration.

I tested with the nightly LIVE build because I wanted to try it with presumably a bleeding-edge kernel. I normally run Centos-6.5 on that system, where pretty much the same things have been happening, but was concerned that there may be a chipset/driver issue, and so wanted to see if a late-model kernel had solved the issue. apparently not.

Comment 10 Josh Boyer 2014-05-29 14:47:58 UTC

Upstream \(Al\) has been poking at this the past few days.  I believe he has a fix now and it should make its way into rawhide soon.

Comment 11 fred smith 2014-05-29 22:28:45 UTC

a mostly OT comment:
If the upstream fix solves the problem, it would be wondrous for it to be backported to EL6 (and Centos 6)...

Comment 12 Josh Boyer 2014-05-31 16:13:54 UTC

Tomorrow's rawhide will contain the upstream fixes for this issue in kernel-3.15.0-rc7.git4.2.  Please test when you can.

Comment 13 fred smith 2014-06-04 00:37:03 UTC

I'm now in the midst of testing yesterday's nightly build using kernel 3.15.0-0.rc7.git4.2.fc21.x86_64. I've been banginig on it for over a half hour, repeatedly running the commands that seemed to trigger the bug, and so far it's just quietly doing what I ask.

I'll continue to do this for a while longer, and will add another comment if I see any problems.