Bug 1096572
| Summary: | [abrt] BUG: soft lockup - CPU#6 stuck for 23s! [systemd-udevd:9995] | ||||||||
|---|---|---|---|---|---|---|---|---|---|
| Product: | [Fedora] Fedora | Reporter: | Chuck Forsberg <caf> | ||||||
| Component: | kernel | Assignee: | Kernel Maintainer List <kernel-maint> | ||||||
| Status: | CLOSED RAWHIDE | QA Contact: | Fedora Extras Quality Assurance <extras-qa> | ||||||
| Severity: | unspecified | Docs Contact: | |||||||
| Priority: | unspecified | ||||||||
| Version: | rawhide | CC: | aviro, elad, fredex, gansalmon, itamar, jonathan, kernel-maint, madhu.chinakonda, mchehab, twohotis | ||||||
| Target Milestone: | --- | ||||||||
| Target Release: | --- | ||||||||
| Hardware: | x86_64 | ||||||||
| OS: | Unspecified | ||||||||
| URL: | https://retrace.fedoraproject.org/faf/reports/bthash/f8b0a799a4b341f461ad73e5a05a311c4858bb23 | ||||||||
| Whiteboard: | abrt_hash:cfe0d94510c04f964d605d77fec6769fc27958d5 | ||||||||
| Fixed In Version: | Doc Type: | Bug Fix | |||||||
| Doc Text: | Story Points: | --- | |||||||
| Clone Of: | Environment: | ||||||||
| Last Closed: | 2014-11-06 00:03:47 UTC | Type: | --- | ||||||
| Regression: | --- | Mount Type: | --- | ||||||
| Documentation: | --- | CRM: | |||||||
| Verified Versions: | Category: | --- | |||||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||||
| Embargoed: | |||||||||
| Attachments: |
|
||||||||
|
Description
Chuck Forsberg
2014-05-12 04:17:19 UTC
Created attachment 894541 [details]
File: dmesg
Reproducer would be nice. It *might* be somebody managing to hog ->i_lock for obscenely long, but even then preempt would've kicked that sucker off CPU eventually - each pass through the loop in shrink_dentry_list() starts with no spinlocks held. And I would really like to see how had that been triggered, preempt or no preempt - ability to create that much work for shrink_dentry_list() is really bad, especially if we have serious ->i_lock contention somehow. What's the .config of that kernel, BTW? *** Bug 1097096 has been marked as a duplicate of this bug. *** Created attachment 895118 [details]
The .config for the kernel
.config attached. It's just a stock Fedora rawhide kernel. I have no idea what a reproducer would be. Hopefully the reporter can fill that in.
*** Bug 1099465 has been marked as a duplicate of this bug. *** *** Bug 1100910 has been marked as a duplicate of this bug. *** This bug is real and is messing with my ASUS N56VJ laptop badly. It happens when I disconnect my Android phone (it has SD cardb. I'd be happy to provide logs if needed *** Bug 1102452 has been marked as a duplicate of this bug. *** I reported 1102452, which I believe is reproducible. here's what I was doing when it failed: Using an external USB3.0 storage device (http://www.newegg.com/Product/Product.aspx?Item=N82E16817332028) with the USB 3.0 cable it came with, find a file or directory tree of several gigabytes, copy it to the device, repeatedly, removing the file after each copy. It takes only 3 or 4 (or 5) copies for things to suddenly go bad. The device suddenly is no longer available, is no longer mounted, and at least sometimes the /dev entry is gone. shortly thereafter the messages about the CPU being stuck start showing up and I get the abrt alarm offering to report the bug. I have no other USB 3.0 storage devices to test with. This testing is on an Asus motherboard (http://www.newegg.com/Product/Product.aspx?Item=N82E16813131874) and AMD CPU (http://www.newegg.com/Product/Product.aspx?Item=N82E16819113286) with the latest (as of about a week ago) BIOS. this storage device APPEARS to work OK with a USB 2.0 cable, but I may simply have not beaten on it hard or long enough. (NOTE that it works fine using esata.) In case it makes any difference, it is configured as RAID-1 with 2 1TB drives. I've not tried it in any other configuration. I tested with the nightly LIVE build because I wanted to try it with presumably a bleeding-edge kernel. I normally run Centos-6.5 on that system, where pretty much the same things have been happening, but was concerned that there may be a chipset/driver issue, and so wanted to see if a late-model kernel had solved the issue. apparently not. Upstream \(Al\) has been poking at this the past few days. I believe he has a fix now and it should make its way into rawhide soon. a mostly OT comment: If the upstream fix solves the problem, it would be wondrous for it to be backported to EL6 (and Centos 6)... Tomorrow's rawhide will contain the upstream fixes for this issue in kernel-3.15.0-rc7.git4.2. Please test when you can. I'm now in the midst of testing yesterday's nightly build using kernel 3.15.0-0.rc7.git4.2.fc21.x86_64. I've been banginig on it for over a half hour, repeatedly running the commands that seemed to trigger the bug, and so far it's just quietly doing what I ask. I'll continue to do this for a while longer, and will add another comment if I see any problems. |