Bug 746485
| Summary: | System crashes on rc9 but not rc8 | ||||||
|---|---|---|---|---|---|---|---|
| Product: | [Fedora] Fedora | Reporter: | Bruno Wolff III <bruno> | ||||
| Component: | kernel | Assignee: | Kernel Maintainer List <kernel-maint> | ||||
| Status: | CLOSED ERRATA | QA Contact: | Fedora Extras Quality Assurance <extras-qa> | ||||
| Severity: | unspecified | Docs Contact: | |||||
| Priority: | unspecified | ||||||
| Version: | 16 | CC: | bruno, gansalmon, itamar, jonathan, kernel-maint, madhu.chinakonda, satellitgo | ||||
| Target Milestone: | --- | ||||||
| Target Release: | --- | ||||||
| Hardware: | Unspecified | ||||||
| OS: | Unspecified | ||||||
| Whiteboard: | |||||||
| Fixed In Version: | kernel-3.1.0-0.rc10.git0.1.fc16 | Doc Type: | Bug Fix | ||||
| Doc Text: | Story Points: | --- | |||||
| Clone Of: | Environment: | ||||||
| Last Closed: | 2011-10-20 04:02:15 UTC | Type: | --- | ||||
| Regression: | --- | Mount Type: | --- | ||||
| Documentation: | --- | CRM: | |||||
| Verified Versions: | Category: | --- | |||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||
| Embargoed: | |||||||
| Bug Depends On: | |||||||
| Bug Blocks: | 713566 | ||||||
| Attachments: |
|
||||||
|
Description
Bruno Wolff III
2011-10-16 14:38:03 UTC
Created attachment 528393 [details]
traceback captured with netconsole
Because this doesn't seem to affect all systems, I am proposing this for NTH instead of blocker. One other NTH note is that rc9 is currently in testing, not stable. (I thought it had moved there but misremembered.) The NTH will only apply if it gets moved to stable before final. I misread the tags. It is currently tagged for both f16 and f16-updates-testing. So it is already an issue for final. Looks like some kind of subtle deadlock:
CPU 0: [<c043640f>] __task_rq_lock+0x28/0x46
[<c0440fd6>] wake_up_new_task+0x3a/0xa3
CPU 1: [<c0435e84>] account_group_exec_runtime+0x2c/0x49
[<c0435fc4>] update_curr+0x123/0x139
CPU 2: [<c043640f>] __task_rq_lock+0x28/0x46
[<c0440fd6>] wake_up_new_task+0x3a/0xa3
CPU 3: [<c04363ba>] task_rq_lock+0x43/0x70
[<c0441414>] task_sched_runtime+0x1f/0x9f
This might be the bug discussed in this thread: http://lkml.org/lkml/2011/10/7/45 I'll look at testing the patch that seemed to work for him. CPUs 0 and 2 are at kernel/sched.c:954:
raw_spin_lock(&rq->lock);
CPU 1 is at kernel/sched_stats.h:333:
spin_lock(&cputimer->lock);
CPU 3 is at kernel/sched.c:973:
raw_spin_lock(&rq->lock);
More recent patch: http://article.gmane.org/gmane.linux.kernel/1204676 Patch committed, will be in the next test kernel. Thanks. I'll test it as soon as it shows up. My build with the later patch is still running and may not finish until after I go to sleep tonight. I might try running the kernel I built with the older patch just to confirm it is really the same issue. I currently have three systems running 3.1.0-0.rc10.git0.1.fc16.i686.PAE. So far no problems, but it's too soon to declare victory. I'll have an x86_64 running the analagous kernel in a couple of hours. If things are all still working late tonight, then it will be very likely the problem is fixed. kernel-3.1.0-0.rc10.git0.1.fc16 has been submitted as an update for Fedora 16. https://admin.fedoraproject.org/updates/kernel-3.1.0-0.rc10.git0.1.fc16 The two machines that were typically crashing in under a couple of hours have been up for about 8 and 6 hours now, so I think there is a pretty good chance the problem is fixed. Package kernel-3.1.0-0.rc10.git0.1.fc16: * should fix your issue, * was pushed to the Fedora 16 testing repository, * should be available at your local mirror within two days. Update it with: # su -c 'yum update --enablerepo=updates-testing kernel-3.1.0-0.rc10.git0.1.fc16' as soon as you are able to, then reboot. Please go to the following url: https://admin.fedoraproject.org/updates/FEDORA-2011-14609 then log in and leave karma (feedback). kernel-3.1.0-0.rc10.git0.1.fc16 has been pushed to the Fedora 16 stable repository. If problems still persist, please make note of it in this bug report. |