Bug 449676
Summary: | Turning a CPU offline causes panic | ||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Product: | Red Hat Enterprise MRG | Reporter: | Arnaldo Carvalho de Melo <acme> | ||||||||||||||
Component: | realtime-kernel | Assignee: | Peter Zijlstra <pzijlstr> | ||||||||||||||
Status: | CLOSED ERRATA | QA Contact: | |||||||||||||||
Severity: | low | Docs Contact: | |||||||||||||||
Priority: | high | ||||||||||||||||
Version: | beta | CC: | bhu, ghaskins, lwang, pzijlstr, srostedt | ||||||||||||||
Target Milestone: | 1.0.1 | ||||||||||||||||
Target Release: | --- | ||||||||||||||||
Hardware: | All | ||||||||||||||||
OS: | Linux | ||||||||||||||||
Whiteboard: | |||||||||||||||||
Fixed In Version: | Doc Type: | Bug Fix | |||||||||||||||
Doc Text: | Story Points: | --- | |||||||||||||||
Clone Of: | Environment: | ||||||||||||||||
Last Closed: | 2008-08-26 19:57:14 UTC | Type: | --- | ||||||||||||||
Regression: | --- | Mount Type: | --- | ||||||||||||||
Documentation: | --- | CRM: | |||||||||||||||
Verified Versions: | Category: | --- | |||||||||||||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||||||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||||||||||
Embargoed: | |||||||||||||||||
Attachments: |
|
Description
Arnaldo Carvalho de Melo
2008-06-03 00:57:32 UTC
Problem is not Intel specific, diagnosed as a cpupri problem by Thomas Gleixner, kernel rpm package with a hotfix patch being built. After talking to tglx, I suspect the issue is in how the root-domain and cpupri code interact with one another when building a domain. The cpus in question should have been marked INVALID which would remove them from the cpupri tables. Instead they were allowed to register with some priority that was != INVALID. This bug is likely to affect 23-rt, 24-rt, 25-rt, and sched-devel (and of course, any derivatives of such). I will hopefully post a solution soon. Survives a lot longer, but if we keep offlining/onlining a cpu in an infinite loop we eventually OOPS, different backtrace this time, will attach. Created attachment 308312 [details]
new oops
Created attachment 308318 [details]
Proposed fix
With Gregory patch applied it now takes a lot longer, but we eventually OOPS Created attachment 308373 [details]
new oops, with Gregory patch applied
Created attachment 308386 [details]
Proposed fix
I have incorporated Peter Zijlstra's feedback, and also fixed some additional
holes that I found would still allow the cpupri table to get updated. Please
retest!
Created attachment 308389 [details]
Proposed fix
Hmmm...i refreshed the patch, but firefox seems to have uploaded the old
one..lets try again.
Created attachment 313693 [details]
sysfs CPU classes, usage example does random CPU off/onlining
An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on therefore solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHSA-2008-0585.html |