Bug 606687
Summary: | HARDWARE ERROR on intel-sunriseridge-01 when unloading igb | ||||||
---|---|---|---|---|---|---|---|
Product: | Red Hat Enterprise Linux 6 | Reporter: | Stefan Assmann <sassmann> | ||||
Component: | kernel | Assignee: | Stefan Assmann <sassmann> | ||||
Status: | CLOSED CURRENTRELEASE | QA Contact: | Network QE <network-qe> | ||||
Severity: | medium | Docs Contact: | |||||
Priority: | low | ||||||
Version: | 6.0 | CC: | agospoda, alexander.h.duyck, hjia, jane.lv, jlv, john.ronciak, jvillalo, keve.a.gabbert, kzhang, luyu, maciej.sosnowski, prarit | ||||
Target Milestone: | rc | ||||||
Target Release: | --- | ||||||
Hardware: | All | ||||||
OS: | Linux | ||||||
Whiteboard: | |||||||
Fixed In Version: | Doc Type: | Bug Fix | |||||
Doc Text: | Story Points: | --- | |||||
Clone Of: | Environment: | ||||||
Last Closed: | 2010-11-11 16:16:03 UTC | Type: | --- | ||||
Regression: | --- | Mount Type: | --- | ||||
Documentation: | --- | CRM: | |||||
Verified Versions: | Category: | --- | |||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
Cloudforms Team: | --- | Target Upstream Version: | |||||
Embargoed: | |||||||
Bug Depends On: | |||||||
Bug Blocks: | 580574 | ||||||
Attachments: |
|
Description
Stefan Assmann
2010-06-22 08:56:47 UTC
This request was evaluated by Red Hat Product Management for inclusion in a Red Hat Enterprise Linux major release. Product Management has requested further review of this request by Red Hat Engineering, for potential inclusion in a Red Hat Enterprise Linux Major release. This request is not yet committed for inclusion. reproduced with - vanilla 2.6.33 - vanilla 2.6.34 PROCESSOR 0:206e6 TIME 1277204113 SOCKET 0 APIC 0 No human readable MCE decoding support on this CPU type. Run the message through 'mcelog --ascii' to decode. This is not a software problem! Machine check: Processor context corrupt Kernel panic - not syncing: Fatal Machine check Pid: 0, comm: swapper Tainted: G M 2.6.34 #3 Call Trace: <#MC> [<ffffffff8149bb2d>] panic+0x7d/0xfe [<ffffffff8101e092>] mce_panic+0x1e2/0x210 [<ffffffff8101f8a8>] do_machine_check+0xa28/0xa70 [<ffffffff8101423f>] ? mwait_idle+0x6f/0xd0 [<ffffffff8149ee5c>] machine_check+0x1c/0x30 [<ffffffff8101423f>] ? mwait_idle+0x6f/0xd0 <<EOE>> [<ffffffff81009dc6>] cpu_idle+0xb6/0x110 [<ffffffff81495c1b>] start_secondary+0x25d/0x2a0 Rebooting in 30 seconds.. Forgot to mention the trace seen here in the initial report just confirmed that it works on RHEL5 by trying 10x modprobe -r igb ; sleep 3 ; modprobe igb Ccing Alex looks like another DCA issue, the module (un)loading succeeds when I blacklist ioatdma. Maciej, looks like another DCA problem, could you look into it? Yes. We will try to reproduce it locally. In the meantime I have informed Sunrise Ridge team about this issue + filed a bug in their database. As I understand, this issue is observed on Sunrise Ridge, not Emerald Ridge - could you confirm? Thanks. (In reply to comment #7) > As I understand, this issue is observed on Sunrise Ridge, not Emerald Ridge - > could you confirm? Thanks. confirmed! Created attachment 430414 [details] patch with workaround (based on kernel 2.6.32) I am attaching a patch with proposed workaround for the issue. The patch is based on kernel 2.6.32. The workaround should work for both: - Bug 572732: Unloading igb module causes system reset, - Bug 606687: HARDWARE ERROR on intel-sunriseridge-01 when unloading igb. Please let me know if it works on your side. Thanks. Hi Maciej, works great on intel-sunriseridge-01 so far! reloaded igb 20x without a problem. Care to explain what was going wrong? Good news. Thanks. The patch is actually a workaround. To avoid platform reset/MCE dca module blocks dca providers if an Emerald Ridge / Sunrise Ridge platform is detected. Is this the code we're going to see upstream? It would be good to have a upstream reference to get it included into RHEL6.0. The problem has not been fully root caused yet. Please include the workaround patch in your kernel to avoid the problem with RHEL6. Once we have root caused this issue we will provide appropriate solution if any needed to Red Hat and upstream kernel. Patch(es) available on kernel-2.6.32-52.el6 Reproduced on -44 kernel, HARDWARE ERROR when unload igb, verified on -63, no crash, igb works fine. *** Bug 624602 has been marked as a duplicate of this bug. *** Red Hat Enterprise Linux 6.0 is now available and should resolve the problem described in this bug report. This report is therefore being closed with a resolution of CURRENTRELEASE. You may reopen this bug report if the solution does not work for you. |