Red Hat Bugzilla – Full Text Bug Listing
|Summary:||[Brocade/Dell 5.3 bug] hts failing memory test with EDAC i5000 Non-Fatal error|
|Product:||Red Hat Enterprise Linux 5||Reporter:||Janice Vatcher <jvatcher>|
|Component:||kernel||Assignee:||Aristeu Rozanski <arozansk>|
|Status:||CLOSED ERRATA||QA Contact:||Martin Jenner <mjenner>|
|Version:||5.2||CC:||andriusb, coughlan, cward, dzickus, gnichols, lwang, martinez, mbroz, mgahagan, mmcallis, rlandry, syeghiay, tao, william|
|Fixed In Version:||Doc Type:||Bug Fix|
|Doc Text:||Story Points:||---|
|:||494734 (view as bug list)||Environment:|
|Last Closed:||2009-01-20 14:45:32 EST||Type:||---|
|oVirt Team:||---||RHEL 7.3 requirements from Atomic Host:|
|Bug Depends On:|
Description Janice Vatcher 2008-11-17 13:50:11 EST
Description of problem: Failing memory test in HTS. Error in messages Nov 16 04:44:02 hba081036 kernel: EDAC i5000: NON-Retry Errors, bits= 0x800 Version-Release number of selected component (if applicable): How reproducible: Run hts against the system Steps to Reproduce: 1. run hts discover 2. run hts certify 3. Actual results: memory - FAIL Expected results: memory - PASS Additional info:
Comment 1 Andrius Benokraitis 2008-11-17 14:32:29 EST
Janice - do you have any logs we can look at?
Comment 2 Rob Landry 2008-11-17 14:40:57 EST
The error in description looks to be coming from the kernel, either the VM or the driver. While I could see how hts might aggravate whatever the problem is I would not suspect it the root cause. Reassigning to kernel, for their assessment (hts or system logs should assist in their review as well).
Comment 3 Janice Vatcher 2008-11-17 14:46:26 EST
Created attachment 323790 [details] Here is the message file. Andrius Here is the messages file from the server. Janice
Comment 4 Andrius Benokraitis 2008-11-17 15:03:29 EST
Janice - can you post the HTS logs? It can be captured in the INFO test.
Comment 5 Milan Broz 2008-11-20 10:22:43 EST
I see this on my Dell 490 too, I'll try to fiddle with memory banks but I expect it is just too verbose message... Console is flooded with EDAC i5000 MC0: NON-FATAL ERROR Found!!! 1st NON-FATAL Err Reg= 0x800 EDAC i5000: NON-Retry Errors, bits= 0x800 Linux 2.6.18-123.el5xen #1 SMP Mon Nov 10 18:45:33 EST 2008 x86_64 x86_64 x86_64 GNU/Linux Seems to similar to bug #458133 ?
Comment 6 Aristeu Rozanski 2008-11-21 08:39:54 EST
Yes, please test the kernel package available at: http://people.redhat.com/arozansk/bz458133/
Comment 7 Milan Broz 2008-11-21 10:04:04 EST
Yes, 2.6.18-120.el5.458133xe is ok here, no messages. The message mentioned in comment #5 repeats every second on non-patched kernel, it makes the physical console mostly unusable or it floods logs at least. Please consider this as blocker for RHEL5.3... My HW is standard Dell Precision 690 workstation.
Comment 12 Aristeu Rozanski 2008-11-21 13:31:06 EST
*** Bug 458133 has been marked as a duplicate of this bug. ***
Comment 14 Don Zickus 2008-12-02 17:20:22 EST
in kernel-2.6.18-125.el5 You can download this test kernel from http://people.redhat.com/dzickus/el5
Comment 16 Chris Ward 2008-12-04 05:22:56 EST
Brocade, what is the current status of this bug fix? The fix should be present in the latest RHEL5.3 Snapshot. Please test and send feedback ASAP.
Comment 17 Chris Ward 2008-12-04 10:46:52 EST
Apologies, this fix should be present in the Snapshot 5, which is scheduled for release next week.
Comment 18 Chris Ward 2008-12-08 06:53:28 EST
~~ Snapshot 5 is now available @ partners.redhat.com ~~ Partners, RHEL 5.3 Snapshot 5 is now available for testing. Please send us your testing feedback on this important bug fix / feature request AS SOON AS POSSIBLE. If you are unable to test, indicate this in a comment or escalate to your Partner Manager. If we do not receive your test feedback, this bug will be AT RISK of being dropped from the release. If you have VERIFIED the fix, please add PartnerVerified to the Bugzilla Keywords field, along with a description of the test results. If you encounter a new bug, CLONE this bug and request from your Partner manager to review. We are no longer excepting new bugs into the release, bar critical regressions.
Comment 19 Chris Ward 2008-12-11 13:01:11 EST
Brocade, any update?
Comment 21 Chris Ward 2008-12-16 11:29:30 EST
~~~ Attention Partners ~~~ The *last* RHEL 5.3 Snapshot 6 is now available at partners.redhat.com. A fix for this bug should be present. Please test and update this bug with test results as soon as possible. If the fix present in Snap6 meets all the expected requirements for this bug, please add the keyword PartnerVerified. If any new bugs are discovered, please CLONE this bug and describe the issues encountered there.
Comment 22 Janice Vatcher 2008-12-19 17:24:32 EST
I loaded the 5.3 kernel on the same system and ran the certification test twice. I did not receive any EDAC errors in dmesg.
Comment 24 errata-xmlrpc 2009-01-20 14:45:32 EST
An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on therefore solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHSA-2009-0225.html
Comment 26 Murray McAllister 2009-05-19 21:55:57 EDT
The following bug description for this issue was too long to include in the errata: The i5000_edac module reported all types of errors (including errors not completely documented) and did not use edac_mc functions to report errors. Not using the edac_mc functions prevented the error messages from being filtered or silenced. On certain systems, this resulted in the console being flooded with errors, for example: EDAC i5000 MC0: NON-FATAL ERROR Found!!! 1st NON-FATAL Err Reg= [hex value] EDAC i5000: NON-Retry Errors, bits= [hex value] Removing the i5000_edac module prevented these errors; however, it may have prevented other important messages from being reported. After installing an update, the i5000_edac module uses the edac_mc functions to report errors, which resolves this issue. Note: After an update, the i5000_edac module will not report errors that are not completely documented: these will be disabled by default. To re-enable these messages, use the i5000_edac "misc_messages=1" module parameter.