RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.
Bug 646505 - Kernel warning at boot: i7core_edac: probe of 0000:80:14.0 failed with error -22
Summary: Kernel warning at boot: i7core_edac: probe of 0000:80:14.0 failed with error...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 6
Classification: Red Hat
Component: kernel
Version: 6.0
Hardware: x86_64
OS: Linux
low
medium
Target Milestone: rc
: ---
Assignee: Mauro Carvalho Chehab
QA Contact: Chao Ye
URL:
Whiteboard:
Depends On:
Blocks: 658418
TreeView+ depends on / blocked
 
Reported: 2010-10-25 13:58 UTC by Gerhard Wichert
Modified: 2018-11-14 16:41 UTC (History)
7 users (show)

Fixed In Version: kernel-2.6.32-112.el6
Doc Type: Bug Fix
Doc Text:
Clone Of:
: 658418 (view as bug list)
Environment:
Last Closed: 2011-05-23 20:27:28 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
sosreport (3.59 MB, application/x-xz-compressed-tar)
2010-10-25 13:58 UTC, Gerhard Wichert
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHSA-2011:0542 0 normal SHIPPED_LIVE Important: Red Hat Enterprise Linux 6.1 kernel security, bug fix and enhancement update 2011-05-19 11:58:07 UTC

Description Gerhard Wichert 2010-10-25 13:58:50 UTC
Created attachment 455545 [details]
sosreport

Description of problem:
At kernel boot, while EDAC driver is loaded, the kernel outputs a warning message:

Oct 13 14:17:19 js-rx600s5 kernel: EDAC MC: Ver: 2.1.0 Sep  1 2010
Oct 13 14:17:19 js-rx600s5 kernel: EDAC i7core: Driver loaded.
Oct 13 14:17:19 js-rx600s5 kernel: i7core_edac: probe of 0000:80:14.0 failed with error -22

Only the first memory controller will be initialized.

Version-Release number of selected component (if applicable):
MC Driver for Intel i7 Core memory controllers - Ver: 1.0.0

How reproducible:
100%

Steps to Reproduce:
1.Boot RHEL6 on Fujitsu PRIMERGY RX600 S5.
2.
3.
  
Actual results:
Warning message is printed and only the first memory controller is initialized.


Expected results:
No warning and all memory controllers will be initialized.

Additional info:
There are 2 PCI_DEVICE_ID_INTEL_X58_HUB_MGMT (0x342e) in the system, the first at 0000:00:14.0 and the second at 0000:80:14.0. The warning message will be issued for the second one and you'll only find entries for the first one in sysfs.

Comment 2 Leonard den Ottolander 2010-11-01 00:20:21 UTC
This should probably be filed as a kernel bug not a midnight commander (mc) bug.

Changing component to kernel.

Comment 4 Mauro Carvalho Chehab 2010-11-22 19:54:44 UTC
Gerhard,

What kernel version are you using? I've sent some patches fixing some probe/remove logic that should be removing the error message.

Btw, the error message is bogus, as the i7core_edac driver probes the two memory controllers at once. Even with this message, you should be seen the two devices created at /sys/devices/system/edac/mc/.

Comment 5 Gerhard Wichert 2010-11-23 10:28:24 UTC
Mauro,

We got the issue first on kernel 2.6.32-71.el6.x86_64 (see sosreport), but retestet with RHEL6.0 GA version with the same result.

And btw, /sys/devices/system/edac/mc/ is empty.

Comment 6 Mauro Carvalho Chehab 2010-11-23 10:46:47 UTC
Hmm... According to Intel's ark:

http://ark.intel.com/Product.aspx?id=46492&code=Intel%C2%AE+Xeon%C2%AE+Processor+E7540+%2818M+Cache%2c+2.00+GHz%2c+6.40+GT%2fs+Intel%C2%AE+QPI%29

This processor is based on Nehalem-EX design. The driver doesn't support Nehalem-EX, as it uses a completely different memory controller. There's a patch pending to be added at RHEL6 kernel that will return -ENODEV instead of -EINVAL, avoiding that error message to be displayed.

Comment 7 Gerhard Wichert 2010-11-23 14:46:24 UTC
Ah yes ... But then we shouldn't see "EDAC i7core: Driver loaded." in the boot messages.

Comment 8 Gerhard Wichert 2010-11-23 16:01:53 UTC
It seems, the problem is that i7core_get_devices() returns 0 if no devices are found but should return -ENODEV.

Comment 9 Mauro Carvalho Chehab 2010-11-23 17:55:13 UTC
I've added some test kernels with some corrections for i7core_edac due to:
   https://bugzilla.redhat.com/show_bug.cgi?id=603124#c7

Test kernels are available at:

http://people.redhat.com/~mchehab/.bz603124/

Could you please test if this solves the issue?

Comment 10 Gerhard Wichert 2010-11-24 10:21:04 UTC
Test kernel shows the same error messages.

Comment 11 Mauro Carvalho Chehab 2010-11-30 10:01:15 UTC
(In reply to comment #10)
> Test kernel shows the same error messages.

Ok. I'm adding a new clause to return -ENODEV and not print the success message, if no memory controller is found.

On my tests on an E7540-based machine, with a debug kernel, it is properly working:

EDAC DEBUG: in drivers/edac/i7core_edac.c, line at 2172: MC: drivers/edac/i7core_edac.c: i7core_exit()
EDAC DEBUG: in drivers/edac/i7core_edac.c, line at 2151: MC: drivers/edac/i7core_edac.c: i7core_init()
EDAC DEBUG: in drivers/edac/i7core_edac.c, line at 1295: Found bus 0
EDAC DEBUG: in drivers/edac/i7core_edac.c, line at 1300: Last bus 0

(no "driver loaded" message).

I'm generating the rpm's for it. I'll be posting them when build finishes.

Comment 12 Mauro Carvalho Chehab 2010-11-30 11:34:49 UTC
A test on a Nehalem device properly shows the number of discovered memory controllers:

EDAC MC: Ver: 2.1.0 Nov 30 2010
EDAC MC0: Giving out device to 'i7core_edac.c' 'i7 core #0': DEV 0000:3f:03.0
EDAC PCI0: Giving out device to module 'i7core_edac' controller 'EDAC PCI controller': DEV '0000:3f:03.0' (POLLED)
EDAC i7core: Driver loaded, 1 memory controller(s) found.

Comment 16 J.H.M. Dassen (Ray) 2010-12-03 09:39:57 UTC
=== In Red Hat Customer Portal Case 00367515 ===
--- Comment by Wichert, Gerhard on 03/12/2010 10:29 ---

Hi Ray,

With the provided packages the error message doesn't occur anymore and the info message "EDAC i7core: Driver loaded." is gone, too.

Regards,
Gerhard

Comment 17 RHEL Program Management 2011-01-17 23:00:18 UTC
This request was evaluated by Red Hat Product Management for inclusion
in a Red Hat Enterprise Linux maintenance release. Product Management has 
requested further review of this request by Red Hat Engineering, for potential
inclusion in a Red Hat Enterprise Linux Update release for currently deployed 
products. This request is not yet committed for inclusion in an Update release.

Comment 20 Aristeu Rozanski 2011-02-03 16:15:34 UTC
Patch(es) available on kernel-2.6.32-112.el6

Comment 23 Han Pingtian 2011-03-10 09:00:14 UTC
Looks like I can reproduce this bug on intel-s3e36-01.lab.bos.redhat.com, using -118.el6 kernel:

...
EDAC MC: Ver: 2.1.0 Sep  1 2010
EDAC i7core: Driver loaded.
i7core_edac: probe of 0000:80:14.0 failed with error -22
...

And there is nothing in /sys/devices/system/edac/mc/:

[root@intel-s3e36-01 ~]# ls /sys/devices/system/edac/mc/
[root@intel-s3e36-01 ~]#

Comment 25 Han Pingtian 2011-03-16 08:06:27 UTC
With -122.el6, there is no this error message:

...
sd 0:2:0:0: Attached scsi generic sg0 type 0
sr 1:0:0:0: Attached scsi generic sg1 type 5
EDAC MC: Ver: 2.1.0 Mar  9 2011
ioatdma: Intel(R) QuickData Technology Driver 4.00
  alloc irq_desc for 43 on node -1
...

But there is nothing under sysfs directory:

[root@intel-s3e36-01 ~]# ls /sys/devices/system/edac/mc/
[root@intel-s3e36-01 ~]#

Comment 26 Mauro Carvalho Chehab 2011-03-16 10:31:01 UTC
(In reply to comment #25)
> With -122.el6, there is no this error message:
> 
> ...
> sd 0:2:0:0: Attached scsi generic sg0 type 0
> sr 1:0:0:0: Attached scsi generic sg1 type 5
> EDAC MC: Ver: 2.1.0 Mar  9 2011
> ioatdma: Intel(R) QuickData Technology Driver 4.00
>   alloc irq_desc for 43 on node -1
> ...
> 
> But there is nothing under sysfs directory:
> 
> [root@intel-s3e36-01 ~]# ls /sys/devices/system/edac/mc/
> [root@intel-s3e36-01 ~]#

Yes, that's the expected behavior on machines that use an unsupported memory controller. Basically, the E75xx machines use Nehalem-EX design, with a completely different memory controller. The memory controller on those machines are undocumented and aren't supported by the EDAC driver.

What the driver does when it notices an unsupported device is to return -ENODEV.

Comment 29 Chao Ye 2011-04-29 04:22:26 UTC
Based on comment#26 and comment#27, change status to VERIFIED.

Comment 30 errata-xmlrpc 2011-05-23 20:27:28 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHSA-2011-0542.html


Note You need to log in before you can comment on or make changes to this bug.