Bug 646505
Summary: | Kernel warning at boot: i7core_edac: probe of 0000:80:14.0 failed with error -22 | ||||||
---|---|---|---|---|---|---|---|
Product: | Red Hat Enterprise Linux 6 | Reporter: | Gerhard Wichert <gerhard.wichert> | ||||
Component: | kernel | Assignee: | Mauro Carvalho Chehab <mchehab> | ||||
Status: | CLOSED ERRATA | QA Contact: | Chao Ye <cye> | ||||
Severity: | medium | Docs Contact: | |||||
Priority: | low | ||||||
Version: | 6.0 | CC: | cward, gasmith, jwest, leonard-rh-bugzilla, lwang, phan, qcai | ||||
Target Milestone: | rc | ||||||
Target Release: | --- | ||||||
Hardware: | x86_64 | ||||||
OS: | Linux | ||||||
Whiteboard: | |||||||
Fixed In Version: | kernel-2.6.32-112.el6 | Doc Type: | Bug Fix | ||||
Doc Text: | Story Points: | --- | |||||
Clone Of: | |||||||
: | 658418 (view as bug list) | Environment: | |||||
Last Closed: | 2011-05-23 20:27:28 UTC | Type: | --- | ||||
Regression: | --- | Mount Type: | --- | ||||
Documentation: | --- | CRM: | |||||
Verified Versions: | Category: | --- | |||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
Cloudforms Team: | --- | Target Upstream Version: | |||||
Embargoed: | |||||||
Bug Depends On: | |||||||
Bug Blocks: | 658418 | ||||||
Attachments: |
|
This should probably be filed as a kernel bug not a midnight commander (mc) bug. Changing component to kernel. Gerhard, What kernel version are you using? I've sent some patches fixing some probe/remove logic that should be removing the error message. Btw, the error message is bogus, as the i7core_edac driver probes the two memory controllers at once. Even with this message, you should be seen the two devices created at /sys/devices/system/edac/mc/. Mauro, We got the issue first on kernel 2.6.32-71.el6.x86_64 (see sosreport), but retestet with RHEL6.0 GA version with the same result. And btw, /sys/devices/system/edac/mc/ is empty. Hmm... According to Intel's ark: http://ark.intel.com/Product.aspx?id=46492&code=Intel%C2%AE+Xeon%C2%AE+Processor+E7540+%2818M+Cache%2c+2.00+GHz%2c+6.40+GT%2fs+Intel%C2%AE+QPI%29 This processor is based on Nehalem-EX design. The driver doesn't support Nehalem-EX, as it uses a completely different memory controller. There's a patch pending to be added at RHEL6 kernel that will return -ENODEV instead of -EINVAL, avoiding that error message to be displayed. Ah yes ... But then we shouldn't see "EDAC i7core: Driver loaded." in the boot messages. It seems, the problem is that i7core_get_devices() returns 0 if no devices are found but should return -ENODEV. I've added some test kernels with some corrections for i7core_edac due to: https://bugzilla.redhat.com/show_bug.cgi?id=603124#c7 Test kernels are available at: http://people.redhat.com/~mchehab/.bz603124/ Could you please test if this solves the issue? Test kernel shows the same error messages. (In reply to comment #10) > Test kernel shows the same error messages. Ok. I'm adding a new clause to return -ENODEV and not print the success message, if no memory controller is found. On my tests on an E7540-based machine, with a debug kernel, it is properly working: EDAC DEBUG: in drivers/edac/i7core_edac.c, line at 2172: MC: drivers/edac/i7core_edac.c: i7core_exit() EDAC DEBUG: in drivers/edac/i7core_edac.c, line at 2151: MC: drivers/edac/i7core_edac.c: i7core_init() EDAC DEBUG: in drivers/edac/i7core_edac.c, line at 1295: Found bus 0 EDAC DEBUG: in drivers/edac/i7core_edac.c, line at 1300: Last bus 0 (no "driver loaded" message). I'm generating the rpm's for it. I'll be posting them when build finishes. A test on a Nehalem device properly shows the number of discovered memory controllers: EDAC MC: Ver: 2.1.0 Nov 30 2010 EDAC MC0: Giving out device to 'i7core_edac.c' 'i7 core #0': DEV 0000:3f:03.0 EDAC PCI0: Giving out device to module 'i7core_edac' controller 'EDAC PCI controller': DEV '0000:3f:03.0' (POLLED) EDAC i7core: Driver loaded, 1 memory controller(s) found. === In Red Hat Customer Portal Case 00367515 === --- Comment by Wichert, Gerhard on 03/12/2010 10:29 --- Hi Ray, With the provided packages the error message doesn't occur anymore and the info message "EDAC i7core: Driver loaded." is gone, too. Regards, Gerhard This request was evaluated by Red Hat Product Management for inclusion in a Red Hat Enterprise Linux maintenance release. Product Management has requested further review of this request by Red Hat Engineering, for potential inclusion in a Red Hat Enterprise Linux Update release for currently deployed products. This request is not yet committed for inclusion in an Update release. Patch(es) available on kernel-2.6.32-112.el6 Looks like I can reproduce this bug on intel-s3e36-01.lab.bos.redhat.com, using -118.el6 kernel: ... EDAC MC: Ver: 2.1.0 Sep 1 2010 EDAC i7core: Driver loaded. i7core_edac: probe of 0000:80:14.0 failed with error -22 ... And there is nothing in /sys/devices/system/edac/mc/: [root@intel-s3e36-01 ~]# ls /sys/devices/system/edac/mc/ [root@intel-s3e36-01 ~]# With -122.el6, there is no this error message: ... sd 0:2:0:0: Attached scsi generic sg0 type 0 sr 1:0:0:0: Attached scsi generic sg1 type 5 EDAC MC: Ver: 2.1.0 Mar 9 2011 ioatdma: Intel(R) QuickData Technology Driver 4.00 alloc irq_desc for 43 on node -1 ... But there is nothing under sysfs directory: [root@intel-s3e36-01 ~]# ls /sys/devices/system/edac/mc/ [root@intel-s3e36-01 ~]# (In reply to comment #25) > With -122.el6, there is no this error message: > > ... > sd 0:2:0:0: Attached scsi generic sg0 type 0 > sr 1:0:0:0: Attached scsi generic sg1 type 5 > EDAC MC: Ver: 2.1.0 Mar 9 2011 > ioatdma: Intel(R) QuickData Technology Driver 4.00 > alloc irq_desc for 43 on node -1 > ... > > But there is nothing under sysfs directory: > > [root@intel-s3e36-01 ~]# ls /sys/devices/system/edac/mc/ > [root@intel-s3e36-01 ~]# Yes, that's the expected behavior on machines that use an unsupported memory controller. Basically, the E75xx machines use Nehalem-EX design, with a completely different memory controller. The memory controller on those machines are undocumented and aren't supported by the EDAC driver. What the driver does when it notices an unsupported device is to return -ENODEV. Based on comment#26 and comment#27, change status to VERIFIED. An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on therefore solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHSA-2011-0542.html |
Created attachment 455545 [details] sosreport Description of problem: At kernel boot, while EDAC driver is loaded, the kernel outputs a warning message: Oct 13 14:17:19 js-rx600s5 kernel: EDAC MC: Ver: 2.1.0 Sep 1 2010 Oct 13 14:17:19 js-rx600s5 kernel: EDAC i7core: Driver loaded. Oct 13 14:17:19 js-rx600s5 kernel: i7core_edac: probe of 0000:80:14.0 failed with error -22 Only the first memory controller will be initialized. Version-Release number of selected component (if applicable): MC Driver for Intel i7 Core memory controllers - Ver: 1.0.0 How reproducible: 100% Steps to Reproduce: 1.Boot RHEL6 on Fujitsu PRIMERGY RX600 S5. 2. 3. Actual results: Warning message is printed and only the first memory controller is initialized. Expected results: No warning and all memory controllers will be initialized. Additional info: There are 2 PCI_DEVICE_ID_INTEL_X58_HUB_MGMT (0x342e) in the system, the first at 0000:00:14.0 and the second at 0000:80:14.0. The warning message will be issued for the second one and you'll only find entries for the first one in sysfs.