755208 – amd64_edac_mod fails to load in 2.6.41.1-1.fc15.x86_64

Bug 755208 - amd64_edac_mod fails to load in 2.6.41.1-1.fc15.x86_64

Summary: amd64_edac_mod fails to load in 2.6.41.1-1.fc15.x86_64

Keywords:
Status:	CLOSED NOTABUG
Alias:	None
Product:	Fedora
Classification:	Fedora
Component:	kernel
Sub Component:
Version:	15
Hardware:	x86_64
OS:	Linux
Priority:	unspecified
Severity:	medium
Target Milestone:	---
Assignee:	Kernel Maintainer List
QA Contact:	Fedora Extras Quality Assurance
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2011-11-19 14:06 UTC by Ian Malone
Modified:	2011-11-29 08:34 UTC (History)
CC List:	5 users (show)
Fixed In Version:
Clone Of:
Environment:
Last Closed:	2011-11-29 08:34:10 UTC
Type:	---
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Description Ian Malone 2011-11-19 14:06:22 UTC

kernel-2.6.41.1-1.fc15.x86_64

My entire dmesg ring consists of this repeated at ~ 0.03s intervals:

[  327.533760] AMD64 EDAC driver v3.4.0
[  327.533995] EDAC amd64: DRAM ECC disabled.
[  327.534042] EDAC amd64: ECC disabled in the BIOS or no ECC capability, module will not load.
[  327.534046]  Either enable ECC checking or force module loading by setting 'ecc_enable_override'.
[  327.534050]  (Note that use of the override may cause unknown side effects.)
[  327.543442] modprobe[22733]: FATAL: Error inserting amd64_edac_mod (/lib/modules/2.6.41.1-1.fc15.x86_64/kernel/drivers/edac/amd64_edac_mod.ko): No such device

This also causes a very long boot, presumably while waiting for the systemd unit to time out. However I can't see far enough back in the message ring to confirm that.

How reproducible:
100% Doesn't happen booting into the previous kernel.


Additional info:
Possibly unrelated and maybe a systemd bug, but during boot in text mode I see a whole lot of these (typed in):
Loading kernel module for a network device with CAP_SYS_MODULE (deprecated). Use CAP_NET_ADMIN and alias X instead.

Where X includes netdev-snd_ice1724, netdev-snd_ac97_codec, netdev-fat, netdev-vfat.

Comment 1 Ian Malone 2011-11-19 23:03:25 UTC

Looking into another problem I've noticed that booting with an earlier kernel (2.6.40.8-3.bz731672.fc15.x86_64) I have one instance of this:
[   20.708375] AMD64 EDAC driver v3.4.0
[   20.708472] EDAC amd64: DRAM ECC disabled.
[   20.708483] EDAC amd64: ECC disabled in the BIOS or no ECC capability, module
 will not load.
[   20.708485]  Either enable ECC checking or force module loading by setting 'e
cc_enable_override'.
[   20.708486]  (Note that use of the override may cause unknown side effects.)

...but not the "modprobe[22733]: FATAL" error. And it doesn't repeat.

Comment 2 Ian Malone 2011-11-20 13:45:14 UTC

This is no longer happening though it did occur for several reboots, no further rpm updates though. I had run a SELinux relabel, is it conceivable there was some labelling error in operation?

Comment 3 Ian Malone 2011-11-20 21:48:11 UTC

That was premature, it's back again. Still no package changes, don't see any unexpected selinux alerts (rkhunter mailx one and a gnome-session-check-accelerated-helper one).

Looks like the context is okay, so the relabel was a coincidence:

$ matchpathcon /lib/modules/2.6.41.1-1.fc15.x86_64/kernel/drivers/edac/amd64_edac_mod.ko
/lib/modules/2.6.41.1-1.fc15.x86_64/kernel/drivers/edac/amd64_edac_mod.ko	system_u:object_r:modules_object_t:s0

Comment 4 Ian Malone 2011-11-21 23:52:15 UTC

After about 50minutes uptime udevd goes crazy and takes 100% cpu rendering the system nearly unresponsive. On killing it there are no more attempts to load this module.

Comment 5 Dave Jones 2011-11-22 17:39:05 UTC

can you run udevadm monitor for a while (with udev running), and attach the output.  It's probably going to be looping, so you won't need to run it for long.

Comment 6 Ian Malone 2011-11-22 23:01:52 UTC

Like this, then repeating:

monitor will print the received events for:
UDEV - the event which udev sends out after rule processing
KERNEL - the kernel uevent

UDEV  [1322001412.051980] add      /bus/pci/drivers/amd64_edac (drivers)
KERNEL[1322001412.055699] remove   /module/amd64_edac_mod (module)
UDEV  [1322001412.056815] add      /module/amd64_edac_mod (module)
UDEV  [1322001412.061686] remove   /bus/pci/drivers/amd64_edac (drivers)
KERNEL[1322001412.065637] add      /module/amd64_edac_mod (module)
KERNEL[1322001412.065831] add      /bus/pci/drivers/amd64_edac (drivers)
KERNEL[1322001412.065844] remove   /bus/pci/drivers/amd64_edac (drivers)
KERNEL[1322001412.070680] remove   /module/amd64_edac_mod (module)

However I think I've got it solved now, a udev rule entered to work around this bug https://bugzilla.redhat.com/show_bug.cgi?id=753648 had a stray newline in it so was doing:
SUBSYSTEM=="net", ACTION=="add", DRIVERS=="rt2500pci", KERNEL=="wlan*",
RUN="/sbin/iw $name set power_save off"

rather than
SUBSYSTEM=="net", ACTION=="add", DRIVERS=="rt2500pci", KERNEL=="wlan*", RUN="/sbin/iw $name set power_save off"

It happened to be picking on this module, but I did get it into a state where it was upset with powernow-k8 and noticed it then tried to
/sbin/iw powernow-k8 set power_save off
Which I suspect meant it was trying to run /sbin/iw against every device tried. I think this bug can be closed out unless the looping it provokes is considered a udev bug.

Thanks for looking into this, pushing me towards udev helped.

Note You need to log in before you can comment on or make changes to this bug.