Bug 299821 - Error when enabling EDAC
Error when enabling EDAC
Status: CLOSED CURRENTRELEASE
Product: Fedora
Classification: Fedora
Component: kernel (Show other bugs)
7
All Linux
low Severity low
: ---
: ---
Assigned To: Kernel Maintainer List
Fedora Extras Quality Assurance
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2007-09-21 04:42 EDT by Jeremy Sanders
Modified: 2008-01-03 18:35 EST (History)
2 users (show)

See Also:
Fixed In Version: 2.6.23
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2008-01-03 18:35:53 EST
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)


External Trackers
Tracker ID Priority Status Summary Last Updated
Linux Kernel 9121 None None None Never

  None (edit)
Description Jeremy Sanders 2007-09-21 04:42:05 EDT
Description of problem:

I tried to enable edac on Fedora 7 on x86-64:

# /sbin/modprobe edac_mc
# echo 1 > /sys/devices/system/edac/pci/check_pci_parity

This gives lots of errors in the logs:

Sep 21 09:34:45 xpc17 kernel: BUG: sleeping function called from invalid context
at kernel/rwsem.c:20
Sep 21 09:34:45 xpc17 kernel: in_atomic():0, irqs_disabled():1
Sep 21 09:34:45 xpc17 kernel:
Sep 21 09:34:45 xpc17 kernel: Call Trace:
Sep 21 09:34:45 xpc17 kernel:  [<ffffffff81046c99>] down_read+0x15/0x24
Sep 21 09:34:45 xpc17 kernel:  [<ffffffff811223b8>] pci_get_subsys+0x81/0x113
Sep 21 09:34:45 xpc17 kernel:  [<ffffffff81247e17>] schedule_timeout+0x85/0xad
Sep 21 09:34:45 xpc17 kernel:  [<ffffffff88bfbb9e>]
:edac_mc:edac_kernel_thread+0x9e/0x104
Sep 21 09:34:45 xpc17 kernel:  [<ffffffff88bfbb00>]
:edac_mc:edac_kernel_thread+0x0/0x104
Sep 21 09:34:45 xpc17 kernel:  [<ffffffff81044320>] kthread+0x47/0x73
Sep 21 09:34:45 xpc17 kernel:  [<ffffffff8100a978>] child_rip+0xa/0x12
Sep 21 09:34:45 xpc17 kernel:  [<ffffffff810442d9>] kthread+0x0/0x73
Sep 21 09:34:45 xpc17 kernel:  [<ffffffff8100a96e>] child_rip+0x0/0x12
[repeated]

Version-Release number of selected component (if applicable):

kernel-2.6.22.4-65.fc7


xpc17:/sys/devices/system/edac/pci:# cat /proc/cpuinfo
processor       : 0
vendor_id       : AuthenticAMD
cpu family      : 15
model           : 43
model name      : AMD Athlon(tm) 64 X2 Dual Core Processor 4600+
stepping        : 1
cpu MHz         : 2400.000
cache size      : 512 KB
physical id     : 0
siblings        : 2
core id         : 0
cpu cores       : 2
fpu             : yes
fpu_exception   : yes
cpuid level     : 1
wp              : yes
flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov
pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt lm 3dnowext
3dnow pni lahf_lm cmp_legacy
bogomips        : 4824.37
TLB size        : 1024 4K pages
clflush size    : 64
cache_alignment : 64
address sizes   : 40 bits physical, 48 bits virtual
power management: ts fid vid ttp

processor       : 1
vendor_id       : AuthenticAMD
cpu family      : 15
model           : 43
model name      : AMD Athlon(tm) 64 X2 Dual Core Processor 4600+
stepping        : 1
cpu MHz         : 2400.000
cache size      : 512 KB
physical id     : 0
siblings        : 2
core id         : 1
cpu cores       : 2
fpu             : yes
fpu_exception   : yes
cpuid level     : 1
wp              : yes
flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov
pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt lm 3dnowext
3dnow pni lahf_lm cmp_legacy
bogomips        : 4821.18
TLB size        : 1024 4K pages
clflush size    : 64
cache_alignment : 64
address sizes   : 40 bits physical, 48 bits virtual
power management: ts fid vid ttp

xpc17:/sys/devices/system/edac/pci:# uname -a
Linux xpc17.ast.cam.ac.uk 2.6.22.4-65.fc7 #1 SMP Tue Aug 21 21:50:50 EDT 2007
x86_64 x86_64 x86_64 GNU/Linux
Comment 1 Andrew Gilmore 2007-09-25 14:30:20 EDT
Have also seen very similar errors on FC6.
kernel-2.6.20-1.2962.fc6
kernel-2.6.22.7-57.fc6

Error (from 2.6.22-7-57):

BUG: sleeping function called from invalid context at kernel/rwsem.c:20
in_atomic():0, irqs_disabled():1
 [<c043b4d6>] down_read+0x12/0x28
 [<c04f7892>] pci_get_subsys+0x71/0xf3
 [<c04f792a>] pci_get_device+0x16/0x19
 [<f8d438cf>] edac_kernel_thread+0x94/0xef [edac_mc]
 [<f8d4383b>] edac_kernel_thread+0x0/0xef [edac_mc]
 [<c04383c2>] kthread+0x38/0x5e
 [<c043838a>] kthread+0x0/0x5e
 [<c0405b6b>] kernel_thread_helper+0x7/0x10
 =======================

I can trigger the errors by:
/sbin/modprobe e752x_edac

and the errors stop when:
/sbin/modprobe -r e752x_edac

CPUs are dual Xeon 3.4s, system is a Dell Precision 670.

Have seen occassional syslog messages:
EDAC e752x: Non-Fatal Error PCI Express B

When attempting to explore this, I began seeing this error.
Any chance this is a real memory issue? I will begin testing for that.
Comment 2 Christopher Brown 2007-10-03 12:01:28 EDT
I see this as well in x86.

EDAC MC: Ver: 2.0.1 Sep 27 2007
BUG: sleeping function called from invalid context at kernel/rwsem.c:20
in_atomic():0, irqs_disabled():1
 [<c043b796>] down_read+0x12/0x28
 [<c04f7522>] pci_get_subsys+0x71/0xf3
 [<c042ff4f>] process_timeout+0x0/0x5
 [<c04f75ba>] pci_get_device+0x16/0x19
 [<f8d818cf>] edac_kernel_thread+0x94/0xef [edac_mc]
 [<f8d8183b>] edac_kernel_thread+0x0/0xef [edac_mc]
 [<c0438682>] kthread+0x38/0x5e
 [<c043864a>] kthread+0x0/0x5e
 [<c0405b6b>] kernel_thread_helper+0x7/0x10
 =======================
EDAC PCI: Signaled System Error on 0000:00:19.0
EDAC PCI: Bridge Signaled System Error on 0000:00:19.0
EDAC PCI: Bridge Detected Parity Error on 0000:00:19.0
BUG: sleeping function called from invalid context at kernel/rwsem.c:20
in_atomic():0, irqs_disabled():1
 [<c043b796>] down_read+0x12/0x28
 [<c04f7522>] pci_get_subsys+0x71/0xf3
 [<c042ff4f>] process_timeout+0x0/0x5
 [<c04f75ba>] pci_get_device+0x16/0x19
 [<f8d818cf>] edac_kernel_thread+0x94/0xef [edac_mc]
 [<f8d8183b>] edac_kernel_thread+0x0/0xef [edac_mc]
 [<c0438682>] kthread+0x38/0x5e
 [<c043864a>] kthread+0x0/0x5e
 [<c0405b6b>] kernel_thread_helper+0x7/0x10
Comment 3 Chuck Ebbert 2007-10-03 12:35:57 EDT
Message reporting this to the maintainer and linux-kernel was ignored.

http://lkml.org/lkml/2007/9/21/537
Comment 4 Christopher Brown 2007-10-04 06:42:15 EDT
Filed upstream - CC'd you in Chuck, hope this is okay.

http://bugzilla.kernel.org/show_bug.cgi?id=9121
Comment 5 Christopher Brown 2007-12-18 11:13:39 EST
Fixed in 2.6.23. A fix for 2.6.22 is available for testing at:

http://bugzilla.kernel.org/show_bug.cgi?id=9121
Comment 6 Christopher Brown 2008-01-03 18:35:53 EST
Closing CURRENTRELEASE as I no longer see this with the latest kernel.

Cheers
Chris

Note You need to log in before you can comment on or make changes to this bug.