Bug 299821 - Error when enabling EDAC
Summary: Error when enabling EDAC
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: Fedora
Classification: Fedora
Component: kernel
Version: 7
Hardware: All
OS: Linux
low
low
Target Milestone: ---
Assignee: Kernel Maintainer List
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2007-09-21 08:42 UTC by Jeremy Sanders
Modified: 2008-01-03 23:35 UTC (History)
2 users (show)

Fixed In Version: 2.6.23
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2008-01-03 23:35:53 UTC
Type: ---
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Linux Kernel 9121 0 None None None 2019-08-08 08:17:44 UTC

Description Jeremy Sanders 2007-09-21 08:42:05 UTC
Description of problem:

I tried to enable edac on Fedora 7 on x86-64:

# /sbin/modprobe edac_mc
# echo 1 > /sys/devices/system/edac/pci/check_pci_parity

This gives lots of errors in the logs:

Sep 21 09:34:45 xpc17 kernel: BUG: sleeping function called from invalid context
at kernel/rwsem.c:20
Sep 21 09:34:45 xpc17 kernel: in_atomic():0, irqs_disabled():1
Sep 21 09:34:45 xpc17 kernel:
Sep 21 09:34:45 xpc17 kernel: Call Trace:
Sep 21 09:34:45 xpc17 kernel:  [<ffffffff81046c99>] down_read+0x15/0x24
Sep 21 09:34:45 xpc17 kernel:  [<ffffffff811223b8>] pci_get_subsys+0x81/0x113
Sep 21 09:34:45 xpc17 kernel:  [<ffffffff81247e17>] schedule_timeout+0x85/0xad
Sep 21 09:34:45 xpc17 kernel:  [<ffffffff88bfbb9e>]
:edac_mc:edac_kernel_thread+0x9e/0x104
Sep 21 09:34:45 xpc17 kernel:  [<ffffffff88bfbb00>]
:edac_mc:edac_kernel_thread+0x0/0x104
Sep 21 09:34:45 xpc17 kernel:  [<ffffffff81044320>] kthread+0x47/0x73
Sep 21 09:34:45 xpc17 kernel:  [<ffffffff8100a978>] child_rip+0xa/0x12
Sep 21 09:34:45 xpc17 kernel:  [<ffffffff810442d9>] kthread+0x0/0x73
Sep 21 09:34:45 xpc17 kernel:  [<ffffffff8100a96e>] child_rip+0x0/0x12
[repeated]

Version-Release number of selected component (if applicable):

kernel-2.6.22.4-65.fc7


xpc17:/sys/devices/system/edac/pci:# cat /proc/cpuinfo
processor       : 0
vendor_id       : AuthenticAMD
cpu family      : 15
model           : 43
model name      : AMD Athlon(tm) 64 X2 Dual Core Processor 4600+
stepping        : 1
cpu MHz         : 2400.000
cache size      : 512 KB
physical id     : 0
siblings        : 2
core id         : 0
cpu cores       : 2
fpu             : yes
fpu_exception   : yes
cpuid level     : 1
wp              : yes
flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov
pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt lm 3dnowext
3dnow pni lahf_lm cmp_legacy
bogomips        : 4824.37
TLB size        : 1024 4K pages
clflush size    : 64
cache_alignment : 64
address sizes   : 40 bits physical, 48 bits virtual
power management: ts fid vid ttp

processor       : 1
vendor_id       : AuthenticAMD
cpu family      : 15
model           : 43
model name      : AMD Athlon(tm) 64 X2 Dual Core Processor 4600+
stepping        : 1
cpu MHz         : 2400.000
cache size      : 512 KB
physical id     : 0
siblings        : 2
core id         : 1
cpu cores       : 2
fpu             : yes
fpu_exception   : yes
cpuid level     : 1
wp              : yes
flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov
pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt lm 3dnowext
3dnow pni lahf_lm cmp_legacy
bogomips        : 4821.18
TLB size        : 1024 4K pages
clflush size    : 64
cache_alignment : 64
address sizes   : 40 bits physical, 48 bits virtual
power management: ts fid vid ttp

xpc17:/sys/devices/system/edac/pci:# uname -a
Linux xpc17.ast.cam.ac.uk 2.6.22.4-65.fc7 #1 SMP Tue Aug 21 21:50:50 EDT 2007
x86_64 x86_64 x86_64 GNU/Linux

Comment 1 Andrew Gilmore 2007-09-25 18:30:20 UTC
Have also seen very similar errors on FC6.
kernel-2.6.20-1.2962.fc6
kernel-2.6.22.7-57.fc6

Error (from 2.6.22-7-57):

BUG: sleeping function called from invalid context at kernel/rwsem.c:20
in_atomic():0, irqs_disabled():1
 [<c043b4d6>] down_read+0x12/0x28
 [<c04f7892>] pci_get_subsys+0x71/0xf3
 [<c04f792a>] pci_get_device+0x16/0x19
 [<f8d438cf>] edac_kernel_thread+0x94/0xef [edac_mc]
 [<f8d4383b>] edac_kernel_thread+0x0/0xef [edac_mc]
 [<c04383c2>] kthread+0x38/0x5e
 [<c043838a>] kthread+0x0/0x5e
 [<c0405b6b>] kernel_thread_helper+0x7/0x10
 =======================

I can trigger the errors by:
/sbin/modprobe e752x_edac

and the errors stop when:
/sbin/modprobe -r e752x_edac

CPUs are dual Xeon 3.4s, system is a Dell Precision 670.

Have seen occassional syslog messages:
EDAC e752x: Non-Fatal Error PCI Express B

When attempting to explore this, I began seeing this error.
Any chance this is a real memory issue? I will begin testing for that.

Comment 2 Christopher Brown 2007-10-03 16:01:28 UTC
I see this as well in x86.

EDAC MC: Ver: 2.0.1 Sep 27 2007
BUG: sleeping function called from invalid context at kernel/rwsem.c:20
in_atomic():0, irqs_disabled():1
 [<c043b796>] down_read+0x12/0x28
 [<c04f7522>] pci_get_subsys+0x71/0xf3
 [<c042ff4f>] process_timeout+0x0/0x5
 [<c04f75ba>] pci_get_device+0x16/0x19
 [<f8d818cf>] edac_kernel_thread+0x94/0xef [edac_mc]
 [<f8d8183b>] edac_kernel_thread+0x0/0xef [edac_mc]
 [<c0438682>] kthread+0x38/0x5e
 [<c043864a>] kthread+0x0/0x5e
 [<c0405b6b>] kernel_thread_helper+0x7/0x10
 =======================
EDAC PCI: Signaled System Error on 0000:00:19.0
EDAC PCI: Bridge Signaled System Error on 0000:00:19.0
EDAC PCI: Bridge Detected Parity Error on 0000:00:19.0
BUG: sleeping function called from invalid context at kernel/rwsem.c:20
in_atomic():0, irqs_disabled():1
 [<c043b796>] down_read+0x12/0x28
 [<c04f7522>] pci_get_subsys+0x71/0xf3
 [<c042ff4f>] process_timeout+0x0/0x5
 [<c04f75ba>] pci_get_device+0x16/0x19
 [<f8d818cf>] edac_kernel_thread+0x94/0xef [edac_mc]
 [<f8d8183b>] edac_kernel_thread+0x0/0xef [edac_mc]
 [<c0438682>] kthread+0x38/0x5e
 [<c043864a>] kthread+0x0/0x5e
 [<c0405b6b>] kernel_thread_helper+0x7/0x10


Comment 3 Chuck Ebbert 2007-10-03 16:35:57 UTC
Message reporting this to the maintainer and linux-kernel was ignored.

http://lkml.org/lkml/2007/9/21/537


Comment 4 Christopher Brown 2007-10-04 10:42:15 UTC
Filed upstream - CC'd you in Chuck, hope this is okay.

http://bugzilla.kernel.org/show_bug.cgi?id=9121

Comment 5 Christopher Brown 2007-12-18 16:13:39 UTC
Fixed in 2.6.23. A fix for 2.6.22 is available for testing at:

http://bugzilla.kernel.org/show_bug.cgi?id=9121

Comment 6 Christopher Brown 2008-01-03 23:35:53 UTC
Closing CURRENTRELEASE as I no longer see this with the latest kernel.

Cheers
Chris


Note You need to log in before you can comment on or make changes to this bug.