Description of problem: I tried to enable edac on Fedora 7 on x86-64: # /sbin/modprobe edac_mc # echo 1 > /sys/devices/system/edac/pci/check_pci_parity This gives lots of errors in the logs: Sep 21 09:34:45 xpc17 kernel: BUG: sleeping function called from invalid context at kernel/rwsem.c:20 Sep 21 09:34:45 xpc17 kernel: in_atomic():0, irqs_disabled():1 Sep 21 09:34:45 xpc17 kernel: Sep 21 09:34:45 xpc17 kernel: Call Trace: Sep 21 09:34:45 xpc17 kernel: [<ffffffff81046c99>] down_read+0x15/0x24 Sep 21 09:34:45 xpc17 kernel: [<ffffffff811223b8>] pci_get_subsys+0x81/0x113 Sep 21 09:34:45 xpc17 kernel: [<ffffffff81247e17>] schedule_timeout+0x85/0xad Sep 21 09:34:45 xpc17 kernel: [<ffffffff88bfbb9e>] :edac_mc:edac_kernel_thread+0x9e/0x104 Sep 21 09:34:45 xpc17 kernel: [<ffffffff88bfbb00>] :edac_mc:edac_kernel_thread+0x0/0x104 Sep 21 09:34:45 xpc17 kernel: [<ffffffff81044320>] kthread+0x47/0x73 Sep 21 09:34:45 xpc17 kernel: [<ffffffff8100a978>] child_rip+0xa/0x12 Sep 21 09:34:45 xpc17 kernel: [<ffffffff810442d9>] kthread+0x0/0x73 Sep 21 09:34:45 xpc17 kernel: [<ffffffff8100a96e>] child_rip+0x0/0x12 [repeated] Version-Release number of selected component (if applicable): kernel-2.6.22.4-65.fc7 xpc17:/sys/devices/system/edac/pci:# cat /proc/cpuinfo processor : 0 vendor_id : AuthenticAMD cpu family : 15 model : 43 model name : AMD Athlon(tm) 64 X2 Dual Core Processor 4600+ stepping : 1 cpu MHz : 2400.000 cache size : 512 KB physical id : 0 siblings : 2 core id : 0 cpu cores : 2 fpu : yes fpu_exception : yes cpuid level : 1 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt lm 3dnowext 3dnow pni lahf_lm cmp_legacy bogomips : 4824.37 TLB size : 1024 4K pages clflush size : 64 cache_alignment : 64 address sizes : 40 bits physical, 48 bits virtual power management: ts fid vid ttp processor : 1 vendor_id : AuthenticAMD cpu family : 15 model : 43 model name : AMD Athlon(tm) 64 X2 Dual Core Processor 4600+ stepping : 1 cpu MHz : 2400.000 cache size : 512 KB physical id : 0 siblings : 2 core id : 1 cpu cores : 2 fpu : yes fpu_exception : yes cpuid level : 1 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt lm 3dnowext 3dnow pni lahf_lm cmp_legacy bogomips : 4821.18 TLB size : 1024 4K pages clflush size : 64 cache_alignment : 64 address sizes : 40 bits physical, 48 bits virtual power management: ts fid vid ttp xpc17:/sys/devices/system/edac/pci:# uname -a Linux xpc17.ast.cam.ac.uk 2.6.22.4-65.fc7 #1 SMP Tue Aug 21 21:50:50 EDT 2007 x86_64 x86_64 x86_64 GNU/Linux
Have also seen very similar errors on FC6. kernel-2.6.20-1.2962.fc6 kernel-2.6.22.7-57.fc6 Error (from 2.6.22-7-57): BUG: sleeping function called from invalid context at kernel/rwsem.c:20 in_atomic():0, irqs_disabled():1 [<c043b4d6>] down_read+0x12/0x28 [<c04f7892>] pci_get_subsys+0x71/0xf3 [<c04f792a>] pci_get_device+0x16/0x19 [<f8d438cf>] edac_kernel_thread+0x94/0xef [edac_mc] [<f8d4383b>] edac_kernel_thread+0x0/0xef [edac_mc] [<c04383c2>] kthread+0x38/0x5e [<c043838a>] kthread+0x0/0x5e [<c0405b6b>] kernel_thread_helper+0x7/0x10 ======================= I can trigger the errors by: /sbin/modprobe e752x_edac and the errors stop when: /sbin/modprobe -r e752x_edac CPUs are dual Xeon 3.4s, system is a Dell Precision 670. Have seen occassional syslog messages: EDAC e752x: Non-Fatal Error PCI Express B When attempting to explore this, I began seeing this error. Any chance this is a real memory issue? I will begin testing for that.
I see this as well in x86. EDAC MC: Ver: 2.0.1 Sep 27 2007 BUG: sleeping function called from invalid context at kernel/rwsem.c:20 in_atomic():0, irqs_disabled():1 [<c043b796>] down_read+0x12/0x28 [<c04f7522>] pci_get_subsys+0x71/0xf3 [<c042ff4f>] process_timeout+0x0/0x5 [<c04f75ba>] pci_get_device+0x16/0x19 [<f8d818cf>] edac_kernel_thread+0x94/0xef [edac_mc] [<f8d8183b>] edac_kernel_thread+0x0/0xef [edac_mc] [<c0438682>] kthread+0x38/0x5e [<c043864a>] kthread+0x0/0x5e [<c0405b6b>] kernel_thread_helper+0x7/0x10 ======================= EDAC PCI: Signaled System Error on 0000:00:19.0 EDAC PCI: Bridge Signaled System Error on 0000:00:19.0 EDAC PCI: Bridge Detected Parity Error on 0000:00:19.0 BUG: sleeping function called from invalid context at kernel/rwsem.c:20 in_atomic():0, irqs_disabled():1 [<c043b796>] down_read+0x12/0x28 [<c04f7522>] pci_get_subsys+0x71/0xf3 [<c042ff4f>] process_timeout+0x0/0x5 [<c04f75ba>] pci_get_device+0x16/0x19 [<f8d818cf>] edac_kernel_thread+0x94/0xef [edac_mc] [<f8d8183b>] edac_kernel_thread+0x0/0xef [edac_mc] [<c0438682>] kthread+0x38/0x5e [<c043864a>] kthread+0x0/0x5e [<c0405b6b>] kernel_thread_helper+0x7/0x10
Message reporting this to the maintainer and linux-kernel was ignored. http://lkml.org/lkml/2007/9/21/537
Filed upstream - CC'd you in Chuck, hope this is okay. http://bugzilla.kernel.org/show_bug.cgi?id=9121
Fixed in 2.6.23. A fix for 2.6.22 is available for testing at: http://bugzilla.kernel.org/show_bug.cgi?id=9121
Closing CURRENTRELEASE as I no longer see this with the latest kernel. Cheers Chris