From Bugzilla Helper: User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:0.9.9) Gecko/20020408 Description of problem: This is a dual Athlon, 1 gig registered ECC DDR RAM, will try 2.4.18-4 but it doesn't look ext3-related (the only big local filesystem is reiserfs over s/w raid0). I do suspect the hardware on this machine. If someone could tell me "that looks like a bad x", I'd be very grateful. More details on request :-/ Unable to handle kernel paging request at virtual address 0200f82b printing eip: c0137dc0 *pde = 00000000 Oops: 0000 nls_iso8859-1 nls_cp437 vfat fat soundcore nfs tuner tvaudio bttv videodev i2c CPU: 0 EIP: 0010:[<c0137dc0>] Not tainted EFLAGS: 00010206 EIP is at page_remove_rmap [kernel] 0x50 (2.4.18-3) eax: 0200f827 ebx: c1df9c38 ecx: c1000030 edx: c3a19168 esi: c3a19168 edi: c33bc618 ebp: 3fe37025 esp: c6b87eb0 ds: 0018 es: 0018 ss: 0018 Process crond (pid: 7463, stackpage=c6b87000) Stack: 00100000 c3a19168 0005a000 c0126ab1 00000020 00000000 42100000 c6b85420 42000000 00000000 42100000 c6b85420 c011c6e6 00000000 c6b86000 00000000 00000000 00000000 c6b86000 c6b860b4 00100000 0012c000 42000000 00000001 Call Trace: [<c0126ab1>] do_zap_page_range [kernel] 0x181 [<c011c6e6>] sys_wait4 [kernel] 0x396 [<c0127010>] zap_page_range [kernel] 0x50 [<c01297da>] exit_mmap [kernel] 0xca [<c0117e36>] mmput [kernel] 0x26 [<c011c183>] do_exit [kernel] 0xb3 [<c011c6e6>] sys_wait4 [kernel] 0x396 [<c0108913>] system_call [kernel] 0x33 Code: 39 70 04 75 0d 53 57 50 e8 a3 02 00 00 83 c4 0c eb 08 89 c7 $ mount /dev/sda3 on / type ext3 (rw) none on /proc type proc (rw) usbdevfs on /proc/bus/usb type usbdevfs (rw) /dev/sda2 on /boot type ext3 (rw) none on /dev/pts type devpts (rw,gid=5,mode=620) /dev/md0 on /export type reiserfs (rw,noatime,notail) none on /dev/shm type tmpfs (rw) none on /tmp type tmpfs (rw) [ + plus some autofs / nfs stuff ] Version-Release number of selected component (if applicable): How reproducible: Didn't try Steps to Reproduce: possibly hardware problem? Additional info:
If you don't trust your memory, the memtest86 program (search on www.freshmeat.net for it if needed) is a pretty good tester of ram chips.
It had passed a few passes of memtest86 before being put into production. I'll re-run it (we have had RAM go dodgy in the past), but it is ECC RAM, so (to quote Alan Cox): memtest86 will give fairly honest answers on ECC RAM. It'll see errors that ECC didnt correct or were caused by chipset/cache/wiring capacitance etc. Those are the same errors the kernel will see
Well you could also try to see if the "ecc" kernel module (included in all RH's recent kernels) detects ECC faults... it's supposed to report ECC soft-failures to syslog
Ahhh! memtest86 3.0 has ECC "stuff" in it. Bingo. My bad.