65608 – VM oops in 2.4.18-3

Bug 65608 - VM oops in 2.4.18-3

Summary: VM oops in 2.4.18-3

Keywords:
Status:	CLOSED NOTABUG
Alias:	None
Product:	Red Hat Linux
Classification:	Retired
Component:	kernel
Sub Component:
Version:	7.3
Hardware:	athlon
OS:	Linux
Priority:	medium
Severity:	medium
Target Milestone:	---
Assignee:	Arjan van de Ven
QA Contact:	Brian Brock
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2002-05-28 14:26 UTC by Need Real Name
Modified:	2007-04-18 16:42 UTC (History)
CC List:	0 users
Fixed In Version:
Clone Of:
Environment:
Last Closed:	2002-05-28 16:45:39 UTC
Embargoed:

Attachments	(Terms of Use)

Description Need Real Name 2002-05-28 14:26:26 UTC

From Bugzilla Helper:
User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:0.9.9) Gecko/20020408

Description of problem:
This is a dual Athlon, 1 gig registered ECC DDR RAM, will try 2.4.18-4 but
it doesn't look ext3-related (the only big local filesystem is reiserfs
over s/w raid0).

I do suspect the hardware on this machine. If someone could tell me "that
looks like a bad x", I'd be very grateful. More details on request :-/

Unable to handle kernel paging request at virtual address 0200f82b
 printing eip:
c0137dc0
*pde = 00000000
Oops: 0000
nls_iso8859-1 nls_cp437 vfat fat soundcore nfs tuner tvaudio bttv videodev i2c
CPU:    0
EIP:    0010:[<c0137dc0>]    Not tainted
EFLAGS: 00010206

EIP is at page_remove_rmap [kernel] 0x50 (2.4.18-3)
eax: 0200f827   ebx: c1df9c38   ecx: c1000030   edx: c3a19168
esi: c3a19168   edi: c33bc618   ebp: 3fe37025   esp: c6b87eb0
ds: 0018   es: 0018   ss: 0018
Process crond (pid: 7463, stackpage=c6b87000)
Stack: 00100000 c3a19168 0005a000 c0126ab1 00000020 00000000 42100000 c6b85420
       42000000 00000000 42100000 c6b85420 c011c6e6 00000000 c6b86000 00000000
       00000000 00000000 c6b86000 c6b860b4 00100000 0012c000 42000000 00000001
Call Trace: [<c0126ab1>] do_zap_page_range [kernel] 0x181
[<c011c6e6>] sys_wait4 [kernel] 0x396
[<c0127010>] zap_page_range [kernel] 0x50
[<c01297da>] exit_mmap [kernel] 0xca
[<c0117e36>] mmput [kernel] 0x26
[<c011c183>] do_exit [kernel] 0xb3
[<c011c6e6>] sys_wait4 [kernel] 0x396
[<c0108913>] system_call [kernel] 0x33


Code: 39 70 04 75 0d 53 57 50 e8 a3 02 00 00 83 c4 0c eb 08 89 c7

$ mount
/dev/sda3 on / type ext3 (rw)
none on /proc type proc (rw)
usbdevfs on /proc/bus/usb type usbdevfs (rw)
/dev/sda2 on /boot type ext3 (rw)
none on /dev/pts type devpts (rw,gid=5,mode=620)
/dev/md0 on /export type reiserfs (rw,noatime,notail)
none on /dev/shm type tmpfs (rw)
none on /tmp type tmpfs (rw)

[ + plus some autofs / nfs stuff ]


Version-Release number of selected component (if applicable):


How reproducible:
Didn't try

Steps to Reproduce:
possibly hardware problem?
	

Additional info:

Comment 1 Arjan van de Ven 2002-05-28 14:39:57 UTC

If you don't trust your memory, the memtest86 program (search on
www.freshmeat.net for it if needed) is a pretty good tester of ram chips.

Comment 2 Need Real Name 2002-05-28 16:25:46 UTC

It had passed a few passes of memtest86 before being put into production.
I'll re-run it (we have had RAM go dodgy in the past), but it is ECC RAM,
so (to quote Alan Cox):
  memtest86 will give fairly honest answers on ECC RAM. It'll see errors
  that ECC didnt correct or were caused by chipset/cache/wiring
  capacitance etc. Those are the same errors the kernel will see

Comment 3 Arjan van de Ven 2002-05-28 16:45:33 UTC

Well you could also try to see if the "ecc" kernel module (included in all RH's
recent kernels) detects ECC faults... it's supposed to report ECC soft-failures
to syslog

Comment 4 Need Real Name 2002-05-28 17:33:46 UTC

Ahhh! memtest86 3.0 has ECC "stuff" in it. Bingo. My bad.

Note You need to log in before you can comment on or make changes to this bug.