From Bugzilla Helper: User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.7) Gecko/20040514 Description of problem: Apache serves files off of an NFS share from a FC2 system. The nfs share is mounted using automount with the following options: rw,v3,rsize=32768,wsize=32768,hard,intr,tcp,lock Apache hung apparently while trying to serve up a directory listing mounted on the nfs share. Prior the kernel Oops in invalidate_inode_pages, a number of "memory.c:101: bad pmd" messages were printed: memory.c:101: bad pmd 00001000. memory.c:101: bad pmd 00007100. memory.c:101: bad pmd 00000100. memory.c:101: bad pmd 00007100. memory.c:101: bad pmd 00005100. memory.c:101: bad pmd 00006100. memory.c:101: bad pmd 00003100. memory.c:101: bad pmd 00001000. memory.c:101: bad pmd 00001000. memory.c:101: bad pmd 00001000. memory.c:101: bad pmd 00001100. memory.c:101: bad pmd 00001100. memory.c:101: bad pmd 00001100. memory.c:101: bad pmd 00001100. memory.c:101: bad pmd 00001000. memory.c:101: bad pmd 00002000. memory.c:101: bad pmd 00002000. memory.c:101: bad pmd 00002100. memory.c:101: bad pmd 00001100. memory.c:101: bad pmd 00007100. memory.c:101: bad pmd 00007100. memory.c:101: bad pmd 00007100. memory.c:101: bad pmd 00007100. memory.c:101: bad pmd 00007100. memory.c:101: bad pmd 00007100. memory.c:101: bad pmd 00007100. memory.c:101: bad pmd 00002100. memory.c:101: bad pmd 00003100. memory.c:101: bad pmd 00005000. memory.c:101: bad pmd 00003100. memory.c:101: bad pmd 00003100. memory.c:101: bad pmd 00003100. memory.c:101: bad pmd 00003100. memory.c:101: bad pmd 00003100. memory.c:101: bad pmd 00003000. memory.c:101: bad pmd 00003000. memory.c:101: bad pmd 00003000. memory.c:101: bad pmd 00005000. memory.c:101: bad pmd 00003000. memory.c:101: bad pmd 00003000. memory.c:101: bad pmd 00003000. memory.c:101: bad pmd 00003000. memory.c:101: bad pmd 00003100. memory.c:101: bad pmd 00003100. memory.c:101: bad pmd 00005100. memory.c:101: bad pmd 00002000. memory.c:101: bad pmd 00002000. memory.c:101: bad pmd 00004100. memory.c:101: bad pmd 00004000. memory.c:101: bad pmd 00006000. memory.c:101: bad pmd 00006100. memory.c:101: bad pmd 00002000. memory.c:101: bad pmd 00003000. memory.c:101: bad pmd 00002100. memory.c:101: bad pmd 00006000. memory.c:101: bad pmd 00006100. memory.c:101: bad pmd 00004000. memory.c:101: bad pmd 00001100. memory.c:101: bad pmd 00005000. memory.c:101: bad pmd 00007100. memory.c:101: bad pmd 00001100. memory.c:101: bad pmd 00004000. memory.c:101: bad pmd 00004000. memory.c:101: bad pmd 00002100. memory.c:101: bad pmd 00001000. memory.c:101: bad pmd 00004100. memory.c:101: bad pmd 00002000. memory.c:101: bad pmd 00007100. memory.c:101: bad pmd 00007100. memory.c:101: bad pmd 00006100. memory.c:101: bad pmd 00001100. memory.c:101: bad pmd 00004000. memory.c:101: bad pmd 00006100. memory.c:101: bad pmd 00006100. memory.c:101: bad pmd 00004000. memory.c:101: bad pmd 00004100. memory.c:101: bad pmd 00006100. memory.c:101: bad pmd 00004100. The Oops: Unable to handle kernel paging request at virtual address 0100000b printing eip: c0132068 *pde = 00000000 Oops: 0000 nfs nfsd lockd sunrpc autofs 3c59x ext3 jbd aic7xxx sd_mod scsi_mod CPU: 0 EIP: 0060:[<c0132068>] Not tainted EFLAGS: 00010203 EIP is at invalidate_inode_pages [kernel] 0x18 (2.4.22-1.2188.nptl) eax: 00000001 ebx: 0100000b ecx: 00008000 edx: 00000000 esi: 0100000b edi: d7af8e74 ebp: d7af8dc0 esp: d74e9d80 ds: 0068 es: 0068 ss: 0068 Process httpd (pid: 18374, stackpage=d74e9000) Stack: c7bff400 00000000 00000000 d7af8f64 e0983c29 d7af8dc0 c167d208 001f73f5 c01595fd c7bff400 001f73f5 c167d208 00000000 00000000 0002f121 00000000 00000001 4018ccb7 0002f121 00000000 00000000 38b5b9f0 0002f121 00000000 Call Trace: [<e0983c29>] __nfs_refresh_inode [nfs] 0x389 (0xd74e9d90) [<c01595fd>] get_new_inode [kernel] 0x4d (0xd74e9da0) [<e09830a1>] __nfs_fhget [nfs] 0x121 (0xd74e9df0) [<e09800b5>] nfs_lookup [nfs] 0x135 (0xd74e9e20) [<c0120006>] do_exit [kernel] 0xd6 (0xd74e9e34) [<c0150000>] sys_symlink [kernel] 0x80 (0xd74e9ee4) [<c014db17>] real_lookup [kernel] 0xc7 (0xd74e9f04) [<c014e1ca>] link_path_walk [kernel] 0x55a (0xd74e9f20) [<c014e7f7>] path_lookup [kernel] 0x37 (0xd74e9f60) [<c014ea89>] __user_walk [kernel] 0x49 (0xd74e9f70) [<c014aa0f>] sys_lstat64 [kernel] 0x1f (0xd74e9f8c) [<c01095f7>] system_call [kernel] 0x33 (0xd74e9fc0) Code: 8b 36 8b 43 18 c1 e8 04 83 e0 01 75 1e 0f ab 43 18 19 c0 85 Version-Release number of selected component (if applicable): kernel-2.4.22-1.2188.nptl How reproducible: Couldn't Reproduce Additional info:
Want to add that the machine hung not too long after the Oops was generated while trying to gather more information. SysRq was responding enough to attempt to sync the drives, but that never completed and used SysRq to reboot the machine.
that looks like massive memory corruption. can you give it a test with memtest86 for a day just to rule out bad ram ?
I was afraid you'd confirm that, I had a hunch that it might have been caused by faulty hardware, but the machine has been solid for a couple years now. I'll give memtest86 a run as soon as I can.
You were right, memtest86 #6 is generating lots of errors. Closing the bug, and thanks for the suggestion to test the hardware.