Bug 125574

Summary: Oops in invalidate_inode_pages - nfs related
Product: [Fedora] Fedora Reporter: David Rees <drees76>
Component: kernelAssignee: Arjan van de Ven <arjanv>
Status: CLOSED NOTABUG QA Contact: Brian Brock <bbrock>
Severity: high Docs Contact:
Priority: medium    
Version: 1   
Target Milestone: ---   
Target Release: ---   
Hardware: athlon   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2004-06-09 00:34:09 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description David Rees 2004-06-08 21:50:18 UTC
From Bugzilla Helper:
User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.7)
Gecko/20040514

Description of problem:
Apache serves files off of an NFS share from a FC2 system.  The nfs
share is mounted using automount with the following options:
rw,v3,rsize=32768,wsize=32768,hard,intr,tcp,lock

Apache hung apparently while trying to serve up a directory listing
mounted on the nfs share.

Prior the kernel Oops in invalidate_inode_pages, a number of
"memory.c:101: bad pmd" messages were printed:

memory.c:101: bad pmd 00001000.
memory.c:101: bad pmd 00007100.
memory.c:101: bad pmd 00000100.
memory.c:101: bad pmd 00007100.
memory.c:101: bad pmd 00005100.
memory.c:101: bad pmd 00006100.
memory.c:101: bad pmd 00003100.
memory.c:101: bad pmd 00001000.
memory.c:101: bad pmd 00001000.
memory.c:101: bad pmd 00001000.
memory.c:101: bad pmd 00001100.
memory.c:101: bad pmd 00001100.
memory.c:101: bad pmd 00001100.
memory.c:101: bad pmd 00001100.
memory.c:101: bad pmd 00001000.
memory.c:101: bad pmd 00002000.
memory.c:101: bad pmd 00002000.
memory.c:101: bad pmd 00002100.
memory.c:101: bad pmd 00001100.
memory.c:101: bad pmd 00007100.
memory.c:101: bad pmd 00007100.
memory.c:101: bad pmd 00007100.
memory.c:101: bad pmd 00007100.
memory.c:101: bad pmd 00007100.
memory.c:101: bad pmd 00007100.
memory.c:101: bad pmd 00007100.
memory.c:101: bad pmd 00002100.
memory.c:101: bad pmd 00003100.
memory.c:101: bad pmd 00005000.
memory.c:101: bad pmd 00003100.
memory.c:101: bad pmd 00003100.
memory.c:101: bad pmd 00003100.
memory.c:101: bad pmd 00003100.
memory.c:101: bad pmd 00003100.
memory.c:101: bad pmd 00003000.
memory.c:101: bad pmd 00003000.
memory.c:101: bad pmd 00003000.
memory.c:101: bad pmd 00005000.
memory.c:101: bad pmd 00003000.
memory.c:101: bad pmd 00003000.
memory.c:101: bad pmd 00003000.
memory.c:101: bad pmd 00003000.
memory.c:101: bad pmd 00003100.
memory.c:101: bad pmd 00003100.
memory.c:101: bad pmd 00005100.
memory.c:101: bad pmd 00002000.
memory.c:101: bad pmd 00002000.
memory.c:101: bad pmd 00004100.
memory.c:101: bad pmd 00004000.
memory.c:101: bad pmd 00006000.
memory.c:101: bad pmd 00006100.
memory.c:101: bad pmd 00002000.
memory.c:101: bad pmd 00003000.
memory.c:101: bad pmd 00002100.
memory.c:101: bad pmd 00006000.
memory.c:101: bad pmd 00006100.
memory.c:101: bad pmd 00004000.
memory.c:101: bad pmd 00001100.
memory.c:101: bad pmd 00005000.
memory.c:101: bad pmd 00007100.
memory.c:101: bad pmd 00001100.
memory.c:101: bad pmd 00004000.
memory.c:101: bad pmd 00004000.
memory.c:101: bad pmd 00002100.
memory.c:101: bad pmd 00001000.
memory.c:101: bad pmd 00004100.
memory.c:101: bad pmd 00002000.
memory.c:101: bad pmd 00007100.
memory.c:101: bad pmd 00007100.
memory.c:101: bad pmd 00006100.
memory.c:101: bad pmd 00001100.
memory.c:101: bad pmd 00004000.
memory.c:101: bad pmd 00006100.
memory.c:101: bad pmd 00006100.
memory.c:101: bad pmd 00004000.
memory.c:101: bad pmd 00004100.
memory.c:101: bad pmd 00006100.
memory.c:101: bad pmd 00004100.

The Oops:
Unable to handle kernel paging request at virtual address 0100000b
 printing eip:
c0132068
*pde = 00000000
Oops: 0000
nfs nfsd lockd sunrpc autofs 3c59x ext3 jbd aic7xxx sd_mod scsi_mod  
CPU:    0
EIP:    0060:[<c0132068>]    Not tainted
EFLAGS: 00010203

EIP is at invalidate_inode_pages [kernel] 0x18 (2.4.22-1.2188.nptl)
eax: 00000001   ebx: 0100000b   ecx: 00008000   edx: 00000000
esi: 0100000b   edi: d7af8e74   ebp: d7af8dc0   esp: d74e9d80
ds: 0068   es: 0068   ss: 0068
Process httpd (pid: 18374, stackpage=d74e9000)
Stack: c7bff400 00000000 00000000 d7af8f64 e0983c29 d7af8dc0 c167d208
001f73f5 
       c01595fd c7bff400 001f73f5 c167d208 00000000 00000000 0002f121
00000000 
       00000001 4018ccb7 0002f121 00000000 00000000 38b5b9f0 0002f121
00000000 
Call Trace:   [<e0983c29>] __nfs_refresh_inode [nfs] 0x389 (0xd74e9d90)
[<c01595fd>] get_new_inode [kernel] 0x4d (0xd74e9da0)
[<e09830a1>] __nfs_fhget [nfs] 0x121 (0xd74e9df0)
[<e09800b5>] nfs_lookup [nfs] 0x135 (0xd74e9e20)
[<c0120006>] do_exit [kernel] 0xd6 (0xd74e9e34)
[<c0150000>] sys_symlink [kernel] 0x80 (0xd74e9ee4)
[<c014db17>] real_lookup [kernel] 0xc7 (0xd74e9f04)
[<c014e1ca>] link_path_walk [kernel] 0x55a (0xd74e9f20)
[<c014e7f7>] path_lookup [kernel] 0x37 (0xd74e9f60)
[<c014ea89>] __user_walk [kernel] 0x49 (0xd74e9f70)
[<c014aa0f>] sys_lstat64 [kernel] 0x1f (0xd74e9f8c)
[<c01095f7>] system_call [kernel] 0x33 (0xd74e9fc0)


Code: 8b 36 8b 43 18 c1 e8 04 83 e0 01 75 1e 0f ab 43 18 19 c0 85 
 


Version-Release number of selected component (if applicable):
kernel-2.4.22-1.2188.nptl

How reproducible:
Couldn't Reproduce


Additional info:

Comment 1 David Rees 2004-06-08 21:52:28 UTC
Want to add that the machine hung not too long after the Oops was
generated while trying to gather more information.  SysRq was
responding enough to attempt to sync the drives, but that never
completed and used SysRq to reboot the machine.

Comment 2 Dave Jones 2004-06-08 23:15:13 UTC
that looks like massive memory corruption.
can you give it a test with memtest86 for a day just to rule
out bad ram ?

Comment 3 David Rees 2004-06-08 23:28:03 UTC
I was afraid you'd confirm that, I had a hunch that it might have been
caused by faulty hardware, but the machine has been solid for a couple
years now.  I'll give memtest86 a run as soon as I can.

Comment 4 David Rees 2004-06-09 00:34:09 UTC
You were right, memtest86 #6 is generating lots of errors.  Closing
the bug, and thanks for the suggestion to test the hardware.