Bug 144266

Summary: nfsd Kernel Oops on x86_64 system
Product: [Fedora] Fedora Reporter: JM <igeorgex>
Component: nfs-utilsAssignee: Steve Dickson <steved>
Status: CLOSED RAWHIDE QA Contact: Ben Levenson <benl>
Severity: high Docs Contact:
Priority: medium    
Version: 3   
Target Milestone: ---   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2005-09-06 07:38:02 EDT Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---

Description JM 2005-01-05 08:42:03 EST
From Bugzilla Helper:
User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.7.5)
Gecko/20041111 Firefox/1.0

Description of problem:
On a AMD Opteron system (4 CPUs) with the 2.6.9-1.681_FC3smp Kernel I
got twice the following Kernel Oops:

-----------------
Jan  4 10:00:27 foobar kernel: Unable to handle kernel NULL pointer
dereference at 0000000000000000 RIP:
Jan  4 10:00:27 foobar kernel: [<0000000000000000>]
Jan  4 10:00:27 foobar kernel: PML4 1fb000067 PGD 1fb001067 PMD 0
Jan  4 10:00:27 foobar kernel: Oops: 0010 [1] SMP
Jan  4 10:00:27 foobar kernel: CPU 3
Jan  4 10:00:27 foobar kernel: Modules linked in: iptable_nat
ip_conntrack iptable_mangle iptable_filter ip_t
ables nfsd exportfs autofs4 i2c_dev i2c_core nfs lockd sunrpc xfs
joydev button battery ac ohci_hcd uhci_hcd ehc
i_hcd hw_random tg3 floppy dm_snapshot dm_zero dm_mirror ext3 jbd
dm_mod sata_sil libata mptscsih mptbase sd_mod
scsi_mod
Jan  4 10:00:27 foobar kernel: Pid: 2732, comm: nfsd Not tainted
2.6.9-1.681_FC3smp
Jan  4 10:00:27 foobar kernel: RIP: 0010:[<0000000000000000>]
[<0000000000000000>]
Jan  4 10:00:27 foobar kernel: RSP: 0018:00000101faeb3d90  EFLAGS:
00010206
Jan  4 10:00:27 foobar kernel: RAX: ffffffff80472d00 RBX:
fffffffffffffff4 RCX: 0000010192e5e730
Jan  4 10:00:27 foobar kernel: RDX: 0000000000000000 RSI:
0000010192e5e318 RDI: 0000010197b1fa08
Jan  4 10:00:27 foobar kernel: RBP: 0000010192e5e318 R08:
0000000000000067 R09: 00002d3611a0ea82
Jan  4 10:00:27 foobar kernel: R10: 0000010197b1fa28 R11:
ffffffff801c0938 R12: 00000101faeb3dc8
Jan  4 10:00:27 foobar kernel: R13: 0000000000000000 R14:
0000010197b1fa08 R15: 00000101fe62a088
Jan  4 10:00:27 foobar kernel: FS:  0000002a9589db00(0000)
GS:ffffffff804a3900(0000) knlGS:00000000f7eb16c0
Jan  4 10:00:27 foobar kernel: CS:  0010 DS: 0000 ES: 0000 CR0:
000000008005003b
Jan  4 10:00:27 foobar kernel: CR2: 0000000000000000 CR3:
00000000fbf2a000 CR4: 00000000000006e0
Jan  4 10:00:27 foobar kernel: Process nfsd (pid: 2732, threadinfo
00000101faeb2000, task 000001007fc91800)
Jan  4 10:00:27 foobar kernel: Stack: ffffffff8017f8ab
0000010197b1fac0 0000010192e5e6d8 0000010197b1fa08
Jan  4 10:00:27 foobar kernel:        0000010101d0c008
000000000000000a ffffffff8017f931 0000000a11a0ea82
Jan  4 10:00:27 foobar kernel:        00000101fe62a088 0000010192e5e6d8
Jan  4 10:00:27 foobar kernel: Call
Trace:<ffffffff8017f8ab>{__lookup_hash+227} <ffffffff8017f931>{lookup_one
_len+94}
Jan  4 10:00:27 foobar kernel:       
<ffffffffa023301c>{:nfsd:nfsd_lookup+936} <ffffffffa018ca93>{:sunrpc:sv
cauth_unix_accept+740}
Jan  4 10:00:27 foobar kernel:       
<ffffffffa023b20a>{:nfsd:nfsd3_proc_lookup+198} <ffffffffa02306f9>{:nfs
d:nfsd_dispatch+220}
Jan  4 10:00:27 foobar kernel:       
<ffffffffa01891b8>{:sunrpc:svc_process+1120} <ffffffffa0230245>{:nfsd:n
fsd+0}
Jan  4 10:00:27 foobar kernel:       
<ffffffffa023047d>{:nfsd:nfsd+568} <ffffffff80110cdf>{child_rip+8}
Jan  4 10:00:27 foobar kernel:        <ffffffffa0230245>{:nfsd:nfsd+0}
<ffffffff8015695a>{mempool_free_slab+0
}
Jan  4 10:00:27 foobar kernel:        <ffffffffa0230245>{:nfsd:nfsd+0}
<ffffffff80110cd7>{child_rip+0}
Jan  4 10:00:27 foobar kernel:
Jan  4 10:00:27 foobar kernel:
Jan  4 10:00:27 foobar kernel: Code:  Bad RIP value.
Jan  4 10:00:27 foobar kernel: RIP [<0000000000000000>] RSP
<00000101faeb3d90>
Jan  4 10:00:27 foobar kernel: CR2: 0000000000000000
-----------------

The result was a corrupted XFS filesystem (I could repair it but lost
some files). It's not possible for me to find a way to reproduce the
Oops always but to me it seems it happens when the NFS-Server is under
heavy load.

This is a very nasty bug because the result was corrupted XFS
filesystem, this way the AMD Opteron system is not really usable as a
NFS-Server.

I Googled a little bit and found this
https://www.redhat.com/archives/fedora-de-list/2004-December/msg00171.html
 as result (identical Kernel Oops), it's a FC2 x86_64 SMP-System, so
it looks like this bug is maybe related to x86_64 SMP-Systems?

Version-Release number of selected component (if applicable):
nfs-utils-1.0.6-44 kernel-2.6.9-1.681_FC3smp

How reproducible:
Sometimes

Steps to Reproduce:
I could not (always) reproduce the bug, to me it looks like it happens
when the NFS-Server is under heavy load.
Comment 1 JM 2005-09-06 07:31:12 EDT
It looks like that it works with Kernel 2.6.12-1.1376_FC3smp.