Bug 64604

Summary: NFS server stops working under high load
Product: [Retired] Red Hat Linux Reporter: Need Real Name <bartschies>
Component: kernelAssignee: Pete Zaitcev <zaitcev>
Status: CLOSED WORKSFORME QA Contact: Ben Levenson <benl>
Severity: high Docs Contact:
Priority: medium    
Version: 7.2   
Target Milestone: ---   
Target Release: ---   
Hardware: i686   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2004-06-30 21:42:55 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Need Real Name 2002-05-08 15:39:26 UTC
From Bugzilla Helper:
User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.0; en-US; rv:1.0rc1)
Gecko/20020417

Description of problem:
when many users access lots of applications via nfs and transfer many gbytes of
data the nfs server freezes. everything else works properly. only a reboot can
reactivate the nfs server.

Version-Release number of selected component (if applicable):


How reproducible:
Always

Steps to Reproduce:
1. users transfer many gbytes of data from/to different UNIX platforms
2.
3.
	

Additional info:

here are the relevant log messages:
May  8 11:21:39 dax kernel: nfsd Security: /// bad export.
May  8 11:31:44 dax kernel: nfsd Security: /// bad export.
May  8 11:34:18 dax kernel: nfsd Security: /// bad export.
May  8 11:40:42 dax kernel: nfsd Security: /// bad export.
May  8 11:42:02 dax last message repeated 8 times

May  8 14:57:05 dax kernel: nfsd: request from insecure port (c1606c21:50756)!
May  8 14:57:05 dax kernel: Unable to handle kernel paging request at virtual
address 75d8d283
May  8 14:57:05 dax kernel:  printing eip:
May  8 14:57:05 dax kernel: f89bacf6
May  8 14:57:05 dax kernel: *pde = 00000000
May  8 14:57:05 dax kernel: Oops: 0000
May  8 14:57:05 dax kernel: CPU:    1
May  8 14:57:05 dax kernel: EIP:    0010:[<f89bacf6>]    Not tainted
May  8 14:57:05 dax kernel: EFLAGS: 00010206
May  8 14:57:05 dax kernel: eax: 00000088   ebx: f6f24a6c   ecx: 00000022   edx:
f6f24a64
May  8 14:57:05 dax kernel: esi: 75d8d283   edi: f7370058   ebp: f7370054   esp:
f73a5da4
May  8 14:57:05 dax kernel: ds: 0018   es: 0018   ss: 0018
May  8 14:57:05 dax kernel: Process lockd (pid: 869, stackpage=f73a5000)
May  8 14:57:05 dax kernel: Stack: f6f24a08 f6f24a08 f89c914b f7370058 f6f24a64
f737001c f6a5f920 f6f24a6c 
May  8 14:57:05 dax kernel:        f6a5f920 f89c9af0 f6a5f9a0 f89b72f9 f7e6605c
f701aa80 f89c9af0 f6a5f9a0 
May  8 14:57:05 dax kernel:        f89c9b04 f7370040 f6f24a08 f7e6605c f89b1910
f7e6605c f7370040 f6f24a08 
May  8 14:57:05 dax kernel: Call Trace: [<f89c914b>] [<f89c9af0>] [<f89b72f9>]
[<f89c9af0>] [<f89c9b04>] 
May  8 14:57:05 dax kernel:    [<f89b1910>] [<f89b55f9>] [<f89b1596>]
[<f89b1527>] [<f89c3c3a>] [<f89ca500>] 
May  8 14:57:05 dax kernel:    [<f89ca4d3>] [<f89ca500>] [<f89ca014>]
[skb_checksum+84/800] [<f89c8e77>] [<f89c8e8c>] 
May  8 14:57:05 dax kernel:    [<f89ca4d3>] [<f89ca500>] [<f89ca014>]
[<c023a684>] [<f89c8e77>] [<f89c8e8c>] 
May  8 14:57:05 dax kernel:    [<f89c935f>] [<f89ceadc>] [<f89b8116>]
[<f89ce208>] [<f89ce22c>] [<f89c4e0d>] 
May  8 14:57:05 dax kernel:    [kernel_thread+38/48] [<f89c4c40>] 
May  8 14:57:05 dax kernel:    [<c0105876>] [<f89c4c40>] 
May  8 14:57:05 dax kernel: 
May  8 14:57:05 dax kernel: Code: f3 a5 a8 02 74 02 66 a5 a8 01 74 01 a4 8b 02
8b 54 24 0c 5e

Comment 1 Need Real Name 2002-05-10 10:45:29 UTC
correction:
I always thought that high load causes the nfs server to die. But that seems not
to be true. The real reason is a simple mount request by a client. In my case a
HP-UX 11.0 workstation tried to mount an nfs exported directory. The output is:

nfs mount: get_fh: dax/raid/software: server not responding : RPC: Timed out
nfs mount: retry: backgrounding: /hosts/dax/software
nfs mount: get_fh: dax:: RPC: Timed out
nfs mount: retry: retrying(1) for: /hosts/dax/software after 5 seconds
nfs mount: get_fh: dax:: RPC: Timed out
nfs mount: retry: backgrounding: /hosts/dax/projects
nfs mount: nfs mountget_fh: dax:: RPC: Program not registered
: get_fh: dax:: RPC: Program not registered
nfs mount: retry: retrying(1) for: /hosts/dax/projects after 5 seconds
nfs mount: retry: backgrounding: /hosts/dax/home
nfs mount: get_fh: dax:: RPC: Program not registeremount: cd
nfs mount: retry: retrying(1) for: /hosts/dax/home after 5 seconds

The nfs server stopped as well when a SuSE Linux 8.0 box tried to mount nfs
shares. This behavior is reproducible.

The additional log message from the nfs server is:
May 10 12:11:14 dax rpc.mountd: authenticated mount request from neuwerk:725 for
/raid/home (/raid/home)

Comment 2 Pete Zaitcev 2002-09-10 22:20:20 UTC
Please make sure the errata is applied.