From Bugzilla Helper: User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.0; en-US; rv:1.0rc1) Gecko/20020417 Description of problem: when many users access lots of applications via nfs and transfer many gbytes of data the nfs server freezes. everything else works properly. only a reboot can reactivate the nfs server. Version-Release number of selected component (if applicable): How reproducible: Always Steps to Reproduce: 1. users transfer many gbytes of data from/to different UNIX platforms 2. 3. Additional info: here are the relevant log messages: May 8 11:21:39 dax kernel: nfsd Security: /// bad export. May 8 11:31:44 dax kernel: nfsd Security: /// bad export. May 8 11:34:18 dax kernel: nfsd Security: /// bad export. May 8 11:40:42 dax kernel: nfsd Security: /// bad export. May 8 11:42:02 dax last message repeated 8 times May 8 14:57:05 dax kernel: nfsd: request from insecure port (c1606c21:50756)! May 8 14:57:05 dax kernel: Unable to handle kernel paging request at virtual address 75d8d283 May 8 14:57:05 dax kernel: printing eip: May 8 14:57:05 dax kernel: f89bacf6 May 8 14:57:05 dax kernel: *pde = 00000000 May 8 14:57:05 dax kernel: Oops: 0000 May 8 14:57:05 dax kernel: CPU: 1 May 8 14:57:05 dax kernel: EIP: 0010:[<f89bacf6>] Not tainted May 8 14:57:05 dax kernel: EFLAGS: 00010206 May 8 14:57:05 dax kernel: eax: 00000088 ebx: f6f24a6c ecx: 00000022 edx: f6f24a64 May 8 14:57:05 dax kernel: esi: 75d8d283 edi: f7370058 ebp: f7370054 esp: f73a5da4 May 8 14:57:05 dax kernel: ds: 0018 es: 0018 ss: 0018 May 8 14:57:05 dax kernel: Process lockd (pid: 869, stackpage=f73a5000) May 8 14:57:05 dax kernel: Stack: f6f24a08 f6f24a08 f89c914b f7370058 f6f24a64 f737001c f6a5f920 f6f24a6c May 8 14:57:05 dax kernel: f6a5f920 f89c9af0 f6a5f9a0 f89b72f9 f7e6605c f701aa80 f89c9af0 f6a5f9a0 May 8 14:57:05 dax kernel: f89c9b04 f7370040 f6f24a08 f7e6605c f89b1910 f7e6605c f7370040 f6f24a08 May 8 14:57:05 dax kernel: Call Trace: [<f89c914b>] [<f89c9af0>] [<f89b72f9>] [<f89c9af0>] [<f89c9b04>] May 8 14:57:05 dax kernel: [<f89b1910>] [<f89b55f9>] [<f89b1596>] [<f89b1527>] [<f89c3c3a>] [<f89ca500>] May 8 14:57:05 dax kernel: [<f89ca4d3>] [<f89ca500>] [<f89ca014>] [skb_checksum+84/800] [<f89c8e77>] [<f89c8e8c>] May 8 14:57:05 dax kernel: [<f89ca4d3>] [<f89ca500>] [<f89ca014>] [<c023a684>] [<f89c8e77>] [<f89c8e8c>] May 8 14:57:05 dax kernel: [<f89c935f>] [<f89ceadc>] [<f89b8116>] [<f89ce208>] [<f89ce22c>] [<f89c4e0d>] May 8 14:57:05 dax kernel: [kernel_thread+38/48] [<f89c4c40>] May 8 14:57:05 dax kernel: [<c0105876>] [<f89c4c40>] May 8 14:57:05 dax kernel: May 8 14:57:05 dax kernel: Code: f3 a5 a8 02 74 02 66 a5 a8 01 74 01 a4 8b 02 8b 54 24 0c 5e
correction: I always thought that high load causes the nfs server to die. But that seems not to be true. The real reason is a simple mount request by a client. In my case a HP-UX 11.0 workstation tried to mount an nfs exported directory. The output is: nfs mount: get_fh: dax/raid/software: server not responding : RPC: Timed out nfs mount: retry: backgrounding: /hosts/dax/software nfs mount: get_fh: dax:: RPC: Timed out nfs mount: retry: retrying(1) for: /hosts/dax/software after 5 seconds nfs mount: get_fh: dax:: RPC: Timed out nfs mount: retry: backgrounding: /hosts/dax/projects nfs mount: nfs mountget_fh: dax:: RPC: Program not registered : get_fh: dax:: RPC: Program not registered nfs mount: retry: retrying(1) for: /hosts/dax/projects after 5 seconds nfs mount: retry: backgrounding: /hosts/dax/home nfs mount: get_fh: dax:: RPC: Program not registeremount: cd nfs mount: retry: retrying(1) for: /hosts/dax/home after 5 seconds The nfs server stopped as well when a SuSE Linux 8.0 box tried to mount nfs shares. This behavior is reproducible. The additional log message from the nfs server is: May 10 12:11:14 dax rpc.mountd: authenticated mount request from neuwerk:725 for /raid/home (/raid/home)
Please make sure the errata is applied.