I have three redhat systems one is an NFS server and two is an NFS clients, they are: server: Redhat 6.0 Kernel: 2.2.13 SMP knfsd: 1.4.7-7 client1: Redat 6.0 kernel: 2.2.12-20smp knfsd: 1.4.7-7 knfsd clients: 1.4.7-7 client2: Redhat 6.0 kernel: 2.2.13 knfsd: 1.4.7-7 knfsd clients: 1.4.7-7 The client system is typicually losing the server system. When this happens, it is not for a long time period, maybe serveral seconds but it is causing pauses on the client side as people are trying to access files. These pauses are very frustrating to the users. This is happening abour 30 times a day as well. Here are log messages from the systems: server: Jan 12 14:21:40 khan kernel: fh_verify: sweets/.nfs0003d803000000e9 permission failure, acc=2, error=13 Jan 12 14:59:02 khan kernel: fh_verify: a/admin permission failure, acc=1, error=13 Jan 12 14:59:49 khan kernel: fh_verify: a/admin permission failure, acc=1, error=13 Jan 12 14:59:49 khan kernel: fh_verify: a/admin permission failure, acc=1, error=13 Jan 12 15:01:38 khan kernel: fh_verify: d/daver permission failure, acc=1, error=13 nfsd_d_validate: invalid address feebbaca nfsd_d_validate: invalid address feebbaca get_empty_dquot: pruning 465 client1: Jan 11 14:13:21 spock kernel: nfs: task 23473 can't get a request slot Jan 12 14:01:08 spock kernel: nfs: server khan2 OK Jan 12 14:02:03 spock kernel: nfs: server khan2 not responding, still trying Jan 12 14:02:03 spock kernel: nfs: server khan2 OK Jan 12 14:09:39 spock kernel: nfs: server khan2 not responding, still trying Jan 12 14:09:39 spock kernel: nfs: server khan2 not responding, still trying Jan 12 14:09:50 spock kernel: nfs: task 37169 can't get a request slot Jan 12 14:09:58 spock kernel: nfs: server khan2 OK Jan 12 14:21:42 spock kernel: NFS: can't silly-delete sweets/.nfs0003d803000000e 9, error=-13 client2: Jan 12 15:07:41 locutus kernel: nfs: task 34523 can't get a request slot Jan 12 15:07:42 locutus kernel: nfs: task 34524 can't get a request slot Jan 12 15:14:12 locutus kernel: nfs: server khan2 not responding, still trying Jan 12 15:14:15 locutus kernel: nfs: server khan2 OK Jan 12 15:14:15 locutus kernel: nfs: server khan2 OK Jan 12 15:15:33 locutus kernel: nfs: server khan2 not responding, still trying Jan 12 15:15:34 locutus kernel: nfs: server khan2 not responding, still trying Jan 12 15:15:39 locutus kernel: nfs: server khan2 OK Jan 12 15:15:39 locutus kernel: nfs: server khan2 OK Let me know if there is any more information I can provide to aid the solving of this problem. Thanks, Luke
there is some heavy networking going on there that makes the kernel on the client side to run out of available sockets. I doubt this is related to the NFS server. Adjusted priorities and severity of the problem report.
Do you know where I would look to see about adjusting the number of sockets on the client system?
assigned to johnsonm
We have 3 Dell Precision 420 workstations, 2 with single CPUs (the clients/desktops), one with two CPUs (intended as compute/file/print/web server). Each workstation exports file systems via NFS to the other two. Accessing the NFS mounted file systems on the server from the clients often results in hangups ("NFS task xxx can't get a request slot") of the clients for a few seconds up to several minutes. This must be a problem with the SMP kernel - if I run the single-processor kernel (RH 2.2.16-22 in both cases) on the server, the problem does not exist. There is also no problem in accessing the file systems on the clients from the server. Network traffic is always low, ping gives times around 150 useconds even during an NFS hangup. The load on the server is also very low.
The running out of slots and other problems listed above were mainly fixed with the newer nfsd that showed up in Red Hat Linux 7.2. The configuration of the server can be found in man rpc.nfsd command to get the number of threads to start and should be configured in /etc/rc.d/inet.d/nfs