Bug 8424

Summary: NFS server, not responding..
Product: [Retired] Red Hat Linux Reporter: luke
Component: nfs-utilsAssignee: Michael K. Johnson <johnsonm>
Status: CLOSED CURRENTRELEASE QA Contact:
Severity: medium Docs Contact:
Priority: low    
Version: 7.0CC: frenzel
Target Milestone: ---   
Target Release: ---   
Hardware: i386   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2003-01-25 03:22:35 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description luke 2000-01-12 23:12:20 UTC
I have three redhat systems one is an NFS server and two is an NFS clients,
they are:
server:
Redhat 6.0
Kernel: 2.2.13 SMP
knfsd: 1.4.7-7

client1:
Redat 6.0
kernel: 2.2.12-20smp
knfsd: 1.4.7-7
knfsd clients: 1.4.7-7

client2:
Redhat 6.0
kernel: 2.2.13
knfsd: 1.4.7-7
knfsd clients: 1.4.7-7

The client system is typicually losing the server system.
When this happens, it is not for a long time period, maybe serveral seconds
but it is causing pauses on the client side as people are trying to access
files.  These pauses are very frustrating to the users.  This is happening
abour 30 times a day as well.

Here are log messages from the systems:
server:
Jan 12 14:21:40 khan kernel: fh_verify: sweets/.nfs0003d803000000e9
permission failure, acc=2, error=13
Jan 12 14:59:02 khan kernel: fh_verify: a/admin permission failure, acc=1,
error=13
Jan 12 14:59:49 khan kernel: fh_verify: a/admin permission failure, acc=1,
error=13
Jan 12 14:59:49 khan kernel: fh_verify: a/admin permission failure, acc=1,
error=13
Jan 12 15:01:38 khan kernel: fh_verify: d/daver permission failure, acc=1,
error=13
nfsd_d_validate: invalid address feebbaca
nfsd_d_validate: invalid address feebbaca
get_empty_dquot: pruning 465

client1:
Jan 11 14:13:21 spock kernel: nfs: task 23473 can't get a request slot
Jan 12 14:01:08 spock kernel: nfs: server khan2 OK
Jan 12 14:02:03 spock kernel: nfs: server khan2 not responding, still
trying
Jan 12 14:02:03 spock kernel: nfs: server khan2 OK
Jan 12 14:09:39 spock kernel: nfs: server khan2 not responding, still
trying
Jan 12 14:09:39 spock kernel: nfs: server khan2 not responding, still
trying
Jan 12 14:09:50 spock kernel: nfs: task 37169 can't get a request slot
Jan 12 14:09:58 spock kernel: nfs: server khan2 OK
Jan 12 14:21:42 spock kernel: NFS: can't silly-delete
sweets/.nfs0003d803000000e
9, error=-13

client2:
Jan 12 15:07:41 locutus kernel: nfs: task 34523 can't get a request slot
Jan 12 15:07:42 locutus kernel: nfs: task 34524 can't get a request slot
Jan 12 15:14:12 locutus kernel: nfs: server khan2 not responding, still
trying
Jan 12 15:14:15 locutus kernel: nfs: server khan2 OK
Jan 12 15:14:15 locutus kernel: nfs: server khan2 OK
Jan 12 15:15:33 locutus kernel: nfs: server khan2 not responding, still
trying
Jan 12 15:15:34 locutus kernel: nfs: server khan2 not responding, still
trying
Jan 12 15:15:39 locutus kernel: nfs: server khan2 OK
Jan 12 15:15:39 locutus kernel: nfs: server khan2 OK

Let me know if there is any more information I can provide to aid the
solving of this problem.

Thanks,

Luke

Comment 1 Cristian Gafton 2000-01-13 17:20:59 UTC
there is some heavy networking going on there that makes the kernel on the
client side to run out of available sockets. I doubt this is related to the NFS
server.

Adjusted priorities and severity of the problem report.

Comment 2 luke 2000-03-01 21:25:59 UTC
Do you know where I would look to see about adjusting the number of sockets on
the client system?

Comment 3 Cristian Gafton 2000-08-09 02:36:26 UTC
assigned to johnsonm

Comment 4 frenzel 2001-02-27 22:13:48 UTC
We have 3 Dell Precision 420 workstations, 2 with single CPUs 
(the clients/desktops), one with two CPUs (intended as 
compute/file/print/web server). Each workstation exports file systems via NFS 
to the other two. Accessing the NFS mounted file systems on the server from 
the clients often results in hangups ("NFS task xxx can't get a request slot") 
of the clients for a few seconds up to several minutes. 
This must be a problem with the SMP kernel - if I run the single-processor 
kernel (RH 2.2.16-22 in both cases) on the server, 
the problem does not exist. There is also no problem in accessing the file 
systems on the clients from the server. Network traffic is always low, ping 
gives times around 150 useconds even during an NFS hangup. The load on the 
server is also very low.

Comment 5 Stephen John Smoogen 2003-01-25 03:22:35 UTC
The running out of slots and other problems listed above were mainly fixed with
the newer nfsd that showed up in Red Hat Linux 7.2. The configuration of the
server can be found in man rpc.nfsd command to get the number of threads to
start and should be configured in /etc/rc.d/inet.d/nfs