Red Hat Bugzilla – Bug 1420926
[RFE] improve NFSD thread counts stats
Last modified: 2018-02-26 02:24:38 EST
Description of problem: Customer would like a command-line (ie: nfsstat) or script (ie: systemtap) to generate an output similar to Solaris 10 Version-Release number of selected component (if applicable): Red Hat Enterprise Linux Server release 7.1 (Maipo) Linux <host> 3.10.0-229.el7.x86_64 #1 SMP Thu Jan 29 18:37:38 EST 2015 x86_64 x86_64 x86_64 GNU/Linux --- nfs-utils-1.3.0-0.21.el7_2.x86_64 libnfsidmap-0.25-11.el7.x86_64 --- NFSv3 NFSv4 How reproducible: N/A Steps to Reproduce: N/A Actual results: There is no command-line (ie: nfsstat) or file (/proc/fs/nfsd/*) or script (ie: systemtap) to generate require NFSD thread counts [root@rhel7u1-nfs-server ~]# nfsstat Server rpc stats: calls badcalls badclnt badauth xdrcall 0 0 0 0 0 [root@rhel7u1-nfs-server ~]# cat /proc/fs/nfsd/pool_threads 8 [root@rhel7u1-nfs-server ~]# cat /proc/fs/nfsd/threads 8 Expected results: Similar to Solaris 10 # echo ::svc_pool nfs | mdb -k SVCPOOL = fffffea66aec45f0 -> POOL ID = NFS(1) Non detached threads = 1324 Detached threads = 0 Max threads = 1324 `redline' = 1 Reserved threads = 0 Thread lock = mutex not held Asleep threads = 1318 Request lock = mutex not held Pending requests = 0 Walking threads = 0 Max requests from xprt = 8 Stack size for svc_run = 24576 Creator lock = mutex not held No of Master xprt's = 148 rwlock for the mxprtlist= owner 0 master xprt list ptr = ffffff5253196500 Additional info: N/A
The nfsd threads service NFS requests - and a single client TCP connection will be issuing a lot of those. If some nfsd operations stall - due to some bottleneck (storage, low memory, etc) then a lot of nfsd threads might be in use - and the # of TCP connections doesn't have to be high. nfsd threads will generally go away after about 5 seconds idle time - but it doesn't take a high nfs ops rate to tough every thread in 5 seconds once created. In the interim, most will be "asleep" - except that they aren't always asleep - in S10 they poll for work periodically. Once again, compare the sum of the non-detached + detached threads vs. the max threads. One hint that more threads is necessary is that the pending requests is frequently non-zero. These four (4) parameters we use in our script on Solaris NFS server to monitor the nfsd threads. # echo ::svc_pool nfs | mdb -k SVCPOOL = 6024a0658c0 -> POOL ID = NFS(1) Non detached threads = 0 <--- Treads invoked by client processes Detached threads = 0 Max threads = 16 <--- Maximum nfsd threads limit `redline' = 1 Reserved threads = 0 Thread lock = mutex not held Asleep threads = 0 <--- Threads currently idle (sleeping) will be released in 5 second or so if not occupied by a client process Request lock = mutex not held Pending requests = 0 <--- If asleep threads becomes 0 the pending requests becomes non-zero and clients starts getting errors nfs server not responding Walking threads = 0 Max requests from xprt = 8 Stack size for svc_run = 0 Creator lock = mutex not held No of Master xprt's = 1 rwlock for the mxprtlist= owner 0 master xprt list ptr = 60249fb19c0
Maybe the problem here is just one of documentation / education. The linux nfsd threading model may bear such little similarity to the one in solaris that those statistics just don't make sense here. Could one of the nfsd maintainers comment on nfsd thread pools, service queues, that sort of thing?
I *think* /proc/fs/nfsd/pool_stats may come closest to what they want--see https://www.kernel.org/doc/Documentation/filesystems/nfs/knfsd-stats.txt for explanation. As noted there, rate of change of those values is probably more interesting than the values themselves--so some utility to help interpret would be useful. But this could probably be worked around with some scripting. Also, the "th" line of /proc/net/rpc/nfsd is a histogram showing how many seconds thread usage has been at certain levels (10% of threads in use, 20% of threads in use,...). Here again the rate of change may be what's interesting. Note that Linux knfsd has a fixed number of threads, with the number configured by the administrator (see RPCNFSDCOUNT in /etc/sysconfig/nfs). (We've considered changing to a system that dynamically creates and destroys threads, but that project isn't currently active.)
Copying in Justin & Paul, since this is stats related. Is there anything we can suggest with PCP to help here?
Just checked the metrics mentioned in comment #23 with latest pcp version: ~~~~~~~~~ [root@ ~]# yum install -y pcp >/dev/null 2>&1 [root@ ~]# rpm -q pcp pcp-3.12.2-4.el7.x86_64 [root@ ~]# systemctl start pmcd [root@ ~]# pminfo nfs.server.threads nfs.server.threads.total nfs.server.threads.pools nfs.server.threads.requests nfs.server.threads.enqueued nfs.server.threads.processed nfs.server.threads.timedout [root@ ~]# As not familiar with pcp, I'm not sure whether it's sufficient to verify it. Any comments or suggestions are welcome!
Test 732 in the qa directory includes tests for this functionality, i believe you can just ./check 732 in the qa directory of built tree.
(In reply to Justin Mitchell from comment #32) > Test 732 in the qa directory includes tests for this functionality, i > believe you can just ./check 732 in the qa directory of built tree. I check Test 732 and get output below(since it's too long, I just skip the part which is irrelevant to nfs.server.threads.*. QA output created by 732 == Checking metric descriptors and values - nfsrpc-root-001.tgz ... nfs.server.threads.total value 0 nfs.server.threads.pools value 0 nfs.server.threads.requests value 0 nfs.server.threads.enqueued value 0 nfs.server.threads.processed value 0 nfs.server.threads.timedout value 0 ... nfs.server.threads.total value 8 nfs.server.threads.pools value 1 nfs.server.threads.requests value 49 nfs.server.threads.enqueued value 4 nfs.server.threads.processed value 45 nfs.server.threads.timedout value 0 ... == done
Thanks Justin and Zhibin for the help. Just closing this one as pcp-3.12.2-1.el7 (Bug 1472153) has covered this issue as comment #33.