Bugzilla (bugzilla.redhat.com) will be under maintenance for infrastructure upgrades and will not be unavailable on July 31st between 12:30 AM - 05:30 AM UTC. We appreciate your understanding and patience. You can follow status.redhat.com for details.
Bug 1420926 - [RFE] improve NFSD thread counts stats
Summary: [RFE] improve NFSD thread counts stats
Alias: None
Product: Red Hat Enterprise Linux 7
Classification: Red Hat
Component: nfs-utils
Version: 7.1
Hardware: Unspecified
OS: Unspecified
Target Milestone: rc
: ---
Assignee: Steve Dickson
QA Contact: Yongcheng Yang
Marek Suchánek
Depends On: 1472153
Blocks: 1420851 1469559
TreeView+ depends on / blocked
Reported: 2017-02-09 21:51 UTC by Efraim Marquez-Arreaza
Modified: 2020-07-16 09:12 UTC (History)
16 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Last Closed: 2018-01-29 09:49:25 UTC
Target Upstream Version:

Attachments (Terms of Use)

System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2018:0981 0 normal SHIPPED_LIVE nfs-utils bug fix and enhancement update 2018-04-10 15:06:13 UTC

Description Efraim Marquez-Arreaza 2017-02-09 21:51:23 UTC
Description of problem:
Customer would like a command-line (ie: nfsstat) or script (ie: systemtap) to generate an output similar to Solaris 10

Version-Release number of selected component (if applicable):
Red Hat Enterprise Linux Server release 7.1 (Maipo)
Linux <host> 3.10.0-229.el7.x86_64 #1 SMP Thu Jan 29 18:37:38 EST 2015 x86_64 x86_64 x86_64 GNU/Linux

How reproducible:

Steps to Reproduce:

Actual results:
There is no command-line (ie: nfsstat) or file (/proc/fs/nfsd/*) or script (ie: systemtap) to generate require NFSD thread counts

[root@rhel7u1-nfs-server ~]# nfsstat 
Server rpc stats:
calls      badcalls   badclnt    badauth    xdrcall
0          0          0          0          0       

[root@rhel7u1-nfs-server ~]# cat /proc/fs/nfsd/pool_threads 
[root@rhel7u1-nfs-server ~]# cat /proc/fs/nfsd/threads

Expected results:

Similar to Solaris 10

# echo ::svc_pool nfs | mdb -k 
SVCPOOL = fffffea66aec45f0 -> POOL ID = NFS(1) 
Non detached threads = 1324 Detached threads = 0 
Max threads = 1324 
`redline' = 1 
Reserved threads = 0 
Thread lock = mutex not held 
Asleep threads = 1318 
Request lock = mutex not held 
Pending requests = 0 
Walking threads = 0 
Max requests from xprt = 8 
Stack size for svc_run = 24576 
Creator lock = mutex not held 
No of Master xprt's = 148 
rwlock for the mxprtlist= owner 0 
master xprt list ptr = ffffff5253196500 

Additional info:

Comment 7 Efraim Marquez-Arreaza 2017-03-02 15:51:42 UTC
The nfsd threads service NFS requests - and a single client TCP connection will be issuing a lot of those.

If some nfsd operations stall - due to some bottleneck (storage, low memory, etc) then a lot of nfsd threads might be in use - and the # of TCP connections doesn't have to be high.

nfsd threads will generally go away after about 5 seconds idle time - but it doesn't take a high nfs ops rate to tough every thread in 5 seconds once created.

In the interim, most will be "asleep" - except that they aren't always asleep - in S10 they poll for work periodically.

Once again, compare the sum of the non-detached + detached threads vs. the max threads.    One hint that more threads is necessary is that the pending requests is frequently non-zero.

These four (4) parameters we use in our script on Solaris NFS server to monitor the nfsd threads.

   # echo ::svc_pool nfs | mdb -k

   SVCPOOL = 6024a0658c0 -> POOL ID = NFS(1)
   Non detached threads    = 0   <--- Treads invoked by client processes
   Detached threads        = 0
   Max threads             = 16  <--- Maximum nfsd threads limit
   `redline'               = 1
   Reserved threads        = 0
   Thread lock     = mutex not held
   Asleep threads          = 0   <--- Threads currently idle (sleeping) will be released in 5 second or so if not occupied by a client process
   Request lock    = mutex not held
   Pending requests        = 0   <--- If asleep threads becomes 0 the pending requests becomes non-zero and clients starts getting errors nfs server not responding
   Walking threads         = 0
   Max requests from xprt  = 8
   Stack size for svc_run  = 0
   Creator lock    = mutex not held
   No of Master xprt's     = 1
   rwlock for the mxprtlist= owner 0
   master xprt list ptr    = 60249fb19c0

Comment 8 Frank Ch. Eigler 2017-03-10 16:50:24 UTC
Maybe the problem here is just one of documentation / education.  The linux nfsd threading model may bear such little similarity to the one in solaris that those statistics just don't make sense here.

Could one of the nfsd maintainers comment on nfsd thread pools, service queues, that sort of thing?

Comment 9 J. Bruce Fields 2017-03-10 19:21:15 UTC
I *think* /proc/fs/nfsd/pool_stats may come closest to what they want--see https://www.kernel.org/doc/Documentation/filesystems/nfs/knfsd-stats.txt for explanation.  As noted there, rate of change of those values is probably more interesting than the values themselves--so some utility to help interpret would be useful.  But this could probably be worked around with some scripting.

Also, the "th" line of /proc/net/rpc/nfsd is a histogram showing how many seconds thread usage has been at certain levels (10% of threads in use, 20% of threads in use,...).  Here again the rate of change may be what's interesting.

Note that Linux knfsd has a fixed number of threads, with the number configured by the administrator (see RPCNFSDCOUNT in /etc/sysconfig/nfs).  (We've considered changing to a system that dynamically creates and destroys threads, but that project isn't currently active.)

Comment 12 Steve Whitehouse 2017-07-19 16:30:14 UTC
Copying in Justin & Paul, since this is stats related. Is there anything we can suggest with PCP to help here?

Comment 31 Yongcheng Yang 2018-01-25 08:49:37 UTC
Just checked the metrics mentioned in comment #23 with latest pcp version:
[root@ ~]# yum install -y pcp >/dev/null 2>&1
[root@ ~]# rpm -q pcp
[root@ ~]# systemctl start pmcd
[root@ ~]# pminfo nfs.server.threads
[root@ ~]#

As not familiar with pcp, I'm not sure whether it's sufficient to verify it.

Any comments or suggestions are welcome!

Comment 32 Alice Mitchell 2018-01-25 11:57:32 UTC
Test 732 in the qa directory includes tests for this functionality, i believe you can just ./check 732 in the qa directory of built tree.

Comment 33 Zhibin Li 2018-01-29 09:25:57 UTC
(In reply to Justin Mitchell from comment #32)
> Test 732 in the qa directory includes tests for this functionality, i
> believe you can just ./check 732 in the qa directory of built tree.

I check Test 732 and get output below(since it's too long, I just skip the part which is irrelevant to nfs.server.threads.*.

QA output created by 732
== Checking metric descriptors and values - nfsrpc-root-001.tgz


    value 0

    value 0

    value 0

    value 0

    value 0

    value 0


    value 8

    value 1

    value 49

    value 4

    value 45

    value 0


== done

Comment 34 Yongcheng Yang 2018-01-29 09:49:25 UTC
Thanks Justin and Zhibin for the help.

Just closing this one as pcp-3.12.2-1.el7 (Bug 1472153) has covered this issue as comment #33.

Note You need to log in before you can comment on or make changes to this bug.