1420926 – [RFE] improve NFSD thread counts stats

RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.

Bug 1420926 - [RFE] improve NFSD thread counts stats

Summary: [RFE] improve NFSD thread counts stats

Keywords:
Status:	CLOSED CURRENTRELEASE
Alias:	None
Product:	Red Hat Enterprise Linux 7
Classification:	Red Hat
Component:	nfs-utils
Sub Component:
Version:	7.1
Hardware:	Unspecified
OS:	Unspecified
Priority:	high
Severity:	high
Target Milestone:	rc
Target Release:	---
Assignee:	Steve Dickson
QA Contact:	Yongcheng Yang
Docs Contact:	Marek Suchánek
URL:
Whiteboard:
Depends On:	1472153
Blocks:	1420851 1469559
TreeView+	depends on / blocked

Reported:	2017-02-09 21:51 UTC by Efraim Marquez-Arreaza
Modified:	2020-07-16 09:12 UTC (History)
CC List:	16 users (show)
Fixed In Version:
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2018-01-29 09:49:25 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Product Errata	RHBA-2018:0981	0	normal	SHIPPED_LIVE	nfs-utils bug fix and enhancement update	2018-04-10 15:06:13 UTC

Description Efraim Marquez-Arreaza 2017-02-09 21:51:23 UTC

Description of problem:
Customer would like a command-line (ie: nfsstat) or script (ie: systemtap) to generate an output similar to Solaris 10

Version-Release number of selected component (if applicable):
Red Hat Enterprise Linux Server release 7.1 (Maipo)
Linux <host> 3.10.0-229.el7.x86_64 #1 SMP Thu Jan 29 18:37:38 EST 2015 x86_64 x86_64 x86_64 GNU/Linux
---
nfs-utils-1.3.0-0.21.el7_2.x86_64
libnfsidmap-0.25-11.el7.x86_64
---
NFSv3
NFSv4

How reproducible:
N/A

Steps to Reproduce:
N/A

Actual results:
There is no command-line (ie: nfsstat) or file (/proc/fs/nfsd/*) or script (ie: systemtap) to generate require NFSD thread counts

[root@rhel7u1-nfs-server ~]# nfsstat 
Server rpc stats:
calls      badcalls   badclnt    badauth    xdrcall
0          0          0          0          0       

[root@rhel7u1-nfs-server ~]# cat /proc/fs/nfsd/pool_threads 
8
[root@rhel7u1-nfs-server ~]# cat /proc/fs/nfsd/threads
8

Expected results:

Similar to Solaris 10

# echo ::svc_pool nfs | mdb -k 
SVCPOOL = fffffea66aec45f0 -> POOL ID = NFS(1) 
Non detached threads = 1324 Detached threads = 0 
Max threads = 1324 
`redline' = 1 
Reserved threads = 0 
Thread lock = mutex not held 
Asleep threads = 1318 
Request lock = mutex not held 
Pending requests = 0 
Walking threads = 0 
Max requests from xprt = 8 
Stack size for svc_run = 24576 
Creator lock = mutex not held 
No of Master xprt's = 148 
rwlock for the mxprtlist= owner 0 
master xprt list ptr = ffffff5253196500 


Additional info:
N/A

Comment 7 Efraim Marquez-Arreaza 2017-03-02 15:51:42 UTC

The nfsd threads service NFS requests - and a single client TCP connection will be issuing a lot of those.

If some nfsd operations stall - due to some bottleneck (storage, low memory, etc) then a lot of nfsd threads might be in use - and the # of TCP connections doesn't have to be high.

nfsd threads will generally go away after about 5 seconds idle time - but it doesn't take a high nfs ops rate to tough every thread in 5 seconds once created.

In the interim, most will be "asleep" - except that they aren't always asleep - in S10 they poll for work periodically.

Once again, compare the sum of the non-detached + detached threads vs. the max threads.    One hint that more threads is necessary is that the pending requests is frequently non-zero.

These four (4) parameters we use in our script on Solaris NFS server to monitor the nfsd threads.

   # echo ::svc_pool nfs | mdb -k

   SVCPOOL = 6024a0658c0 -> POOL ID = NFS(1)
   Non detached threads    = 0   <--- Treads invoked by client processes
   Detached threads        = 0
   Max threads             = 16  <--- Maximum nfsd threads limit
   `redline'               = 1
   Reserved threads        = 0
   Thread lock     = mutex not held
   Asleep threads          = 0   <--- Threads currently idle (sleeping) will be released in 5 second or so if not occupied by a client process
   Request lock    = mutex not held
   Pending requests        = 0   <--- If asleep threads becomes 0 the pending requests becomes non-zero and clients starts getting errors nfs server not responding
   Walking threads         = 0
   Max requests from xprt  = 8
   Stack size for svc_run  = 0
   Creator lock    = mutex not held
   No of Master xprt's     = 1
   rwlock for the mxprtlist= owner 0
   master xprt list ptr    = 60249fb19c0

Comment 8 Frank Ch. Eigler 2017-03-10 16:50:24 UTC

Maybe the problem here is just one of documentation / education.  The linux nfsd threading model may bear such little similarity to the one in solaris that those statistics just don't make sense here.

Could one of the nfsd maintainers comment on nfsd thread pools, service queues, that sort of thing?

Comment 9 J. Bruce Fields 2017-03-10 19:21:15 UTC

I *think* /proc/fs/nfsd/pool_stats may come closest to what they want--see https://www.kernel.org/doc/Documentation/filesystems/nfs/knfsd-stats.txt for explanation.  As noted there, rate of change of those values is probably more interesting than the values themselves--so some utility to help interpret would be useful.  But this could probably be worked around with some scripting.

Also, the "th" line of /proc/net/rpc/nfsd is a histogram showing how many seconds thread usage has been at certain levels (10% of threads in use, 20% of threads in use,...).  Here again the rate of change may be what's interesting.

Note that Linux knfsd has a fixed number of threads, with the number configured by the administrator (see RPCNFSDCOUNT in /etc/sysconfig/nfs).  (We've considered changing to a system that dynamically creates and destroys threads, but that project isn't currently active.)

Comment 12 Steve Whitehouse 2017-07-19 16:30:14 UTC

Copying in Justin & Paul, since this is stats related. Is there anything we can suggest with PCP to help here?

Comment 31 Yongcheng Yang 2018-01-25 08:49:37 UTC

Just checked the metrics mentioned in comment #23 with latest pcp version:
~~~~~~~~~
[root@ ~]# yum install -y pcp >/dev/null 2>&1
[root@ ~]# rpm -q pcp
pcp-3.12.2-4.el7.x86_64
[root@ ~]# systemctl start pmcd
[root@ ~]# pminfo nfs.server.threads
nfs.server.threads.total
nfs.server.threads.pools
nfs.server.threads.requests
nfs.server.threads.enqueued
nfs.server.threads.processed
nfs.server.threads.timedout
[root@ ~]#

As not familiar with pcp, I'm not sure whether it's sufficient to verify it.

Any comments or suggestions are welcome!

Comment 32 Alice Mitchell 2018-01-25 11:57:32 UTC

Test 732 in the qa directory includes tests for this functionality, i believe you can just ./check 732 in the qa directory of built tree.

Comment 33 Zhibin Li 2018-01-29 09:25:57 UTC

(In reply to Justin Mitchell from comment #32)
> Test 732 in the qa directory includes tests for this functionality, i
> believe you can just ./check 732 in the qa directory of built tree.

I check Test 732 and get output below(since it's too long, I just skip the part which is irrelevant to nfs.server.threads.*.

QA output created by 732
== Checking metric descriptors and values - nfsrpc-root-001.tgz

...

nfs.server.threads.total
    value 0

nfs.server.threads.pools
    value 0

nfs.server.threads.requests
    value 0

nfs.server.threads.enqueued
    value 0

nfs.server.threads.processed
    value 0

nfs.server.threads.timedout
    value 0

...

nfs.server.threads.total
    value 8

nfs.server.threads.pools
    value 1

nfs.server.threads.requests
    value 49

nfs.server.threads.enqueued
    value 4

nfs.server.threads.processed
    value 45

nfs.server.threads.timedout
    value 0

...

== done

Comment 34 Yongcheng Yang 2018-01-29 09:49:25 UTC

Thanks Justin and Zhibin for the help.

Just closing this one as pcp-3.12.2-1.el7 (Bug 1472153) has covered this issue as comment #33.

Note You need to log in before you can comment on or make changes to this bug.