Bug 1420926
Summary: | [RFE] improve NFSD thread counts stats | ||
---|---|---|---|
Product: | Red Hat Enterprise Linux 7 | Reporter: | Efraim Marquez-Arreaza <emarquez> |
Component: | nfs-utils | Assignee: | Steve Dickson <steved> |
Status: | CLOSED CURRENTRELEASE | QA Contact: | Yongcheng Yang <yoyang> |
Severity: | high | Docs Contact: | Marek Suchánek <msuchane> |
Priority: | high | ||
Version: | 7.1 | CC: | ajmitchell, bfields, coughlan, dwysocha, eguan, emarquez, fche, jiyin, jlayton, linche, pevans, steved, swhiteho, xzhou, yoyang, zhibli |
Target Milestone: | rc | Keywords: | FutureFeature, TestOnly |
Target Release: | --- | ||
Hardware: | Unspecified | ||
OS: | Unspecified | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | If docs needed, set a value | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2018-01-29 09:49:25 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: | |||
Bug Depends On: | 1472153 | ||
Bug Blocks: | 1420851, 1469559 |
Description
Efraim Marquez-Arreaza
2017-02-09 21:51:23 UTC
The nfsd threads service NFS requests - and a single client TCP connection will be issuing a lot of those. If some nfsd operations stall - due to some bottleneck (storage, low memory, etc) then a lot of nfsd threads might be in use - and the # of TCP connections doesn't have to be high. nfsd threads will generally go away after about 5 seconds idle time - but it doesn't take a high nfs ops rate to tough every thread in 5 seconds once created. In the interim, most will be "asleep" - except that they aren't always asleep - in S10 they poll for work periodically. Once again, compare the sum of the non-detached + detached threads vs. the max threads. One hint that more threads is necessary is that the pending requests is frequently non-zero. These four (4) parameters we use in our script on Solaris NFS server to monitor the nfsd threads. # echo ::svc_pool nfs | mdb -k SVCPOOL = 6024a0658c0 -> POOL ID = NFS(1) Non detached threads = 0 <--- Treads invoked by client processes Detached threads = 0 Max threads = 16 <--- Maximum nfsd threads limit `redline' = 1 Reserved threads = 0 Thread lock = mutex not held Asleep threads = 0 <--- Threads currently idle (sleeping) will be released in 5 second or so if not occupied by a client process Request lock = mutex not held Pending requests = 0 <--- If asleep threads becomes 0 the pending requests becomes non-zero and clients starts getting errors nfs server not responding Walking threads = 0 Max requests from xprt = 8 Stack size for svc_run = 0 Creator lock = mutex not held No of Master xprt's = 1 rwlock for the mxprtlist= owner 0 master xprt list ptr = 60249fb19c0 Maybe the problem here is just one of documentation / education. The linux nfsd threading model may bear such little similarity to the one in solaris that those statistics just don't make sense here. Could one of the nfsd maintainers comment on nfsd thread pools, service queues, that sort of thing? I *think* /proc/fs/nfsd/pool_stats may come closest to what they want--see https://www.kernel.org/doc/Documentation/filesystems/nfs/knfsd-stats.txt for explanation. As noted there, rate of change of those values is probably more interesting than the values themselves--so some utility to help interpret would be useful. But this could probably be worked around with some scripting. Also, the "th" line of /proc/net/rpc/nfsd is a histogram showing how many seconds thread usage has been at certain levels (10% of threads in use, 20% of threads in use,...). Here again the rate of change may be what's interesting. Note that Linux knfsd has a fixed number of threads, with the number configured by the administrator (see RPCNFSDCOUNT in /etc/sysconfig/nfs). (We've considered changing to a system that dynamically creates and destroys threads, but that project isn't currently active.) Copying in Justin & Paul, since this is stats related. Is there anything we can suggest with PCP to help here? Just checked the metrics mentioned in comment #23 with latest pcp version: ~~~~~~~~~ [root@ ~]# yum install -y pcp >/dev/null 2>&1 [root@ ~]# rpm -q pcp pcp-3.12.2-4.el7.x86_64 [root@ ~]# systemctl start pmcd [root@ ~]# pminfo nfs.server.threads nfs.server.threads.total nfs.server.threads.pools nfs.server.threads.requests nfs.server.threads.enqueued nfs.server.threads.processed nfs.server.threads.timedout [root@ ~]# As not familiar with pcp, I'm not sure whether it's sufficient to verify it. Any comments or suggestions are welcome! Test 732 in the qa directory includes tests for this functionality, i believe you can just ./check 732 in the qa directory of built tree. (In reply to Justin Mitchell from comment #32) > Test 732 in the qa directory includes tests for this functionality, i > believe you can just ./check 732 in the qa directory of built tree. I check Test 732 and get output below(since it's too long, I just skip the part which is irrelevant to nfs.server.threads.*. QA output created by 732 == Checking metric descriptors and values - nfsrpc-root-001.tgz ... nfs.server.threads.total value 0 nfs.server.threads.pools value 0 nfs.server.threads.requests value 0 nfs.server.threads.enqueued value 0 nfs.server.threads.processed value 0 nfs.server.threads.timedout value 0 ... nfs.server.threads.total value 8 nfs.server.threads.pools value 1 nfs.server.threads.requests value 49 nfs.server.threads.enqueued value 4 nfs.server.threads.processed value 45 nfs.server.threads.timedout value 0 ... == done Thanks Justin and Zhibin for the help. Just closing this one as pcp-3.12.2-1.el7 (Bug 1472153) has covered this issue as comment #33. |