Bug 1371693
Summary: | Processes on nfs client have very high cpu usage in rpcauth_lookup_credcache | |||
---|---|---|---|---|
Product: | Red Hat Enterprise Linux 7 | Reporter: | Frank Sorenson <fsorenso> | |
Component: | kernel | Assignee: | Frank Sorenson <fsorenso> | |
kernel sub component: | NFS | QA Contact: | JianHong Yin <jiyin> | |
Status: | CLOSED ERRATA | Docs Contact: | ||
Severity: | medium | |||
Priority: | unspecified | CC: | dwysocha, eguan, jiyin, jlayton, yoyang | |
Version: | 7.2 | Keywords: | Patch, Reproducer | |
Target Milestone: | rc | |||
Target Release: | --- | |||
Hardware: | x86_64 | |||
OS: | Linux | |||
Whiteboard: | ||||
Fixed In Version: | kernel-3.10.0-592.el7 | Doc Type: | If docs needed, set a value | |
Doc Text: | Story Points: | --- | ||
Clone Of: | ||||
: | 1565033 (view as bug list) | Environment: | ||
Last Closed: | 2017-08-02 00:46:17 UTC | Type: | Bug | |
Regression: | --- | Mount Type: | --- | |
Documentation: | --- | CRM: | ||
Verified Versions: | Category: | --- | ||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | ||
Cloudforms Team: | --- | Target Upstream Version: | ||
Embargoed: | ||||
Bug Depends On: | ||||
Bug Blocks: | 1298243, 1385242 | |||
Attachments: |
The mount is nfs v3 with sec=sys web108.store:/web108 on /mnt/web108 type nfs (rw,nosuid,relatime,vers=3,rsize=65536,wsize=65536,namlen=255,hard,proto=tcp,timeo=600,retrans=2,sec=sys,mountaddr=192.168.47.40,mountvers=3,mountport=635,mountproto=tcp,local_lock=none,addr=192.168.xxx.xxx) I made a systemtap to show the number of calls to rpcauth_lookup_credcache, and how many calls to generic_match and unx_match resulted from those. The results are output every 10 seconds. Here are some results (I'll attach the rest of the output): Fri Aug 19 12:30:39 2016 CEST: perl: lookups: 1052 (generic_match: 4959398, unx_match: 1474836, errors: 0) webquota: lookups: 915 (generic_match: 544110, unx_match: 6586170, errors: 0) kworker/1:1H: lookups: 72 (generic_match: 0, unx_match: 423000, errors: 0) kworker/0:0H: lookups: 13 (generic_match: 0, unx_match: 76375, errors: 0) Fri Aug 19 12:30:49 2016 CEST: perl: lookups: 8701 (generic_match: 34533944, unx_match: 6141059, errors: 0) webquota: lookups: 766 (generic_match: 652932, unx_match: 5426305, errors: 0) Fri Aug 19 12:30:59 2016 CEST: perl: lookups: 8256 (generic_match: 33662617, unx_match: 5104564, errors: 0) webquota: lookups: 1022 (generic_match: 544110, unx_match: 7388819, errors: 0) kworker/1:1H: lookups: 2 (generic_match: 0, unx_match: 14884, errors: 0) Created attachment 1199076 [details]
systemtap to monitor calls to rpcauth_lookup_credcache and the *_match functions
Could that mean that the hashing scheme is crappy here? Maybe we have some really disproportionately long hash chains? How big is the cache here? Looking now: nr = hash_long(from_kuid(&init_user_ns, acred->uid), cache->hashbits); ...and it seems like there was some brouhaha recently about how some of the standard pointer hashing routines actually end up with poor distribution in many cases. Maybe that's at least part of the problem here? The hash function itself is fine, but we're only hashing with the uid. Accesses by a single user using more than one egid (really fsgid, but setegid will set fsgid as well) will result in an additional rpc_cred for the new uid:gid pair, but it will simply be appended to the same hash chain. This obviously makes the hash useless, since it devolves into a linear search. I thrashed the credcache by doing a stat() of a directory entry as root after doing a setegid() for each gid from 500-100000, and confirmed that both unix_auth and generic_auth had all 100,000 entries in the hash[0] chain. After that, I did an '/bin/ls -1fl' of a directory containing 1000 files. with the default 4-bit uid-only hash, the ls took over 8 1/2 minutes (519 seconds), with 479 seconds of system time. I then added the gid to the hash, as we discussed (on 64-bit, we've been hashing a 32-bit uid using a 64-bit hash): diff --git a/net/sunrpc/auth.c b/net/sunrpc/auth.c index a7e42f9..7a0266a 100644 --- a/net/sunrpc/auth.c +++ b/net/sunrpc/auth.c @@ -551,7 +551,8 @@ rpcauth_lookup_credcache(struct rpc_auth *auth, struct auth_cred * acred, *entry, *new; unsigned int nr; - nr = hash_long(from_kuid(&init_user_ns, acred->uid), cache->hashbits); + nr = hash_long((from_kuid(&init_user_ns, acred->uid) << 32) | + from_kgid(&init_user_ns, acred->gid), cache->hashbits); rcu_read_lock(); hlist_for_each_entry_rcu(entry, &cache->hashtable[nr], cr_hash) { with the new uid/gid hash, and using the default hash size of 4 bits, the ls took 4.33 seconds, with 3.94 seconds of system time. The longest hash chain was 6222, and the shortest was 6217 I also tested after bumping up the hash size to 6 bits (64 hash buckets). The ls took 2.36 seconds, with 1.82 seconds of system time. The longest hash chain was 1557 entries, and the shortest 1554 Looks like a winner to me. Nice! Minor nits: unsigned long is 32 bits on 32 bit platforms, so you want to use hash_64() there instead of hash_long(). Also, the bare "32" there is somewhat fragile. Maybe phrase that as "sizeof(gid_t) * 8" ? This sort of feedback is perfect, thanks. How about something like this? Will hash_64 do the right thing on 32-bit? diff --git a/net/sunrpc/auth.c b/net/sunrpc/auth.c index a7e42f9..db18be1 100644 --- a/net/sunrpc/auth.c +++ b/net/sunrpc/auth.c @@ -538,6 +539,14 @@ rpcauth_cache_enforce_limit(void) rpcauth_cache_do_shrink(nr_to_scan); } +static unsigned int +rpcauth_hash_acred(struct auth_cred *acred, unsigned int hashbits) +{ + return hash_64(from_kgid(&init_user_ns, acred->gid) | + (from_kuid(&init_user_ns, acred->uid) << (sizeof(gid_t) * 8)), + hashbits); +} + /* * Look up a process' credentials in the authentication cache */ @@ -551,7 +560,7 @@ rpcauth_lookup_credcache(struct rpc_auth *auth, struct auth_cred * acred, *entry, *new; unsigned int nr; - nr = hash_long(from_kuid(&init_user_ns, acred->uid), cache->hashbits); + nr = rpcauth_hash_acred(acred, cache->hashbits); rcu_read_lock(); hlist_for_each_entry_rcu(entry, &cache->hashtable[nr], cr_hash) { Created attachment 1199794 [details]
program to stat a path as various gids
Next thought... The hashtable is created in rpcauth_init_credcache(), which is called in the initialization code for each auth. In fact, it's really the only thing that's done in the init code (both rpc_init_generic_auth and rpc_init_authunix contain only a 'return rpcauth_init_credcache(&....)'). Although the number of 'auth_hashbits' is configurable through sysfs by way of the module parameter 'auth_hashtable_size', both rpc_init_authunix and rpc_init_generic_auth are called in rpcauth_init_module. Since they're static, the number of hashbits can only be changed when the 'sunrpc' module is loaded. We probably want the new hashbits setting to have some effect though, wouldn't we? Can we create a new hashtable, rehash the existing entries in the old, and move them over to the new? call the shrinker with aggressive settings, then do the above? create a new hashtable and dump everything in the old? so sorry, too late to change now? (then should the parameter be changed to read-only?) Personally...that sounds like a lot of work for dubious benefit, and could have "interesting" effects on performance. I'd avoid dynamically resizing the hashtable unless we decide that we really need to go down that road. Making it r/o sounds reasonable though. Created attachment 1205271 [details]
patches, take 2
TODO: cast the uid prior to shifting to eliminate:
warning: left shift count >= width of type [enabled by default]
(from_kuid(&init_user_ns, acred->uid) << (sizeof(gid_t) * 8)),
Created attachment 1205626 [details]
add hash_cred() function to rpc_authops struct
Created attachment 1205627 [details]
add generic_auth hash_cred() function
Created attachment 1205628 [details]
add auth_unix hash_cred() function
Created attachment 1205629 [details]
add RPCSEC_GSS hash_cred() function
Created attachment 1205636 [details]
replace generic auth_cred hash with auth-specific function
upstream commits: e856a23 sunrpc: add hash_cred() function to rpc_authops struct 18028c9 sunrpc: add generic_auth hash_cred() function 1e035d0 sunrpc: add auth_unix hash_cred() function a960f8d sunrpc: add RPCSEC_GSS hash_cred() function 66cbd4b sunrpc: replace generic auth_cred hash with auth-specific function commit 66cbd4ba8ac4711e8bad97b5bded31ec298c7433 Author: Frank Sorenson <sorenson> Date: 2016-09-29 10:44:41 -0500 sunrpc: replace generic auth_cred hash with auth-specific function Replace the generic code to hash the auth_cred with the call to the auth-specific hash function in the rpc_authops struct. Signed-off-by: Frank Sorenson <sorenson> Signed-off-by: Anna Schumaker <Anna.Schumaker> commit a960f8d6db431f5785ee28dbe903d61d821f368e Author: Frank Sorenson <sorenson> Date: 2016-09-29 10:44:40 -0500 sunrpc: add RPCSEC_GSS hash_cred() function Add a hash_cred() function for RPCSEC_GSS, using only the uid from the auth_cred. Signed-off-by: Frank Sorenson <sorenson> Signed-off-by: Anna Schumaker <Anna.Schumaker> commit 1e035d065f3415809c056fb7537320a74c718537 Author: Frank Sorenson <sorenson> Date: 2016-09-29 10:44:39 -0500 sunrpc: add auth_unix hash_cred() function Add a hash_cred() function for auth_unix, using both the uid and gid from the auth_cred. Signed-off-by: Frank Sorenson <sorenson> Signed-off-by: Anna Schumaker <Anna.Schumaker> commit 18028c967e423673a055a30b1bf4e603ec64026d Author: Frank Sorenson <sorenson> Date: 2016-09-29 10:44:38 -0500 sunrpc: add generic_auth hash_cred() function Add a hash_cred() function for generic_auth, using both the uid and gid from the auth_cred. Signed-off-by: Frank Sorenson <sorenson> Signed-off-by: Anna Schumaker <Anna.Schumaker> commit e856a231d5d5742fe7c63e3a2b266bef668af5b4 Author: Frank Sorenson <sorenson> Date: 2016-09-29 10:44:37 -0500 sunrpc: add hash_cred() function to rpc_authops struct Currently, a single hash algorithm is used to hash the auth_cred for the credcache for all rpc_auth types. Add a hash_cred() function to the rpc_authops struct to allow a hash function specific to each auth flavor. Signed-off-by: Frank Sorenson <sorenson> Signed-off-by: Anna Schumaker <Anna.Schumaker> The program attached in comment 11 (the thrash_credcache-2.c) can be used to cause the conditions which result in the high cpu usage and slow performance: on the nfs client (example uses nfs v3, but nfs v4 is likely also affected): # mount server:/exports /mnt/credcache -o sec=sys,vers=3 # mkdir /mnt/credcache/test_files # for f in $(seq 1 1000) ; do touch /mnt/credcache/test_files/file_$f ; done # thrash_credcache-2 /mnt/credcache/test_files 100 100000 calling stat on '/mnt/credcache/test_files' with gids 100 through 100000 # /usr/bin/time /bin/ls -1fl /mnt/credcache/test_files/ >/dev/null The thrash_credcache-2 will take a long time to run on a system without the patches, and could probably be used on its own for testing, rather than the 'ls' command... just time how long it takes to run run... the difference will likely be very apparent. If this issue is present, the 'ls' command will take "a long time", and cpu usage will be very high. The precise amount of time is difficult to say, however my testing showed that the 'ls' of 1000 directory entries was taking 8 1/2 minutes to complete. With the patches, the 'ls' time was reduced to 4 seconds. Patch(es) committed on kernel repository and an interim kernel build is undergoing testing Patch(es) available on kernel-3.10.0-592.el7 Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2017:1842 Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2017:1842 |
Created attachment 1196075 [details] customer-provided pseudocode of reproducer Description of problem: Processes accessing nfs mounts have very high cpu usage in rpcauth_lookup_credcache, with rpcauth_lookup_credcache calling generic_match and unx_match thousands of times, causing very slow response time. Version-Release number of selected component (if applicable): RHEL 7.2 kernel 3.10.0-327.10.1.el7.x86_64 How reproducible: customer is able to reproduce the issue at will Steps to Reproduce: Customer has provided example pseudo-code which does the following: chroot into an nfs-mounted directory calls setuid and setgid to the user/group which owns the directory performs a recursive chmod(0700) of the directory recursively removes the directory tree Actual results: cpu utilization and nfs response time normal Expected results: cpu usage and response time remain Additional info: