Bug 1371693 - Processes on nfs client have very high cpu usage in rpcauth_lookup_credcache
Summary: Processes on nfs client have very high cpu usage in rpcauth_lookup_credcache
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 7
Classification: Red Hat
Component: kernel
Version: 7.2
Hardware: x86_64
OS: Linux
unspecified
medium
Target Milestone: rc
: ---
Assignee: Frank Sorenson
QA Contact: JianHong Yin
URL:
Whiteboard:
Depends On:
Blocks: 1298243 1385242
TreeView+ depends on / blocked
 
Reported: 2016-08-30 20:36 UTC by Frank Sorenson
Modified: 2020-09-10 09:45 UTC (History)
5 users (show)

Fixed In Version: kernel-3.10.0-592.el7
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
: 1565033 (view as bug list)
Environment:
Last Closed: 2017-08-02 00:46:17 UTC
Target Upstream Version:


Attachments (Terms of Use)
systemtap to monitor calls to rpcauth_lookup_credcache and the *_match functions (1.18 KB, text/plain)
2016-09-08 12:49 UTC, Frank Sorenson
no flags Details
program to stat a path as various gids (1.33 KB, text/x-csrc)
2016-09-10 17:57 UTC, Frank Sorenson
no flags Details
patches, take 2 (3.48 KB, patch)
2016-09-27 15:23 UTC, Frank Sorenson
no flags Details | Diff
add hash_cred() function to rpc_authops struct (1.79 KB, patch)
2016-09-28 18:55 UTC, Frank Sorenson
no flags Details | Diff
add generic_auth hash_cred() function (1.24 KB, patch)
2016-09-28 18:55 UTC, Frank Sorenson
no flags Details | Diff
add auth_unix hash_cred() function (1.08 KB, patch)
2016-09-28 18:56 UTC, Frank Sorenson
no flags Details | Diff
add RPCSEC_GSS hash_cred() function (1.09 KB, patch)
2016-09-28 18:57 UTC, Frank Sorenson
no flags Details | Diff
replace generic auth_cred hash with auth-specific function (877 bytes, patch)
2016-09-28 18:58 UTC, Frank Sorenson
no flags Details | Diff


Links
System ID Priority Status Summary Last Updated
Red Hat Product Errata RHSA-2017:1842 normal SHIPPED_LIVE Important: kernel security, bug fix, and enhancement update 2017-08-01 18:22:09 UTC

Description Frank Sorenson 2016-08-30 20:36:30 UTC
Created attachment 1196075 [details]
customer-provided pseudocode of reproducer

Description of problem:

Processes accessing nfs mounts have very high cpu usage in rpcauth_lookup_credcache, with rpcauth_lookup_credcache calling generic_match and unx_match thousands of times, causing very slow response time.


Version-Release number of selected component (if applicable):

RHEL 7.2 kernel 3.10.0-327.10.1.el7.x86_64


How reproducible:

customer is able to reproduce the issue at will


Steps to Reproduce:

Customer has provided example pseudo-code which does the following:
	chroot into an nfs-mounted directory
	calls setuid and setgid to the user/group which owns the directory
	performs a recursive chmod(0700) of the directory
	recursively removes the directory tree


Actual results:

cpu utilization and nfs response time normal


Expected results:

cpu usage and response time remain 


Additional info:

Comment 1 Frank Sorenson 2016-08-30 20:41:23 UTC
The mount is nfs v3 with sec=sys

web108.store:/web108 on /mnt/web108 type nfs (rw,nosuid,relatime,vers=3,rsize=65536,wsize=65536,namlen=255,hard,proto=tcp,timeo=600,retrans=2,sec=sys,mountaddr=192.168.47.40,mountvers=3,mountport=635,mountproto=tcp,local_lock=none,addr=192.168.xxx.xxx)

Comment 2 Frank Sorenson 2016-08-30 20:44:10 UTC
I made a systemtap to show the number of calls to rpcauth_lookup_credcache, and how many calls to generic_match and unx_match resulted from those.  The results are output every 10 seconds.

Here are some results (I'll attach the rest of the output):

Fri Aug 19 12:30:39 2016 CEST: 
        perl: lookups: 1052 (generic_match: 4959398, unx_match: 1474836, errors: 0)
        webquota: lookups: 915 (generic_match: 544110, unx_match: 6586170, errors: 0)
        kworker/1:1H: lookups: 72 (generic_match: 0, unx_match: 423000, errors: 0)
        kworker/0:0H: lookups: 13 (generic_match: 0, unx_match: 76375, errors: 0)

Fri Aug 19 12:30:49 2016 CEST: 
        perl: lookups: 8701 (generic_match: 34533944, unx_match: 6141059, errors: 0)
        webquota: lookups: 766 (generic_match: 652932, unx_match: 5426305, errors: 0)

Fri Aug 19 12:30:59 2016 CEST: 
        perl: lookups: 8256 (generic_match: 33662617, unx_match: 5104564, errors: 0)
        webquota: lookups: 1022 (generic_match: 544110, unx_match: 7388819, errors: 0)
        kworker/1:1H: lookups: 2 (generic_match: 0, unx_match: 14884, errors: 0)

Comment 6 Frank Sorenson 2016-09-08 12:49:00 UTC
Created attachment 1199076 [details]
systemtap to monitor calls to rpcauth_lookup_credcache and the *_match functions

Comment 7 Jeff Layton 2016-09-08 13:01:21 UTC
Could that mean that the hashing scheme is crappy here? Maybe we have some really disproportionately long hash chains? How big is the cache here? Looking now:

    nr = hash_long(from_kuid(&init_user_ns, acred->uid), cache->hashbits);  

...and it seems like there was some brouhaha recently about how some of the standard pointer hashing routines actually end up with poor distribution in many cases. Maybe that's at least part of the problem here?

Comment 8 Frank Sorenson 2016-09-09 06:21:29 UTC
The hash function itself is fine, but we're only hashing with the uid.

Accesses by a single user using more than one egid (really fsgid, but setegid will set fsgid as well) will result in an additional rpc_cred for the new uid:gid pair, but it will simply be appended to the same hash chain.  This obviously makes the hash useless, since it devolves into a linear search.


I thrashed the credcache by doing a stat() of a directory entry as root after doing a setegid() for each gid from 500-100000, and confirmed that both unix_auth and generic_auth had all 100,000 entries in the hash[0] chain.


After that, I did an '/bin/ls -1fl' of a directory containing 1000 files.

with the default 4-bit uid-only hash, the ls took over 8 1/2 minutes (519 seconds), with 479 seconds of system time.


I then added the gid to the hash, as we discussed (on 64-bit, we've been hashing a 32-bit uid using a 64-bit hash):

diff --git a/net/sunrpc/auth.c b/net/sunrpc/auth.c
index a7e42f9..7a0266a 100644
--- a/net/sunrpc/auth.c
+++ b/net/sunrpc/auth.c
@@ -551,7 +551,8 @@ rpcauth_lookup_credcache(struct rpc_auth *auth, struct auth_cred * acred,
                        *entry, *new;
        unsigned int nr;
 
-       nr = hash_long(from_kuid(&init_user_ns, acred->uid), cache->hashbits);
+       nr = hash_long((from_kuid(&init_user_ns, acred->uid) << 32) |
+                from_kgid(&init_user_ns, acred->gid), cache->hashbits);
 
        rcu_read_lock();
        hlist_for_each_entry_rcu(entry, &cache->hashtable[nr], cr_hash) {


with the new uid/gid hash, and using the default hash size of 4 bits, the ls took 4.33 seconds, with 3.94 seconds of system time.  The longest hash chain was 6222, and the shortest was 6217

I also tested after bumping up the hash size to 6 bits (64 hash buckets).  The ls took 2.36 seconds, with 1.82 seconds of system time.  The longest hash chain was 1557 entries, and the shortest 1554

Looks like a winner to me.

Comment 9 Jeff Layton 2016-09-09 13:41:11 UTC
Nice!

Minor nits: unsigned long is 32 bits on 32 bit platforms, so you want to use hash_64() there instead of hash_long(). Also, the bare "32" there is somewhat fragile. Maybe phrase that as  "sizeof(gid_t) * 8" ?

Comment 10 Frank Sorenson 2016-09-10 17:44:23 UTC
This sort of feedback is perfect, thanks.

How about something like this?  Will hash_64 do the right thing on 32-bit?


diff --git a/net/sunrpc/auth.c b/net/sunrpc/auth.c
index a7e42f9..db18be1 100644
--- a/net/sunrpc/auth.c
+++ b/net/sunrpc/auth.c
@@ -538,6 +539,14 @@ rpcauth_cache_enforce_limit(void)
 	rpcauth_cache_do_shrink(nr_to_scan);
 }
 
+static unsigned int
+rpcauth_hash_acred(struct auth_cred *acred, unsigned int hashbits)
+{
+	return hash_64(from_kgid(&init_user_ns, acred->gid) |
+		(from_kuid(&init_user_ns, acred->uid) << (sizeof(gid_t) * 8)),
+		hashbits);
+}
+
 /*
  * Look up a process' credentials in the authentication cache
  */
@@ -551,7 +560,7 @@ rpcauth_lookup_credcache(struct rpc_auth *auth, struct auth_cred * acred,
 			*entry, *new;
 	unsigned int nr;
 
-	nr = hash_long(from_kuid(&init_user_ns, acred->uid), cache->hashbits);
+	nr = rpcauth_hash_acred(acred, cache->hashbits);
 
 	rcu_read_lock();
 	hlist_for_each_entry_rcu(entry, &cache->hashtable[nr], cr_hash) {

Comment 11 Frank Sorenson 2016-09-10 17:57:36 UTC
Created attachment 1199794 [details]
program to stat a path as various gids

Comment 12 Frank Sorenson 2016-09-12 14:30:18 UTC
Next thought...

The hashtable is created in rpcauth_init_credcache(), which is called in the initialization code for each auth.  In fact, it's really the only thing that's done in the init code (both rpc_init_generic_auth and rpc_init_authunix contain only a 'return rpcauth_init_credcache(&....)').

Although the number of 'auth_hashbits' is configurable through sysfs by way of the module parameter 'auth_hashtable_size', both rpc_init_authunix and rpc_init_generic_auth are called in rpcauth_init_module.  Since they're static, 
the number of hashbits can only be changed when the 'sunrpc' module is loaded.


We probably want the new hashbits setting to have some effect though, wouldn't we?  Can we
	create a new hashtable, rehash the existing entries in the old, and move them over to the new?
	call the shrinker with aggressive settings, then do the above?
	create a new hashtable and dump everything in the old?
	so sorry, too late to change now?  (then should the parameter be changed to read-only?)

Comment 13 Jeff Layton 2016-09-12 14:42:19 UTC
Personally...that sounds like a lot of work for dubious benefit, and could have "interesting" effects on performance. I'd avoid dynamically resizing the hashtable unless we decide that we really need to go down that road.

Making it r/o sounds reasonable though.

Comment 14 Frank Sorenson 2016-09-27 15:23:25 UTC
Created attachment 1205271 [details]
patches, take 2

TODO: cast the uid prior to shifting to eliminate:
	warning: left shift count >= width of type [enabled by default]
	(from_kuid(&init_user_ns, acred->uid) << (sizeof(gid_t) * 8)),

Comment 15 Frank Sorenson 2016-09-28 18:55:09 UTC
Created attachment 1205626 [details]
add hash_cred() function to rpc_authops struct

Comment 16 Frank Sorenson 2016-09-28 18:55:49 UTC
Created attachment 1205627 [details]
add generic_auth hash_cred() function

Comment 17 Frank Sorenson 2016-09-28 18:56:24 UTC
Created attachment 1205628 [details]
add auth_unix hash_cred() function

Comment 18 Frank Sorenson 2016-09-28 18:57:04 UTC
Created attachment 1205629 [details]
add RPCSEC_GSS hash_cred() function

Comment 19 Frank Sorenson 2016-09-28 18:58:04 UTC
Created attachment 1205636 [details]
replace generic auth_cred hash with auth-specific function

Comment 20 Frank Sorenson 2016-10-14 21:59:21 UTC
upstream commits:

e856a23 sunrpc: add hash_cred() function to rpc_authops struct
18028c9 sunrpc: add generic_auth hash_cred() function
1e035d0 sunrpc: add auth_unix hash_cred() function
a960f8d sunrpc: add RPCSEC_GSS hash_cred() function
66cbd4b sunrpc: replace generic auth_cred hash with auth-specific function


commit 66cbd4ba8ac4711e8bad97b5bded31ec298c7433
Author: Frank Sorenson <sorenson@redhat.com>
Date:   2016-09-29 10:44:41 -0500

    sunrpc: replace generic auth_cred hash with auth-specific function
    
    Replace the generic code to hash the auth_cred with the call to
    the auth-specific hash function in the rpc_authops struct.
    
    Signed-off-by: Frank Sorenson <sorenson@redhat.com>
    Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>

commit a960f8d6db431f5785ee28dbe903d61d821f368e
Author: Frank Sorenson <sorenson@redhat.com>
Date:   2016-09-29 10:44:40 -0500

    sunrpc: add RPCSEC_GSS hash_cred() function
    
    Add a hash_cred() function for RPCSEC_GSS, using only the
    uid from the auth_cred.
    
    Signed-off-by: Frank Sorenson <sorenson@redhat.com>
    Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>

commit 1e035d065f3415809c056fb7537320a74c718537
Author: Frank Sorenson <sorenson@redhat.com>
Date:   2016-09-29 10:44:39 -0500

    sunrpc: add auth_unix hash_cred() function
    
    Add a hash_cred() function for auth_unix, using both the
    uid and gid from the auth_cred.
    
    Signed-off-by: Frank Sorenson <sorenson@redhat.com>
    Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>

commit 18028c967e423673a055a30b1bf4e603ec64026d
Author: Frank Sorenson <sorenson@redhat.com>
Date:   2016-09-29 10:44:38 -0500

    sunrpc: add generic_auth hash_cred() function
    
    Add a hash_cred() function for generic_auth, using both the
    uid and gid from the auth_cred.
    
    Signed-off-by: Frank Sorenson <sorenson@redhat.com>
    Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>

commit e856a231d5d5742fe7c63e3a2b266bef668af5b4
Author: Frank Sorenson <sorenson@redhat.com>
Date:   2016-09-29 10:44:37 -0500

    sunrpc: add hash_cred() function to rpc_authops struct
    
    Currently, a single hash algorithm is used to hash the auth_cred for
    the credcache for all rpc_auth types.  Add a hash_cred() function to
    the rpc_authops struct to allow a hash function specific to each
    auth flavor.
    
    Signed-off-by: Frank Sorenson <sorenson@redhat.com>
    Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>

Comment 22 Frank Sorenson 2016-10-20 21:57:30 UTC
The program attached in comment 11 (the thrash_credcache-2.c) can be used to cause the conditions which result in the high cpu usage and slow performance:

on the nfs client (example uses nfs v3, but nfs v4 is likely also affected):
# mount server:/exports /mnt/credcache -o sec=sys,vers=3
# mkdir /mnt/credcache/test_files
# for f in $(seq 1 1000) ; do touch /mnt/credcache/test_files/file_$f ; done

# thrash_credcache-2 /mnt/credcache/test_files 100 100000
calling stat on '/mnt/credcache/test_files' with gids 100 through 100000

# /usr/bin/time /bin/ls -1fl /mnt/credcache/test_files/ >/dev/null



The thrash_credcache-2 will take a long time to run on a system without the patches, and could probably be used on its own for testing, rather than the 'ls' command...  just time how long it takes to run run...  the difference will likely be very apparent.


If this issue is present, the 'ls' command will take "a long time", and cpu usage will be very high.  The precise amount of time is difficult to say, however my testing showed that the 'ls' of 1000 directory entries was taking 8 1/2 minutes to complete.

With the patches, the 'ls' time was reduced to 4 seconds.

Comment 27 Rafael Aquini 2017-03-08 19:38:12 UTC
Patch(es) committed on kernel repository and an interim kernel build is undergoing testing

Comment 29 Rafael Aquini 2017-03-09 15:40:36 UTC
Patch(es) available on kernel-3.10.0-592.el7

Comment 33 errata-xmlrpc 2017-08-02 00:46:17 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2017:1842

Comment 34 errata-xmlrpc 2017-08-02 01:39:21 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2017:1842


Note You need to log in before you can comment on or make changes to this bug.