Bug 1528207

Summary: rpc.gssd uses a lot of memory with krb5 mounts
Product: Red Hat Enterprise Linux 6 Reporter: Pierguido Lambri <plambri>
Component: nfs-utilsAssignee: Steve Dickson <steved>
Status: CLOSED ERRATA QA Contact: Yongcheng Yang <yoyang>
Severity: high Docs Contact:
Priority: high    
Version: 6.9CC: agaikwad, cww, plambri, steved, swhiteho, toneata, xzhou, yoyang
Target Milestone: rcKeywords: Reopened
Target Release: ---   
Hardware: Unspecified   
OS: Linux   
Whiteboard:
Fixed In Version: nfs-utils-1.2.3-78.el6_10.2 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of:
: 1807110 (view as bug list) Environment:
Last Closed: 2020-03-10 10:15:25 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1807110, 1809277    
Attachments:
Description Flags
Patch for memleak in find_keytab_entry() none

Description Pierguido Lambri 2017-12-21 09:58:42 UTC
Description of problem:
After few months rpc.gssd reports a huge amount of used memory:

from top:

 1794 root      20   0 7357m 2.2g 1788 S  0.0 14.1  35:27.29 rpc.gssd

Version-Release number of selected component (if applicable):
Various versions report this problem.
The newest packages where this is seen are:

nfs-utils-1.2.3-64.el6
libtirpc-0.2.1-11.el6

How reproducible:
Very often by customer

Steps to Reproduce:
1. Mount a NFS share that uses krb5
2. Use the share for a while
3. Check the used memory

Actual results:
The used memory is very high

Expected results:
Used Memory shouldn't be that high

Additional info:

This takes some time. It's reported that high values are seen after few weeks (or months)

Comment 5 Pierguido Lambri 2018-02-12 16:10:29 UTC
*** Bug 1544471 has been marked as a duplicate of this bug. ***

Comment 14 Pierguido Lambri 2018-09-20 06:22:48 UTC
Created attachment 1485016 [details]
Patch for memleak in find_keytab_entry()

Comment 15 Pierguido Lambri 2018-09-20 06:25:05 UTC
With this patch I had no more leaks in find_keytab_entry().
This is the valgrind output after around 12 hours:

...
==00:08:42:22.758 2388== 2,048 bytes in 1 blocks are still reachable in loss record 45 of 48
==00:08:42:22.758 2388==    at 0x4A057BB: calloc (vg_replace_malloc.c:593)
==00:08:42:22.758 2388==    by 0x10D6E1: init_client_list (gssd_proc.c:471)
==00:08:42:22.758 2388==    by 0x10D327: gssd_run (gssd_main_loop.c:203)
==00:08:42:22.758 2388==    by 0x10D103: main (gssd.c:222)
==00:08:42:22.758 2388== 
==00:08:42:22.758 2388== 4,096 bytes in 1 blocks are still reachable in loss record 46 of 48
==00:08:42:22.758 2388==    at 0x4A057BB: calloc (vg_replace_malloc.c:593)
==00:08:42:22.758 2388==    by 0x4E2FB69: clnt_vc_create (clnt_vc.c:209)
==00:08:42:22.758 2388==    by 0x1124E5: nfs_get_tcpclient (rpc_socket.c:415)
==00:08:42:22.758 2388==    by 0x10DA36: create_auth_rpc_client (gssd_proc.c:883)
==00:08:42:22.758 2388==    by 0x10EC61: process_krb5_upcall (gssd_proc.c:1017)
==00:08:42:22.758 2388==    by 0x10F5BB: handle_gssd_upcall (gssd_proc.c:1214)
==00:08:42:22.758 2388==    by 0x10D482: gssd_run (gssd_main_loop.c:83)
==00:08:42:22.758 2388==    by 0x10D103: main (gssd.c:222)
==00:08:42:22.758 2388== 
==00:08:42:22.758 2388== 28,672 bytes in 7 blocks are still reachable in loss record 47 of 48
==00:08:42:22.758 2388==    at 0x4A06A2E: malloc (vg_replace_malloc.c:270)
==00:08:42:22.758 2388==    by 0x10D263: gssd_run (gssd_main_loop.c:113)
==00:08:42:22.758 2388==    by 0x10D103: main (gssd.c:222)
==00:08:42:22.758 2388== 
==00:08:42:22.758 2388== 49,152 bytes in 1 blocks are still reachable in loss record 48 of 48
==00:08:42:22.758 2388==    at 0x4A057BB: calloc (vg_replace_malloc.c:593)
==00:08:42:22.758 2388==    by 0x4E2FBB2: clnt_vc_create (clnt_vc.c:219)
==00:08:42:22.758 2388==    by 0x1124E5: nfs_get_tcpclient (rpc_socket.c:415)
==00:08:42:22.758 2388==    by 0x10DA36: create_auth_rpc_client (gssd_proc.c:883)
==00:08:42:22.758 2388==    by 0x10EC61: process_krb5_upcall (gssd_proc.c:1017)
==00:08:42:22.758 2388==    by 0x10F5BB: handle_gssd_upcall (gssd_proc.c:1214)
==00:08:42:22.758 2388==    by 0x10D482: gssd_run (gssd_main_loop.c:83)
==00:08:42:22.758 2388==    by 0x10D103: main (gssd.c:222)
==00:08:42:22.758 2388== 
==00:08:42:22.758 2388== LEAK SUMMARY:
==00:08:42:22.758 2388==    definitely lost: 16 bytes in 1 blocks
==00:08:42:22.758 2388==    indirectly lost: 0 bytes in 0 blocks
==00:08:42:22.758 2388==      possibly lost: 0 bytes in 0 blocks
==00:08:42:22.758 2388==    still reachable: 87,908 bytes in 69 blocks
==00:08:42:22.758 2388==         suppressed: 0 bytes in 0 blocks
...

Comment 23 Steve Whitehouse 2019-10-08 11:31:10 UTC
Steve D, please review the proposed fix in comment #14. If it meets the z-stream criteria, of being small and targetted then we should fix this I think.

Comment 27 Steve Dickson 2020-02-12 18:54:26 UTC
It turns out this was added to upstream last Aug
commit 9a94f8bb8f1eec22a8ce4deb92c7ce42b5730b66
Author: Steve Dickson <steved>
Date:   Thu Aug 22 09:37:55 2019 -0400

    nfs-utils: Removed a number of Coverity Scan RESOURCE_LEAK errors

    Signed-off-by: Steve Dickson <steved>

and part of this patch will fixed the leak you are seeing (hopefully ;-) )

diff --git a/utils/gssd/krb5_util.c b/utils/gssd/krb5_util.c
index 454a6eb..f68be85 100644
--- a/utils/gssd/krb5_util.c
+++ b/utils/gssd/krb5_util.c
@@ -912,6 +912,8 @@ find_keytab_entry(krb5_context context, krb5_keytab kt,
                k5err = gssd_k5_err_msg(context, code);
                printerr(3, "%s while getting keytab entry for '%s'\n",
                     k5err, spn);
+               free(k5err);
+               k5err = NULL;
                /*
                 * We tried the active directory machine account
                 * with the hostname part as-is and failed...
@@ -1231,6 +1233,8 @@ gssd_destroy_krb5_machine_creds(void)
            k5err = gssd_k5_err_msg(context, code);
            printerr(0, "WARNING: %s while destroying credential "
                    "cache '%s'\n", k5err, ple->ccname);
+           free(k5err);
+           k5err = NULL;
        }
    }
    krb5_free_context(context);


I do have a question about this...

+    if (adhostoverride)
+        free(adhostoverride);

 the man page for krb5_appdefault_string says 
"def_val is the value to return if no value is found in krb5.conf(5)"
Is it known that 'def_val' is free-able? 

Also what happens if adhostoverride is freed above.
Would this be free the memory twice? 

It appears upstream has a similar problem... I'll
cc you on my upstream proposal.

Comment 28 Steve Dickson 2020-02-12 19:06:57 UTC
I forgot to cc you but here is what I came up with
commit 6c43c80764e17da1d95eefede45a0edfdf726ac8 
Author: Steve Dickson <steved>
Date:   Wed Feb 12 13:56:53 2020 -0500

    gssd: Closed a memory leak in find_keytab_entry()
    
    When 'adhostoverride' is "not set", which
    is most of the time, adhostoverride is not freed.
    
    Signed-off-by: Steve Dickson <steved>

diff --git a/utils/gssd/krb5_util.c b/utils/gssd/krb5_util.c
index a1c43d2..85f60ae 100644
--- a/utils/gssd/krb5_util.c
+++ b/utils/gssd/krb5_util.c
@@ -799,7 +799,7 @@ find_keytab_entry(krb5_context context, krb5_keytab kt,
        int tried_all = 0, tried_default = 0, tried_upper = 0;
        krb5_principal princ;
        const char *notsetstr = "not set";
-       char *adhostoverride;
+       char *adhostoverride = NULL;
 
 
        /* Get full target hostname */
@@ -827,7 +827,6 @@ find_keytab_entry(krb5_context context, krb5_keytab kt,
                                adhostoverride);
                /* No overflow: Windows cannot handle strings longer than 19 chars */
                strcpy(myhostad, adhostoverride);
-               free(adhostoverride);
        } else {
                strcpy(myhostad, myhostname);
                for (i = 0; myhostad[i] != 0; ++i) {
@@ -836,6 +835,8 @@ find_keytab_entry(krb5_context context, krb5_keytab kt,
                myhostad[i] = '$';
                myhostad[i+1] = 0;
        }
+       if (adhostoverride)
+               krb5_free_string(context, adhostoverride);
 
        if (!srchost) {
                retval = get_full_hostname(myhostname, myhostname, sizeof(myhostname));

Comment 29 Pierguido Lambri 2020-02-13 08:54:28 UTC
Yours looks much better. 
Now I don't remember exactly how I came up with the "free(adhostoverride)", 
I guess I missed the fact that it could be set to def_val 
(and also the fact that you should be using krb5_free_string().

Comment 46 Yongcheng Yang 2020-03-03 09:50:13 UTC
Moving to VERIFIED according to comment #42. I'll include this test into our regression tests.

Thanks Pierguido for the help!

Comment 49 errata-xmlrpc 2020-03-10 10:15:25 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:0749