1379482 – Indirect memory leak bug in /src/gp_creds.c (gssproxy).

RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.

Bug 1379482 - Indirect memory leak bug in /src/gp_creds.c (gssproxy).

Summary: Indirect memory leak bug in /src/gp_creds.c (gssproxy).

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Enterprise Linux 7
Classification:	Red Hat
Component:	gssproxy
Sub Component:
Version:	7.4
Hardware:	Unspecified
OS:	Unspecified
Priority:	high
Severity:	medium
Target Milestone:	rc
Target Release:	---
Assignee:	Robbie Harwood
QA Contact:	Abhijeet Kasurde
Docs Contact:
URL:	https://pagure.io/gssproxy/pull-reque...
Whiteboard:
Depends On:
Blocks:	1298243 1399979
TreeView+	depends on / blocked

Reported:	2016-09-26 23:36 UTC by Thomas Gardner
Modified:	2020-09-10 09:49 UTC (History)
CC List:	7 users (show)
Fixed In Version:	gssproxy-0.6.2-4.el7
Doc Type:	No Doc Update
Doc Text:	Fixed several memory leaks in gssproxy. (Group 1379005, 1379482, 1379616, 1380490 together as a single line item.)
Clone Of:
Environment:
Last Closed:	2017-08-01 20:55:26 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Product Errata	RHBA-2017:2033	0	normal	SHIPPED_LIVE	gssproxy bug fix update	2017-08-01 18:34:35 UTC

Description Thomas Gardner 2016-09-26 23:36:23 UTC

Description of problem:

While looking into multiple memory leaks, I came across this.  It may
or not be the cause of the particular leak I'm currently looking
into, but it could be, and it's certainly a bug.  In /src/gp_creds.c,
in gp_get_cred_environment, we have:

[...]
    338     for (d = 0; d < svc->krb5.cred_count; d++) {
    339         p = strchr(svc->krb5.cred_store[d], ':');
    340         if (!p) {
    341             GPERROR("Invalid cred_store value"
    342                     "no ':' separator found in [%s].\n",
    343                     svc->krb5.cred_store[d]);
    344             ret = EINVAL;
    345             goto done;
    346         }
    347 
    348         if (strncmp(svc->krb5.cred_store[d], "client_keytab:", 14) == 0) {
    349             ck_num = c;
    350         } else if (strncmp(svc->krb5.cred_store[d], "keytab:", 7) == 0) {
    351             k_num = c;
    352         }
    353 
    354         ret = asprintf(&str, "%.*s", (int)(p - svc->krb5.cred_store[d]),
    355                                      svc->krb5.cred_store[d]);
    356         if (ret == -1) {
    357             ret = ENOMEM;
    358             goto done;
    359         }
    360         cs->elements[c].key = str;
    361 
    362         fmtstr = p + 1;
    363         cs->elements[c].value = get_formatted_string(fmtstr, target_uid);
    364         if (!cs->elements[c].value) {
    365             GPDEBUG("Failed to build credential store formatted string.\n");
    366             ret = ENOMEM;
    367             goto done;
    368         }
    369 
    370         c++;
    371     }
    372     cs->count = c;
[...]

"done" looks like:

[...]
    410 done:
    411     if (ret) {
    412         free_cred_store_elements(cs);
    413     }
    414     return ret;
    415 }
[...]

free_cred_store_elements looks like:

[...]
    206 static void free_cred_store_elements(gss_key_value_set_desc *cs)
    207 {
    208     int i;
    209 
    210     for (i = 0; i < cs->count; i++) {
    211         safefree(cs->elements[i].key);
    212         safefree(cs->elements[i].value);
    213     }
    214     safefree(cs->elements);
    215 }
[...]

safefree looks like:

     60 #define safefree(ptr) do { \
     61     free(no_const(ptr)); \
     62     ptr = NULL; \
     63 } while(0)

OK, cs->elements does get calloc(3)ed earlier in the
gp_get_cred_environment function.  For now, I'm going to assume that
cs->count comes into this function initialized to 0.  If it doesn't
I'll file another bug for that because other stuff can break causing
segfaults and whatnot if that's not the case.  Anyway, for now, we'll
just assume it's initialized properly and deal with that separately
later if that turns out not to be the case.

So, I can see what you're trying to do in that loop.  You're trying
to only fill in elements of the cs->elements array that matter.  So,
rather than having cs->count track the index from the source (the "d"
loop counter), we'll just keep track of how many elements we update in
the "c" temporary variable, and at the end of the loop, we'll update
cs->count to equal the number of elements we actually put in there.
Unfortunately, there are several places in that loop where we could
error out, and exit the function via the "done" label which should
free up everything in cs->elements, then free cs->elements itself.
However, if we ever do actually hit one of those conditions in that
loop, we never set cs->count, so it's either some random number
if not initialized properly in whatever created it (I'll file a
separate bug later if that turns out to be the case), or if it is
initialized properly to 0, well, none of the "cs->elements[i].key"
"cs->elements[i].value" memory will be freed.  Either way, it's a bug.
Given that valgrind had this:

==24113== 259,728 (148,416 direct, 111,312 indirect) bytes in 4,638 blocks are definitely lost in loss record 82 of 85
==24113==    at 0x4C2B974: calloc (in /usr/lib64/valgrind/vgpreload_memcheck-amd64-linux.so)
==24113==    by 0x408063: gp_get_cred_environment (gp_creds.c:331)
==24113==    by 0x408063: gp_add_krb5_creds (gp_creds.c:461)
==24113==    by 0x40DD15: gp_accept_sec_context (gp_rpc_accept_sec_context.c:79)
==24113==    by 0x40ADC0: gp_rpc_execute (gp_rpc_process.c:343)
==24113==    by 0x40ADC0: gp_rpc_process_call (gp_rpc_process.c:400)
==24113==    by 0x4073CB: gp_handle_query (gp_workers.c:447)
==24113==    by 0x4073CB: gp_worker_main (gp_workers.c:401)
==24113==    by 0x6822DC4: start_thread (pthread_create.c:308)
==24113==    by 0x6B2DCEC: clone (clone.S:113)

to say about the leak I'm currently tracking down, I'd say this could
easily be at least part of the problem, because _not_ freeing a bunch
of ...value and ...key pairs , but actually freeing the ...elements
array that contains pointers to those two things would generate a
bunch of indirect leaks, and I think it would show up in the loss record
indicated because it's actually the freeing of ...elements that causes
the indirect leak, and the code stack indicated would be what allocates
that ...elements memory.  I think that's the way it would work....

Anyway, it's not the whole problem, for certain, because I don't
think this would cause any of the direct leaks, but it could certainly
cause indirect ones, and it's certainly a bug.

The good news is that I think it's probably pretty easy to fix.
The way the code currently stands, I think just changing the
initialization of the "c" temporary variable to 0 to up where it's
defined at the top of the function (instead of just before the loop),
and then changing the "done" code to:

done:
    if (!cs->count) cs->count = c;
    if (ret) {
        free_cred_store_elements(cs);
    }
    return ret;

should probably cover it without making anything any worse.  Verify that
for yourself, though.  That was the first thing to pop into my wee-tiny
little brain, and it "looks" right to me, but....

Version-Release number of selected component (if applicable):

gssproxy-0.4.1-13.el7

How reproducible:

Every time.  Just look at the code, and there it will be until someone
changes it!  :-)

Steps to Reproduce:
1. Look at the code.
2. See the bug.
3. Slap self in forehead.  Pounding head on desk, weeping and wailing
are all optional, of course.

Actual results:

Indirect memory leak.

Expected results:

Memory not leaked, indirectly or otherwisely.

Additional info:

Comment 6 Luc Lalonde 2017-03-22 10:46:21 UTC

We're seeing a memory leak in gssproxy:

top - 16:45:35 up 145 days, 19:28,  1 user,  load average: 6,88, 9,05, 10,41
Tasks: 235 total,   4 running, 231 sleeping,   0 stopped,   0 zombie
%Cpu(s):  6,3 us,  9,7 sy,  0,0 ni, 30,6 id, 52,4 wa,  0,0 hi,  1,0 si,  0,0 st
KiB Mem : 16268684 total,  1565936 free,  3803752 used, 10898996 buff/cache
KiB Swap:  8388604 total,     8872 free,  8379732 used.  3639696 avail Mem 

  PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND                                                                     
  745 root      20   0 10,970g 3,228g   1568 S   0,0 20,8  84:12.88 gssproxy     

After killing and restarting gssproxy:

top - 16:54:56 up 145 days, 19:37,  1 user,  load average: 5,70, 6,66, 8,63
Tasks: 234 total,   2 running, 232 sleeping,   0 stopped,   0 zombie
%Cpu(s): 29,2 us, 10,9 sy,  0,0 ni, 32,2 id, 26,0 wa,  0,0 hi,  1,8 si,  0,0 st
KiB Mem : 16268684 total,  1601892 free,   404088 used, 14262704 buff/cache
KiB Swap:  8388604 total,  7884064 free,   504540 used.  7037336 avail Mem 

Here's the version that was causing the problem:
gssproxy-0.4.1-8.el7_2.x86_64

We are seeing this on a NFSv4 (krb5),  kernel-3.10.0-327.36.3.el7.x86_64

Comment 7 Thomas Gardner 2017-03-24 14:40:00 UTC

(In reply to Luc Lalonde from comment #6)
> We're seeing a memory leak in gssproxy:
> 
> top - 16:45:35 up 145 days, 19:28,  1 user,  load average: 6,88, 9,05, 10,41
> Tasks: 235 total,   4 running, 231 sleeping,   0 stopped,   0 zombie
> %Cpu(s):  6,3 us,  9,7 sy,  0,0 ni, 30,6 id, 52,4 wa,  0,0 hi,  1,0 si,  0,0
> st
> KiB Mem : 16268684 total,  1565936 free,  3803752 used, 10898996 buff/cache
> KiB Swap:  8388604 total,     8872 free,  8379732 used.  3639696 avail Mem 
> 
>   PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND 
> 
>   745 root      20   0 10,970g 3,228g   1568 S   0,0 20,8  84:12.88 gssproxy
> 
> 
> After killing and restarting gssproxy:
> 
> top - 16:54:56 up 145 days, 19:37,  1 user,  load average: 5,70, 6,66, 8,63
> Tasks: 234 total,   2 running, 232 sleeping,   0 stopped,   0 zombie
> %Cpu(s): 29,2 us, 10,9 sy,  0,0 ni, 32,2 id, 26,0 wa,  0,0 hi,  1,8 si,  0,0
> st
> KiB Mem : 16268684 total,  1601892 free,   404088 used, 14262704 buff/cache
> KiB Swap:  8388604 total,  7884064 free,   504540 used.  7037336 avail Mem 
> 
> Here's the version that was causing the problem:
> gssproxy-0.4.1-8.el7_2.x86_64
> 
> We are seeing this on a NFSv4 (krb5),  kernel-3.10.0-327.36.3.el7.x86_64

A LOT of memory leaks (some quite large) were fixed in gssproxy-0.6.2-4.el7 .  There were others that I didn't even track down from the valgrind output I had from the customer I was helping at the time, but I tracked down and submitted bugs for the biggest ones.  I recommend that you update to the latest one and if you still see a lot of memory still being leaked, open a ticket with RH support and we can work with you to track down some more.

Comment 8 Luc Lalonde 2017-03-24 16:30:19 UTC

I'm using CentOS (7.2.1511) and I've updated to the latest available version:

gssproxy-0.4.1-13.el7.x86_64

So I guess I have to bide my time until the update finishes QA and reaches the downstream repos.

Comment 11 Abhijeet Kasurde 2017-05-22 14:04:25 UTC

Verified using GSSProxy :: gssproxy-0.7.0-3.el7.x86_64

Marking BZ as verified as sanityonly.

Comment 12 errata-xmlrpc 2017-08-01 20:55:26 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2017:2033

Note You need to log in before you can comment on or make changes to this bug.