Bug 1379482
| Summary: | Indirect memory leak bug in /src/gp_creds.c (gssproxy). | ||
|---|---|---|---|
| Product: | Red Hat Enterprise Linux 7 | Reporter: | Thomas Gardner <thgardne> |
| Component: | gssproxy | Assignee: | Robbie Harwood <rharwood> |
| Status: | CLOSED ERRATA | QA Contact: | Abhijeet Kasurde <akasurde> |
| Severity: | medium | Docs Contact: | |
| Priority: | high | ||
| Version: | 7.4 | CC: | dpal, fs-qe, ipa-qe, luc.lalonde, nsoman, rharwood, yoyang |
| Target Milestone: | rc | ||
| Target Release: | --- | ||
| Hardware: | Unspecified | ||
| OS: | Unspecified | ||
| URL: | https://pagure.io/gssproxy/pull-request/33 | ||
| Whiteboard: | |||
| Fixed In Version: | gssproxy-0.6.2-4.el7 | Doc Type: | No Doc Update |
| Doc Text: |
Fixed several memory leaks in gssproxy. (Group 1379005, 1379482, 1379616, 1380490 together as a single line item.)
|
Story Points: | --- |
| Clone Of: | Environment: | ||
| Last Closed: | 2017-08-01 20:55:26 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
| Bug Depends On: | |||
| Bug Blocks: | 1298243, 1399979 | ||
We're seeing a memory leak in gssproxy: top - 16:45:35 up 145 days, 19:28, 1 user, load average: 6,88, 9,05, 10,41 Tasks: 235 total, 4 running, 231 sleeping, 0 stopped, 0 zombie %Cpu(s): 6,3 us, 9,7 sy, 0,0 ni, 30,6 id, 52,4 wa, 0,0 hi, 1,0 si, 0,0 st KiB Mem : 16268684 total, 1565936 free, 3803752 used, 10898996 buff/cache KiB Swap: 8388604 total, 8872 free, 8379732 used. 3639696 avail Mem PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 745 root 20 0 10,970g 3,228g 1568 S 0,0 20,8 84:12.88 gssproxy After killing and restarting gssproxy: top - 16:54:56 up 145 days, 19:37, 1 user, load average: 5,70, 6,66, 8,63 Tasks: 234 total, 2 running, 232 sleeping, 0 stopped, 0 zombie %Cpu(s): 29,2 us, 10,9 sy, 0,0 ni, 32,2 id, 26,0 wa, 0,0 hi, 1,8 si, 0,0 st KiB Mem : 16268684 total, 1601892 free, 404088 used, 14262704 buff/cache KiB Swap: 8388604 total, 7884064 free, 504540 used. 7037336 avail Mem Here's the version that was causing the problem: gssproxy-0.4.1-8.el7_2.x86_64 We are seeing this on a NFSv4 (krb5), kernel-3.10.0-327.36.3.el7.x86_64 (In reply to Luc Lalonde from comment #6) > We're seeing a memory leak in gssproxy: > > top - 16:45:35 up 145 days, 19:28, 1 user, load average: 6,88, 9,05, 10,41 > Tasks: 235 total, 4 running, 231 sleeping, 0 stopped, 0 zombie > %Cpu(s): 6,3 us, 9,7 sy, 0,0 ni, 30,6 id, 52,4 wa, 0,0 hi, 1,0 si, 0,0 > st > KiB Mem : 16268684 total, 1565936 free, 3803752 used, 10898996 buff/cache > KiB Swap: 8388604 total, 8872 free, 8379732 used. 3639696 avail Mem > > PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND > > 745 root 20 0 10,970g 3,228g 1568 S 0,0 20,8 84:12.88 gssproxy > > > After killing and restarting gssproxy: > > top - 16:54:56 up 145 days, 19:37, 1 user, load average: 5,70, 6,66, 8,63 > Tasks: 234 total, 2 running, 232 sleeping, 0 stopped, 0 zombie > %Cpu(s): 29,2 us, 10,9 sy, 0,0 ni, 32,2 id, 26,0 wa, 0,0 hi, 1,8 si, 0,0 > st > KiB Mem : 16268684 total, 1601892 free, 404088 used, 14262704 buff/cache > KiB Swap: 8388604 total, 7884064 free, 504540 used. 7037336 avail Mem > > Here's the version that was causing the problem: > gssproxy-0.4.1-8.el7_2.x86_64 > > We are seeing this on a NFSv4 (krb5), kernel-3.10.0-327.36.3.el7.x86_64 A LOT of memory leaks (some quite large) were fixed in gssproxy-0.6.2-4.el7 . There were others that I didn't even track down from the valgrind output I had from the customer I was helping at the time, but I tracked down and submitted bugs for the biggest ones. I recommend that you update to the latest one and if you still see a lot of memory still being leaked, open a ticket with RH support and we can work with you to track down some more. I'm using CentOS (7.2.1511) and I've updated to the latest available version: gssproxy-0.4.1-13.el7.x86_64 So I guess I have to bide my time until the update finishes QA and reaches the downstream repos. Verified using GSSProxy :: gssproxy-0.7.0-3.el7.x86_64 Marking BZ as verified as sanityonly. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2017:2033 |
Description of problem: While looking into multiple memory leaks, I came across this. It may or not be the cause of the particular leak I'm currently looking into, but it could be, and it's certainly a bug. In /src/gp_creds.c, in gp_get_cred_environment, we have: [...] 338 for (d = 0; d < svc->krb5.cred_count; d++) { 339 p = strchr(svc->krb5.cred_store[d], ':'); 340 if (!p) { 341 GPERROR("Invalid cred_store value" 342 "no ':' separator found in [%s].\n", 343 svc->krb5.cred_store[d]); 344 ret = EINVAL; 345 goto done; 346 } 347 348 if (strncmp(svc->krb5.cred_store[d], "client_keytab:", 14) == 0) { 349 ck_num = c; 350 } else if (strncmp(svc->krb5.cred_store[d], "keytab:", 7) == 0) { 351 k_num = c; 352 } 353 354 ret = asprintf(&str, "%.*s", (int)(p - svc->krb5.cred_store[d]), 355 svc->krb5.cred_store[d]); 356 if (ret == -1) { 357 ret = ENOMEM; 358 goto done; 359 } 360 cs->elements[c].key = str; 361 362 fmtstr = p + 1; 363 cs->elements[c].value = get_formatted_string(fmtstr, target_uid); 364 if (!cs->elements[c].value) { 365 GPDEBUG("Failed to build credential store formatted string.\n"); 366 ret = ENOMEM; 367 goto done; 368 } 369 370 c++; 371 } 372 cs->count = c; [...] "done" looks like: [...] 410 done: 411 if (ret) { 412 free_cred_store_elements(cs); 413 } 414 return ret; 415 } [...] free_cred_store_elements looks like: [...] 206 static void free_cred_store_elements(gss_key_value_set_desc *cs) 207 { 208 int i; 209 210 for (i = 0; i < cs->count; i++) { 211 safefree(cs->elements[i].key); 212 safefree(cs->elements[i].value); 213 } 214 safefree(cs->elements); 215 } [...] safefree looks like: 60 #define safefree(ptr) do { \ 61 free(no_const(ptr)); \ 62 ptr = NULL; \ 63 } while(0) OK, cs->elements does get calloc(3)ed earlier in the gp_get_cred_environment function. For now, I'm going to assume that cs->count comes into this function initialized to 0. If it doesn't I'll file another bug for that because other stuff can break causing segfaults and whatnot if that's not the case. Anyway, for now, we'll just assume it's initialized properly and deal with that separately later if that turns out not to be the case. So, I can see what you're trying to do in that loop. You're trying to only fill in elements of the cs->elements array that matter. So, rather than having cs->count track the index from the source (the "d" loop counter), we'll just keep track of how many elements we update in the "c" temporary variable, and at the end of the loop, we'll update cs->count to equal the number of elements we actually put in there. Unfortunately, there are several places in that loop where we could error out, and exit the function via the "done" label which should free up everything in cs->elements, then free cs->elements itself. However, if we ever do actually hit one of those conditions in that loop, we never set cs->count, so it's either some random number if not initialized properly in whatever created it (I'll file a separate bug later if that turns out to be the case), or if it is initialized properly to 0, well, none of the "cs->elements[i].key" "cs->elements[i].value" memory will be freed. Either way, it's a bug. Given that valgrind had this: ==24113== 259,728 (148,416 direct, 111,312 indirect) bytes in 4,638 blocks are definitely lost in loss record 82 of 85 ==24113== at 0x4C2B974: calloc (in /usr/lib64/valgrind/vgpreload_memcheck-amd64-linux.so) ==24113== by 0x408063: gp_get_cred_environment (gp_creds.c:331) ==24113== by 0x408063: gp_add_krb5_creds (gp_creds.c:461) ==24113== by 0x40DD15: gp_accept_sec_context (gp_rpc_accept_sec_context.c:79) ==24113== by 0x40ADC0: gp_rpc_execute (gp_rpc_process.c:343) ==24113== by 0x40ADC0: gp_rpc_process_call (gp_rpc_process.c:400) ==24113== by 0x4073CB: gp_handle_query (gp_workers.c:447) ==24113== by 0x4073CB: gp_worker_main (gp_workers.c:401) ==24113== by 0x6822DC4: start_thread (pthread_create.c:308) ==24113== by 0x6B2DCEC: clone (clone.S:113) to say about the leak I'm currently tracking down, I'd say this could easily be at least part of the problem, because _not_ freeing a bunch of ...value and ...key pairs , but actually freeing the ...elements array that contains pointers to those two things would generate a bunch of indirect leaks, and I think it would show up in the loss record indicated because it's actually the freeing of ...elements that causes the indirect leak, and the code stack indicated would be what allocates that ...elements memory. I think that's the way it would work.... Anyway, it's not the whole problem, for certain, because I don't think this would cause any of the direct leaks, but it could certainly cause indirect ones, and it's certainly a bug. The good news is that I think it's probably pretty easy to fix. The way the code currently stands, I think just changing the initialization of the "c" temporary variable to 0 to up where it's defined at the top of the function (instead of just before the loop), and then changing the "done" code to: done: if (!cs->count) cs->count = c; if (ret) { free_cred_store_elements(cs); } return ret; should probably cover it without making anything any worse. Verify that for yourself, though. That was the first thing to pop into my wee-tiny little brain, and it "looks" right to me, but.... Version-Release number of selected component (if applicable): gssproxy-0.4.1-13.el7 How reproducible: Every time. Just look at the code, and there it will be until someone changes it! :-) Steps to Reproduce: 1. Look at the code. 2. See the bug. 3. Slap self in forehead. Pounding head on desk, weeping and wailing are all optional, of course. Actual results: Indirect memory leak. Expected results: Memory not leaked, indirectly or otherwisely. Additional info: