Hide Forgot
+++ This bug was initially created as a clone of Bug #1311287 +++ +++ This bug was initially created as a clone of Bug #1309730 +++ Description of problem: selabel_lookup leaks memory per call on RHEL 6.7 under Samba. Richard Sharpe of the Samba Team reported, that his malloc_hook experiment suggests that selabel_linux on selinux-2.0.94-5.8.el6 (for RHEL/CentOS 6.7) is leaking ~10 MB of memory under Samba. Following is the stack trace: [2016/02/15 14:12:37.713576, 0] ../source3/smbd/server.c:92(samba_malloc_hook) 2048 allocate at 0x7f8bbf2fd930 [2016/02/15 14:12:37.713724, 0] ../source3/smbd/server.c:85(samba_malloc_hook) malloc size of 2048 requested [2016/02/15 14:12:37.714175, 0] ../source3/lib/util.c:901(log_stack_trace) BACKTRACE: 46 stack frames: #0 /usr/lib/libsmbconf.so.0(log_stack_trace+0x1f) [0x7f8bbc22e542] #1 smbd(+0xa16c) [0x7f8bbe84416c] #2 /lib64/libc.so.6(__libc_calloc+0x331) [0x7f8bba78fa51] #3 /lib64/libc.so.6(+0xbe7db) [0x7f8bba7d37db] #4 /lib64/libc.so.6(+0xc91a9) [0x7f8bba7de1a9] #5 /lib64/libc.so.6(regexec+0xc3) [0x7f8bba7de553] #6 /lib64/libselinux.so.1(+0xefc7) [0x7f8baeb60fc7] #7 /lib64/libselinux.so.1(+0xd22a) [0x7f8baeb5f22a] #8 /lib64/libselinux.so.1(selabel_lookup+0xe) [0x7f8baeb5f36e] #9 /lib64/libkrb5support.so.0(+0x3af6) [0x7f8bbe653af6] #10 /lib64/libkrb5support.so.0(krb5int_push_fscreatecon_for+0x8c) [0x7f8bbe653dec] #11 /lib64/libkrb5.so.3(+0x78d01) [0x7f8bbe70cd01] #12 /lib64/libkrb5.so.3(+0x78b8e) [0x7f8bbe70cb8e] #13 /lib64/libkrb5.so.3(+0x78ecf) [0x7f8bbe70cecf] #14 /lib64/libkrb5.so.3(krb5_get_server_rcache+0x1a1) [0x7f8bbe709101] #15 /lib64/libkrb5.so.3(+0x6d584) [0x7f8bbe701584] #16 /lib64/libkrb5.so.3(krb5_rd_req_decoded+0x2a) [0x7f8bbe7018ea] #17 /lib64/libkrb5.so.3(krb5_rd_req+0xbd) [0x7f8bbe7008fd] #18 /lib64/libgssapi_krb5.so.2(+0x1b4d5) [0x7f8bb1f6a4d5] #19 /lib64/libgssapi_krb5.so.2(+0x1cc3a) [0x7f8bb1f6bc3a] #20 /lib64/libgssapi_krb5.so.2(+0x1cd89) [0x7f8bb1f6bd89] #21 /lib64/libgssapi_krb5.so.2(gss_accept_sec_context+0x20a) [0x7f8bb1f5e53a] #22 /usr/lib/samba/libgse-samba4.so(+0xd830) [0x7f8bb6f1e830] #23 /usr/lib/samba/libgse-samba4.so(+0xe6be) [0x7f8bb6f1f6be] #24 /usr/lib/libgensec.so.0(gensec_update_ev+0xc8) [0x7f8bb6d0034d] #25 /usr/lib/libgensec.so.0(+0xb650) [0x7f8bb6cee650] #26 /usr/lib/libgensec.so.0(+0xc717) [0x7f8bb6cef717] #27 /usr/lib/libgensec.so.0(+0xdd93) [0x7f8bb6cf0d93] #28 /usr/lib/libgensec.so.0(+0x1d8d4) [0x7f8bb6d008d4] #29 /usr/lib/libtevent.so.0(tevent_common_loop_immediate+0x1f9) [0x7f8bbaaae384] #30 /usr/lib/libsmbconf.so.0(run_events_poll+0x57) [0x7f8bbc24fdf5] #31 /usr/lib/libsmbconf.so.0(+0x464a2) [0x7f8bbc2504a2] #32 /usr/lib/libtevent.so.0(_tevent_loop_once+0xfc) [0x7f8bbaaad449] #33 /usr/lib/libtevent.so.0(tevent_common_loop_wait+0x25) [0x7f8bbaaad6c1] #34 /usr/lib/libtevent.so.0(_tevent_loop_wait+0x2b) [0x7f8bbaaad78c] #35 /usr/lib/samba/libsmbd-base-samba4.so(smbd_process+0xc49) [0x7f8bbdd5244b] #36 smbd(+0xb815) [0x7f8bbe845815] #37 /usr/lib/libsmbconf.so.0(run_events_poll+0x544) [0x7f8bbc2502e2] #38 /usr/lib/libsmbconf.so.0(+0x465b8) [0x7f8bbc2505b8] #39 /usr/lib/libtevent.so.0(_tevent_loop_once+0xfc) [0x7f8bbaaad449] #40 /usr/lib/libtevent.so.0(tevent_common_loop_wait+0x25) [0x7f8bbaaad6c1] #41 /usr/lib/libtevent.so.0(_tevent_loop_wait+0x2b) [0x7f8bbaaad78c] #42 smbd(+0xc5a9) [0x7f8bbe8465a9] #43 smbd(main+0x159c) [0x7f8bbe847cf7] #44 /lib64/libc.so.6(__libc_start_main+0xfd) [0x7f8bba733d5d] #45 smbd(+0x6029) [0x7f8bbe840029] [2016/02/15 14:12:37.716145, 0] ../source3/smbd/server.c:92(samba_malloc_hook) 2048 allocate at 0x7f8bbf2ff710 After messing with changing the krb5-libs-1.10.3, Richard hacked out the guts of the call to krb5int_push_fscreatecon_for above and the problem went away and Samba now has a nice small heap. See the thread on the samba-technical mailinglist: https://lists.samba.org/archive/samba-technical/2016-February/112237.html --- Additional comment from Richard Sharpe on 2016-02-18 10:35:35 EST --- Versions: RHEL/CentOS 6.7, krb5-libs-1.10.3-37.el6.x86_64 and libselinux-2.0.94-5.8.el6.x86_64. After I hacked out this code in the Kerberos library I stopped seeing the leak: --- krb5-1.10.3/src/util/support/selinux.c.orig 2016-02-17 12:54:22.102145416 -0800 +++ krb5-1.10.3/src/util/support/selinux.c 2016-02-17 13:01:23.185096415 -0800 @@ -252,18 +252,7 @@ void * krb5int_push_fscreatecon_for(const char *pathname) { - struct stat st; - void *retval; - k5_once(&labeled_once, label_mutex_init); - if (k5_mutex_lock(&labeled_mutex) == 0) { - if (stat(pathname, &st) != 0) { - st.st_mode = S_IRUSR | S_IWUSR; - } - retval = push_fscreatecon(pathname, st.st_mode); - return retval ? retval : (void *) -1; - } else { - return NULL; - } + return NULL; } void --- Additional comment from Jeremy Allison on 2016-02-18 14:35 EST --- Andreas, Here is the patch. If the init() function inside libselinux-2.0.94/src/label_file.c fails it doesn't properly clean up the half-constructed regexp regex_t contexts (or any of the other allocated blobs). Let me know if this works. --- Additional comment from Petr Lautrbach on 2016-02-18 15:35:09 EST --- There seems to be a leak in selabel_lookup() which tries to translate raw contexts via mcstransd. selabel_lookup_raw() could be used in krb5 to prevent it: --- krb5-1.10.3/src/util/support/selinux.c.orig 2016-02-18 21:11:50.420462236 +0100 +++ krb5-1.10.3/src/util/support/selinux.c 2016-02-18 21:12:02.833398079 +0100 @@ -152,7 +152,7 @@ NULL, 0); } if (selabel_ctx != NULL) { - if (selabel_lookup(selabel_ctx, &configuredsc, + if (selabel_lookup_raw(selabel_ctx, &configuredsc, fullpath, mode) != 0) { free(genpath); if (previous != NULL) { --- Additional comment from Miroslav Grepl on 2016-02-18 16:11:52 EST --- Nice catch. Actually *_raw libselinux functions should be used where it is possible (if there is no requirement for readable contexts). We can avoid the overhead caused by context translating. --- Additional comment from Jeremy Allison on 2016-02-18 16:41:27 EST --- Changing to using that function doesn't fix an underlying leak though. Plus I'm pretty sure the error path in the init() function will leak memory (see my attached patch). --- Additional comment from Richard Sharpe on 2016-02-18 21:17:40 EST --- So, I tried Jeremy's fix, and although I think he is correct about the problems there, it does not fix the problem. I will try one of the other suggestions. --- Additional comment from Richard Sharpe on 2016-02-19 10:47:14 EST --- It looks like that this problem is fixed by combining Petr's suggestion and Jeremy's suggestion. I need to do some more checking which I will do later today, but it does seem to be fixed now. --- Additional comment from Richard Sharpe on 2016-02-19 10:48:39 EST --- (In reply to Richard Sharpe from comment #12) > It looks like that this problem is fixed by combining Petr's suggestion and > Jeremy's suggestion. > > I need to do some more checking which I will do later today, but it does > seem to be fixed now. Sorry, I was wrong. One of the smbd's is showing an RSS of 31M which is a reliable indicator that it has the problem. --- Additional comment from Richard Sharpe on 2016-02-19 11:35:19 EST --- The test we have for this is to run lots of VDI clients against a Samba server, however, it is probable that a long-running smbtorture nbench test using Kerberos auth will also turn this up. I have a strategy for finding the problem. Since I am hooking malloc and am hitting on these 2048-byte allocations, I can add an smb_panic in that code path. If I have the debuginfo RPMs installed for both krb5-libs and libselinux I should get enough info to pinpoint exactly where the mallocs are coming from and maybe see how to fix it. --- Additional comment from Jeremy Allison on 2016-02-19 13:04:25 EST --- +1 on the smb_panic (or abort() strategy). I've used this successfully in the past to find problematic mallocs. --- Additional comment from Richard Sharpe on 2016-02-19 14:42:28 EST --- These appear to be the lines/locations accessed: (gdb) info line *0x7efeb085732b Line 234 of "label.c" starts at address 0x7efeb085732b <selabel_lookup_raw+11> and ends at 0x7efeb0857330 <selabel_lookup_raw+16>. (gdb) info line *0x7efeb085722a Line 200 of "label.c" starts at address 0x7efeb085721e <selabel_lookup_common+110> and ends at 0x7efeb085722d <selabel_lookup_common+125>. (gdb) info line *0x7efeb08586cb Line 621 of "label_file.c" starts at address 0x7efeb08586b8 <lookup+344> and ends at 0x7efeb08586cd <lookup+365>. They seem to match the new stack where I am using selabel_lookup_raw: BACKTRACE: 46 stack frames: #0 /usr/lib/libsmbconf.so.0(log_stack_trace+0x1f) [0x7efebe843542] #1 smbd(+0xa16c) [0x7efec0e5916c] #2 /lib64/libc.so.6(__libc_calloc+0x331) [0x7efebcda4a51] #3 /lib64/libc.so.6(+0x38d40be7db) [0x7efebcde87db] #4 /lib64/libc.so.6(+0x38d40c91a9) [0x7efebcdf31a9] #5 /lib64/libc.so.6(regexec+0xc3) [0x7efebcdf3553] #6 /lib64/libselinux.so.1(+0xe6cb) [0x7efeb08586cb] #7 /lib64/libselinux.so.1(+0xd22a) [0x7efeb085722a] #8 /lib64/libselinux.so.1(selabel_lookup_raw+0xb) [0x7efeb085732b] #9 /lib64/libkrb5support.so.0(+0x3b5a) [0x7efeb0c6fb5a] #10 /lib64/libkrb5support.so.0(krb5int_push_fscreatecon_for+0x76) [0x7efeb0c6ffa6] #11 /lib64/libkrb5.so.3(+0x7be62) [0x7efeb495ce62] #12 /lib64/libkrb5.so.3(+0x7c196) [0x7efeb495d196] #13 /lib64/libkrb5.so.3(+0x7c935) [0x7efeb495d935] #14 /lib64/libkrb5.so.3(krb5_get_server_rcache+0x1a1) [0x7efeb49593f1] #15 /lib64/libkrb5.so.3(+0x700e7) [0x7efeb49510e7] #16 /lib64/libkrb5.so.3(krb5_rd_req_decoded+0x2a) [0x7efeb495145a] #17 /lib64/libkrb5.so.3(krb5_rd_req+0xf9) [0x7efeb4950489] #18 /lib64/libgssapi_krb5.so.2(+0x1c691) [0x7efeb3e6e691] #19 /lib64/libgssapi_krb5.so.2(+0x1dfca) [0x7efeb3e6ffca] #20 /lib64/libgssapi_krb5.so.2(+0x1e129) [0x7efeb3e70129] #21 /lib64/libgssapi_krb5.so.2(gss_accept_sec_context+0x233) [0x7efeb3e5fb43] --- Additional comment from Jeremy Allison on 2016-02-19 15:25:18 EST --- OK, I see the issue. Both selabel_lookup_raw() and selabel_lookup() call selabel_lookup_common() which can leak memory on the: static struct selabel_handle *selabel_ctx inside krb5/src/util/support/selinux.c. This fits with the regexec() source for the mallocs. Inside libselinux-2.0.94/src/label.c we have: ---------------------------------------------------------- static struct selabel_lookup_rec * selabel_lookup_common(struct selabel_handle *rec, int translating, const char *key, int type) { struct selabel_lookup_rec *lr; char *ptr = selabel_sub(key); if (ptr) { lr = rec->func_lookup(rec, ptr, type); free(ptr); } else { lr = rec->func_lookup(rec, key, type); } if (!lr) return NULL; if (compat_validate(rec, lr, "file_contexts", 0)) return NULL; if (translating && !lr->ctx_trans && selinux_raw_to_trans_context(lr->ctx_raw, &lr->ctx_trans)) return NULL; return lr; } ---------------------------------------------------------- The rec->func_lookup() call can allocate memory (on the regexec context), and if compat_validate() fails or the next if clause fails then the memory is left allocated on selabel_ctx inside krb5/src/util/support/selinux.c as there is no cleanup operation called. This memory will hang around until the cleanup_fscreatecon() function in krb5/src/util/support/selinux.c is called which calls selabel_close(selabel_ctx). However, that is never called explicitly, only as a MAKE_FINI_FUNCTION, which is a destructor called when the library is unloaded. Which never happens of course as this is a long-lived daemon. There needs to be a way to call selabel_close() and re-open on failure of selabel_lookup(). --- Additional comment from Jeremy Allison on 2016-02-19 15:30:34 EST --- Just to clarify: rec->func_lookup() can call static struct selabel_lookup_rec *lookup(struct selabel_handle *rec, const char *key, int type) inside libselinux-2.0.94/src/label_file.c. That function calls compile_regex(), which then calls regcomp(), but there's no way to call regfree() on an error outside this module - so the memory allocated on regcomp just sits around. The error handling in these functions is very poor - there's no proper cleanup at all. My original patch for error cleanup inside the init function is still needed also :-). --- Additional comment from Jeremy Allison on 2016-02-19 15:38:00 EST --- Richard, can you try this fix to krb5/src/util/support/selinux.c (this version is a patch to the patch that gets applied inside krb5-1.10.2 but it should be clear what is needed - the extra cleanup_fscreatecon() call if the selabel_lookup() call fails. This should be safe - it looks like the the selabel_ctx is re-created on the next request if it's set to NULL (which cleanup_fscreatecon() will do). diff -U 6 krb5-1.10.2-selinux-label.patch.orig krb5-1.10.2-selinux-label.patch --- krb5-1.10.2-selinux-label.patch.orig 2016-02-19 12:32:35.992091371 -0800 +++ krb5-1.10.2-selinux-label.patch 2016-02-19 12:33:34.708495344 -0800 @@ -623,12 +623,13 @@ + if (selabel_lookup(selabel_ctx, &configuredsc, + fullpath, mode) != 0) { + free(genpath); + if (previous != NULL) { + freecon(previous); + } ++ cleanup_fscreatecon(); + return NULL; + } + } +#else + if (matchpathcon(fullpath, mode, &configuredsc) != 0) { + free(genpath); --- Additional comment from Richard Sharpe on 2016-02-19 22:05:21 EST --- Hmm, that is a patch to a patch. My head hurts. OK, I think you are adding one line: a call to cleanup_fscreatecon() in the error path. I will apply that change using a separate patch ... and test it out. Then RedHat can figure out how to package it. --- Additional comment from Richard Sharpe on 2016-02-19 22:28:10 EST --- Unfortunately, that does not seem to be enough. I still see some smbds leaking lots of memory. I will have to try to figure it out this weekend. --- Additional comment from Jeremy Allison on 2016-02-19 23:00:19 EST --- Yep, it's a patch to a patch that just adds cleanup_fscreatecon() in the error path. Hmmm. You know, you might need to add cleanup_fscreatecon() on successful exit also. I'm at home now with no access to the source code I've been working on, but it might be that the selabel_ctx needs to get freed before anything gets freed of those compiled regexps. More on Monday when I'm back in at work (sorry). --- Additional comment from Richard Sharpe on 2016-02-19 23:10:58 EST --- What is curious, unless the info provided by gdb above is wrong, the memory that is leaked seems to be being allocated in regexec. Time to look at glibc. --- Additional comment from Richard Sharpe on 2016-02-20 10:15:03 EST --- Hmmm, since all the callers of krb5int_push_fscreatecon_for also call krb5_pop_fscreatecon immediately after they have done whatever they were doing, it seems like there should be a call to cleanup_fscreatecon() in krb5int_pop_fscreatecon. That's what I am going with. --- Additional comment from Richard Sharpe on 2016-02-20 10:43 EST --- This patch, along with Jeremy's earlier patch seems to fix the problem. I will report back later today on whether or not it is holding up. However, after 20 minutes of run time I am not seeing any smbds with large amounts of heap. --- Additional comment from Richard Sharpe on 2016-02-20 11:16:45 EST --- Hmmm, not yet fixed, or maybe a different problem. There is still one smbd that has crept up to 21MB in its heap. Checking the problem now. --- Additional comment from Richard Sharpe on 2016-02-20 11:20:24 EST --- This time around there seems to be 2553 segments of 3680 bytes in the heap that should not be there. This still points to the krb5/selinux axis of evil, so I will look at tracking this down. --- Additional comment from Jeremy Allison on 2016-02-20 11:36:24 EST --- "What is curious, unless the info provided by gdb above is wrong, the memory that is leaked seems to be being allocated in regexec" No it's not curious, that's what I kept telling you above :-). The regcomp() calls allocate this memory on the selabel_ctx, and it won't get freed until the loop calling regfree() gets done when the ctx cleanup function is done. Can you use the smb_panic method method to track down the malloc location of the remaining 3680 allocations ? Then we can nail that too. --- Additional comment from Richard Sharpe on 2016-02-20 11:54:56 EST --- (In reply to Jeremy Allison from comment #28) > "What is curious, unless the info provided by gdb above is wrong, the memory > that is leaked seems to be being allocated in regexec" > > No it's not curious, that's what I kept telling you above :-). The regcomp() > calls allocate this memory on the selabel_ctx, and it won't get freed until > the loop calling regfree() gets done when the ctx cleanup function is done. > > Can you use the smb_panic method method to track down the malloc location of > the remaining 3680 allocations ? Then we can nail that too. If I get desperate I will. However, I noticed that there are other return paths out of push_fscreatecon that return NULL but have already done a lookup. So, I changed approaches and am cleaning up in krb5int_push_fscreatcon_for as well as krb5int_pop_fscreatecon. It that fixes this last problem I will call it a day. If not, then on to the smb_panic method. --- Additional comment from Jeremy Allison on 2016-02-20 13:07:44 EST --- Yeah, someone *seriously* needs to run Coverity over this code. --- Additional comment from Richard Sharpe on 2016-02-20 16:46:00 EST --- OK, after changing Samba to try to detect the 3680-byte segments I am no longer seeing the leak with my final fixed change for krb5-1.10.3, so I will post the here. I think the problem I saw was due to Samba's DEBUG facility causing problems but am not sure. I have run the test for more than four hours and am not seeing any smbds with large amounts of heap. --- Additional comment from Richard Sharpe on 2016-02-20 17:25 EST --- This is a better fix, I think. It applies last and seems to resolve the issue. --- Additional comment from Jeremy Allison on 2016-02-21 01:50:01 EST --- Looks like you're calling cleanup_fscreatecon() outside of the protection of the mutex k5_mutex_unlock(&labeled_mutex). Is that safe ? --- Additional comment from Richard Sharpe on 2016-02-21 10:25:07 EST --- (In reply to Jeremy Allison from comment #33) > Comment on attachment 1128878 [details] > I think this is a more correct fix > > Looks like you're calling cleanup_fscreatecon() outside of the protection of > the mutex k5_mutex_unlock(&labeled_mutex). Is that safe ? Good point. For Samba it probably doesn't matter as we will only ever call this from one thread, but we should keep other users in mind. I need to move the final cleanup_fscreatecon() to before we release the mutex.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://rhn.redhat.com/errata/RHSA-2016-2591.html