Bug 1309730
| Summary: | selinux leaks 10-12MB memory in Samba | ||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Product: | Red Hat Enterprise Linux 6 | Reporter: | Andreas Schneider <asn> | ||||||||||
| Component: | libselinux | Assignee: | Petr Lautrbach <plautrba> | ||||||||||
| Status: | CLOSED ERRATA | QA Contact: | Milos Malik <mmalik> | ||||||||||
| Severity: | high | Docs Contact: | |||||||||||
| Priority: | high | ||||||||||||
| Version: | 6.7 | CC: | asn, gdeschner, jhrozek, jra, lslebodn, lvrabec, madam, mgrepl, mmalik, mrogers, plautrba, realrichardsharpe, rharwood, sbose, ssekidde | ||||||||||
| Target Milestone: | rc | ||||||||||||
| Target Release: | --- | ||||||||||||
| Hardware: | All | ||||||||||||
| OS: | Linux | ||||||||||||
| Whiteboard: | |||||||||||||
| Fixed In Version: | libselinux-2.0.94-7.el6 | Doc Type: | Bug Fix | ||||||||||
| Doc Text: | Story Points: | --- | |||||||||||
| Clone Of: | |||||||||||||
| : | 1311287 (view as bug list) | Environment: | |||||||||||
| Last Closed: | 2016-05-10 21:41:03 UTC | Type: | Bug | ||||||||||
| Regression: | --- | Mount Type: | --- | ||||||||||
| Documentation: | --- | CRM: | |||||||||||
| Verified Versions: | Category: | --- | |||||||||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||||||||
| Embargoed: | |||||||||||||
| Bug Depends On: | 1311287 | ||||||||||||
| Bug Blocks: | |||||||||||||
| Attachments: |
|
||||||||||||
|
Description
Andreas Schneider
2016-02-18 14:59:15 UTC
Versions: RHEL/CentOS 6.7, krb5-libs-1.10.3-37.el6.x86_64 and libselinux-2.0.94-5.8.el6.x86_64.
After I hacked out this code in the Kerberos library I stopped seeing the leak:
--- krb5-1.10.3/src/util/support/selinux.c.orig 2016-02-17 12:54:22.102145416 -0800
+++ krb5-1.10.3/src/util/support/selinux.c 2016-02-17 13:01:23.185096415 -0800
@@ -252,18 +252,7 @@
void *
krb5int_push_fscreatecon_for(const char *pathname)
{
- struct stat st;
- void *retval;
- k5_once(&labeled_once, label_mutex_init);
- if (k5_mutex_lock(&labeled_mutex) == 0) {
- if (stat(pathname, &st) != 0) {
- st.st_mode = S_IRUSR | S_IWUSR;
- }
- retval = push_fscreatecon(pathname, st.st_mode);
- return retval ? retval : (void *) -1;
- } else {
- return NULL;
- }
+ return NULL;
}
void
I initially used gdb-heap to find the problem. Then I added __malloc_hook to capture the above stacktrace, which was possible because Samba is essentially singlethreaded. Created attachment 1128298 [details]
Proposed patch (not yet tested)
Andreas, Here is the patch. If the init() function inside
libselinux-2.0.94/src/label_file.c fails it doesn't
properly clean up the half-constructed regexp
regex_t contexts (or any of the other allocated
blobs).
Let me know if this works.
There seems to be a leak in selabel_lookup() which tries to translate raw contexts via mcstransd. selabel_lookup_raw() could be used in krb5 to prevent it:
--- krb5-1.10.3/src/util/support/selinux.c.orig 2016-02-18 21:11:50.420462236 +0100
+++ krb5-1.10.3/src/util/support/selinux.c 2016-02-18 21:12:02.833398079 +0100
@@ -152,7 +152,7 @@
NULL, 0);
}
if (selabel_ctx != NULL) {
- if (selabel_lookup(selabel_ctx, &configuredsc,
+ if (selabel_lookup_raw(selabel_ctx, &configuredsc,
fullpath, mode) != 0) {
free(genpath);
if (previous != NULL) {
Nice catch. Actually *_raw libselinux functions should be used where it is possible (if there is no requirement for readable contexts). We can avoid the overhead caused by context translating. Changing to using that function doesn't fix an underlying leak though. Plus I'm pretty sure the error path in the init() function will leak memory (see my attached patch). So, I tried Jeremy's fix, and although I think he is correct about the problems there, it does not fix the problem. I will try one of the other suggestions. (In reply to Petr Lautrbach from comment #5) > There seems to be a leak in selabel_lookup() which tries to translate raw > contexts via mcstransd. selabel_lookup_raw() could be used in krb5 to > prevent it: > > --- krb5-1.10.3/src/util/support/selinux.c.orig 2016-02-18 > 21:11:50.420462236 +0100 > +++ krb5-1.10.3/src/util/support/selinux.c 2016-02-18 > 21:12:02.833398079 +0100 > @@ -152,7 +152,7 @@ > NULL, 0); > } > if (selabel_ctx != NULL) { > - if (selabel_lookup(selabel_ctx, > &configuredsc, > + if (selabel_lookup_raw(selabel_ctx, > &configuredsc, > fullpath, mode) != 0) { > free(genpath); > if (previous != NULL) { OK, I will try this to see if it resolves the problem. OK, I have applied the change suggested by Petr, but I see an smbd with the following stack: root@NTNX-10-5-37-99-A-NVM:/home/some-login# cat /proc/6764/smaps | tail -80 | head -32 7f060d6de000-7f060d726000 rw-p 00000000 00:00 0 [heap] Size: 288 kB Rss: 276 kB Pss: 260 kB Shared_Clean: 0 kB Shared_Dirty: 16 kB Private_Clean: 0 kB Private_Dirty: 260 kB Referenced: 264 kB Anonymous: 276 kB AnonHugePages: 0 kB Swap: 0 kB KernelPageSize: 4 kB MMUPageSize: 4 kB Locked: 0 kB VmFlags: rd wr mr mw me ac 7f060d726000-7f060ec0f000 rw-p 00000000 00:00 0 [heap] Size: 21412 kB Rss: 21252 kB Pss: 21252 kB Shared_Clean: 0 kB Shared_Dirty: 0 kB Private_Clean: 0 kB Private_Dirty: 21252 kB Referenced: 21252 kB Anonymous: 21252 kB AnonHugePages: 0 kB Swap: 0 kB KernelPageSize: 4 kB MMUPageSize: 4 kB Locked: 0 kB VmFlags: rd wr mr mw me ac At this stage, SeLinux is indistinguishable from a heap of shit. I will try with both patches ... It looks like that this problem is fixed by combining Petr's suggestion and Jeremy's suggestion. I need to do some more checking which I will do later today, but it does seem to be fixed now. (In reply to Richard Sharpe from comment #12) > It looks like that this problem is fixed by combining Petr's suggestion and > Jeremy's suggestion. > > I need to do some more checking which I will do later today, but it does > seem to be fixed now. Sorry, I was wrong. One of the smbd's is showing an RSS of 31M which is a reliable indicator that it has the problem. The test we have for this is to run lots of VDI clients against a Samba server, however, it is probable that a long-running smbtorture nbench test using Kerberos auth will also turn this up. I have a strategy for finding the problem. Since I am hooking malloc and am hitting on these 2048-byte allocations, I can add an smb_panic in that code path. If I have the debuginfo RPMs installed for both krb5-libs and libselinux I should get enough info to pinpoint exactly where the mallocs are coming from and maybe see how to fix it. +1 on the smb_panic (or abort() strategy). I've used this successfully in the past to find problematic mallocs. These appear to be the lines/locations accessed: (gdb) info line *0x7efeb085732b Line 234 of "label.c" starts at address 0x7efeb085732b <selabel_lookup_raw+11> and ends at 0x7efeb0857330 <selabel_lookup_raw+16>. (gdb) info line *0x7efeb085722a Line 200 of "label.c" starts at address 0x7efeb085721e <selabel_lookup_common+110> and ends at 0x7efeb085722d <selabel_lookup_common+125>. (gdb) info line *0x7efeb08586cb Line 621 of "label_file.c" starts at address 0x7efeb08586b8 <lookup+344> and ends at 0x7efeb08586cd <lookup+365>. They seem to match the new stack where I am using selabel_lookup_raw: BACKTRACE: 46 stack frames: #0 /usr/lib/libsmbconf.so.0(log_stack_trace+0x1f) [0x7efebe843542] #1 smbd(+0xa16c) [0x7efec0e5916c] #2 /lib64/libc.so.6(__libc_calloc+0x331) [0x7efebcda4a51] #3 /lib64/libc.so.6(+0x38d40be7db) [0x7efebcde87db] #4 /lib64/libc.so.6(+0x38d40c91a9) [0x7efebcdf31a9] #5 /lib64/libc.so.6(regexec+0xc3) [0x7efebcdf3553] #6 /lib64/libselinux.so.1(+0xe6cb) [0x7efeb08586cb] #7 /lib64/libselinux.so.1(+0xd22a) [0x7efeb085722a] #8 /lib64/libselinux.so.1(selabel_lookup_raw+0xb) [0x7efeb085732b] #9 /lib64/libkrb5support.so.0(+0x3b5a) [0x7efeb0c6fb5a] #10 /lib64/libkrb5support.so.0(krb5int_push_fscreatecon_for+0x76) [0x7efeb0c6ffa6] #11 /lib64/libkrb5.so.3(+0x7be62) [0x7efeb495ce62] #12 /lib64/libkrb5.so.3(+0x7c196) [0x7efeb495d196] #13 /lib64/libkrb5.so.3(+0x7c935) [0x7efeb495d935] #14 /lib64/libkrb5.so.3(krb5_get_server_rcache+0x1a1) [0x7efeb49593f1] #15 /lib64/libkrb5.so.3(+0x700e7) [0x7efeb49510e7] #16 /lib64/libkrb5.so.3(krb5_rd_req_decoded+0x2a) [0x7efeb495145a] #17 /lib64/libkrb5.so.3(krb5_rd_req+0xf9) [0x7efeb4950489] #18 /lib64/libgssapi_krb5.so.2(+0x1c691) [0x7efeb3e6e691] #19 /lib64/libgssapi_krb5.so.2(+0x1dfca) [0x7efeb3e6ffca] #20 /lib64/libgssapi_krb5.so.2(+0x1e129) [0x7efeb3e70129] #21 /lib64/libgssapi_krb5.so.2(gss_accept_sec_context+0x233) [0x7efeb3e5fb43] OK, I see the issue. Both selabel_lookup_raw() and selabel_lookup() call selabel_lookup_common() which can leak memory on the:
static struct selabel_handle *selabel_ctx
inside krb5/src/util/support/selinux.c. This fits with the regexec() source for the mallocs.
Inside libselinux-2.0.94/src/label.c we have:
----------------------------------------------------------
static struct selabel_lookup_rec *
selabel_lookup_common(struct selabel_handle *rec, int translating,
const char *key, int type)
{
struct selabel_lookup_rec *lr;
char *ptr = selabel_sub(key);
if (ptr) {
lr = rec->func_lookup(rec, ptr, type);
free(ptr);
} else {
lr = rec->func_lookup(rec, key, type);
}
if (!lr)
return NULL;
if (compat_validate(rec, lr, "file_contexts", 0))
return NULL;
if (translating && !lr->ctx_trans &&
selinux_raw_to_trans_context(lr->ctx_raw, &lr->ctx_trans))
return NULL;
return lr;
}
----------------------------------------------------------
The rec->func_lookup() call can allocate memory (on the regexec context), and if compat_validate() fails or the next if clause fails then the memory is left allocated on selabel_ctx inside krb5/src/util/support/selinux.c as there is no cleanup operation called.
This memory will hang around until the cleanup_fscreatecon() function in krb5/src/util/support/selinux.c is called which calls selabel_close(selabel_ctx).
However, that is never called explicitly, only as a MAKE_FINI_FUNCTION, which is a destructor called when the library is unloaded. Which never happens of course as this is a long-lived daemon.
There needs to be a way to call selabel_close() and re-open on failure of selabel_lookup().
Just to clarify:
rec->func_lookup() can call
static struct selabel_lookup_rec *lookup(struct selabel_handle *rec,
const char *key, int type)
inside libselinux-2.0.94/src/label_file.c. That function calls compile_regex(), which then calls regcomp(), but there's no way to call regfree() on an error outside this module - so the memory allocated on regcomp just sits around.
The error handling in these functions is very poor - there's no proper cleanup at all.
My original patch for error cleanup inside the init function is still needed also :-).
Richard, can you try this fix to krb5/src/util/support/selinux.c (this version is a patch to the patch that gets applied inside krb5-1.10.2 but it should be clear what is needed - the extra cleanup_fscreatecon() call if the selabel_lookup() call fails. This should be safe - it looks like the the selabel_ctx is re-created on the next request if it's set to NULL (which cleanup_fscreatecon() will do).
diff -U 6 krb5-1.10.2-selinux-label.patch.orig krb5-1.10.2-selinux-label.patch
--- krb5-1.10.2-selinux-label.patch.orig 2016-02-19 12:32:35.992091371 -0800
+++ krb5-1.10.2-selinux-label.patch 2016-02-19 12:33:34.708495344 -0800
@@ -623,12 +623,13 @@
+ if (selabel_lookup(selabel_ctx, &configuredsc,
+ fullpath, mode) != 0) {
+ free(genpath);
+ if (previous != NULL) {
+ freecon(previous);
+ }
++ cleanup_fscreatecon();
+ return NULL;
+ }
+ }
+#else
+ if (matchpathcon(fullpath, mode, &configuredsc) != 0) {
+ free(genpath);
Hmm, that is a patch to a patch. My head hurts. OK, I think you are adding one line: a call to cleanup_fscreatecon() in the error path. I will apply that change using a separate patch ... and test it out. Then RedHat can figure out how to package it. Unfortunately, that does not seem to be enough. I still see some smbds leaking lots of memory. I will have to try to figure it out this weekend. Yep, it's a patch to a patch that just adds cleanup_fscreatecon() in the error path. Hmmm. You know, you might need to add cleanup_fscreatecon() on successful exit also. I'm at home now with no access to the source code I've been working on, but it might be that the selabel_ctx needs to get freed before anything gets freed of those compiled regexps. More on Monday when I'm back in at work (sorry). What is curious, unless the info provided by gdb above is wrong, the memory that is leaked seems to be being allocated in regexec. Time to look at glibc. Hmmm, since all the callers of krb5int_push_fscreatecon_for also call krb5_pop_fscreatecon immediately after they have done whatever they were doing, it seems like there should be a call to cleanup_fscreatecon() in krb5int_pop_fscreatecon. That's what I am going with. Created attachment 1128820 [details]
A second patch to fix the problem
This patch, along with Jeremy's earlier patch seems to fix the problem.
I will report back later today on whether or not it is holding up.
However, after 20 minutes of run time I am not seeing any smbds with large amounts of heap.
Hmmm, not yet fixed, or maybe a different problem. There is still one smbd that has crept up to 21MB in its heap. Checking the problem now. This time around there seems to be 2553 segments of 3680 bytes in the heap that should not be there. This still points to the krb5/selinux axis of evil, so I will look at tracking this down. "What is curious, unless the info provided by gdb above is wrong, the memory that is leaked seems to be being allocated in regexec" No it's not curious, that's what I kept telling you above :-). The regcomp() calls allocate this memory on the selabel_ctx, and it won't get freed until the loop calling regfree() gets done when the ctx cleanup function is done. Can you use the smb_panic method method to track down the malloc location of the remaining 3680 allocations ? Then we can nail that too. (In reply to Jeremy Allison from comment #28) > "What is curious, unless the info provided by gdb above is wrong, the memory > that is leaked seems to be being allocated in regexec" > > No it's not curious, that's what I kept telling you above :-). The regcomp() > calls allocate this memory on the selabel_ctx, and it won't get freed until > the loop calling regfree() gets done when the ctx cleanup function is done. > > Can you use the smb_panic method method to track down the malloc location of > the remaining 3680 allocations ? Then we can nail that too. If I get desperate I will. However, I noticed that there are other return paths out of push_fscreatecon that return NULL but have already done a lookup. So, I changed approaches and am cleaning up in krb5int_push_fscreatcon_for as well as krb5int_pop_fscreatecon. It that fixes this last problem I will call it a day. If not, then on to the smb_panic method. Yeah, someone *seriously* needs to run Coverity over this code. OK, after changing Samba to try to detect the 3680-byte segments I am no longer seeing the leak with my final fixed change for krb5-1.10.3, so I will post the here. I think the problem I saw was due to Samba's DEBUG facility causing problems but am not sure. I have run the test for more than four hours and am not seeing any smbds with large amounts of heap. Created attachment 1128878 [details]
I think this is a more correct fix
This is a better fix, I think.
It applies last and seems to resolve the issue.
Comment on attachment 1128878 [details]
I think this is a more correct fix
Looks like you're calling cleanup_fscreatecon() outside of the protection of the mutex k5_mutex_unlock(&labeled_mutex). Is that safe ?
(In reply to Jeremy Allison from comment #33) > Comment on attachment 1128878 [details] > I think this is a more correct fix > > Looks like you're calling cleanup_fscreatecon() outside of the protection of > the mutex k5_mutex_unlock(&labeled_mutex). Is that safe ? Good point. For Samba it probably doesn't matter as we will only ever call this from one thread, but we should keep other users in mind. I need to move the final cleanup_fscreatecon() to before we release the mutex. Created attachment 1129074 [details]
Updated to address Jeremy's concern
This is a small change to the earlier patch to address Jeremy's concern.
It has been running for a number of hours now and does not show any leaks as far as I can see.
I see that the Selinux changes were introduced to solve a problem with Apache or something. Can we get some testing with the changes on the other components that rely on KRB5 integraton with Selinux? Thank you for debugging this! I've created a bug for the krb5 portion. I suspect the answer is "yes", but do you see similar leak problems in rhel-7's krb5? (In reply to Robbie Harwood from comment #37) > Thank you for debugging this! I've created a bug for the krb5 portion. > > I suspect the answer is "yes", but do you see similar leak problems in > rhel-7's krb5? So, I haven't been testing with RHEL-7, but let me try. I could take the version of krb5 used on RHEL-7 and build it for RHEL-6 and try it. I couldn't find the SELINUX support stuff in the RHEL-7 RPM so I suspect that the problem does not occur. The changes causing the problems seem to have been introduced here: https://bugzilla.redhat.com/show_bug.cgi?id=845125 (In reply to Richard Sharpe from comment #38) > (In reply to Robbie Harwood from comment #37) > > Thank you for debugging this! I've created a bug for the krb5 portion. > > > > I suspect the answer is "yes", but do you see similar leak problems in > > rhel-7's krb5? > > So, I haven't been testing with RHEL-7, but let me try. Thanks! > I couldn't find the SELINUX support stuff in the RHEL-7 RPM so I suspect > that the problem does not occur. The patch is named krb5-1.13-selinux-label.patch in RHEL-7 and produces krb5/src/util/support/selinux.c; it looks to be the same in the relevant areas but I was hoping for confirmation that the bug applies there before I try to fix it. Petr, the first attachment fixes a memory leak in selinux. Do you plan to fix that memory leak? I will try to have a look at the RHEL 7 versions this weekend. I will update libselinux with the the first patch or a similar patch as upstream has already fixed this issue in the latest release and used same approach with closef(rec) I have had a few cursory attempts at reproducing this with RHEL7 (CentOS7) and have not had much success. That is because it seems to require multiple login sessions near to each other or something like that and I cannot use my main test bed. However, I looked at the code after building that RPM and it seems to suffer from the same faults as the code in 6.X. Thanks much Richard. I agree that it's likely the same issue, so I've made https://bugzilla.redhat.com/show_bug.cgi?id=1313457 for it Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://rhn.redhat.com/errata/RHBA-2016-0848.html |