Bug 658571 - libselinux memory leak caused by improper use of __thread
Summary: libselinux memory leak caused by improper use of __thread
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 6
Classification: Red Hat
Component: libselinux
Version: 6.0
Hardware: All
OS: Linux
high
high
Target Milestone: rc
: ---
Assignee: Daniel Walsh
QA Contact: Milos Malik
URL:
Whiteboard:
Depends On:
Blocks: 639247 656795 658657 693600
TreeView+ depends on / blocked
 
Reported: 2010-11-30 18:16 UTC by Eric Blake
Modified: 2015-09-28 02:20 UTC (History)
11 users (show)

Fixed In Version: libselinux-2.0.94-3.el6
Doc Type: Bug Fix
Doc Text:
libselinux used __thread variables to store malloc() data in order to minimize computation. Destructors cannot be associated with __thread variables, so malloc() data stored in a __thread void* variable could potentially cause memory leaks upon thread exit. For example, repeatedly starting and stopping domains with libvirt could trigger out-of-memory exceptions, since libvirt starts one thread per domain, and each thread uses libselinux calls such as fgetfilecon. libselinux has been updated to be thread-safe, preventing these potential memory leaks.
Clone Of:
: 658657 (view as bug list)
Environment:
Last Closed: 2011-05-19 14:18:38 UTC
Target Upstream Version:


Attachments (Terms of Use)
simpler reproduction program (496 bytes, text/plain)
2010-11-30 18:29 UTC, Eric Blake
no flags Details
Proposed fix (4.08 KB, patch)
2010-12-02 19:05 UTC, Eamon Walsh
no flags Details | Diff
Miroslav lets add this patch from the other bug. (22.76 KB, patch)
2011-04-05 17:20 UTC, Daniel Walsh
no flags Details | Diff
test case demonstrationg the destruction of innocent pthread_key. (987 bytes, text/x-csrc)
2011-04-15 18:41 UTC, Igor Grobman
no flags Details


Links
System ID Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2011:0751 normal SHIPPED_LIVE libselinux bug fix update 2011-05-18 18:08:59 UTC

Description Eric Blake 2010-11-30 18:16:37 UTC
Description of problem:
Use of __thread variables is great for creating a thread-safe variable, but only insofar as the contents of that variable can safely be abandoned on pthread_exit().  The moment you store malloc()d data into a __thread void* variable, you have leaked memory when the thread exits, since there is no way to associate a destructor with __thread variables.

The _only_ safe way to use thread-local caching of malloc()d data is to use pthread_key_create, and associate a destructor that will call free() on the resulting data when the thread exits.

libselinux is guilty of abusing __thread variables to store malloc()d data as a form of a cache, to minimize computation by reusing earlier results from the same thread.  As a result of this memory leak, repeated starting and stopping of domains via libvirt can result in the OOM killer triggering, since libvirt fires up a thread per domain, and each thread uses selinux calls such as fgetfilecon.

Version-Release number of selected component (if applicable):
libselinux-2.0.94-2.el6.x86_64
libvirt-0.8.1-27.el6.x86_64

How reproducible:
100%

Steps to Reproduce:
0. These steps are run as root, assuming hardware kvm support and existence of a VM named fedora (adjust the steps below as appropriate); if desired, I can reduce this to a simpler test case that does not rely on libvirt, by using a single .c file that links against libselinux and repeatedly spawns threads.
1. service libvirtd stop
2. valgrind --quiet --leak-check=full /usr/sbin/libvirtd& pid=$!
3. virsh start fedora
4. kill $pid
  
Actual results:
The biggest leak reported is due to libselinux' abuse of __thread:

==26696== 829,730 (40 direct, 829,690 indirect) bytes in 1 blocks are definitely lost in loss record 500 of 500
==26696==    at 0x4A0515D: malloc (vg_replace_malloc.c:195)
==26696==    by 0x3022E0D48C: selabel_open (label.c:165)
==26696==    by 0x3022E11646: matchpathcon_init_prefix (matchpathcon.c:296)
==26696==    by 0x3022E1190D: matchpathcon (matchpathcon.c:317)
==26696==    by 0x3033ED7FB5: SELinuxRestoreSecurityFileLabel (security_selinux.c:381)
==26696==    by 0x3033ED8539: SELinuxRestoreSecurityAllLabel (security_selinux.c:749)
==26696==    by 0x459153: qemuSecurityStackedRestoreSecurityAllLabel (qemu_security_stacked.c:257)
==26696==    by 0x43F0C5: qemudShutdownVMDaemon (qemu_driver.c:4311)
==26696==    by 0x4555C9: qemudStartVMDaemon (qemu_driver.c:4234)
==26696==    by 0x458416: qemudDomainObjStart (qemu_driver.c:7268)
==26696==    by 0x45896F: qemudDomainStart (qemu_driver.c:7308)
==26696==    by 0x3033E75412: virDomainCreate (libvirt.c:4881)
==26696== 

Basically, libvirt created a thread that used matchpathcon during 'virsh start fedora', and matchpathcon stuffed over 800k of malloc'd data into:

static __thread char **con_array;

which are then inaccessible when libvirt exits the thread as part of shutting down on SIGTERM.

Expected results:
valgrind should not report any memory leaks related to libselinux.


Additional info:
libvirtd has other leaks, and I am actively pursuing patches to plug them (https://bugzilla.redhat.com/show_bug.cgi?id=656795).  But leaks inside third-party libraries need an independent fix.

Comment 2 Eric Blake 2010-11-30 18:29:42 UTC
Created attachment 463796 [details]
simpler reproduction program

Comment 3 Eric Blake 2010-11-30 18:31:37 UTC
(In reply to comment #2)
> Created attachment 463796 [details]
> simpler reproduction program

$ gcc -o foo -Wall foo.c -pthread -lselinux
$ valgrind --quiet --leak-check=full ./foo 2
==5658== 54 bytes in 2 blocks are definitely lost in loss record 1 of 2
==5658==    at 0x4A05E46: malloc (vg_replace_malloc.c:195)
==5658==    by 0x37E3082A91: strdup (strdup.c:43)
==5658==    by 0x3864E111D3: selinux_raw_to_trans_context (setrans_client.c:314)
==5658==    by 0x3864E102B9: getfilecon (getfilecon.c:64)
==5658==    by 0x400780: leaker (foo.c:10)
==5658==    by 0x37E3806D5A: start_thread (pthread_create.c:301)
==5658==    by 0x37E30E4AAC: clone (clone.S:115)
==5658== 
==5658== 54 bytes in 2 blocks are definitely lost in loss record 2 of 2
==5658==    at 0x4A05E46: malloc (vg_replace_malloc.c:195)
==5658==    by 0x37E3082A91: strdup (strdup.c:43)
==5658==    by 0x3864E111FA: selinux_raw_to_trans_context (setrans_client.c:317)
==5658==    by 0x3864E102B9: getfilecon (getfilecon.c:64)
==5658==    by 0x400780: leaker (foo.c:10)
==5658==    by 0x37E3806D5A: start_thread (pthread_create.c:301)
==5658==    by 0x37E30E4AAC: clone (clone.S:115)
==5658== 

A non-leaky solution should be silent with 0 exit status.

Comment 4 Daniel Walsh 2010-11-30 21:49:19 UTC
I think the best solution to this is to move to using selabel_open directly rather then matchpathcon within libvirt.

Comment 5 Eric Blake 2010-11-30 22:24:40 UTC
Changing libvirt to use selabel_open would only fix libvirt's leak, but not any other clients.  I still think that libselinux should be fixed to be thread-friendly.  The memory leak is in libselinux, whether or not libvirt is fixed to avoid libselinux.

Comment 6 Eric Blake 2010-11-30 22:48:43 UTC
Reassigning this bug back to libselinux, and cloning a new bug for libvirt to consider using selabel_open instead of matchpathcon

Comment 7 Eric Blake 2010-11-30 23:17:25 UTC
Looking at selinux.git, for at least libselinux/src/setrans_client.c, I can see that this has been contentious in the past:

commit a842c9dae863 (Jun 2009) converted __thread to pthread_key_t handling to resolve some crashes, and as a side effect, became multi-thread safe

commit 1ac1ff638250 (Jul 2009) reverted that patch, and instead avoided the crash by:

commit 8c372f665db4 (Jul 2009) instead ensuring that the __thread variables were properly initialized, but ended up reintroducing the leak for multi-thread processes

Comment 8 Eamon Walsh 2010-12-02 00:58:02 UTC
I think the pthread_key_t patch was reverted because we don't want libselinux to link to libpthread unconditionally.

The solution will need to use weak symbols for pthread_key_create, pthread_getspecific, and pthread_set_specific, with macros for the non-pthread case.  This is already done with pthread_once (see selinux_internal.h)

Will try to put something together tomorrow.

Comment 9 Eric Blake 2010-12-02 01:33:45 UTC
Make(In reply to comment #8)
> I think the pthread_key_t patch was reverted because we don't want libselinux
> to link to libpthread unconditionally.

Makes sense to me.

> 
> The solution will need to use weak symbols for pthread_key_create,
> pthread_getspecific, and pthread_set_specific, with macros for the non-pthread
> case.  This is already done with pthread_once (see selinux_internal.h)

I wasn't too familiar with the libselinux code base, so thanks for stepping in and taking over.

> 
> Will try to put something together tomorrow.

I'm happy to review whatever you come up with.

Also, I thought of something else you can use to minimize code churn.  It should be a much smaller patch to just introduce one additional variable (per file that uses __thread), guarantee that it gets set to a non-0 thread-local value any time any of the other __thread variables are touched, and that all the __thread variables get cleaned as a side effect of the destructor for the one pthread_key_t variable (that is, you only need a single destructor for a new variable, rather than one destructor for each instance of a __thread variable, and the bulk of your code can benefit from the ease of use of __thread rather than the verbosity of pthread_{get,set}specific).

Comment 10 Eamon Walsh 2010-12-02 19:05:37 UTC
Created attachment 464338 [details]
Proposed fix

Comment 11 Eamon Walsh 2010-12-02 19:05:53 UTC
Please test patch.

Comment 12 Eric Blake 2010-12-02 20:53:59 UTC
(In reply to comment #11)
> Please test patch.

Tested.  It solves the leak on getfilecon when leaked with the simple program, and looks like it does the right thing for matchpathcon as well.  Thanks!

Comment 13 Daniel Walsh 2010-12-02 21:02:08 UTC
Kind of a scary package to update,  I think we should run it for a couple of days internally and in rawhide before we ship.

Comment 14 Eamon Walsh 2010-12-03 01:04:59 UTC
Fix is upstream along with 2 other patches related to thread-local storage.  libselinux 2.0.97.

Comment 16 Miroslav Grepl 2011-02-04 07:55:33 UTC
Fixed in libselinux-2.0.94-3.el6

Comment 19 Daniel Walsh 2011-04-05 14:00:07 UTC
Eamon this is causing other bugs.

Comment 20 Daniel Walsh 2011-04-05 14:01:12 UTC
https://bugzilla.redhat.com/show_bug.cgi?id=693600

Comment 25 Daniel Walsh 2011-04-05 17:20:09 UTC
Created attachment 490044 [details]
Miroslav lets add this patch from the other bug.

Comment 30 Igor Grobman 2011-04-15 18:41:55 UTC
Created attachment 492476 [details]
test case demonstrationg the destruction of innocent pthread_key.

Comment 31 Igor Grobman 2011-04-15 18:42:16 UTC
(In reply to comment #19)
> Eamon this is causing other bugs.

I am also running into a side effect of this change on RHEL 6.1 beta.  I can't access the other bug, so can't verify if it is the same issue.  My problem happens when I dlopen() and then later dlclose() libselinux.  After dlclose(), the TSD key I created using pthread_key_create() is no longer accessible.  I attached gdb, and discovered that libselinux is destroying it in its destructor.  This is because libselinux's destructor_key is a static variable that can sometimes be left uninitialized.  Yet, the library destructor unconditionally calls pthread_key_delete() on this uninitialized value (zero, since it's static).  I will attach a test program that demonstrates this.

comment 25 has a patch that appears to remove all phtread_key-related functionality.  It should certainly fix my problem.  Are you planning to fix this in 6.1 beta, for GA or at some later point?

Comment 32 Eamon Walsh 2011-04-15 22:37:41 UTC
A patch from Dan has gone upstream to fix this.

Comment 33 Laura Bailey 2011-05-17 01:36:17 UTC
    Technical note added. If any revisions are required, please edit the "Technical Notes" field
    accordingly. All revisions will be proofread by the Engineering Content Services team.
    
    New Contents:
libselinux used __thread variables to store malloc() data in order to minimize computation. Destructors cannot be associated with __thread variables, so malloc() data stored in a __thread void* variable could potentially cause memory leaks upon thread exit. For example, repeatedly starting and stopping domains with libvirt could trigger out-of-memory exceptions, since libvirt starts one thread per domain, and each thread uses libselinux calls such as fgetfilecon. libselinux has been updated to be thread-safe, preventing these potential memory leaks.

Comment 34 errata-xmlrpc 2011-05-19 14:18:38 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHBA-2011-0751.html


Note You need to log in before you can comment on or make changes to this bug.