Bug 658657

Summary: libvirtd should use selabel_open rather than matchpathcon
Product: Red Hat Enterprise Linux 6 Reporter: Eric Blake <eblake>
Component: libvirtAssignee: Eric Blake <eblake>
Status: CLOSED ERRATA QA Contact: Virtualization Bugs <virt-bugs>
Severity: high Docs Contact:
Priority: low    
Version: 6.0CC: bsarathy, dallan, dyuan, eblake, gren, jiachen, kxiong, plyons, xen-maint
Target Milestone: rcKeywords: ZStream
Target Release: 6.1   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: libvirt-0.8.7-1.el6 Doc Type: Bug Fix
Doc Text:
libvirt used a non-thread friendly SELinux API (matchpathcon) to get the default security context for a specified path. This led to a memory leak upon domain startup and shutdown. libvirt now uses improved SELinux APIs, so this memory leak no longer occurs.
Story Points: ---
Clone Of: 658571 Environment:
Last Closed: 2011-05-19 13:24:36 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 658571    
Bug Blocks: 656795, 672554    
Attachments:
Description Flags
the output of after starting guest
none
2011.4.21-valgrind.out none

Description Eric Blake 2010-11-30 22:50:54 UTC
+++ This bug was initially created as a clone of Bug #658571 +++

libselinux is guilty of abusing __thread variables to store malloc()d data as a form of a cache, to minimize computation by reusing earlier results from the same thread.  As a result of this memory leak, repeated starting and stopping of domains via libvirt can result in the OOM killer triggering, since libvirt fires up a thread per domain, and each thread uses selinux calls such as fgetfilecon.

The leaks in fgetfilecon can only be fixed upstream in libselinux.  However, matchpathcon() may be overkill for libvirt's purposes, and using the lighter-weight selabel_open() may avoid the worst of libvirt's memory problems:

Version-Release number of selected component (if applicable):
libselinux-2.0.94-2.el6.x86_64
libvirt-0.8.1-27.el6.x86_64

How reproducible:
100%

Steps to Reproduce:
0. These steps are run as root, assuming hardware kvm support and existence of a VM named fedora (adjust the steps below as appropriate); if desired, I can reduce this to a simpler test case that does not rely on libvirt, by using a single .c file that links against libselinux and repeatedly spawns threads.
1. service libvirtd stop
2. valgrind --quiet --leak-check=full /usr/sbin/libvirtd& pid=$!
3. virsh start fedora
4. kill $pid
  
Actual results:
The biggest leak reported is due to libselinux' abuse of __thread:

==26696== 829,730 (40 direct, 829,690 indirect) bytes in 1 blocks are definitely lost in loss record 500 of 500
==26696==    at 0x4A0515D: malloc (vg_replace_malloc.c:195)
==26696==    by 0x3022E0D48C: selabel_open (label.c:165)
==26696==    by 0x3022E11646: matchpathcon_init_prefix (matchpathcon.c:296)
==26696==    by 0x3022E1190D: matchpathcon (matchpathcon.c:317)
==26696==    by 0x3033ED7FB5: SELinuxRestoreSecurityFileLabel (security_selinux.c:381)
==26696==    by 0x3033ED8539: SELinuxRestoreSecurityAllLabel (security_selinux.c:749)
==26696==    by 0x459153: qemuSecurityStackedRestoreSecurityAllLabel (qemu_security_stacked.c:257)
==26696==    by 0x43F0C5: qemudShutdownVMDaemon (qemu_driver.c:4311)
==26696==    by 0x4555C9: qemudStartVMDaemon (qemu_driver.c:4234)
==26696==    by 0x458416: qemudDomainObjStart (qemu_driver.c:7268)
==26696==    by 0x45896F: qemudDomainStart (qemu_driver.c:7308)
==26696==    by 0x3033E75412: virDomainCreate (libvirt.c:4881)
==26696== 

Basically, libvirt created a thread that used matchpathcon during 'virsh start fedora', and matchpathcon stuffed over 800k of malloc'd data into:

static __thread char **con_array;

which are then inaccessible when libvirt exits the thread as part of shutting down on SIGTERM.

--- Additional comment from dwalsh on 2010-11-30 14:49:19 MST ---

I think the best solution to this is to move to using selabel_open directly rather then matchpathcon within libvirt.

--- Additional comment from eblake on 2010-11-30 15:24:40 MST ---

Changing libvirt to use selabel_open would only fix libvirt's leak, but not any other clients.  I still think that libselinux should be fixed to be thread-friendly.  The memory leak is in libselinux, whether or not libvirt is fixed to avoid libselinux.

--- Additional comment from eblake on 2010-11-30 15:48:43 MST ---

Reassigning this bug back to libselinux, and cloning a new bug for libvirt to consider using selabel_open instead of matchpathcon

Comment 2 Eric Blake 2010-12-01 04:12:44 UTC
patch posted upstream:
https://www.redhat.com/archives/libvir-list/2010-December/msg00001.html

still doesn't work around the fgetfilecon leak, but makes the situation much nicer by avoiding the leak of matchpathcon.

Comment 5 Jiri Denemark 2011-01-09 23:57:16 UTC
Built into libvirt-0.8.7-1.el6

Comment 6 koka xiong 2011-01-10 09:40:45 UTC
Verified with libvirt-0.8.7-1.el6,meeting the following problem:
1. service libvirtd stop
2. valgrind --quiet --leak-check=full /usr/sbin/libvirtd& pid=$!
3. virsh start fedora
error: Failed to start domain test
error: internal error Process exited while reading console log output: 17:37:27.765: 16656: debug : virCgroupNew:555 : New group /libvirt/qemu/test
17:37:27.770: 16656: debug : virCgroupDetect:245 : Detected mount/mapping 0:cpu at /cgroup/cpu in 
17:37:27.770: 16656: debug : virCgroupDetect:245 : Detected mount/mapping 1:cpuacct at /cgroup/cpuacct in 
17:37:27.771: 16656: debug : virCgroupDetect:245 : Detected mount/mapping 2:cpuset at /cgroup/cpuset in 
17:37:27.771: 16656: debug : virCgroupDetect:245 : Detected mount/mapping 3:memory at /cgroup/memory in 
17:37:27.771: 16656: debug : virCgroupDetect:245 : Detected mount/mapping 4:devices at /cgroup/devices in 
17:37:27.771: 16656: debug : virCgroupDetect:245 : Detected mount/mapping 5:freezer at /cgroup/freezer in 
17:37:27.772: 16656: debug : virCgroupMakeGroup:497 : Make group /libvirt/qemu/test
17:37:27.772: 16656: debug : virCgroupMakeGroup:509 : Make controller /cgroup/cpu/libvirt/qemu/test/
17:37:27.772: 16656: debug : virCgroupMakeGroup:509 : Make controller /cgroup/cpu

Comment 7 koka xiong 2011-01-10 10:16:56 UTC
Hi,Eric Blake
please ignore my comment 6
The guest shold no be able to be started,which is as expected.
but 
1. service libvirtd stop
2. valgrind --quiet --leak-check=full /usr/sbin/libvirtd& pid=$!
3. virsh start fedora
4. kill $pid
No log shows after kill $pid, where can I find the selinux information?

Comment 8 Eric Blake 2011-01-11 17:38:30 UTC
(In reply to comment #7)
> Hi,Eric Blake
> please ignore my comment 6
> The guest shold no be able to be started,which is as expected.
> but 
> 1. service libvirtd stop
> 2. valgrind --quiet --leak-check=full /usr/sbin/libvirtd& pid=$!
> 3. virsh start fedora
> 4. kill $pid
> No log shows after kill $pid, where can I find the selinux information?

Did you run the test with selinux enforcing?  There won't be any selinux leaks triggered if selinux is disabled.

No log from valgrind is generally a good sign - it means that valgrind didn't detect any leaks; but it is not matching my current experience (with libvirt-0.8.7-1.el6, libselinux-2.0.94-2.el6.x86_64, and libnl-1.1-12.el6.x86_64, I'm still seeing leaks in libvirt, although they are unrelated to this particular BZ).  Maybe you should omit the --quiet argument to valgrind, to be sure that you are capturing valgrind's output correctly by the presence of a valgrind header.  So, I suggest using:

valgrind --leak-check=full /usr/sbin/libvirtd 2>&1 | tee valgrind.out& pid=$1

so that you can check valgrind.out after kill $pid.

To verify the difference in selinux leak sizes, note that a buggy libselinux coupled with a buggy libvirt version will result in a huge loss record associated to matchpathcon (libselinux) and SELinuxRestoreSecurityFileLabel (libvirt), such as this line in comment #1:

829,730 (40 direct, 829,690 indirect)

a buggy libselinux coupled with a fixed libvirt will still leak, but the leak will be much smaller, such as this leak in fgetfilecon/virStorageBackendUpdateVolTargetInfoFD (but there should no longer be any large leak tied to matchpathcon):

==17240== 81 bytes in 3 blocks are definitely lost in loss record 396 of 555
==17240==    at 0x4A0515D: malloc (vg_replace_malloc.c:195)
==17240==    by 0x302167FD51: strdup (in /lib64/libc-2.12.so)
==17240==    by 0x3022E13DF6: selinux_raw_to_trans_context (setrans_client.c:317)
==17240==    by 0x3022E0B1E9: fgetfilecon (fgetfilecon.c:64)
==17240==    by 0x49E89E: virStorageBackendUpdateVolTargetInfoFD (storage_backend.c:1147)

a fixed libselinux coupled with buggy libvirt will not leak in matchpathcon or in fgetfilecon, but by fixing libselinux, you are masking the ability of valgrind to detect whether libvirt was patched to avoid matchpathcon, so you have to verify this with a known-leaking libselinux.

Comment 9 koka xiong 2011-01-12 03:21:07 UTC
Created attachment 472940 [details]
the output of after starting guest

Comment 10 koka xiong 2011-01-12 03:21:54 UTC
Hi,Eric,I am verifying bug again,but still can't display the selinux message
libvirt-0.8.6-1.el6.x86_64
libselinux-2.0.94-2.el6.x86_64
libnl-1.1-12.el6.x86_64
1.service libvirtd stop
2.valgrind --leak-check=full /usr/sbin/libvirtd 2>&1 | tee valgrind.out
3.virsh start guest
In fact,after start the guest,the libvirtd died,so the kill $pid doesn't work
I will attach the output to you,after start the guest
it will show
==3265== Memcheck, a memory error detector
==3265== Copyright (C) 2002-2009, and GNU GPL'd, by Julian Seward et al.
==3265== Using Valgrind-3.5.0 and LibVEX; rerun with -h for copyright info
==3265== Command: /usr/sbin/libvirtd
==3265== 
==3265== Warning: noted but unhandled ioctl 0x89a2 with no size/direction hints
==3265==    This could cause spurious value errors to appear.
==3265==    See README_MISSING_SYSCALL_OR_IOCTL for guidance on writing a proper wrapper.
==3265== 
==3265== HEAP SUMMARY:
==3265==     in use at exit: 2,399,347 bytes in 17,812 blocks
==3265==   total heap usage: 862,369 allocs, 844,557 frees, 263,747,701 bytes allocated
==3265== 
==3265== execve(0xfdd3e20(/usr/libexec/qemu-kvm), 0xfbd42e0, 0x4c4b730) failed, errno 13
==3265== EXEC FAILED: I can't recover from execve() failing, so I'm dying.
==3265== Add more stringent tests in PRE(sys_execve), or work out how to recover.

Comment 13 Eric Blake 2011-01-27 22:39:25 UTC
I'm wondering if this could be tested by using systemtap to probe both selabel_open and matchpathcon calls made by libvirtd (pre-patch would have mathpathcon calls, post-patch would not), rather than running under valgrind.  This is especially useful if libselinux has been updated in the meantime to a non-leaking version, which means that neither call will show up in valgrind reports.  I'm not an expert with stap yet, but it seems powerful enough to be used for the task.

Comment 14 koka xiong 2011-01-28 02:26:15 UTC
Verified with selinux=enforcing state
libvirt-0.8.6-1.el6.x86_64
libselinux-2.0.94-2.el6.x86_64
libnl-1.1-12.el6.x86_64
1.service libvirtd stop
2.valgrind --leak-check=full /usr/sbin/libvirtd 2>&1 | tee valgrind.out& pid=$1
3.virsh start guest
Look at the log
102 bytes in 3 blocks are definitely lost in loss record 1,999 of 2,936
==11495==    at 0x4A0515D: malloc (vg_replace_malloc.c:195)
==11495==    by 0x382D07FD51: strdup (in /lib64/libc-2.12.so)
==11495==    by 0x382E813B8F: selinux_trans_to_raw_context (in /lib64/libselinux.so.1)
==11495==    by 0x382E813657: setfilecon (in /lib64/libselinux.so.1)
==11495==    by 0x3F8BF070DA: SELinuxSetFilecon (security_selinux.c:328)
==11495==    by 0x3F8BF07300: SELinuxRestoreSecurityFileLabel (security_selinux.c:408)

==11495==    by 0x3F8BF07829: SELinuxRestoreSecurityAllLabel (security_selinux.c:779)
==11495==    by 0x44C973: qemuSecurityStackedRestoreSecurityAllLabel (qemu_security_stacked.c:257)
==11495==    by 0x435BEF: qemudShutdownVMDaemon (qemu_driver.c:3444)
==11495==    by 0x448AFD: qemudStartVMDaemon (qemu_driver.c:3345)
==11495==    by 0x44BB62: qemudDomainObjStart (qemu_driver.c:6576)
==11495==    by 0x44C114: qemudDomainStartWithFlags (qemu_driver.c:6622)
==11495== 
==11495== 102 bytes in 3 blocks are definitely lost in loss record 2,000 of 2,936
==11495==    at 0x4A0515D: malloc (vg_replace_malloc.c:195)
==11495==    by 0x382D07FD51: strdup (in /lib64/libc-2.12.so)
==11495==    by 0x382E813BB6: selinux_trans_to_raw_context (in /lib64/libselinux.so.1)
==11495==    by 0x382E813657: setfilecon (in /lib64/libselinux.so.1)
==11495==    by 0x3F8BF070DA: SELinuxSetFilecon (security_selinux.c:328)
==11495==    by 0x3F8BF07300: SELinuxRestoreSecurityFileLabel (security_selinux.c:408)
==11495==    by 0x3F8BF07829: SELinuxRestoreSecurityAllLabel (security_selinux.c:779)
==11495==    by 0x44C973: qemuSecurityStackedRestoreSecurityAllLabel (qemu_security_stacked.c:257)
==11495==    by 0x435BEF: qemudShutdownVMDaemon (qemu_driver.c:3444)
==11495==    by 0x448AFD: qemudStartVMDaemon (qemu_driver.c:3345)
==11495==    by 0x44BB62: qemudDomainObjStart (qemu_driver.c:6576)
==11495==    by 0x44C114: qemudDomainStartWithFlags (qemu_driver.c:6622)

No matchpathcon leak is detected.
So this bug is verified.

Comment 15 Martin Prpič 2011-04-15 14:25:01 UTC
    Technical note added. If any revisions are required, please edit the "Technical Notes" field
    accordingly. All revisions will be proofread by the Engineering Content Services team.
    
    New Contents:
Starting and shutting down a domain led to a memory leak due to the use of a thread-unfriendly "matchpathcon" (which gets the default security context for the specified path) SELinux API. With this update, libvirt uses improved SELinux APIs and a memory leak no longer occurs.

Comment 16 Jason Chen 2011-04-21 10:18:21 UTC
Created attachment 493775 [details]
2011.4.21-valgrind.out

Comment 17 Jason Chen 2011-04-21 10:33:51 UTC
this bug is UN-fixed, so reopen it.

build list:
------------
libvirt-0.8.7-18.el6.x86_64 
libselinux-2.0.94-5.el6.x86_64
libnl-1.1-14.el6.x86_64
selinux=enforcing
Linux localhost.localdomain 2.6.32-131.0.1.el6.x86_64 #1 SMP Tue Apr 12 16:40:23 EDT 2011 x86_64 x86_64 x86_64 GNU/Linux

steps:
1.service libvirtd stop
2.valgrind --leak-check=full /usr/sbin/libvirtd 2>&1 | tee valgrind.out& pid=$1
3.execute "virsh start rhel6-x86_64", the domain can not start and complaining the following:
[root@localhost test]# virsh start rhel6-x86_64
error: Failed to start domain rhel6-x86_64
error: internal error Process exited while reading console log output: 

4 it is still lead to memory leak, some log line is as below(detail is in the attachment file named "2011.4.21-valgrind.out")
...............
==5099== LEAK SUMMARY:
==5099==    definitely lost: 1,264 bytes in 5 blocks
==5099==    indirectly lost: 0 bytes in 0 blocks
==5099==      possibly lost: 3,421 bytes in 43 blocks
==5099==    still reachable: 2,061,701 bytes in 13,159 blocks
==5099==         suppressed: 0 bytes in 0 blocks
==5099== Reachable blocks (those to which a pointer was found) are not shown.
==5099== To see them, rerun with: --leak-check=full --show-reachable=yes
==5099== 
==5099== For counts of detected and suppressed errors, rerun with: -v
==5099== ERROR SUMMARY: 36 errors from 33 contexts (suppressed: 32 from 9)
06:10:02.491: 5013: warning : qemudStartVMDaemon:3346 : Executing done /usr/libexec/qemu-kvm
06:10:02.604: 5013: error : qemudReadLogOutput:2072 : internal error Process exited while reading console log output:
...............

Comment 18 Eric Blake 2011-04-21 21:13:32 UTC
That leak trace includes snippets like this:

==5099== 56 bytes in 1 blocks are definitely lost in loss record 748 of 1,426
==5099==    at 0x4A04A28: calloc (vg_replace_malloc.c:467)
==5099==    by 0x3BDB039D7B: virAllocN (in /usr/lib64/libvirt.so.0.8.7)
==5099==    by 0x418868: ??? (in /usr/sbin/libvirtd)

which means you don't have the debuginfo for the libvirtd that was running.  Can you please regenerate a trace with more details, after installing the libvirt-debuginfo package?

Comment 19 Dave Allan 2011-04-21 21:19:52 UTC
Since the original BZ was reported against a ~800k leak, and this one is
reported as 1k, it's unlikely to be the same bug.  I'm going to put this BZ
back into VERIFIED, and would you mind opening a new BZ against 6.2 to track
this?  Depending on where the leak turns out to be, we can consider it for
zstream.  Thanks, Dave

Comment 20 Dave Allan 2011-04-21 21:20:24 UTC
Sorry, putting back into ON_QA.

Comment 21 Jason Chen 2011-04-22 03:07:40 UTC
Since the original problem of this bug reported is fixed, and according to Dave's comment(#19), a new bug(https://bugzilla.redhat.com/show_bug.cgi?id=698861) was created to track the new issues. 

i will ask someone to help me to change this bug's status from "ON_QA" to "verified" as i am not authorized to do so currently

Comment 22 Dave Allan 2011-04-22 13:39:33 UTC
(In reply to comment #21)
> bug(https://bugzilla.redhat.com/show_bug.cgi?id=698861) was created to track
> the new issues. 

Great, thanks Jason.

Comment 25 Laura Bailey 2011-05-04 05:36:21 UTC
    Technical note updated. If any revisions are required, please edit the "Technical Notes" field
    accordingly. All revisions will be proofread by the Engineering Content Services team.
    
    Diffed Contents:
@@ -1 +1 @@
-Starting and shutting down a domain led to a memory leak due to the use of a thread-unfriendly "matchpathcon" (which gets the default security context for the specified path) SELinux API. With this update, libvirt uses improved SELinux APIs and a memory leak no longer occurs.+libvirt used a non-thread friendly SELinux API (matchpathcon) to get the default security context for a specified path. This led to a memory leak upon domain startup and shutdown. libvirt now uses improved SELinux APIs, so this memory leak no longer occurs.

Comment 26 errata-xmlrpc 2011-05-19 13:24:36 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHBA-2011-0596.html