Bug 658657
Summary: | libvirtd should use selabel_open rather than matchpathcon | ||||||||
---|---|---|---|---|---|---|---|---|---|
Product: | Red Hat Enterprise Linux 6 | Reporter: | Eric Blake <eblake> | ||||||
Component: | libvirt | Assignee: | Eric Blake <eblake> | ||||||
Status: | CLOSED ERRATA | QA Contact: | Virtualization Bugs <virt-bugs> | ||||||
Severity: | high | Docs Contact: | |||||||
Priority: | low | ||||||||
Version: | 6.0 | CC: | bsarathy, dallan, dyuan, eblake, gren, jiachen, kxiong, plyons, xen-maint | ||||||
Target Milestone: | rc | Keywords: | ZStream | ||||||
Target Release: | 6.1 | ||||||||
Hardware: | Unspecified | ||||||||
OS: | Unspecified | ||||||||
Whiteboard: | |||||||||
Fixed In Version: | libvirt-0.8.7-1.el6 | Doc Type: | Bug Fix | ||||||
Doc Text: |
libvirt used a non-thread friendly SELinux API (matchpathcon) to get the default security context for a specified path. This led to a memory leak upon domain startup and shutdown. libvirt now uses improved SELinux APIs, so this memory leak no longer occurs.
|
Story Points: | --- | ||||||
Clone Of: | 658571 | Environment: | |||||||
Last Closed: | 2011-05-19 13:24:36 UTC | Type: | --- | ||||||
Regression: | --- | Mount Type: | --- | ||||||
Documentation: | --- | CRM: | |||||||
Verified Versions: | Category: | --- | |||||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||
Embargoed: | |||||||||
Bug Depends On: | 658571 | ||||||||
Bug Blocks: | 656795, 672554 | ||||||||
Attachments: |
|
Description
Eric Blake
2010-11-30 22:50:54 UTC
patch posted upstream: https://www.redhat.com/archives/libvir-list/2010-December/msg00001.html still doesn't work around the fgetfilecon leak, but makes the situation much nicer by avoiding the leak of matchpathcon. Proposed z-stream patch: http://post-office.corp.redhat.com/archives/rhvirt-patches/2010-December/msg00311.html Built into libvirt-0.8.7-1.el6 Verified with libvirt-0.8.7-1.el6,meeting the following problem: 1. service libvirtd stop 2. valgrind --quiet --leak-check=full /usr/sbin/libvirtd& pid=$! 3. virsh start fedora error: Failed to start domain test error: internal error Process exited while reading console log output: 17:37:27.765: 16656: debug : virCgroupNew:555 : New group /libvirt/qemu/test 17:37:27.770: 16656: debug : virCgroupDetect:245 : Detected mount/mapping 0:cpu at /cgroup/cpu in 17:37:27.770: 16656: debug : virCgroupDetect:245 : Detected mount/mapping 1:cpuacct at /cgroup/cpuacct in 17:37:27.771: 16656: debug : virCgroupDetect:245 : Detected mount/mapping 2:cpuset at /cgroup/cpuset in 17:37:27.771: 16656: debug : virCgroupDetect:245 : Detected mount/mapping 3:memory at /cgroup/memory in 17:37:27.771: 16656: debug : virCgroupDetect:245 : Detected mount/mapping 4:devices at /cgroup/devices in 17:37:27.771: 16656: debug : virCgroupDetect:245 : Detected mount/mapping 5:freezer at /cgroup/freezer in 17:37:27.772: 16656: debug : virCgroupMakeGroup:497 : Make group /libvirt/qemu/test 17:37:27.772: 16656: debug : virCgroupMakeGroup:509 : Make controller /cgroup/cpu/libvirt/qemu/test/ 17:37:27.772: 16656: debug : virCgroupMakeGroup:509 : Make controller /cgroup/cpu Hi,Eric Blake please ignore my comment 6 The guest shold no be able to be started,which is as expected. but 1. service libvirtd stop 2. valgrind --quiet --leak-check=full /usr/sbin/libvirtd& pid=$! 3. virsh start fedora 4. kill $pid No log shows after kill $pid, where can I find the selinux information? (In reply to comment #7) > Hi,Eric Blake > please ignore my comment 6 > The guest shold no be able to be started,which is as expected. > but > 1. service libvirtd stop > 2. valgrind --quiet --leak-check=full /usr/sbin/libvirtd& pid=$! > 3. virsh start fedora > 4. kill $pid > No log shows after kill $pid, where can I find the selinux information? Did you run the test with selinux enforcing? There won't be any selinux leaks triggered if selinux is disabled. No log from valgrind is generally a good sign - it means that valgrind didn't detect any leaks; but it is not matching my current experience (with libvirt-0.8.7-1.el6, libselinux-2.0.94-2.el6.x86_64, and libnl-1.1-12.el6.x86_64, I'm still seeing leaks in libvirt, although they are unrelated to this particular BZ). Maybe you should omit the --quiet argument to valgrind, to be sure that you are capturing valgrind's output correctly by the presence of a valgrind header. So, I suggest using: valgrind --leak-check=full /usr/sbin/libvirtd 2>&1 | tee valgrind.out& pid=$1 so that you can check valgrind.out after kill $pid. To verify the difference in selinux leak sizes, note that a buggy libselinux coupled with a buggy libvirt version will result in a huge loss record associated to matchpathcon (libselinux) and SELinuxRestoreSecurityFileLabel (libvirt), such as this line in comment #1: 829,730 (40 direct, 829,690 indirect) a buggy libselinux coupled with a fixed libvirt will still leak, but the leak will be much smaller, such as this leak in fgetfilecon/virStorageBackendUpdateVolTargetInfoFD (but there should no longer be any large leak tied to matchpathcon): ==17240== 81 bytes in 3 blocks are definitely lost in loss record 396 of 555 ==17240== at 0x4A0515D: malloc (vg_replace_malloc.c:195) ==17240== by 0x302167FD51: strdup (in /lib64/libc-2.12.so) ==17240== by 0x3022E13DF6: selinux_raw_to_trans_context (setrans_client.c:317) ==17240== by 0x3022E0B1E9: fgetfilecon (fgetfilecon.c:64) ==17240== by 0x49E89E: virStorageBackendUpdateVolTargetInfoFD (storage_backend.c:1147) a fixed libselinux coupled with buggy libvirt will not leak in matchpathcon or in fgetfilecon, but by fixing libselinux, you are masking the ability of valgrind to detect whether libvirt was patched to avoid matchpathcon, so you have to verify this with a known-leaking libselinux. Created attachment 472940 [details]
the output of after starting guest
Hi,Eric,I am verifying bug again,but still can't display the selinux message libvirt-0.8.6-1.el6.x86_64 libselinux-2.0.94-2.el6.x86_64 libnl-1.1-12.el6.x86_64 1.service libvirtd stop 2.valgrind --leak-check=full /usr/sbin/libvirtd 2>&1 | tee valgrind.out 3.virsh start guest In fact,after start the guest,the libvirtd died,so the kill $pid doesn't work I will attach the output to you,after start the guest it will show ==3265== Memcheck, a memory error detector ==3265== Copyright (C) 2002-2009, and GNU GPL'd, by Julian Seward et al. ==3265== Using Valgrind-3.5.0 and LibVEX; rerun with -h for copyright info ==3265== Command: /usr/sbin/libvirtd ==3265== ==3265== Warning: noted but unhandled ioctl 0x89a2 with no size/direction hints ==3265== This could cause spurious value errors to appear. ==3265== See README_MISSING_SYSCALL_OR_IOCTL for guidance on writing a proper wrapper. ==3265== ==3265== HEAP SUMMARY: ==3265== in use at exit: 2,399,347 bytes in 17,812 blocks ==3265== total heap usage: 862,369 allocs, 844,557 frees, 263,747,701 bytes allocated ==3265== ==3265== execve(0xfdd3e20(/usr/libexec/qemu-kvm), 0xfbd42e0, 0x4c4b730) failed, errno 13 ==3265== EXEC FAILED: I can't recover from execve() failing, so I'm dying. ==3265== Add more stringent tests in PRE(sys_execve), or work out how to recover. I'm wondering if this could be tested by using systemtap to probe both selabel_open and matchpathcon calls made by libvirtd (pre-patch would have mathpathcon calls, post-patch would not), rather than running under valgrind. This is especially useful if libselinux has been updated in the meantime to a non-leaking version, which means that neither call will show up in valgrind reports. I'm not an expert with stap yet, but it seems powerful enough to be used for the task. Verified with selinux=enforcing state libvirt-0.8.6-1.el6.x86_64 libselinux-2.0.94-2.el6.x86_64 libnl-1.1-12.el6.x86_64 1.service libvirtd stop 2.valgrind --leak-check=full /usr/sbin/libvirtd 2>&1 | tee valgrind.out& pid=$1 3.virsh start guest Look at the log 102 bytes in 3 blocks are definitely lost in loss record 1,999 of 2,936 ==11495== at 0x4A0515D: malloc (vg_replace_malloc.c:195) ==11495== by 0x382D07FD51: strdup (in /lib64/libc-2.12.so) ==11495== by 0x382E813B8F: selinux_trans_to_raw_context (in /lib64/libselinux.so.1) ==11495== by 0x382E813657: setfilecon (in /lib64/libselinux.so.1) ==11495== by 0x3F8BF070DA: SELinuxSetFilecon (security_selinux.c:328) ==11495== by 0x3F8BF07300: SELinuxRestoreSecurityFileLabel (security_selinux.c:408) ==11495== by 0x3F8BF07829: SELinuxRestoreSecurityAllLabel (security_selinux.c:779) ==11495== by 0x44C973: qemuSecurityStackedRestoreSecurityAllLabel (qemu_security_stacked.c:257) ==11495== by 0x435BEF: qemudShutdownVMDaemon (qemu_driver.c:3444) ==11495== by 0x448AFD: qemudStartVMDaemon (qemu_driver.c:3345) ==11495== by 0x44BB62: qemudDomainObjStart (qemu_driver.c:6576) ==11495== by 0x44C114: qemudDomainStartWithFlags (qemu_driver.c:6622) ==11495== ==11495== 102 bytes in 3 blocks are definitely lost in loss record 2,000 of 2,936 ==11495== at 0x4A0515D: malloc (vg_replace_malloc.c:195) ==11495== by 0x382D07FD51: strdup (in /lib64/libc-2.12.so) ==11495== by 0x382E813BB6: selinux_trans_to_raw_context (in /lib64/libselinux.so.1) ==11495== by 0x382E813657: setfilecon (in /lib64/libselinux.so.1) ==11495== by 0x3F8BF070DA: SELinuxSetFilecon (security_selinux.c:328) ==11495== by 0x3F8BF07300: SELinuxRestoreSecurityFileLabel (security_selinux.c:408) ==11495== by 0x3F8BF07829: SELinuxRestoreSecurityAllLabel (security_selinux.c:779) ==11495== by 0x44C973: qemuSecurityStackedRestoreSecurityAllLabel (qemu_security_stacked.c:257) ==11495== by 0x435BEF: qemudShutdownVMDaemon (qemu_driver.c:3444) ==11495== by 0x448AFD: qemudStartVMDaemon (qemu_driver.c:3345) ==11495== by 0x44BB62: qemudDomainObjStart (qemu_driver.c:6576) ==11495== by 0x44C114: qemudDomainStartWithFlags (qemu_driver.c:6622) No matchpathcon leak is detected. So this bug is verified. Technical note added. If any revisions are required, please edit the "Technical Notes" field accordingly. All revisions will be proofread by the Engineering Content Services team. New Contents: Starting and shutting down a domain led to a memory leak due to the use of a thread-unfriendly "matchpathcon" (which gets the default security context for the specified path) SELinux API. With this update, libvirt uses improved SELinux APIs and a memory leak no longer occurs. Created attachment 493775 [details]
2011.4.21-valgrind.out
this bug is UN-fixed, so reopen it. build list: ------------ libvirt-0.8.7-18.el6.x86_64 libselinux-2.0.94-5.el6.x86_64 libnl-1.1-14.el6.x86_64 selinux=enforcing Linux localhost.localdomain 2.6.32-131.0.1.el6.x86_64 #1 SMP Tue Apr 12 16:40:23 EDT 2011 x86_64 x86_64 x86_64 GNU/Linux steps: 1.service libvirtd stop 2.valgrind --leak-check=full /usr/sbin/libvirtd 2>&1 | tee valgrind.out& pid=$1 3.execute "virsh start rhel6-x86_64", the domain can not start and complaining the following: [root@localhost test]# virsh start rhel6-x86_64 error: Failed to start domain rhel6-x86_64 error: internal error Process exited while reading console log output: 4 it is still lead to memory leak, some log line is as below(detail is in the attachment file named "2011.4.21-valgrind.out") ............... ==5099== LEAK SUMMARY: ==5099== definitely lost: 1,264 bytes in 5 blocks ==5099== indirectly lost: 0 bytes in 0 blocks ==5099== possibly lost: 3,421 bytes in 43 blocks ==5099== still reachable: 2,061,701 bytes in 13,159 blocks ==5099== suppressed: 0 bytes in 0 blocks ==5099== Reachable blocks (those to which a pointer was found) are not shown. ==5099== To see them, rerun with: --leak-check=full --show-reachable=yes ==5099== ==5099== For counts of detected and suppressed errors, rerun with: -v ==5099== ERROR SUMMARY: 36 errors from 33 contexts (suppressed: 32 from 9) 06:10:02.491: 5013: warning : qemudStartVMDaemon:3346 : Executing done /usr/libexec/qemu-kvm 06:10:02.604: 5013: error : qemudReadLogOutput:2072 : internal error Process exited while reading console log output: ............... That leak trace includes snippets like this: ==5099== 56 bytes in 1 blocks are definitely lost in loss record 748 of 1,426 ==5099== at 0x4A04A28: calloc (vg_replace_malloc.c:467) ==5099== by 0x3BDB039D7B: virAllocN (in /usr/lib64/libvirt.so.0.8.7) ==5099== by 0x418868: ??? (in /usr/sbin/libvirtd) which means you don't have the debuginfo for the libvirtd that was running. Can you please regenerate a trace with more details, after installing the libvirt-debuginfo package? Since the original BZ was reported against a ~800k leak, and this one is reported as 1k, it's unlikely to be the same bug. I'm going to put this BZ back into VERIFIED, and would you mind opening a new BZ against 6.2 to track this? Depending on where the leak turns out to be, we can consider it for zstream. Thanks, Dave Sorry, putting back into ON_QA. Since the original problem of this bug reported is fixed, and according to Dave's comment(#19), a new bug(https://bugzilla.redhat.com/show_bug.cgi?id=698861) was created to track the new issues. i will ask someone to help me to change this bug's status from "ON_QA" to "verified" as i am not authorized to do so currently (In reply to comment #21) > bug(https://bugzilla.redhat.com/show_bug.cgi?id=698861) was created to track > the new issues. Great, thanks Jason. Technical note updated. If any revisions are required, please edit the "Technical Notes" field accordingly. All revisions will be proofread by the Engineering Content Services team. Diffed Contents: @@ -1 +1 @@ -Starting and shutting down a domain led to a memory leak due to the use of a thread-unfriendly "matchpathcon" (which gets the default security context for the specified path) SELinux API. With this update, libvirt uses improved SELinux APIs and a memory leak no longer occurs.+libvirt used a non-thread friendly SELinux API (matchpathcon) to get the default security context for a specified path. This led to a memory leak upon domain startup and shutdown. libvirt now uses improved SELinux APIs, so this memory leak no longer occurs. An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on therefore solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHBA-2011-0596.html |