Bug 698861
Summary: | starting domain leaks memory | ||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Product: | Red Hat Enterprise Linux 6 | Reporter: | Jason Chen <jiachen> | ||||||||||||||
Component: | libvirt | Assignee: | Osier Yang <jyang> | ||||||||||||||
Status: | CLOSED ERRATA | QA Contact: | Virtualization Bugs <virt-bugs> | ||||||||||||||
Severity: | high | Docs Contact: | |||||||||||||||
Priority: | high | ||||||||||||||||
Version: | 6.1 | CC: | ajia, dallan, dyuan, jiachen, mzhan, veillard, xhu, yoyzhang | ||||||||||||||
Target Milestone: | rc | ||||||||||||||||
Target Release: | --- | ||||||||||||||||
Hardware: | x86_64 | ||||||||||||||||
OS: | Linux | ||||||||||||||||
Whiteboard: | |||||||||||||||||
Fixed In Version: | libvirt-0.9.3-1.el6 | Doc Type: | Bug Fix | ||||||||||||||
Doc Text: | Story Points: | --- | |||||||||||||||
Clone Of: | Environment: | ||||||||||||||||
Last Closed: | 2011-12-06 11:06:12 UTC | Type: | --- | ||||||||||||||
Regression: | --- | Mount Type: | --- | ||||||||||||||
Documentation: | --- | CRM: | |||||||||||||||
Verified Versions: | Category: | --- | |||||||||||||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||||||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||||||||||
Embargoed: | |||||||||||||||||
Attachments: |
|
It's two seperate problems, generally it's not good to folder more than one problems into one bug, that's not good to track, I'd like this bug is left for problem "memory leak". And please file another bug for the problem "domain can't startup". For the problem 1, it's caused by valgrind aborted starting the domain. see the log: ==3423== execve(0x10e68080(/usr/libexec/qemu-kvm), 0x10e6f150, 0x10e68190) failed, errno 13 ==3423== EXEC FAILED: I can't recover from execve() failing, so I'm dying. ==3423== Add more stringent tests in PRE(sys_execve), or work out how to recover. So, it's not a libvirt problem, though I'm not sure if it's a bug of valgrind. Can you provide the domain XML for the guest so that we can attempt to reproduce what you're seeing? (In reply to comment #4) > Can you provide the domain XML for the guest so that we can attempt to > reproduce what you're seeing? Dave, i reproduced this bug as i can not found the XML file of the domain tested before. some information is as below: 1) domain XML file ---------------------- <domain type='kvm'> <name>rhel-x86_64</name> <uuid>ab342148-e2db-fdca-b371-7adfc73fcf97</uuid> <memory>1048576</memory> <currentMemory>1048576</currentMemory> <vcpu>1</vcpu> <os> <type arch='x86_64' machine='rhel6.1.0'>hvm</type> <boot dev='hd'/> </os> <features> <acpi/> <apic/> <pae/> </features> <clock offset='utc'/> <on_poweroff>destroy</on_poweroff> <on_reboot>restart</on_reboot> <on_crash>restart</on_crash> <devices> <emulator>/usr/libexec/qemu-kvm</emulator> <disk type='file' device='disk'> <driver name='qemu' type='raw' cache='none' io='threads'/> <source file='/var/lib/libvirt/images/rhel-x86_64.img'/> <target dev='vda' bus='virtio'/> <address type='pci' domain='0x0000' bus='0x00' slot='0x05' function='0x0'/> </disk> <interface type='network'> <mac address='52:54:00:a8:6f:0b'/> <source network='default'/> <model type='virtio'/> <address type='pci' domain='0x0000' bus='0x00' slot='0x03' function='0x0'/> </interface> <serial type='pty'> <target port='0'/> </serial> <console type='pty'> <target type='serial' port='0'/> </console> <input type='tablet' bus='usb'/> <input type='mouse' bus='ps2'/> <graphics type='vnc' port='-1' autoport='yes'/> <sound model='ich6'> <address type='pci' domain='0x0000' bus='0x00' slot='0x04' function='0x0'/> </sound> <video> <model type='cirrus' vram='9216' heads='1'/> <address type='pci' domain='0x0000' bus='0x00' slot='0x02' function='0x0'/> </video> <memballoon model='virtio'> <address type='pci' domain='0x0000' bus='0x00' slot='0x06' function='0x0'/> </memballoon> </devices> </domain> 2 version list: -------------------- libvirt-0.8.7-18.el6.x86_64 libvirt-python-0.8.7-18.el6.x86_64 libvirt-client-0.8.7-18.el6.x86_64 libvirt-debuginfo-0.8.7-18.el6.x86_64 libselinux-2.0.94-5.1.el6.x86_64 libnl-1.1-14.el6.x86_64 Host: Linux dhcp-4-199.nay.redhat.com 2.6.32-71.el6.x86_64 #1 SMP Wed Sep 1 01:33:01 EDT 2010 x86_64 x86_64 x86_64 GNU/Linux 3 some memory leak log lines:(detail log is in the attachment named "20110613-valgrind.out") -------------------- ..... ==31727== LEAK SUMMARY: ==31727== definitely lost: 1,264 bytes in 5 blocks ==31727== indirectly lost: 0 bytes in 0 blocks ==31727== possibly lost: 3,421 bytes in 43 blocks ==31727== still reachable: 2,067,842 bytes in 13,276 blocks ==31727== suppressed: 0 bytes in 0 blocks ==31727== Reachable blocks (those to which a pointer was found) are not shown. ==31727== To see them, rerun with: --leak-check=full --show-reachable=yes .... Created attachment 504383 [details]
memory leak log file
The bug summary implies that if you start the domain without selinux=enforcing, that you don't see the leak; is that correct? (In reply to comment #7) > The bug summary implies that if you start the domain without selinux=enforcing, > that you don't see the leak; is that correct? No, my domain started with "selinux=enforcing",as below: --------------------------------------------- [root@dhcp-4-199 ~]# getenforce Enforcing Jason, I understand that you started the domain with selinux enforcing. If you start the guest with selinux permissive, do you see the leak? Created attachment 504791 [details]
20110615-valgrind.out
(In reply to comment #9) > Jason, I understand that you started the domain with selinux enforcing. If you > start the guest with selinux permissive, do you see the leak? (In reply to comment #9) > Jason, I understand that you started the domain with selinux enforcing. If you > start the guest with selinux permissive, do you see the leak? Dave, yes. i did a test(start the guest with selinux permissive) just now. it still lead to memory leak, and the domain can be started this time, some information is as blow: 1) ----------- [root@dhcp-4-199 ~]# getenforce Permissive 2) ---------------------------------------------------------- ..... ==3734== ==3734== LEAK SUMMARY: ==3734== definitely lost: 1,264 bytes in 5 blocks ==3734== indirectly lost: 0 bytes in 0 blocks ==3734== possibly lost: 3,421 bytes in 43 blocks ==3734== still reachable: 2,067,842 bytes in 13,276 blocks ==3734== suppressed: 0 bytes in 0 blocks ==3734== Reachable blocks (those to which a pointer was found) are not shown. ==3734== To see them, rerun with: --leak-check=full --show-reachable=yes ==3734== ==3734== For counts of detected and suppressed errors, rerun with: -v ==3734== Use --track-origins=yes to see where uninitialised values come from ==3734== ERROR SUMMARY: 40 errors from 37 contexts (suppressed: 32 from 9) 10:28:20.022: 3570: warning : qemudStartVMDaemon:3346 : Executing done /usr/libexec/qemu-kvm Domain rhel-x86_64 started ..... (detail log is in the attachment named “20110615-valgrind.out”) Ok, thanks for the info. I've revised the BZ summary to reflect that what you're seeing does not depend on selinux state. ==22835== 80 bytes in 1 blocks are definitely lost in loss record 916 of 1,572 ==22835== at 0x4C24481: calloc (vg_replace_malloc.c:418) ==22835== by 0x5077E91: virAlloc (memory.c:101) ==22835== by 0x508A607: virLastErrorObject (virterror.c:262) ==22835== by 0x508AA28: virResetLastError (virterror.c:403) ==22835== by 0x50E1757: virConnectGetURI (libvirt.c:1581) ==22835== by 0x42B7CA: remoteDispatchGetURI (remote_dispatch_bodies.h:3153) ==22835== by 0x436316: remoteDispatchClientRequest (dispatch.c:516) ==22835== by 0x41E716: qemudWorker (libvirtd.c:1618) ==22835== by 0x949ACCA: start_thread (in /lib64/libpthread-2.13.so) ==22835== by 0x9994C2C: clone (in /lib64/libc-2.13.so) Things like above are normal leak, libvirt keeps last virError object in thread local storage to record the error. -----------This is seperator -------------------- ==22835== 72 bytes in 2 blocks are definitely lost in loss record 897 of 1,572 ==22835== at 0x4C2515D: malloc (vg_replace_malloc.c:195) ==22835== by 0x9933C01: strdup (in /lib64/libc-2.13.so) ==22835== by 0x8B16FAA: selinux_raw_to_trans_context (setrans_client.c:317) ==22835== by 0x8B15E03: selabel_lookup_common (label.c:201) ==22835== by 0x8B1625D: selabel_lookup (label.c:212) ==22835== by 0x5212645: SELinuxRestoreSecurityFileLabel (security_selinux.c:425) ==22835== by 0x520F7D8: virSecurityStackRestoreSavedStateLabel (security_stack.c:300) ==22835== by 0x453CFB: qemuDomainSaveImageStartVM (qemu_driver.c:3523) ==22835== by 0x454137: qemudDomainObjStart (qemu_driver.c:3611) ==22835== by 0x454694: qemudDomainStartWithFlags (qemu_driver.c:3884) ==22835== by 0x50ECEC5: virDomainCreate (libvirt.c:6504) ==22835== by 0x432267: remoteDispatchDomainCreate (remote_dispatch_bodies.h:330) Thing likes above is wrong reporting by valgrind, we free "security_context_t fcon" at the "err" label. -----------This is seperator -------------------- ==22835== 968 bytes in 1 blocks are definitely lost in loss record 1,405 of 1,572 ==22835== at 0x4C2515D: malloc (vg_replace_malloc.c:195) ==22835== by 0x8DC8354: xmlGetGlobalState (in /usr/lib64/libxml2.so.2.7.7) ==22835== by 0x8DC7894: __xmlDefaultSAXHandler (in /usr/lib64/libxml2.so.2.7.7) ==22835== by 0x8E1D318: xmlDefaultSAXHandlerInit (in /usr/lib64/libxml2.so.2.7.7) ==22835== by 0x8D592A1: xmlInitParserCtxt (in /usr/lib64/libxml2.so.2.7.7) ==22835== by 0x8D59A1B: xmlNewParserCtxt (in /usr/lib64/libxml2.so.2.7.7) ==22835== by 0x5089F3D: virXMLParseHelper (xml.c:682) ==22835== by 0x50A3A54: virDomainDefParse (domain_conf.c:6769) ==22835== by 0x442804: qemuDomainSaveImageOpen (qemu_driver.c:3408) ==22835== by 0x4540B1: qemudDomainObjStart (qemu_driver.c:3590) ==22835== by 0x454694: qemudDomainStartWithFlags (qemu_driver.c:3884) ==22835== by 0x50ECEC5: virDomainCreate (libvirt.c:6504) For things like above, It looks to me it's also wrong reporting by valrind, it might can't detect things are freed by "on_exit_thread" in "xmlGetGlobalState", though I'm not sure about it. Other leaks are fixed by following patch: commit 66cfb9ef0a25b9051391cc3522602dd0a6a70b5f Author: Osier Yang <jyang> Date: Wed Jun 22 21:36:29 2011 +0800 Fix memory leak in drv_init * src/drv_initscripts.c: "goto error" instead of "return -1", as "add_augeas_xfm_table" has already done some initialization on ncf->driver. == NB, the above commit is made on netcf. commit 20d5e9db829953ab8dd96a28c86be28c55a84c97 Author: Osier Yang <jyang> Date: Wed Jun 22 19:32:04 2011 +0800 qemu: Fix one memory leak * src/qemu/qemu_domain.c: (qemuDomainAppendLog) Free "message" in "cleanup". And the following two which are not got acked and commited yet. http://www.redhat.com/archives/libvir-list/2011-June/msg01104.html http://www.redhat.com/archives/libvir-list/2011-June/msg01103.html > And the following two which are not got acked and commited yet.
>
> http://www.redhat.com/archives/libvir-list/2011-June/msg01104.html
> http://www.redhat.com/archives/libvir-list/2011-June/msg01103.html
these two patches got committed upstream.
commit 91141474f85b933764b3271a74027952ecd5ccf0
Author: Osier Yang <jyang>
Date: Thu Jun 23 18:16:31 2011 +0800
util: Fix memory leak in virJSONParserHandleStartMap
commit d61820df98bdce78c5a1f3e4a8dd72bd81455272
Author: Osier Yang <jyang>
Date: Thu Jun 23 18:15:25 2011 +0800
qemu: Fix memory leak in qemuProcessWaitForMonitor
Move "VIR_FREE(buf) into label "closelog", so that "buf" could be
freed before returning.
> ==22835== 968 bytes in 1 blocks are definitely lost in loss record 1,405 of
> 1,572
> ==22835== at 0x4C2515D: malloc (vg_replace_malloc.c:195)
> ==22835== by 0x8DC8354: xmlGetGlobalState (in /usr/lib64/libxml2.so.2.7.7)
> ==22835== by 0x8DC7894: __xmlDefaultSAXHandler (in
> /usr/lib64/libxml2.so.2.7.7)
> ==22835== by 0x8E1D318: xmlDefaultSAXHandlerInit (in
> /usr/lib64/libxml2.so.2.7.7)
> ==22835== by 0x8D592A1: xmlInitParserCtxt (in /usr/lib64/libxml2.so.2.7.7)
> ==22835== by 0x8D59A1B: xmlNewParserCtxt (in /usr/lib64/libxml2.so.2.7.7)
> ==22835== by 0x5089F3D: virXMLParseHelper (xml.c:682)
> ==22835== by 0x50A3A54: virDomainDefParse (domain_conf.c:6769)
> ==22835== by 0x442804: qemuDomainSaveImageOpen (qemu_driver.c:3408)
> ==22835== by 0x4540B1: qemudDomainObjStart (qemu_driver.c:3590)
> ==22835== by 0x454694: qemudDomainStartWithFlags (qemu_driver.c:3884)
> ==22835== by 0x50ECEC5: virDomainCreate (libvirt.c:6504)
>
> For things like above, It looks to me it's also wrong reporting by valrind, it
> might can't detect things are freed by "on_exit_thread" in "xmlGetGlobalState",
> though I'm not sure about it.
This is a global leak, but skip it as any fixing on that might introduce more problems.
Move to MODIFIED.
Wrong status, move to POST. The leaks for libxml2 cannot be removed, it's actually a global variable to the leak is "static" it's not important. libxml2 has some global data, you can't free them from a library, the program using libvirt may use libxml2 for other things, if one were to free the libxml2 global data from libvirt, and the program was still using libxml2 (possibly via another library) then the program would crash. So those parts of the leaks from libvirtd cannot be avoided, but we should verify that the other leaks coming from libvirt own allocations are cleared in libvirt-0.9.3-1.el6, Daniel Created attachment 511635 [details]
valgrind log for libvirt-0.9.3-1
Created attachment 511646 [details]
valgrind log for start the guest for 10 times
Created attachment 511649 [details]
valgrind log for start the guest for 100 times
Moved this bug to VERIFIED on no libvirtd own memory leak but not no any other leaks(such as libxml2) on starting domain. xhu, wrong operatioin? I see a needinfo with no message from you. Hi osier, Pls see comments 22 and 25 and help check valgrind log for starting domain 10 and 100 times. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. http://rhn.redhat.com/errata/RHBA-2011-1513.html |
Created attachment 494050 [details] detail log file of memory leak Description of problem: when start a domain with commands "virsh start *..", it led to memory leak, please take a look at it. Version-Release number of selected component (if applicable): libvirt-0.8.7-18.el6.x86_64 libvirt-debuginfo-0.8.7-18.el6.x86_64 libselinux-2.0.94-5.el6.x86_64 libnl-1.1-14.el6.x86_64 Linux localhost.localdomain 2.6.32-131.0.1.el6.x86_64 #1 SMP Tue Apr 12 16:40:23 EDT 2011 x86_64 x86_64 x86_64 GNU/Linux How reproducible: Steps to Reproduce: 1.service libvirtd stop 2.valgrind --leak-check=full /usr/sbin/libvirtd 2>&1 | tee valgrind.out& pid=$1 3.execute "virsh start rhel6-x86_64", the domain can not start Actual results: 1. the domain can not start and complaining the following: [root@localhost test]# virsh start rhel6-x86_64 error: Failed to start domain rhel6-x86_64 error: internal error Process exited while reading console log output: 2. it also led to memory leak, some log lines is as below (detail log file is in the attachment named "2011.4.22-valgrind.out"): .... ==3422== LEAK SUMMARY: ==3422== definitely lost: 1,264 bytes in 5 blocks ==3422== indirectly lost: 0 bytes in 0 blocks ==3422== possibly lost: 3,421 bytes in 43 blocks ==3422== still reachable: 2,061,699 bytes in 13,159 blocks ==3422== suppressed: 0 bytes in 0 blocks ==3422== Reachable blocks (those to which a pointer was found) are not shown. ==3422== To see them, rerun with: --leak-check=full --show-reachable=yes ..... Expected results: 1. the domain can be started successfully 2. it should not led to memory leak Additional info: [root@localhost test]# getenforce Enforcing