Bug 698861

Summary: starting domain leaks memory
Product: Red Hat Enterprise Linux 6 Reporter: Jason Chen <jiachen>
Component: libvirtAssignee: Osier Yang <jyang>
Status: CLOSED ERRATA QA Contact: Virtualization Bugs <virt-bugs>
Severity: high Docs Contact:
Priority: high    
Version: 6.1CC: ajia, dallan, dyuan, jiachen, mzhan, veillard, xhu, yoyzhang
Target Milestone: rc   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: libvirt-0.9.3-1.el6 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2011-12-06 11:06:12 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
detail log file of memory leak
none
memory leak log file
none
20110615-valgrind.out
none
valgrind log for libvirt-0.9.3-1
none
valgrind log for start the guest for 10 times
none
valgrind log for start the guest for 100 times none

Description Jason Chen 2011-04-22 03:01:25 UTC
Created attachment 494050 [details]
detail log file of memory leak

Description of problem:
   when start a domain with commands "virsh start *..", it led to memory leak, please take a look at it.

Version-Release number of selected component (if applicable):
libvirt-0.8.7-18.el6.x86_64 
libvirt-debuginfo-0.8.7-18.el6.x86_64
libselinux-2.0.94-5.el6.x86_64
libnl-1.1-14.el6.x86_64
Linux localhost.localdomain 2.6.32-131.0.1.el6.x86_64 #1 SMP Tue Apr 12 16:40:23 EDT 2011 x86_64 x86_64 x86_64 GNU/Linux

How reproducible:

Steps to Reproduce:
1.service libvirtd stop
2.valgrind --leak-check=full /usr/sbin/libvirtd 2>&1 | tee valgrind.out& pid=$1
3.execute "virsh start rhel6-x86_64", the domain can not start
  
Actual results:
1. the domain can not start and complaining the following:
[root@localhost test]# virsh start rhel6-x86_64
error: Failed to start domain rhel6-x86_64
error: internal error Process exited while reading console log output: 

2. it also led to memory leak, some log lines is as below
(detail log file is in the attachment named "2011.4.22-valgrind.out"):
....
==3422== LEAK SUMMARY:
==3422==    definitely lost: 1,264 bytes in 5 blocks
==3422==    indirectly lost: 0 bytes in 0 blocks
==3422==      possibly lost: 3,421 bytes in 43 blocks
==3422==    still reachable: 2,061,699 bytes in 13,159 blocks
==3422==         suppressed: 0 bytes in 0 blocks
==3422== Reachable blocks (those to which a pointer was found) are not shown.
==3422== To see them, rerun with: --leak-check=full --show-reachable=yes
..... 

Expected results:
1. the domain can be started successfully
2. it should not led to memory leak

Additional info:
[root@localhost test]# getenforce 
Enforcing

Comment 2 Osier Yang 2011-04-26 02:57:41 UTC
It's two seperate problems, generally it's not good to folder more than one problems into one bug, that's not good to track, I'd like this bug is left for problem "memory leak". And please file another bug for the problem "domain can't startup".

Comment 3 Osier Yang 2011-04-26 07:02:21 UTC
For the problem 1, it's caused by valgrind aborted starting the domain. see the log:

==3423== execve(0x10e68080(/usr/libexec/qemu-kvm), 0x10e6f150, 0x10e68190) failed, errno 13
==3423== EXEC FAILED: I can't recover from execve() failing, so I'm dying.
==3423== Add more stringent tests in PRE(sys_execve), or work out how to recover.

So, it's not a libvirt problem, though I'm not sure if it's a bug of valgrind.

Comment 4 Dave Allan 2011-06-08 19:36:09 UTC
Can you provide the domain XML for the guest so that we can attempt to reproduce what you're seeing?

Comment 5 Jason Chen 2011-06-13 07:24:50 UTC
(In reply to comment #4)
> Can you provide the domain XML for the guest so that we can attempt to
> reproduce what you're seeing?

Dave,
   i reproduced this bug as i can not found the XML file of the domain tested before. some information is as below:

1) domain XML file
----------------------
<domain type='kvm'>
  <name>rhel-x86_64</name>
  <uuid>ab342148-e2db-fdca-b371-7adfc73fcf97</uuid>
  <memory>1048576</memory>
  <currentMemory>1048576</currentMemory>
  <vcpu>1</vcpu>
  <os>
    <type arch='x86_64' machine='rhel6.1.0'>hvm</type>
    <boot dev='hd'/>
  </os>
  <features>
    <acpi/>
    <apic/>
    <pae/>
  </features>
  <clock offset='utc'/>
  <on_poweroff>destroy</on_poweroff>
  <on_reboot>restart</on_reboot>
  <on_crash>restart</on_crash>
  <devices>
    <emulator>/usr/libexec/qemu-kvm</emulator>
    <disk type='file' device='disk'>
      <driver name='qemu' type='raw' cache='none' io='threads'/>
      <source file='/var/lib/libvirt/images/rhel-x86_64.img'/>
      <target dev='vda' bus='virtio'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x05' function='0x0'/>
    </disk>
    <interface type='network'>
      <mac address='52:54:00:a8:6f:0b'/>
      <source network='default'/>
      <model type='virtio'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x03' function='0x0'/>
    </interface>
    <serial type='pty'>
      <target port='0'/>
    </serial>
    <console type='pty'>
      <target type='serial' port='0'/>
    </console>
    <input type='tablet' bus='usb'/>
    <input type='mouse' bus='ps2'/>
    <graphics type='vnc' port='-1' autoport='yes'/>
    <sound model='ich6'>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x04' function='0x0'/>
    </sound>
    <video>
      <model type='cirrus' vram='9216' heads='1'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x02' function='0x0'/>
    </video>
    <memballoon model='virtio'>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x06' function='0x0'/>
    </memballoon>
  </devices>
</domain>


2 version list:
--------------------
libvirt-0.8.7-18.el6.x86_64
libvirt-python-0.8.7-18.el6.x86_64
libvirt-client-0.8.7-18.el6.x86_64
libvirt-debuginfo-0.8.7-18.el6.x86_64
libselinux-2.0.94-5.1.el6.x86_64
libnl-1.1-14.el6.x86_64
Host: Linux dhcp-4-199.nay.redhat.com 2.6.32-71.el6.x86_64 #1 SMP Wed Sep 1 01:33:01 EDT 2010 x86_64 x86_64 x86_64 GNU/Linux

3 some memory leak log lines:(detail log is in the attachment named "20110613-valgrind.out")
--------------------
.....
==31727== LEAK SUMMARY:
==31727==    definitely lost: 1,264 bytes in 5 blocks
==31727==    indirectly lost: 0 bytes in 0 blocks
==31727==      possibly lost: 3,421 bytes in 43 blocks
==31727==    still reachable: 2,067,842 bytes in 13,276 blocks
==31727==         suppressed: 0 bytes in 0 blocks
==31727== Reachable blocks (those to which a pointer was found) are not shown.
==31727== To see them, rerun with: --leak-check=full --show-reachable=yes
....

Comment 6 Jason Chen 2011-06-13 07:26:28 UTC
Created attachment 504383 [details]
memory leak log file

Comment 7 Dave Allan 2011-06-13 15:19:13 UTC
The bug summary implies that if you start the domain without selinux=enforcing, that you don't see the leak; is that correct?

Comment 8 Jason Chen 2011-06-14 08:27:38 UTC
(In reply to comment #7)
> The bug summary implies that if you start the domain without selinux=enforcing,
> that you don't see the leak; is that correct?

No, my domain started with "selinux=enforcing",as below:
---------------------------------------------
[root@dhcp-4-199 ~]# getenforce 
Enforcing

Comment 9 Dave Allan 2011-06-14 14:04:38 UTC
Jason, I understand that you started the domain with selinux enforcing.  If you start the guest with selinux permissive, do you see the leak?

Comment 10 Jason Chen 2011-06-15 02:39:54 UTC
Created attachment 504791 [details]
20110615-valgrind.out

Comment 11 Jason Chen 2011-06-15 02:41:53 UTC
(In reply to comment #9)
> Jason, I understand that you started the domain with selinux enforcing.  If you
> start the guest with selinux permissive, do you see the leak?

(In reply to comment #9)
> Jason, I understand that you started the domain with selinux enforcing.  If you
> start the guest with selinux permissive, do you see the leak?

Dave,
    yes. i did a test(start the guest with selinux permissive) just now. it still lead to memory leak, and the domain can be started this time, some information is as blow:

1)
-----------
[root@dhcp-4-199 ~]# getenforce 
Permissive

2)
----------------------------------------------------------
.....
==3734== 
==3734== LEAK SUMMARY:
==3734==    definitely lost: 1,264 bytes in 5 blocks
==3734==    indirectly lost: 0 bytes in 0 blocks
==3734==      possibly lost: 3,421 bytes in 43 blocks
==3734==    still reachable: 2,067,842 bytes in 13,276 blocks
==3734==         suppressed: 0 bytes in 0 blocks
==3734== Reachable blocks (those to which a pointer was found) are not shown.
==3734== To see them, rerun with: --leak-check=full --show-reachable=yes
==3734== 
==3734== For counts of detected and suppressed errors, rerun with: -v
==3734== Use --track-origins=yes to see where uninitialised values come from
==3734== ERROR SUMMARY: 40 errors from 37 contexts (suppressed: 32 from 9)
10:28:20.022: 3570: warning : qemudStartVMDaemon:3346 : Executing done /usr/libexec/qemu-kvm
Domain rhel-x86_64 started
.....
(detail log is in the attachment named “20110615-valgrind.out”)

Comment 12 Dave Allan 2011-06-15 13:05:56 UTC
Ok, thanks for the info.  I've revised the BZ summary to reflect that what you're seeing does not depend on selinux state.

Comment 13 Osier Yang 2011-06-23 08:45:49 UTC
==22835== 80 bytes in 1 blocks are definitely lost in loss record 916 of 1,572
==22835==    at 0x4C24481: calloc (vg_replace_malloc.c:418)
==22835==    by 0x5077E91: virAlloc (memory.c:101)
==22835==    by 0x508A607: virLastErrorObject (virterror.c:262)
==22835==    by 0x508AA28: virResetLastError (virterror.c:403)
==22835==    by 0x50E1757: virConnectGetURI (libvirt.c:1581)
==22835==    by 0x42B7CA: remoteDispatchGetURI (remote_dispatch_bodies.h:3153)
==22835==    by 0x436316: remoteDispatchClientRequest (dispatch.c:516)
==22835==    by 0x41E716: qemudWorker (libvirtd.c:1618)
==22835==    by 0x949ACCA: start_thread (in /lib64/libpthread-2.13.so)
==22835==    by 0x9994C2C: clone (in /lib64/libc-2.13.so)

Things like above are normal leak, libvirt keeps last virError object in thread local storage to record the error.

-----------This is seperator --------------------

==22835== 72 bytes in 2 blocks are definitely lost in loss record 897 of 1,572
==22835==    at 0x4C2515D: malloc (vg_replace_malloc.c:195)
==22835==    by 0x9933C01: strdup (in /lib64/libc-2.13.so)
==22835==    by 0x8B16FAA: selinux_raw_to_trans_context (setrans_client.c:317)
==22835==    by 0x8B15E03: selabel_lookup_common (label.c:201)
==22835==    by 0x8B1625D: selabel_lookup (label.c:212)
==22835==    by 0x5212645: SELinuxRestoreSecurityFileLabel (security_selinux.c:425)
==22835==    by 0x520F7D8: virSecurityStackRestoreSavedStateLabel (security_stack.c:300)
==22835==    by 0x453CFB: qemuDomainSaveImageStartVM (qemu_driver.c:3523)
==22835==    by 0x454137: qemudDomainObjStart (qemu_driver.c:3611)
==22835==    by 0x454694: qemudDomainStartWithFlags (qemu_driver.c:3884)
==22835==    by 0x50ECEC5: virDomainCreate (libvirt.c:6504)
==22835==    by 0x432267: remoteDispatchDomainCreate (remote_dispatch_bodies.h:330)

Thing likes above is wrong reporting by valgrind, we free "security_context_t fcon" at the "err" label.

-----------This is seperator --------------------


==22835== 968 bytes in 1 blocks are definitely lost in loss record 1,405 of 1,572
==22835==    at 0x4C2515D: malloc (vg_replace_malloc.c:195)
==22835==    by 0x8DC8354: xmlGetGlobalState (in /usr/lib64/libxml2.so.2.7.7)
==22835==    by 0x8DC7894: __xmlDefaultSAXHandler (in /usr/lib64/libxml2.so.2.7.7)
==22835==    by 0x8E1D318: xmlDefaultSAXHandlerInit (in /usr/lib64/libxml2.so.2.7.7)
==22835==    by 0x8D592A1: xmlInitParserCtxt (in /usr/lib64/libxml2.so.2.7.7)
==22835==    by 0x8D59A1B: xmlNewParserCtxt (in /usr/lib64/libxml2.so.2.7.7)
==22835==    by 0x5089F3D: virXMLParseHelper (xml.c:682)
==22835==    by 0x50A3A54: virDomainDefParse (domain_conf.c:6769)
==22835==    by 0x442804: qemuDomainSaveImageOpen (qemu_driver.c:3408)
==22835==    by 0x4540B1: qemudDomainObjStart (qemu_driver.c:3590)
==22835==    by 0x454694: qemudDomainStartWithFlags (qemu_driver.c:3884)
==22835==    by 0x50ECEC5: virDomainCreate (libvirt.c:6504)

For things like above, It looks to me it's also wrong reporting by valrind, it might can't detect things are freed by "on_exit_thread" in "xmlGetGlobalState", though I'm not sure about it.

Other leaks are fixed by following patch:

commit 66cfb9ef0a25b9051391cc3522602dd0a6a70b5f
Author: Osier Yang <jyang>
Date:   Wed Jun 22 21:36:29 2011 +0800

    Fix memory leak in drv_init
    
    * src/drv_initscripts.c: "goto error" instead of "return -1", as
      "add_augeas_xfm_table" has already done some initialization on
      ncf->driver.

== NB, the above commit is made on netcf.

commit 20d5e9db829953ab8dd96a28c86be28c55a84c97
Author: Osier Yang <jyang>
Date:   Wed Jun 22 19:32:04 2011 +0800

    qemu: Fix one memory leak
    
    * src/qemu/qemu_domain.c: (qemuDomainAppendLog)
    
      Free "message" in "cleanup".

And the following two which are not got acked and commited yet.

http://www.redhat.com/archives/libvir-list/2011-June/msg01104.html
http://www.redhat.com/archives/libvir-list/2011-June/msg01103.html

Comment 14 Osier Yang 2011-06-23 10:19:24 UTC
> And the following two which are not got acked and commited yet.
> 
> http://www.redhat.com/archives/libvir-list/2011-June/msg01104.html
> http://www.redhat.com/archives/libvir-list/2011-June/msg01103.html

these two patches got committed upstream.


commit 91141474f85b933764b3271a74027952ecd5ccf0
Author: Osier Yang <jyang>
Date:   Thu Jun 23 18:16:31 2011 +0800

    util: Fix memory leak in virJSONParserHandleStartMap

commit d61820df98bdce78c5a1f3e4a8dd72bd81455272
Author: Osier Yang <jyang>
Date:   Thu Jun 23 18:15:25 2011 +0800

    qemu: Fix memory leak in qemuProcessWaitForMonitor
    
    Move "VIR_FREE(buf) into label "closelog", so that "buf" could be
    freed before returning.

Comment 15 Osier Yang 2011-06-23 10:23:44 UTC
 
> ==22835== 968 bytes in 1 blocks are definitely lost in loss record 1,405 of
> 1,572
> ==22835==    at 0x4C2515D: malloc (vg_replace_malloc.c:195)
> ==22835==    by 0x8DC8354: xmlGetGlobalState (in /usr/lib64/libxml2.so.2.7.7)
> ==22835==    by 0x8DC7894: __xmlDefaultSAXHandler (in
> /usr/lib64/libxml2.so.2.7.7)
> ==22835==    by 0x8E1D318: xmlDefaultSAXHandlerInit (in
> /usr/lib64/libxml2.so.2.7.7)
> ==22835==    by 0x8D592A1: xmlInitParserCtxt (in /usr/lib64/libxml2.so.2.7.7)
> ==22835==    by 0x8D59A1B: xmlNewParserCtxt (in /usr/lib64/libxml2.so.2.7.7)
> ==22835==    by 0x5089F3D: virXMLParseHelper (xml.c:682)
> ==22835==    by 0x50A3A54: virDomainDefParse (domain_conf.c:6769)
> ==22835==    by 0x442804: qemuDomainSaveImageOpen (qemu_driver.c:3408)
> ==22835==    by 0x4540B1: qemudDomainObjStart (qemu_driver.c:3590)
> ==22835==    by 0x454694: qemudDomainStartWithFlags (qemu_driver.c:3884)
> ==22835==    by 0x50ECEC5: virDomainCreate (libvirt.c:6504)
> 
> For things like above, It looks to me it's also wrong reporting by valrind, it
> might can't detect things are freed by "on_exit_thread" in "xmlGetGlobalState",
> though I'm not sure about it.
 
This is a global leak, but skip it as any fixing on that might introduce more problems.

Move to MODIFIED.

Comment 16 Osier Yang 2011-07-01 08:01:45 UTC
Wrong status, move to POST.

Comment 19 Daniel Veillard 2011-07-07 07:10:37 UTC
The leaks for libxml2 cannot be removed, it's actually a global variable
to the leak is "static" it's not important.

libxml2 has some global data, you can't free them from a library,
the program using libvirt may use libxml2 for other things, if one
were to free the libxml2 global data from libvirt, and the program
was still using libxml2 (possibly via another library) then the program
would crash.

So those parts of the leaks from libvirtd cannot be avoided, but we should
verify that the other leaks coming from libvirt own allocations are
cleared in libvirt-0.9.3-1.el6,

Daniel

Comment 20 xhu 2011-07-07 07:33:31 UTC
Created attachment 511635 [details]
valgrind log for libvirt-0.9.3-1

Comment 23 xhu 2011-07-07 08:34:24 UTC
Created attachment 511646 [details]
valgrind log for start the guest for 10 times

Comment 24 xhu 2011-07-07 08:57:24 UTC
Created attachment 511649 [details]
valgrind log for start the guest for 100 times

Comment 26 dyuan 2011-07-11 06:01:04 UTC
Moved this bug to VERIFIED on no libvirtd own memory leak but not no any other leaks(such as libxml2) on starting domain.

Comment 27 Osier Yang 2011-07-12 03:08:32 UTC
xhu, wrong operatioin? I see a needinfo with no message from you.

Comment 28 xhu 2011-07-12 05:30:03 UTC
Hi osier,
  Pls see comments 22 and 25 and help check valgrind log for starting domain 10 and 100 times.

Comment 29 errata-xmlrpc 2011-12-06 11:06:12 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHBA-2011-1513.html