Bug 994731

Summary: Documentation for virDomainLookupBy* should mention caller's responsibility to free virDomainPtr
Product: Red Hat Enterprise Linux 7 Reporter: mnguyen
Component: libvirtAssignee: Ján Tomko <jtomko>
Status: CLOSED ERRATA QA Contact: Virtualization Bugs <virt-bugs>
Severity: high Docs Contact:
Priority: unspecified    
Version: 7.1CC: ajia, bsarathy, dyuan, eblake, mnguyen, mzhan, rbalakri, sowang, weizhan
Target Milestone: rc   
Target Release: 7.1   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: libvirt-1.2.7-1.el7 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2015-03-05 07:24:10 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
Test program in C none

Description mnguyen 2013-08-07 21:55:11 UTC
Created attachment 784139 [details]
Test program in C

Description of problem:
-----------------------
The attached program (mytest.c) shows memory usage growing over time.

Using valgrind, memory leak info is as follows:

# valgrind --tool=memcheck --leak-check=full /tmp/mytest
==15689== Memcheck, a memory error detector.
==15689== Copyright (C) 2002-2006, and GNU GPL'd, by Julian Seward et al.
==15689== Using LibVEX rev 1658, a library for dynamic binary translation.
==15689== Copyright (C) 2004-2006, and GNU GPL'd, by OpenWorks LLP.
==15689== Using valgrind-3.2.1, a dynamic binary instrumentation framework.
==15689== Copyright (C) 2000-2006, and GNU GPL'd, by Julian Seward et al.
==15689== For more details, rerun with: -v
==15689==
==15689==
==15689== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 5 from 1)
==15689== malloc/free: in use at exit: 69,027,717 bytes in 103,327 blocks.
==15689== malloc/free: 194,324 allocs, 90,997 frees, 11,196,969,836 bytes allocated.
==15689== For counts of detected errors, rerun with: -v
==15689== searching for pointers to 103,327 not-freed blocks.
==15689== checked 1,717,752 bytes.
==15689==
==15689== 32 bytes in 1 blocks are possibly lost in loss record 4 of 39
==15689==    at 0x4A05809: malloc (vg_replace_malloc.c:149)
==15689==    by 0x332C479431: strdup (in /lib64/libc-2.5.so)
==15689==    by 0x332F01669C: virHashAddEntry (in /usr/lib64/libvirt.so.0.6.3)
==15689==    by 0x332F0215E0: virGetDomain (in /usr/lib64/libvirt.so.0.6.3)
==15689==    by 0x332F0471E2: (within /usr/lib64/libvirt.so.0.6.3)
==15689==    by 0x332F04C3A3: (within /usr/lib64/libvirt.so.0.6.3)
==15689==    by 0x332F0292DC: virDomainLookupByName (in /usr/lib64/libvirt.so.0.6.3)
==15689==    by 0x40097D: main (in /tmp/mytest)
==15689==
==15689==
==15689== 1,316,679 (70,400 direct, 1,246,279 indirect) bytes in 110 blocks are definitely lost in loss record 31 of 39
==15689==    at 0x4A04B32: calloc (vg_replace_malloc.c:279)
==15689==    by 0x332F0186FB: virAllocN (in /usr/lib64/libvirt.so.0.6.3)
==15689==    by 0x332F01674E: virHashCreate (in /usr/lib64/libvirt.so.0.6.3)
==15689==    by 0x332F0212AA: (within /usr/lib64/libvirt.so.0.6.3)
==15689==    by 0x332F02B5D0: (within /usr/lib64/libvirt.so.0.6.3)
==15689==    by 0x40096D: main (in /tmp/mytest)
==15689==
==15689==
==15689== 66,776,786 (268,776 direct, 66,508,010 indirect) bytes in 5,873 blocks are definitely lost in loss record 35 of 39
==15689==    at 0x4A04B32: calloc (vg_replace_malloc.c:279)
==15689==    by 0x332F01871D: virAlloc (in /usr/lib64/libvirt.so.0.6.3)
==15689==    by 0x332F01672C: virHashCreate (in /usr/lib64/libvirt.so.0.6.3)
==15689==    by 0x332F0212AA: (within /usr/lib64/libvirt.so.0.6.3)
==15689==    by 0x332F02B5D0: (within /usr/lib64/libvirt.so.0.6.3)
==15689==    by 0x40096D: main (in /tmp/mytest)
==15689==
==15689==
==15689== 285,568 bytes in 45 blocks are possibly lost in loss record 36 of 39
==15689==    at 0x4A04B32: calloc (vg_replace_malloc.c:279)
==15689==    by 0x332F0186FB: virAllocN (in /usr/lib64/libvirt.so.0.6.3)
==15689==    by 0x332F01674E: virHashCreate (in /usr/lib64/libvirt.so.0.6.3)
==15689==    by 0x332F021332: (within /usr/lib64/libvirt.so.0.6.3)
==15689==    by 0x332F02B5D0: (within /usr/lib64/libvirt.so.0.6.3)
==15689==    by 0x40096D: main (in /tmp/mytest)
==15689==
==15689== LEAK SUMMARY:
==15689==    definitely lost: 339,176 bytes in 5,983 blocks.
==15689==    indirectly lost: 67,754,289 bytes in 96,866 blocks.
==15689==      possibly lost: 285,600 bytes in 46 blocks.
==15689==    still reachable: 648,652 bytes in 432 blocks.
==15689==         suppressed: 0 bytes in 0 blocks.
==15689== Reachable blocks (those to which a pointer was found) are not shown.
==15689== To see them, rerun with: --show-reachable=yes


Version-Release number of selected component (if applicable):
------------------------------------------------------------
Linux 2.6.18-164.15.1.el5

How reproducible:
-----------------
100%

Steps to Reproduce:
-------------------
1.  gcc -o mytest mytest.c -lvirt -I/usr/include/libvirt
2.  Set up a domain named "mytestdomain" to be used in the test program.
3.  Run the mytest program 
4.  ps -eo rss,cmd | grep mytest 

Actual results:
---------------
The rss memory continues growing.


Expected results:
----------------
The rss memory usage should be relatively flat.


Additional info:

Comment 1 Eric Blake 2013-08-07 22:06:26 UTC
(In reply to mnguyen from comment #0)

> 
> Version-Release number of selected component (if applicable):
> ------------------------------------------------------------
> Linux 2.6.18-164.15.1.el5

What's needed here is the version of libvirt, not the kernel, that you tested.  Furthermore, have you considered upgrading to RHEL 6?  There have been lots of memory leaks plugged in the meantime, and it may be easier for you to upgrade than to take the risk that libvirt developers can figure out which particular leak you are seeing, and determine whether it can even be backported without risking regressions on the backport.

Comment 2 mnguyen 2013-08-07 23:06:16 UTC
The libvirt info:

Name       : libvirt
Arch       : x86_64
Version    : 0.6.3
Release    : 20.1.el5_4
Size       : 7.1 M


Sorry it is not possible for us to upgrade to RHEL 6 at this point since we are on the verge of a major release of our product based on RHEL 5.  

I will try to do some test on RHEL 6, but just for experimentation only.

Thanks!

Comment 3 Eric Blake 2013-08-07 23:22:55 UTC
Even trying libvirt-0.8.2-29.el5 from RHEL 5.10 is better than sticking with the ancient history of libvirt-0.6.3-20.1.el5_4 from RHEL 5.4; while you may be unable to upgrade to RHEL 6, it should certainly be easier to upgrade to the latest RHEL 5.

Comment 4 Dave Allan 2013-08-08 13:34:23 UTC
(In reply to mnguyen from comment #2)
> Sorry it is not possible for us to upgrade to RHEL 6 at this point since we
> are on the verge of a major release of our product based on RHEL 5.  
> 
> I will try to do some test on RHEL 6, but just for experimentation only.

Please test against 5.10 also and let us know the result.

Comment 5 mnguyen 2013-08-08 18:47:59 UTC
I did a test using the attached program in RHEL6 environment.  Here is the libvirt info:

Name        : libvirt
Arch        : x86_64
Version     : 0.10.2
Release     : 18.el6
Size        : 5.6 M
Repo        : installed
From repo   : anaconda-CentOS-201303020151.x86_64
Summary     : Library providing a simple virtualization API
URL         : http://libvirt.org/
License     : LGPLv2+
Description : Libvirt is a C toolkit to interact with the virtualization capabilities
            : of recent versions of Linux (and other OSes). The main package includes
            : the libvirtd server exporting the virtualization support.

------------------------------------------

The results using the original attached program:

1. The call to virConnectClose() returned a "1".  I have no idea why there was still a reference to the connection in the logic.  Has something changed in libvirt API in RHEL 6?

2. The while loop did not go very far.  After around 20 iterations with no problem, I then got an error message:
libvir: XML-RPC error : Cannot recv data: Connection reset by peer

-------------------------------------------

So there is something changed in the way virConnectOpen and virConnectClose work in RHEL6.  I decided to remove the virConnectOpen and virConnectClose from the while loop, so basically the while loop now only tests 2 functions virDomainLookupByName and virDomainGetInfo.  Here is the modified code:

int
main(int argc, char **argv)
{
    virConnectPtr vp;
    virDomainPtr dom = NULL;
    virDomainInfo info;
    int ret;

    vp = virConnectOpen(NULL);
    virInitialize();

    while (1) {
        dom = virDomainLookupByName(vp, "mytestdomain");
        if (!dom) {
            printf("Fail look up\n");
            continue;
        }

        memset(&info, 0, sizeof(virDomainInfo));
        virDomainGetInfo(dom, &info);
        printf("state=%d\n", info.state);
        printf("maxMem=%d\n", info.maxMem);
        printf("memory=%d\n", info.memory);
        printf("nrVirtCpu=%d\n", info.nrVirtCpu);
        printf("cpuTime=%lld\n", info.cpuTime);
    }
    virConnectClose(vp);  // never get here anyway
    return 0;
}

Using the above modified program, I got major memory leak too.  
Since the libvirt API does not specify that the caller must free the return object from virDomainLookupByName, let me know if I need to free it by some other procedure.


Thus, if RHEL6 still has memory leak issue, should I try the libvirt-0.8.2-29.el5 from RHEL 5.10?

Comment 6 Dave Allan 2013-08-08 19:38:07 UTC
(In reply to mnguyen from comment #5)
> Using the above modified program, I got major memory leak too.  
> Since the libvirt API does not specify that the caller must free the return
> object from virDomainLookupByName, let me know if I need to free it by some
> other procedure.

You do need to free it by calling virDomainFree when you are done with it, and indeed http://libvirt.org/html/libvirt-libvirt.html#virDomainLookupByName does not mention that.  Thanks for reporting this, and this BZ will track getting the docs updated.

Comment 7 Eric Blake 2013-08-08 19:46:47 UTC
Pretty much ALL of the libvirt API that return a vir*Ptr require the caller to later call vir*Free when they are done with that object.  It's not just DomainLookupByName that has this convention (ie. if we update the docs to mention that returned objects need to be freed, we need to update a LOT of functions; the alternative is documenting the conventions once up front).  But the conventions have been unchanged since old releases.  What DID change between your older version and the RHEL6 virConnectClose is that we got smarter about reference counting, where a connection is no longer closed if you leak a reference (in RHEL5, once the connection closes, your attempts to use the leaked reference will segfault; in RHEL6, the connection remains open until all references are gone so you can't segfault).  The reason your loop stopped after 20 connect/close pairs is that the default limit on open connections is 20, to prevent a denial of service (ie. libvirtd is protecting itself from your leak); you can raise the limit on number of concurrent connections, but better is to fix your code to avoid the leak in the first place.

Comment 8 mnguyen 2013-08-08 19:49:32 UTC
YESSS!!  Calling virDomainFree() eliminated the memory leak problem in the test program.

This is tested in the original RHEL5.4 environment.
So we could close this bug after the document is updated to reflect the need to document the need to call virDomainFree.

Thank you for all your supports,
Minh Nguyen

Comment 9 Dave Allan 2013-08-08 19:53:04 UTC
(In reply to Eric Blake from comment #7)
> Pretty much ALL of the libvirt API that return a vir*Ptr require the caller
> to later call vir*Free when they are done with that object.  It's not just
> DomainLookupByName that has this convention (ie. if we update the docs to
> mention that returned objects need to be freed, we need to update a LOT of
> functions; the alternative is documenting the conventions once up front). 

I don't disagree, but I wasn't able to find a concrete statement to this effect, so I agree and we ought to have an API fundamentals section that documents all the conventions like this (maybe one exists, I just didn't find it when I looked).

Comment 10 Ján Tomko 2014-07-08 12:36:53 UTC
Proposed upstream patch:
https://www.redhat.com/archives/libvir-list/2014-July/msg00364.html

Comment 11 Eric Blake 2014-07-08 12:51:55 UTC
(In reply to Dave Allan from comment #9)
> (In reply to Eric Blake from comment #7)
> > Pretty much ALL of the libvirt API that return a vir*Ptr require the caller
> > to later call vir*Free when they are done with that object.  It's not just
> > DomainLookupByName that has this convention (ie. if we update the docs to
> > mention that returned objects need to be freed, we need to update a LOT of
> > functions; the alternative is documenting the conventions once up front). 
> 
> I don't disagree, but I wasn't able to find a concrete statement to this
> effect, so I agree and we ought to have an API fundamentals section that
> documents all the conventions like this (maybe one exists, I just didn't
> find it when I looked).

The proposed patch in comment 10 only adds an up-front notice; I think we want both (an up-front notice, AND per-function notices)

Comment 12 Ján Tomko 2014-07-08 16:01:25 UTC
I have sent an additional patch adding per-function notices:
https://www.redhat.com/archives/libvir-list/2014-July/msg00426.html

Comment 13 Ján Tomko 2014-07-09 07:38:42 UTC
Both patches are now pushed upstream:
commit 3701b519847c391e6d0755239f7e89e9362d15f5
Author:     Ján Tomko <jtomko>
CommitDate: 2014-07-09 09:22:20 +0200

    Document the need to free vir*Ptr objects
    
    https://bugzilla.redhat.com/show_bug.cgi?id=994731

commit 3d8d18f673aecfe6d452a173cba64ec37a67d76c
Author:     Ján Tomko <jtomko>
CommitDate: 2014-07-09 09:22:20 +0200

    Document the need to free vir*Ptr objects per-function
    
    Another patch for
    https://bugzilla.redhat.com/show_bug.cgi?id=994731

git describe: v1.2.6-89-g3d8d18f

Comment 15 Song Wang 2014-12-15 08:46:39 UTC
1.[root@localhost libvirt]# rpm -q libvirt
libvirt-1.2.8-10.el7.x86_64

2.[root@localhost libvirt]#  gcc -o mytest test.c -lvirt -I/usr/include/libvirt

3.[root@localhost libvirt]# valgrind --tool=memcheck --leak-check=full /home/songwang/libvirt/mytest
........
==22302== 1,663 (24 direct, 1,639 indirect) bytes in 1 blocks are definitely lost in loss record 197 of 226
==22302==    at 0x4C2BB0A: realloc (in /usr/lib64/valgrind/vgpreload_memcheck-amd64-linux.so)
==22302==    by 0x4E9256F: virReallocN (viralloc.c:245)
==22302==    by 0x4E92639: virExpandN (viralloc.c:294)
==22302==    by 0x4FDEF55: virNetClientAddProgram (virnetclient.c:881)
==22302==    by 0x4FD0C57: doRemoteOpen (remote_driver.c:995)
==22302==    by 0x4FD2E45: remoteConnectOpen (remote_driver.c:1220)
==22302==    by 0x4F6C9F9: do_open (libvirt.c:1147)
==22302==    by 0x4F6F238: virConnectOpen (libvirt.c:1317)
==22302==    by 0x400973: main (in /home/songwang/libvirt/mytest)
==22302== 
==22302== 8,315 (240 direct, 8,075 indirect) bytes in 5 blocks are definitely lost in loss record 212 of 226
==22302==    at 0x4C2B934: calloc (in /usr/lib64/valgrind/vgpreload_memcheck-amd64-linux.so)
==22302==    by 0x4E929C4: virAllocVar (viralloc.c:560)
==22302==    by 0x4ED4495: virObjectNew (virobject.c:193)
==22302==    by 0x4FDF343: virNetClientProgramNew (virnetclientprogram.c:80)
==22302==    by 0x4FD0BC5: doRemoteOpen (remote_driver.c:974)
==22302==    by 0x4FD2E45: remoteConnectOpen (remote_driver.c:1220)
==22302==    by 0x4F6C9F9: do_open (libvirt.c:1147)
==22302==    by 0x4F6F238: virConnectOpen (libvirt.c:1317)
==22302==    by 0x400973: main (in /home/songwang/libvirt/mytest)
==22302== 
...........

4[root@localhost libvirt]# ps -eo rss,cmd |grep mytest
 6220 ./mytest
  956 grep --color=auto mytest

5.[root@localhost libvirt]# vim /usr/share/doc/libvirt-docs-1.2.8/html/api.html 
...
269         <p>Note: functions returning vir*Ptr (like the virDomainLookup functions)
270     allocate memory which needs to be freed by the caller by the corresponding
271     vir*Free function (e.g. virDomainFree for a virDomainPtr object).
272     </p>
....

Comment 17 errata-xmlrpc 2015-03-05 07:24:10 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHSA-2015-0323.html