Bug 994731
Summary: | Documentation for virDomainLookupBy* should mention caller's responsibility to free virDomainPtr | ||||||
---|---|---|---|---|---|---|---|
Product: | Red Hat Enterprise Linux 7 | Reporter: | mnguyen | ||||
Component: | libvirt | Assignee: | Ján Tomko <jtomko> | ||||
Status: | CLOSED ERRATA | QA Contact: | Virtualization Bugs <virt-bugs> | ||||
Severity: | high | Docs Contact: | |||||
Priority: | unspecified | ||||||
Version: | 7.1 | CC: | ajia, bsarathy, dyuan, eblake, mnguyen, mzhan, rbalakri, sowang, weizhan | ||||
Target Milestone: | rc | ||||||
Target Release: | 7.1 | ||||||
Hardware: | x86_64 | ||||||
OS: | Linux | ||||||
Whiteboard: | |||||||
Fixed In Version: | libvirt-1.2.7-1.el7 | Doc Type: | Bug Fix | ||||
Doc Text: | Story Points: | --- | |||||
Clone Of: | Environment: | ||||||
Last Closed: | 2015-03-05 07:24:10 UTC | Type: | Bug | ||||
Regression: | --- | Mount Type: | --- | ||||
Documentation: | --- | CRM: | |||||
Verified Versions: | Category: | --- | |||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
Cloudforms Team: | --- | Target Upstream Version: | |||||
Embargoed: | |||||||
Attachments: |
|
(In reply to mnguyen from comment #0) > > Version-Release number of selected component (if applicable): > ------------------------------------------------------------ > Linux 2.6.18-164.15.1.el5 What's needed here is the version of libvirt, not the kernel, that you tested. Furthermore, have you considered upgrading to RHEL 6? There have been lots of memory leaks plugged in the meantime, and it may be easier for you to upgrade than to take the risk that libvirt developers can figure out which particular leak you are seeing, and determine whether it can even be backported without risking regressions on the backport. The libvirt info: Name : libvirt Arch : x86_64 Version : 0.6.3 Release : 20.1.el5_4 Size : 7.1 M Sorry it is not possible for us to upgrade to RHEL 6 at this point since we are on the verge of a major release of our product based on RHEL 5. I will try to do some test on RHEL 6, but just for experimentation only. Thanks! Even trying libvirt-0.8.2-29.el5 from RHEL 5.10 is better than sticking with the ancient history of libvirt-0.6.3-20.1.el5_4 from RHEL 5.4; while you may be unable to upgrade to RHEL 6, it should certainly be easier to upgrade to the latest RHEL 5. (In reply to mnguyen from comment #2) > Sorry it is not possible for us to upgrade to RHEL 6 at this point since we > are on the verge of a major release of our product based on RHEL 5. > > I will try to do some test on RHEL 6, but just for experimentation only. Please test against 5.10 also and let us know the result. I did a test using the attached program in RHEL6 environment. Here is the libvirt info: Name : libvirt Arch : x86_64 Version : 0.10.2 Release : 18.el6 Size : 5.6 M Repo : installed From repo : anaconda-CentOS-201303020151.x86_64 Summary : Library providing a simple virtualization API URL : http://libvirt.org/ License : LGPLv2+ Description : Libvirt is a C toolkit to interact with the virtualization capabilities : of recent versions of Linux (and other OSes). The main package includes : the libvirtd server exporting the virtualization support. ------------------------------------------ The results using the original attached program: 1. The call to virConnectClose() returned a "1". I have no idea why there was still a reference to the connection in the logic. Has something changed in libvirt API in RHEL 6? 2. The while loop did not go very far. After around 20 iterations with no problem, I then got an error message: libvir: XML-RPC error : Cannot recv data: Connection reset by peer ------------------------------------------- So there is something changed in the way virConnectOpen and virConnectClose work in RHEL6. I decided to remove the virConnectOpen and virConnectClose from the while loop, so basically the while loop now only tests 2 functions virDomainLookupByName and virDomainGetInfo. Here is the modified code: int main(int argc, char **argv) { virConnectPtr vp; virDomainPtr dom = NULL; virDomainInfo info; int ret; vp = virConnectOpen(NULL); virInitialize(); while (1) { dom = virDomainLookupByName(vp, "mytestdomain"); if (!dom) { printf("Fail look up\n"); continue; } memset(&info, 0, sizeof(virDomainInfo)); virDomainGetInfo(dom, &info); printf("state=%d\n", info.state); printf("maxMem=%d\n", info.maxMem); printf("memory=%d\n", info.memory); printf("nrVirtCpu=%d\n", info.nrVirtCpu); printf("cpuTime=%lld\n", info.cpuTime); } virConnectClose(vp); // never get here anyway return 0; } Using the above modified program, I got major memory leak too. Since the libvirt API does not specify that the caller must free the return object from virDomainLookupByName, let me know if I need to free it by some other procedure. Thus, if RHEL6 still has memory leak issue, should I try the libvirt-0.8.2-29.el5 from RHEL 5.10? (In reply to mnguyen from comment #5) > Using the above modified program, I got major memory leak too. > Since the libvirt API does not specify that the caller must free the return > object from virDomainLookupByName, let me know if I need to free it by some > other procedure. You do need to free it by calling virDomainFree when you are done with it, and indeed http://libvirt.org/html/libvirt-libvirt.html#virDomainLookupByName does not mention that. Thanks for reporting this, and this BZ will track getting the docs updated. Pretty much ALL of the libvirt API that return a vir*Ptr require the caller to later call vir*Free when they are done with that object. It's not just DomainLookupByName that has this convention (ie. if we update the docs to mention that returned objects need to be freed, we need to update a LOT of functions; the alternative is documenting the conventions once up front). But the conventions have been unchanged since old releases. What DID change between your older version and the RHEL6 virConnectClose is that we got smarter about reference counting, where a connection is no longer closed if you leak a reference (in RHEL5, once the connection closes, your attempts to use the leaked reference will segfault; in RHEL6, the connection remains open until all references are gone so you can't segfault). The reason your loop stopped after 20 connect/close pairs is that the default limit on open connections is 20, to prevent a denial of service (ie. libvirtd is protecting itself from your leak); you can raise the limit on number of concurrent connections, but better is to fix your code to avoid the leak in the first place. YESSS!! Calling virDomainFree() eliminated the memory leak problem in the test program. This is tested in the original RHEL5.4 environment. So we could close this bug after the document is updated to reflect the need to document the need to call virDomainFree. Thank you for all your supports, Minh Nguyen (In reply to Eric Blake from comment #7) > Pretty much ALL of the libvirt API that return a vir*Ptr require the caller > to later call vir*Free when they are done with that object. It's not just > DomainLookupByName that has this convention (ie. if we update the docs to > mention that returned objects need to be freed, we need to update a LOT of > functions; the alternative is documenting the conventions once up front). I don't disagree, but I wasn't able to find a concrete statement to this effect, so I agree and we ought to have an API fundamentals section that documents all the conventions like this (maybe one exists, I just didn't find it when I looked). Proposed upstream patch: https://www.redhat.com/archives/libvir-list/2014-July/msg00364.html (In reply to Dave Allan from comment #9) > (In reply to Eric Blake from comment #7) > > Pretty much ALL of the libvirt API that return a vir*Ptr require the caller > > to later call vir*Free when they are done with that object. It's not just > > DomainLookupByName that has this convention (ie. if we update the docs to > > mention that returned objects need to be freed, we need to update a LOT of > > functions; the alternative is documenting the conventions once up front). > > I don't disagree, but I wasn't able to find a concrete statement to this > effect, so I agree and we ought to have an API fundamentals section that > documents all the conventions like this (maybe one exists, I just didn't > find it when I looked). The proposed patch in comment 10 only adds an up-front notice; I think we want both (an up-front notice, AND per-function notices) I have sent an additional patch adding per-function notices: https://www.redhat.com/archives/libvir-list/2014-July/msg00426.html Both patches are now pushed upstream: commit 3701b519847c391e6d0755239f7e89e9362d15f5 Author: Ján Tomko <jtomko> CommitDate: 2014-07-09 09:22:20 +0200 Document the need to free vir*Ptr objects https://bugzilla.redhat.com/show_bug.cgi?id=994731 commit 3d8d18f673aecfe6d452a173cba64ec37a67d76c Author: Ján Tomko <jtomko> CommitDate: 2014-07-09 09:22:20 +0200 Document the need to free vir*Ptr objects per-function Another patch for https://bugzilla.redhat.com/show_bug.cgi?id=994731 git describe: v1.2.6-89-g3d8d18f 1.[root@localhost libvirt]# rpm -q libvirt libvirt-1.2.8-10.el7.x86_64 2.[root@localhost libvirt]# gcc -o mytest test.c -lvirt -I/usr/include/libvirt 3.[root@localhost libvirt]# valgrind --tool=memcheck --leak-check=full /home/songwang/libvirt/mytest ........ ==22302== 1,663 (24 direct, 1,639 indirect) bytes in 1 blocks are definitely lost in loss record 197 of 226 ==22302== at 0x4C2BB0A: realloc (in /usr/lib64/valgrind/vgpreload_memcheck-amd64-linux.so) ==22302== by 0x4E9256F: virReallocN (viralloc.c:245) ==22302== by 0x4E92639: virExpandN (viralloc.c:294) ==22302== by 0x4FDEF55: virNetClientAddProgram (virnetclient.c:881) ==22302== by 0x4FD0C57: doRemoteOpen (remote_driver.c:995) ==22302== by 0x4FD2E45: remoteConnectOpen (remote_driver.c:1220) ==22302== by 0x4F6C9F9: do_open (libvirt.c:1147) ==22302== by 0x4F6F238: virConnectOpen (libvirt.c:1317) ==22302== by 0x400973: main (in /home/songwang/libvirt/mytest) ==22302== ==22302== 8,315 (240 direct, 8,075 indirect) bytes in 5 blocks are definitely lost in loss record 212 of 226 ==22302== at 0x4C2B934: calloc (in /usr/lib64/valgrind/vgpreload_memcheck-amd64-linux.so) ==22302== by 0x4E929C4: virAllocVar (viralloc.c:560) ==22302== by 0x4ED4495: virObjectNew (virobject.c:193) ==22302== by 0x4FDF343: virNetClientProgramNew (virnetclientprogram.c:80) ==22302== by 0x4FD0BC5: doRemoteOpen (remote_driver.c:974) ==22302== by 0x4FD2E45: remoteConnectOpen (remote_driver.c:1220) ==22302== by 0x4F6C9F9: do_open (libvirt.c:1147) ==22302== by 0x4F6F238: virConnectOpen (libvirt.c:1317) ==22302== by 0x400973: main (in /home/songwang/libvirt/mytest) ==22302== ........... 4[root@localhost libvirt]# ps -eo rss,cmd |grep mytest 6220 ./mytest 956 grep --color=auto mytest 5.[root@localhost libvirt]# vim /usr/share/doc/libvirt-docs-1.2.8/html/api.html ... 269 <p>Note: functions returning vir*Ptr (like the virDomainLookup functions) 270 allocate memory which needs to be freed by the caller by the corresponding 271 vir*Free function (e.g. virDomainFree for a virDomainPtr object). 272 </p> .... Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://rhn.redhat.com/errata/RHSA-2015-0323.html |
Created attachment 784139 [details] Test program in C Description of problem: ----------------------- The attached program (mytest.c) shows memory usage growing over time. Using valgrind, memory leak info is as follows: # valgrind --tool=memcheck --leak-check=full /tmp/mytest ==15689== Memcheck, a memory error detector. ==15689== Copyright (C) 2002-2006, and GNU GPL'd, by Julian Seward et al. ==15689== Using LibVEX rev 1658, a library for dynamic binary translation. ==15689== Copyright (C) 2004-2006, and GNU GPL'd, by OpenWorks LLP. ==15689== Using valgrind-3.2.1, a dynamic binary instrumentation framework. ==15689== Copyright (C) 2000-2006, and GNU GPL'd, by Julian Seward et al. ==15689== For more details, rerun with: -v ==15689== ==15689== ==15689== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 5 from 1) ==15689== malloc/free: in use at exit: 69,027,717 bytes in 103,327 blocks. ==15689== malloc/free: 194,324 allocs, 90,997 frees, 11,196,969,836 bytes allocated. ==15689== For counts of detected errors, rerun with: -v ==15689== searching for pointers to 103,327 not-freed blocks. ==15689== checked 1,717,752 bytes. ==15689== ==15689== 32 bytes in 1 blocks are possibly lost in loss record 4 of 39 ==15689== at 0x4A05809: malloc (vg_replace_malloc.c:149) ==15689== by 0x332C479431: strdup (in /lib64/libc-2.5.so) ==15689== by 0x332F01669C: virHashAddEntry (in /usr/lib64/libvirt.so.0.6.3) ==15689== by 0x332F0215E0: virGetDomain (in /usr/lib64/libvirt.so.0.6.3) ==15689== by 0x332F0471E2: (within /usr/lib64/libvirt.so.0.6.3) ==15689== by 0x332F04C3A3: (within /usr/lib64/libvirt.so.0.6.3) ==15689== by 0x332F0292DC: virDomainLookupByName (in /usr/lib64/libvirt.so.0.6.3) ==15689== by 0x40097D: main (in /tmp/mytest) ==15689== ==15689== ==15689== 1,316,679 (70,400 direct, 1,246,279 indirect) bytes in 110 blocks are definitely lost in loss record 31 of 39 ==15689== at 0x4A04B32: calloc (vg_replace_malloc.c:279) ==15689== by 0x332F0186FB: virAllocN (in /usr/lib64/libvirt.so.0.6.3) ==15689== by 0x332F01674E: virHashCreate (in /usr/lib64/libvirt.so.0.6.3) ==15689== by 0x332F0212AA: (within /usr/lib64/libvirt.so.0.6.3) ==15689== by 0x332F02B5D0: (within /usr/lib64/libvirt.so.0.6.3) ==15689== by 0x40096D: main (in /tmp/mytest) ==15689== ==15689== ==15689== 66,776,786 (268,776 direct, 66,508,010 indirect) bytes in 5,873 blocks are definitely lost in loss record 35 of 39 ==15689== at 0x4A04B32: calloc (vg_replace_malloc.c:279) ==15689== by 0x332F01871D: virAlloc (in /usr/lib64/libvirt.so.0.6.3) ==15689== by 0x332F01672C: virHashCreate (in /usr/lib64/libvirt.so.0.6.3) ==15689== by 0x332F0212AA: (within /usr/lib64/libvirt.so.0.6.3) ==15689== by 0x332F02B5D0: (within /usr/lib64/libvirt.so.0.6.3) ==15689== by 0x40096D: main (in /tmp/mytest) ==15689== ==15689== ==15689== 285,568 bytes in 45 blocks are possibly lost in loss record 36 of 39 ==15689== at 0x4A04B32: calloc (vg_replace_malloc.c:279) ==15689== by 0x332F0186FB: virAllocN (in /usr/lib64/libvirt.so.0.6.3) ==15689== by 0x332F01674E: virHashCreate (in /usr/lib64/libvirt.so.0.6.3) ==15689== by 0x332F021332: (within /usr/lib64/libvirt.so.0.6.3) ==15689== by 0x332F02B5D0: (within /usr/lib64/libvirt.so.0.6.3) ==15689== by 0x40096D: main (in /tmp/mytest) ==15689== ==15689== LEAK SUMMARY: ==15689== definitely lost: 339,176 bytes in 5,983 blocks. ==15689== indirectly lost: 67,754,289 bytes in 96,866 blocks. ==15689== possibly lost: 285,600 bytes in 46 blocks. ==15689== still reachable: 648,652 bytes in 432 blocks. ==15689== suppressed: 0 bytes in 0 blocks. ==15689== Reachable blocks (those to which a pointer was found) are not shown. ==15689== To see them, rerun with: --show-reachable=yes Version-Release number of selected component (if applicable): ------------------------------------------------------------ Linux 2.6.18-164.15.1.el5 How reproducible: ----------------- 100% Steps to Reproduce: ------------------- 1. gcc -o mytest mytest.c -lvirt -I/usr/include/libvirt 2. Set up a domain named "mytestdomain" to be used in the test program. 3. Run the mytest program 4. ps -eo rss,cmd | grep mytest Actual results: --------------- The rss memory continues growing. Expected results: ---------------- The rss memory usage should be relatively flat. Additional info: