Description of problem: If you run the attached test program, memory usage grows without bounds. Version-Release number of selected component (if applicable): 0.2.3-4.el5 How reproducible: 100% Steps to Reproduce: 1. gcc -o vtest test.c -lvirt -I/usr/include/libvirt -Wall 2. ./vtest 3. ps auwwx | grep vtest Actual results: Unbound memory usage Expected results: Flat memory usage Additional info: I found this using valgrind initially. This eventually breaks fence_xvmd, which is used for virtual machine fencing.
Created attachment 160260 [details] Reproducer.
Running with a debugging memory allocator: 0x2aaaaaf07e48 (1600 bytes) allocation trace: 0x2aaaaaac2fd1 0x2aaaaaf077f0 (1600 bytes) allocation trace: 0x2aaaaaac2fd1 ... GDB says: (gdb) x/i 0x2aaaaaac2fd1 0x2aaaaaac2fd1 <__virGetNetwork+545>: test %rax,%rax
There's also a pile of 8, 16, and 32-byte bits left over from regcomp() it looks like.
My error - 8 and, 24, and 32-byte chunks are not leaked over time, only 16-byte chunks it seems. Still trying to figure out where.
It's consistent; over two runs of two different time periods, I leaked 96 1600-byte blocks and 96 16-byte blocks. I'm still tracking down where the 16-byte blocks are leaked from.
Seems to be: (gdb) x/i 0x2aaaaaac2fae 0x2aaaaaac2fae <__virGetNetwork+510>: test %rax,%rax
Backtrace for the 1600 byte alloc #2 0x00002aaaaacee6aa in calloc (sz=1, nmemb=1600) at alloc.c:958 #3 0x00002aaaaaac2fd1 in virHashCreate (size=50) at hash.c:92 #4 0x00002aaaaaadbd48 in xenXMOpen (conn=<value optimized out>, name=0x5e7d <Address 0x5e7d out of bounds>, flags=19) at xm_internal.c:486 #5 0x00002aaaaaaca73c in xenUnifiedOpen (conn=0x2aaaaaef9cf0, name=0x2aaaaaae0014 "Xen", flags=0) at xen_unified.c:123 #6 0x00002aaaaaac1ac2 in do_open (name=0x0, flags=0) at libvirt.c:304 The stack frame actually points here: #3 0x00002aaaaaac2fd1 in virHashCreate (size=50) at hash.c:92 92 table->table = calloc(1, size * sizeof(virHashEntry)); So, xenXMOpen() is creating it (looks like).
Ah, maybe this is the problem: /* * Free the config files in the cache if this is the * last connection */ int xenXMClose(virConnectPtr conn ATTRIBUTE_UNUSED) { if (!nconnections--) { virHashFree(nameConfigMap, NULL); nameConfigMap = NULL; virHashFree(configCache, xenXMConfigFree); configCache = NULL; } return (0); } Postdecrement operator? Should maybe be: if (!--nconnections)
I hate when people mix operators like that :-( I will double check tomorrow morning. Thanks for the heads-up ! Concerning RHEL-5, I doubt we will be able to get all the ACKs to fix the leak, respinning libvirt and associated QA might be too much, Daniel
In addition to the problem mentioned in #10, there's a second bug in the error case. If libvirt fails to open a connection to Xen it will also leak memory: ==6696== 18,400 bytes in 100 blocks are definitely lost in loss record 20 of 20 ==6696== at 0x4A05809: malloc (vg_replace_malloc.c:149) ==6696== by 0x34818106AA: xenUnifiedOpen (xen_unified.c:96) ==6696== by 0x3481807AC1: do_open (libvirt.c:304) ==6696== by 0x400652: main (in /home/berrange/demo) ==6696== It allocates private data priv = malloc (sizeof *priv); And then later has an failure path which doesn't free 'priv': /* If as root, then all drivers must succeed. If non-root, then only proxy must succeed */ if (!priv->opened[i] && (getuid() == 0 || i == proxy_offset)) { for (j = 0; j < i; ++j) if (priv->opened[j]) drivers[j]->close (conn); return VIR_DRV_OPEN_ERROR; } }
Created attachment 160269 [details] Fix postdecrement which should be predecrement.
That patch definitely helps. I don't know if it solves the one the other Dan found ;)
WRT to comment #10, latest upstream has already changed that fragement to int xenXMClose(virConnectPtr conn ATTRIBUTE_UNUSED) { nconnections--; if (nconnections <= 0) { virHashFree(nameConfigMap, NULL); nameConfigMap = NULL; virHashFree(configCache, xenXMConfigFree); configCache = NULL; } return (0); }
==30951== 3,172 (336 direct, 2,836 indirect) bytes in 2 blocks are definitely lost in loss record 60 of 67 ==30951== at 0x4A04B32: calloc (vg_replace_malloc.c:279) ==30951== by 0x4EA8004: virGetConnect (hash.c:662) ==30951== by 0x4EA6A76: do_open (libvirt.c:297) ==30951== by 0x403840: xvmd_loop (fence_xvmd.c:543) ==30951== by 0x403FF8: main (fence_xvmd.c:782) Also seems to be a bit here. I'll see if I can track it down.
The upstream version looks more like the rest of the libvirt code base; I'm obsoleting my patch.
Scratch comment #16; it looks like my bad.
The fix in the patch was applied upstream on 04-Jul-07 , so this is part of 0.3.0 and 0.3.1 latest upstream releases. Daniel
900151 build (dist-5E-qu-candidate, RHEL-5:libvirt-0_2_3-5_el5) libvirt-0.2.3-5.el5 has been built with the patch. For QA I guess that having a machine with Xen installed and running the reproducer in comment #1 , and watching the memory consuption of the program stay flat over time should be sufficient to assert the bug is fixed. Daniel
An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on the solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHEA-2007-0643.html