Created attachment 412398 [details] output of Valgrind when libvirtd was run under Valgrind Description of problem: The leak is pretty slow, but becomes problematic for systems that have been running a long time.For example, on one of the customer's systems that had been running for ~2 months with ~25 VM's, libvirtd was using ~2.3GiB of memory (over half of the total memory allocated to the dom0). Snippet from the output of Valgrind when libvirtd was run under Valgrind: valgrind -v --leak-check=full --show-reachable=yes --log-file=libvirtd.memcheck /usr/sbin/libvirtd --daemon ==3876== 789,432 bytes in 8,938 blocks are definitely lost in loss record 417 of 417 ==3876== at 0x4A05F1D: realloc (vg_replace_malloc.c:476) ==3876== by 0x384F0191AE: virReallocN (memory.c:160) ==3876== by 0x384F06166A: xenUnifiedAddDomainInfo (xen_unified.c:1688) ==3876== by 0x384F076A01: xenStoreDomainIntroduced (xs_internal.c:1373) ==3876== by 0x384F0775FD: xenStoreWatchEvent (xs_internal.c:1303) ==3876== by 0x40E3FE: virEventRunOnce (event.c:451) ==3876== by 0x40F7DE: qemudRunLoop (qemud.c:2079) ==3876== by 0x413DE8: main (qemud.c:2956) Version-Release number of selected component (if applicable): libvirt-0.6.3-20.1.el5_4 How reproducible: Consistently Steps to Reproduce: 1.Run libvirtd on a dom0 long enough(atleast a week) 2.Make sure dom0 has numerous guests. 3.Verify the memory usage of libvirtd using ps auxwww|grep libvirtd Actual results: libvirtd slowly leaks memory Expected results: libvirtd should not leak memory Additional info: 1)There are quite a few fixes for memory leaks in the upstream code which aren't include in RHEL 5. For example: (upstream fix: 7be5c26d746643b5ba889d62212a615050fed772) virDomainPtr xenXMDomainDefineXML(virConnectPtr conn, const char *xml) { virDomainPtr ret; - virDomainPtr olddomain; <snip> - /* XXX wtf.com is this line for - it appears to be amemory leak */ - if (!(olddomain = virGetDomain(conn, def->name, entry->def->uuid))) - goto error; virGetDomain allocs() the domain. 2)I've also attached the output of Valgrind when libvirtd was run under Valgrind.(libvirtd.memcheck)
Here's Dan Berrange's comments on the two biggest offenders in the valgrind output. On 06/08/2010 04:53 AM, Daniel P. Berrange wrote: > On Mon, Jun 07, 2010 at 11:02:21PM -0400, Laine Stump wrote: >> Before I dig into this, do either of the following memory leaks look >> familiar to any libvirt guys? (as subject says, they're from 0.6.3 in >> RHEL 5.5) >> >> ==3876== 357,840 bytes in 8,946 blocks are definitely lost in loss >> record 416 of 417 >> ==3876== at 0x4A05E1C: malloc (vg_replace_malloc.c:195) >> ==3876== by 0x346A2019D9: read_message (xs.c:768) >> ==3876== by 0x346A201B4B: read_thread (xs.c:824) >> ==3876== by 0x346760673C: start_thread (pthread_create.c:301) >> ==3876== by 0x3466ED3D1C: clone (in /lib64/libc-2.5.so) > > That's a XenStore bug and I'm not sure its easily fixable. When you > register for a watch notification with xenstore it spawns a background > thread for that. When you close your xenstore handle it uses the pure > evil pthread_cancel() to kill that thread. Memory cleanup ? What's > that ? Would need todo something with cancelation handlers or rewrite > the code to not use pthread_cancel(). > > You'll leak one record for each libvirt Xen connection you open & > close > >> ==3876== >> ==3876== 789,432 bytes in 8,938 blocks are definitely lost in loss >> record 417 of 417 >> ==3876== at 0x4A05F1D: realloc (vg_replace_malloc.c:476) >> ==3876== by 0x384F0191AE: virReallocN (memory.c:160) >> ==3876== by 0x384F06166A: xenUnifiedAddDomainInfo (xen_unified.c:1688) >> ==3876== by 0x384F076A01: xenStoreDomainIntroduced (xs_internal.c:1373) >> ==3876== by 0x384F0775FD: xenStoreWatchEvent (xs_internal.c:1303) >> ==3876== by 0x40E3FE: virEventRunOnce (event.c:451) >> ==3876== by 0x40F7DE: qemudRunLoop (qemud.c:2079) >> ==3876== by 0x413DE8: main (qemud.c:2956) > > I'm not sure what this is caused buy offhand. I might say that it was a > result of the virConnectPtr ref counting not being right, but if that > were the case I'd expect to see valgrind report that 'virConnectPtr' was > leaked too, but it doesn't. So it must be some other issue.
These ones are also noticeable: ==3876== 262,168 bytes in 1 blocks are indirectly lost in loss record 414 of 417 ==3876== at 0x4A05140: calloc (vg_replace_malloc.c:418) ==3876== by 0x384F01921D: virAlloc (memory.c:100) ==3876== by 0x410B20: qemudDispatchClientEvent (qemud.c:1741) ==3876== by 0x40E3FE: virEventRunOnce (event.c:451) ==3876== by 0x40F7DE: qemudRunLoop (qemud.c:2079) ==3876== by 0x413DE8: main (qemud.c:2956) ==3876== ==3876== 262,496 (8 direct, 262,488 indirect) bytes in 1 blocks are definitely lost in loss record 415 of 417 ==3876== at 0x4A05F1D: realloc (vg_replace_malloc.c:476) ==3876== by 0x384F0191AE: virReallocN (memory.c:160) ==3876== by 0x40FA16: qemudRunLoop (qemud.c:2206) ==3876== by 0x413DE8: main (qemud.c:2956) ==3876==
Note that Bug 606919, which was cloned from this bug, is in MODIFIED state. The modified xen userspace package xen-3.0.3-114.el5 will eliminate the first of the leaks in comment 5. I'm looking into the other leak in comment 5 now. The two leaks noted in comment 6 don't seem as concerning, since each only occurs once.
I'm unable to reproduce the second of the 2 leaks in Comment 5 on my RHEL5 system (with libvirt-0.6.3-33.el5_5.1 and xen-3.0.3-105.el5_5.4). Can you provide more information on what you're doing on the system to produce the leak? (On my setup, I run libvirt under valgrind as described above, then startup a few guests, and leave virt-manager running overnight (it's calling libvirt several times/second).
Maybe I'm missing something obvious, but here: void xenUnifiedDomainInfoListFree(xenUnifiedDomainInfoListPtr list) { int i; if (list == NULL) return; for (i=0; i<list->count; i++) { VIR_FREE(list->doms[i]->name); VIR_FREE(list->doms[i]); } VIR_FREE(list); } isn't a VIR_FREE(list->doms); missing??
I guess my time would have been better spent examining the code rather than trying to reproduce first. That is definitely the problem. Thanks, Paolo!
Fix built into libvirt-0.6.3-37.el5
Fixed in libvirt-0.8.2-1.el5
Verified the bug on RHEL5.6-Server-x86_64-Xen, RHEL5.6-Client-x86_64-Xen and RHEL5.6-Server-ia64-Xen. # rpm -q libvirt libvirt-0.8.2-8.el5 Steps: 1. Open two terminal. 2. On one terminal, run "while true; do echo connect; done | virsh", which will repeatedly connect and disconnect to/from libvirtd. The connect/disconnect loop is not the best way to trigger this leak but seems to be the easiest one and it shows quite fast. 3. On the other terminal, run "top -d1 -p $(pidof libvirtd)" to watch memory consumption of libvirtd, take note the value of RES column. With the new package the memory remain more-or-less steady, it sometimes stay unchanged for a while or even go down. So this bug is fixed. Move to VERIFIED.
Verified on RHEL5u6-Client-i386-xen and it passed: kernel-2.6.18-228.el5xen libvirt-0.8.2-9.el5 xen-3.0.3-117.el5
An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on therefore solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHEA-2011-0060.html