Bug 590073
Summary: | Memory leak in libvirtd | ||||||
---|---|---|---|---|---|---|---|
Product: | Red Hat Enterprise Linux 5 | Reporter: | Nandini Chandra <nachandr> | ||||
Component: | libvirt | Assignee: | Laine Stump <laine> | ||||
Status: | CLOSED ERRATA | QA Contact: | Virtualization Bugs <virt-bugs> | ||||
Severity: | urgent | Docs Contact: | |||||
Priority: | urgent | ||||||
Version: | 5.4 | CC: | alexander, chris.lober, hbrock, herrold, james.brown, jdenemar, juzhang, jwest, jyang, mjenner, pbonzini, plyons, samuel.kielek, slords, smayhew, tao, virt-maint, xen-maint, xhu, ydu | ||||
Target Milestone: | rc | Keywords: | ZStream | ||||
Target Release: | --- | ||||||
Hardware: | x86_64 | ||||||
OS: | Linux | ||||||
Whiteboard: | |||||||
Fixed In Version: | libvirt-0.8.2-1.el5 | Doc Type: | Bug Fix | ||||
Doc Text: | Story Points: | --- | |||||
Clone Of: | |||||||
: | 606919 (view as bug list) | Environment: | |||||
Last Closed: | 2011-01-13 23:12:06 UTC | Type: | --- | ||||
Regression: | --- | Mount Type: | --- | ||||
Documentation: | --- | CRM: | |||||
Verified Versions: | Category: | --- | |||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
Cloudforms Team: | --- | Target Upstream Version: | |||||
Embargoed: | |||||||
Bug Depends On: | |||||||
Bug Blocks: | 593339, 606919, 619711 | ||||||
Attachments: |
|
Here's Dan Berrange's comments on the two biggest offenders in the valgrind output. On 06/08/2010 04:53 AM, Daniel P. Berrange wrote: > On Mon, Jun 07, 2010 at 11:02:21PM -0400, Laine Stump wrote: >> Before I dig into this, do either of the following memory leaks look >> familiar to any libvirt guys? (as subject says, they're from 0.6.3 in >> RHEL 5.5) >> >> ==3876== 357,840 bytes in 8,946 blocks are definitely lost in loss >> record 416 of 417 >> ==3876== at 0x4A05E1C: malloc (vg_replace_malloc.c:195) >> ==3876== by 0x346A2019D9: read_message (xs.c:768) >> ==3876== by 0x346A201B4B: read_thread (xs.c:824) >> ==3876== by 0x346760673C: start_thread (pthread_create.c:301) >> ==3876== by 0x3466ED3D1C: clone (in /lib64/libc-2.5.so) > > That's a XenStore bug and I'm not sure its easily fixable. When you > register for a watch notification with xenstore it spawns a background > thread for that. When you close your xenstore handle it uses the pure > evil pthread_cancel() to kill that thread. Memory cleanup ? What's > that ? Would need todo something with cancelation handlers or rewrite > the code to not use pthread_cancel(). > > You'll leak one record for each libvirt Xen connection you open & > close > >> ==3876== >> ==3876== 789,432 bytes in 8,938 blocks are definitely lost in loss >> record 417 of 417 >> ==3876== at 0x4A05F1D: realloc (vg_replace_malloc.c:476) >> ==3876== by 0x384F0191AE: virReallocN (memory.c:160) >> ==3876== by 0x384F06166A: xenUnifiedAddDomainInfo (xen_unified.c:1688) >> ==3876== by 0x384F076A01: xenStoreDomainIntroduced (xs_internal.c:1373) >> ==3876== by 0x384F0775FD: xenStoreWatchEvent (xs_internal.c:1303) >> ==3876== by 0x40E3FE: virEventRunOnce (event.c:451) >> ==3876== by 0x40F7DE: qemudRunLoop (qemud.c:2079) >> ==3876== by 0x413DE8: main (qemud.c:2956) > > I'm not sure what this is caused buy offhand. I might say that it was a > result of the virConnectPtr ref counting not being right, but if that > were the case I'd expect to see valgrind report that 'virConnectPtr' was > leaked too, but it doesn't. So it must be some other issue. These ones are also noticeable: ==3876== 262,168 bytes in 1 blocks are indirectly lost in loss record 414 of 417 ==3876== at 0x4A05140: calloc (vg_replace_malloc.c:418) ==3876== by 0x384F01921D: virAlloc (memory.c:100) ==3876== by 0x410B20: qemudDispatchClientEvent (qemud.c:1741) ==3876== by 0x40E3FE: virEventRunOnce (event.c:451) ==3876== by 0x40F7DE: qemudRunLoop (qemud.c:2079) ==3876== by 0x413DE8: main (qemud.c:2956) ==3876== ==3876== 262,496 (8 direct, 262,488 indirect) bytes in 1 blocks are definitely lost in loss record 415 of 417 ==3876== at 0x4A05F1D: realloc (vg_replace_malloc.c:476) ==3876== by 0x384F0191AE: virReallocN (memory.c:160) ==3876== by 0x40FA16: qemudRunLoop (qemud.c:2206) ==3876== by 0x413DE8: main (qemud.c:2956) ==3876== Note that Bug 606919, which was cloned from this bug, is in MODIFIED state. The modified xen userspace package xen-3.0.3-114.el5 will eliminate the first of the leaks in comment 5. I'm looking into the other leak in comment 5 now. The two leaks noted in comment 6 don't seem as concerning, since each only occurs once. I'm unable to reproduce the second of the 2 leaks in Comment 5 on my RHEL5 system (with libvirt-0.6.3-33.el5_5.1 and xen-3.0.3-105.el5_5.4). Can you provide more information on what you're doing on the system to produce the leak? (On my setup, I run libvirt under valgrind as described above, then startup a few guests, and leave virt-manager running overnight (it's calling libvirt several times/second). Maybe I'm missing something obvious, but here: void xenUnifiedDomainInfoListFree(xenUnifiedDomainInfoListPtr list) { int i; if (list == NULL) return; for (i=0; i<list->count; i++) { VIR_FREE(list->doms[i]->name); VIR_FREE(list->doms[i]); } VIR_FREE(list); } isn't a VIR_FREE(list->doms); missing?? I guess my time would have been better spent examining the code rather than trying to reproduce first. That is definitely the problem. Thanks, Paolo! Fix built into libvirt-0.6.3-37.el5 Fixed in libvirt-0.8.2-1.el5 Verified the bug on RHEL5.6-Server-x86_64-Xen, RHEL5.6-Client-x86_64-Xen and RHEL5.6-Server-ia64-Xen. # rpm -q libvirt libvirt-0.8.2-8.el5 Steps: 1. Open two terminal. 2. On one terminal, run "while true; do echo connect; done | virsh", which will repeatedly connect and disconnect to/from libvirtd. The connect/disconnect loop is not the best way to trigger this leak but seems to be the easiest one and it shows quite fast. 3. On the other terminal, run "top -d1 -p $(pidof libvirtd)" to watch memory consumption of libvirtd, take note the value of RES column. With the new package the memory remain more-or-less steady, it sometimes stay unchanged for a while or even go down. So this bug is fixed. Move to VERIFIED. Verified on RHEL5u6-Client-i386-xen and it passed: kernel-2.6.18-228.el5xen libvirt-0.8.2-9.el5 xen-3.0.3-117.el5 An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on therefore solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHEA-2011-0060.html |
Created attachment 412398 [details] output of Valgrind when libvirtd was run under Valgrind Description of problem: The leak is pretty slow, but becomes problematic for systems that have been running a long time.For example, on one of the customer's systems that had been running for ~2 months with ~25 VM's, libvirtd was using ~2.3GiB of memory (over half of the total memory allocated to the dom0). Snippet from the output of Valgrind when libvirtd was run under Valgrind: valgrind -v --leak-check=full --show-reachable=yes --log-file=libvirtd.memcheck /usr/sbin/libvirtd --daemon ==3876== 789,432 bytes in 8,938 blocks are definitely lost in loss record 417 of 417 ==3876== at 0x4A05F1D: realloc (vg_replace_malloc.c:476) ==3876== by 0x384F0191AE: virReallocN (memory.c:160) ==3876== by 0x384F06166A: xenUnifiedAddDomainInfo (xen_unified.c:1688) ==3876== by 0x384F076A01: xenStoreDomainIntroduced (xs_internal.c:1373) ==3876== by 0x384F0775FD: xenStoreWatchEvent (xs_internal.c:1303) ==3876== by 0x40E3FE: virEventRunOnce (event.c:451) ==3876== by 0x40F7DE: qemudRunLoop (qemud.c:2079) ==3876== by 0x413DE8: main (qemud.c:2956) Version-Release number of selected component (if applicable): libvirt-0.6.3-20.1.el5_4 How reproducible: Consistently Steps to Reproduce: 1.Run libvirtd on a dom0 long enough(atleast a week) 2.Make sure dom0 has numerous guests. 3.Verify the memory usage of libvirtd using ps auxwww|grep libvirtd Actual results: libvirtd slowly leaks memory Expected results: libvirtd should not leak memory Additional info: 1)There are quite a few fixes for memory leaks in the upstream code which aren't include in RHEL 5. For example: (upstream fix: 7be5c26d746643b5ba889d62212a615050fed772) virDomainPtr xenXMDomainDefineXML(virConnectPtr conn, const char *xml) { virDomainPtr ret; - virDomainPtr olddomain; <snip> - /* XXX wtf.com is this line for - it appears to be amemory leak */ - if (!(olddomain = virGetDomain(conn, def->name, entry->def->uuid))) - goto error; virGetDomain allocs() the domain. 2)I've also attached the output of Valgrind when libvirtd was run under Valgrind.(libvirtd.memcheck)