Bug 590073 - Memory leak in libvirtd
Summary: Memory leak in libvirtd
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 5
Classification: Red Hat
Component: libvirt
Version: 5.4
Hardware: x86_64
OS: Linux
urgent
urgent
Target Milestone: rc
: ---
Assignee: Laine Stump
QA Contact: Virtualization Bugs
URL:
Whiteboard:
Depends On:
Blocks: 593339 606919 619711
TreeView+ depends on / blocked
 
Reported: 2010-05-07 16:42 UTC by Nandini Chandra
Modified: 2018-10-27 13:42 UTC (History)
20 users (show)

Fixed In Version: libvirt-0.8.2-1.el5
Doc Type: Bug Fix
Doc Text:
Clone Of:
: 606919 (view as bug list)
Environment:
Last Closed: 2011-01-13 23:12:06 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
output of Valgrind when libvirtd was run under Valgrind (340.18 KB, application/octet-stream)
2010-05-07 16:42 UTC, Nandini Chandra
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHEA-2011:0060 0 normal SHIPPED_LIVE libvirt bug fix and enhancement update 2011-01-12 17:22:30 UTC

Description Nandini Chandra 2010-05-07 16:42:02 UTC
Created attachment 412398 [details]
output of Valgrind when libvirtd was run under Valgrind

Description of problem:
The leak is pretty slow, but becomes problematic for systems that have been running a long time.For example, on one of the customer's systems that had been running for ~2 months with ~25 VM's, libvirtd was using ~2.3GiB of memory (over half of the total memory allocated to the dom0).

Snippet from the output of Valgrind when libvirtd was run under Valgrind:
valgrind -v --leak-check=full --show-reachable=yes --log-file=libvirtd.memcheck /usr/sbin/libvirtd --daemon
==3876== 789,432 bytes in 8,938 blocks are definitely lost in loss record 417 of 417
==3876==    at 0x4A05F1D: realloc (vg_replace_malloc.c:476)
==3876==    by 0x384F0191AE: virReallocN (memory.c:160)
==3876==    by 0x384F06166A: xenUnifiedAddDomainInfo  (xen_unified.c:1688)
==3876==    by 0x384F076A01: xenStoreDomainIntroduced (xs_internal.c:1373)
==3876==    by 0x384F0775FD: xenStoreWatchEvent       (xs_internal.c:1303)
==3876==    by 0x40E3FE: virEventRunOnce (event.c:451)
==3876==    by 0x40F7DE: qemudRunLoop (qemud.c:2079)
==3876==    by 0x413DE8: main (qemud.c:2956)

Version-Release number of selected component (if applicable):
libvirt-0.6.3-20.1.el5_4


How reproducible:
Consistently


Steps to Reproduce:
1.Run libvirtd on a dom0 long enough(atleast a week) 
2.Make sure dom0 has numerous guests.
3.Verify the memory usage of libvirtd using 
ps auxwww|grep libvirtd
  
Actual results:
libvirtd slowly leaks memory

Expected results:
libvirtd should not leak memory

Additional info:
1)There are quite a few fixes for memory leaks in the upstream code which aren't include in RHEL 5.
For example:
(upstream fix: 7be5c26d746643b5ba889d62212a615050fed772)
virDomainPtr xenXMDomainDefineXML(virConnectPtr conn, const char *xml) {
    virDomainPtr ret;
-    virDomainPtr olddomain;
<snip>
-        /* XXX wtf.com is this line for - it appears to be amemory leak */
-        if (!(olddomain = virGetDomain(conn, def->name, entry->def->uuid)))
-            goto error;

virGetDomain allocs() the domain.

2)I've also attached the output of Valgrind when libvirtd was run under Valgrind.(libvirtd.memcheck)

Comment 5 Laine Stump 2010-06-22 16:58:17 UTC
Here's Dan Berrange's comments on the two biggest offenders in the valgrind output.

On 06/08/2010 04:53 AM, Daniel P. Berrange wrote:

> On Mon, Jun 07, 2010 at 11:02:21PM -0400, Laine Stump wrote:
>> Before I dig into this, do either of the following memory leaks look 
>> familiar to any libvirt guys? (as subject says, they're from 0.6.3 in 
>> RHEL 5.5)
>>
>> ==3876== 357,840 bytes in 8,946 blocks are definitely lost in loss 
>> record 416 of 417
>> ==3876==    at 0x4A05E1C: malloc (vg_replace_malloc.c:195)
>> ==3876==    by 0x346A2019D9: read_message (xs.c:768)
>> ==3876==    by 0x346A201B4B: read_thread (xs.c:824)
>> ==3876==    by 0x346760673C: start_thread (pthread_create.c:301)
>> ==3876==    by 0x3466ED3D1C: clone (in /lib64/libc-2.5.so)
>
> That's a XenStore bug and I'm not sure its easily fixable. When you
> register for a watch notification with xenstore it spawns a background
> thread for that. When you close your xenstore handle it uses the pure
> evil  pthread_cancel() to kill that thread. Memory cleanup ? What's 
> that ?  Would need todo something with cancelation handlers or rewrite
> the code to not use pthread_cancel().
>
> You'll leak one record for each libvirt Xen connection you open &
> close
>
>> ==3876==
>> ==3876== 789,432 bytes in 8,938 blocks are definitely lost in loss 
>> record 417 of 417
>> ==3876==    at 0x4A05F1D: realloc (vg_replace_malloc.c:476)
>> ==3876==    by 0x384F0191AE: virReallocN (memory.c:160)
>> ==3876==    by 0x384F06166A: xenUnifiedAddDomainInfo (xen_unified.c:1688)
>> ==3876==    by 0x384F076A01: xenStoreDomainIntroduced (xs_internal.c:1373)
>> ==3876==    by 0x384F0775FD: xenStoreWatchEvent (xs_internal.c:1303)
>> ==3876==    by 0x40E3FE: virEventRunOnce (event.c:451)
>> ==3876==    by 0x40F7DE: qemudRunLoop (qemud.c:2079)
>> ==3876==    by 0x413DE8: main (qemud.c:2956)
>
> I'm not sure what this is caused buy offhand. I might say that it was a
> result of the virConnectPtr ref counting not being right, but if that
> were the case I'd expect to see valgrind report that 'virConnectPtr' was
> leaked too, but it doesn't. So it must be some other issue.

Comment 6 Paolo Bonzini 2010-06-23 11:39:52 UTC
These ones are also noticeable:

==3876== 262,168 bytes in 1 blocks are indirectly lost in loss record 414 of 417
==3876==    at 0x4A05140: calloc (vg_replace_malloc.c:418)
==3876==    by 0x384F01921D: virAlloc (memory.c:100)
==3876==    by 0x410B20: qemudDispatchClientEvent (qemud.c:1741)
==3876==    by 0x40E3FE: virEventRunOnce (event.c:451)
==3876==    by 0x40F7DE: qemudRunLoop (qemud.c:2079)
==3876==    by 0x413DE8: main (qemud.c:2956)
==3876== 

==3876== 262,496 (8 direct, 262,488 indirect) bytes in 1 blocks are definitely lost in loss record 415 of 417
==3876==    at 0x4A05F1D: realloc (vg_replace_malloc.c:476)
==3876==    by 0x384F0191AE: virReallocN (memory.c:160)
==3876==    by 0x40FA16: qemudRunLoop (qemud.c:2206)
==3876==    by 0x413DE8: main (qemud.c:2956)
==3876==

Comment 7 Laine Stump 2010-07-19 16:01:01 UTC
Note that Bug 606919, which was cloned from this bug, is in MODIFIED state. The modified xen userspace package xen-3.0.3-114.el5 will eliminate the first of the leaks in comment 5.

I'm looking into the other leak in comment 5 now.

The two leaks noted in comment 6 don't seem as concerning, since each only occurs once.

Comment 8 Laine Stump 2010-07-23 20:30:29 UTC
I'm unable to reproduce the second of the 2 leaks in Comment 5 on my RHEL5 system (with libvirt-0.6.3-33.el5_5.1 and xen-3.0.3-105.el5_5.4). Can you provide more information on what you're doing on the system to produce the leak?

(On my setup, I run libvirt under valgrind as described above, then startup a few guests, and leave virt-manager running overnight (it's calling libvirt several times/second).

Comment 10 Paolo Bonzini 2010-07-29 11:59:48 UTC
Maybe I'm missing something obvious, but here:

void
xenUnifiedDomainInfoListFree(xenUnifiedDomainInfoListPtr list)
{
    int i;

    if (list == NULL)
        return;

    for (i=0; i<list->count; i++) {
        VIR_FREE(list->doms[i]->name);
        VIR_FREE(list->doms[i]);
    }
    VIR_FREE(list);
}

isn't a VIR_FREE(list->doms); missing??

Comment 11 Laine Stump 2010-07-29 13:07:38 UTC
I guess my time would have been better spent examining the code rather than trying to reproduce first.

That is definitely the problem. Thanks, Paolo!

Comment 13 Jiri Denemark 2010-07-29 18:55:45 UTC
Fix built into libvirt-0.6.3-37.el5

Comment 15 Jiri Denemark 2010-09-02 11:58:40 UTC
Fixed in libvirt-0.8.2-1.el5

Comment 18 yanbing du 2010-10-27 08:21:42 UTC
Verified the bug on RHEL5.6-Server-x86_64-Xen, RHEL5.6-Client-x86_64-Xen and  RHEL5.6-Server-ia64-Xen.
# rpm -q libvirt 
libvirt-0.8.2-8.el5
Steps:
1. Open two terminal.
2. On one terminal, run "while true; do echo connect; done | virsh",
which will repeatedly connect and disconnect to/from libvirtd.
The connect/disconnect loop is not the best way to trigger this leak but seems
to be the easiest one and it shows quite fast.
3. On the other terminal, run "top -d1 -p $(pidof libvirtd)" to watch memory
consumption of libvirtd, take note the value of RES column.

With the new package the memory remain more-or-less steady, it sometimes stay
unchanged for a while or even go down. So this bug is fixed. Move to VERIFIED.

Comment 19 xhu 2010-10-29 06:59:32 UTC
Verified on RHEL5u6-Client-i386-xen and it passed:
kernel-2.6.18-228.el5xen
libvirt-0.8.2-9.el5
xen-3.0.3-117.el5

Comment 21 errata-xmlrpc 2011-01-13 23:12:06 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHEA-2011-0060.html


Note You need to log in before you can comment on or make changes to this bug.