Bug 590073

Summary: Memory leak in libvirtd
Product: Red Hat Enterprise Linux 5 Reporter: Nandini Chandra <nachandr>
Component: libvirtAssignee: Laine Stump <laine>
Status: CLOSED ERRATA QA Contact: Virtualization Bugs <virt-bugs>
Severity: urgent Docs Contact:
Priority: urgent    
Version: 5.4CC: alexander, chris.lober, hbrock, herrold, james.brown, jdenemar, juzhang, jwest, jyang, mjenner, pbonzini, plyons, samuel.kielek, slords, smayhew, tao, virt-maint, xen-maint, xhu, ydu
Target Milestone: rcKeywords: ZStream
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: libvirt-0.8.2-1.el5 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
: 606919 (view as bug list) Environment:
Last Closed: 2011-01-13 23:12:06 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 593339, 606919, 619711    
Attachments:
Description Flags
output of Valgrind when libvirtd was run under Valgrind none

Description Nandini Chandra 2010-05-07 16:42:02 UTC
Created attachment 412398 [details]
output of Valgrind when libvirtd was run under Valgrind

Description of problem:
The leak is pretty slow, but becomes problematic for systems that have been running a long time.For example, on one of the customer's systems that had been running for ~2 months with ~25 VM's, libvirtd was using ~2.3GiB of memory (over half of the total memory allocated to the dom0).

Snippet from the output of Valgrind when libvirtd was run under Valgrind:
valgrind -v --leak-check=full --show-reachable=yes --log-file=libvirtd.memcheck /usr/sbin/libvirtd --daemon
==3876== 789,432 bytes in 8,938 blocks are definitely lost in loss record 417 of 417
==3876==    at 0x4A05F1D: realloc (vg_replace_malloc.c:476)
==3876==    by 0x384F0191AE: virReallocN (memory.c:160)
==3876==    by 0x384F06166A: xenUnifiedAddDomainInfo  (xen_unified.c:1688)
==3876==    by 0x384F076A01: xenStoreDomainIntroduced (xs_internal.c:1373)
==3876==    by 0x384F0775FD: xenStoreWatchEvent       (xs_internal.c:1303)
==3876==    by 0x40E3FE: virEventRunOnce (event.c:451)
==3876==    by 0x40F7DE: qemudRunLoop (qemud.c:2079)
==3876==    by 0x413DE8: main (qemud.c:2956)

Version-Release number of selected component (if applicable):
libvirt-0.6.3-20.1.el5_4


How reproducible:
Consistently


Steps to Reproduce:
1.Run libvirtd on a dom0 long enough(atleast a week) 
2.Make sure dom0 has numerous guests.
3.Verify the memory usage of libvirtd using 
ps auxwww|grep libvirtd
  
Actual results:
libvirtd slowly leaks memory

Expected results:
libvirtd should not leak memory

Additional info:
1)There are quite a few fixes for memory leaks in the upstream code which aren't include in RHEL 5.
For example:
(upstream fix: 7be5c26d746643b5ba889d62212a615050fed772)
virDomainPtr xenXMDomainDefineXML(virConnectPtr conn, const char *xml) {
    virDomainPtr ret;
-    virDomainPtr olddomain;
<snip>
-        /* XXX wtf.com is this line for - it appears to be amemory leak */
-        if (!(olddomain = virGetDomain(conn, def->name, entry->def->uuid)))
-            goto error;

virGetDomain allocs() the domain.

2)I've also attached the output of Valgrind when libvirtd was run under Valgrind.(libvirtd.memcheck)

Comment 5 Laine Stump 2010-06-22 16:58:17 UTC
Here's Dan Berrange's comments on the two biggest offenders in the valgrind output.

On 06/08/2010 04:53 AM, Daniel P. Berrange wrote:

> On Mon, Jun 07, 2010 at 11:02:21PM -0400, Laine Stump wrote:
>> Before I dig into this, do either of the following memory leaks look 
>> familiar to any libvirt guys? (as subject says, they're from 0.6.3 in 
>> RHEL 5.5)
>>
>> ==3876== 357,840 bytes in 8,946 blocks are definitely lost in loss 
>> record 416 of 417
>> ==3876==    at 0x4A05E1C: malloc (vg_replace_malloc.c:195)
>> ==3876==    by 0x346A2019D9: read_message (xs.c:768)
>> ==3876==    by 0x346A201B4B: read_thread (xs.c:824)
>> ==3876==    by 0x346760673C: start_thread (pthread_create.c:301)
>> ==3876==    by 0x3466ED3D1C: clone (in /lib64/libc-2.5.so)
>
> That's a XenStore bug and I'm not sure its easily fixable. When you
> register for a watch notification with xenstore it spawns a background
> thread for that. When you close your xenstore handle it uses the pure
> evil  pthread_cancel() to kill that thread. Memory cleanup ? What's 
> that ?  Would need todo something with cancelation handlers or rewrite
> the code to not use pthread_cancel().
>
> You'll leak one record for each libvirt Xen connection you open &
> close
>
>> ==3876==
>> ==3876== 789,432 bytes in 8,938 blocks are definitely lost in loss 
>> record 417 of 417
>> ==3876==    at 0x4A05F1D: realloc (vg_replace_malloc.c:476)
>> ==3876==    by 0x384F0191AE: virReallocN (memory.c:160)
>> ==3876==    by 0x384F06166A: xenUnifiedAddDomainInfo (xen_unified.c:1688)
>> ==3876==    by 0x384F076A01: xenStoreDomainIntroduced (xs_internal.c:1373)
>> ==3876==    by 0x384F0775FD: xenStoreWatchEvent (xs_internal.c:1303)
>> ==3876==    by 0x40E3FE: virEventRunOnce (event.c:451)
>> ==3876==    by 0x40F7DE: qemudRunLoop (qemud.c:2079)
>> ==3876==    by 0x413DE8: main (qemud.c:2956)
>
> I'm not sure what this is caused buy offhand. I might say that it was a
> result of the virConnectPtr ref counting not being right, but if that
> were the case I'd expect to see valgrind report that 'virConnectPtr' was
> leaked too, but it doesn't. So it must be some other issue.

Comment 6 Paolo Bonzini 2010-06-23 11:39:52 UTC
These ones are also noticeable:

==3876== 262,168 bytes in 1 blocks are indirectly lost in loss record 414 of 417
==3876==    at 0x4A05140: calloc (vg_replace_malloc.c:418)
==3876==    by 0x384F01921D: virAlloc (memory.c:100)
==3876==    by 0x410B20: qemudDispatchClientEvent (qemud.c:1741)
==3876==    by 0x40E3FE: virEventRunOnce (event.c:451)
==3876==    by 0x40F7DE: qemudRunLoop (qemud.c:2079)
==3876==    by 0x413DE8: main (qemud.c:2956)
==3876== 

==3876== 262,496 (8 direct, 262,488 indirect) bytes in 1 blocks are definitely lost in loss record 415 of 417
==3876==    at 0x4A05F1D: realloc (vg_replace_malloc.c:476)
==3876==    by 0x384F0191AE: virReallocN (memory.c:160)
==3876==    by 0x40FA16: qemudRunLoop (qemud.c:2206)
==3876==    by 0x413DE8: main (qemud.c:2956)
==3876==

Comment 7 Laine Stump 2010-07-19 16:01:01 UTC
Note that Bug 606919, which was cloned from this bug, is in MODIFIED state. The modified xen userspace package xen-3.0.3-114.el5 will eliminate the first of the leaks in comment 5.

I'm looking into the other leak in comment 5 now.

The two leaks noted in comment 6 don't seem as concerning, since each only occurs once.

Comment 8 Laine Stump 2010-07-23 20:30:29 UTC
I'm unable to reproduce the second of the 2 leaks in Comment 5 on my RHEL5 system (with libvirt-0.6.3-33.el5_5.1 and xen-3.0.3-105.el5_5.4). Can you provide more information on what you're doing on the system to produce the leak?

(On my setup, I run libvirt under valgrind as described above, then startup a few guests, and leave virt-manager running overnight (it's calling libvirt several times/second).

Comment 10 Paolo Bonzini 2010-07-29 11:59:48 UTC
Maybe I'm missing something obvious, but here:

void
xenUnifiedDomainInfoListFree(xenUnifiedDomainInfoListPtr list)
{
    int i;

    if (list == NULL)
        return;

    for (i=0; i<list->count; i++) {
        VIR_FREE(list->doms[i]->name);
        VIR_FREE(list->doms[i]);
    }
    VIR_FREE(list);
}

isn't a VIR_FREE(list->doms); missing??

Comment 11 Laine Stump 2010-07-29 13:07:38 UTC
I guess my time would have been better spent examining the code rather than trying to reproduce first.

That is definitely the problem. Thanks, Paolo!

Comment 13 Jiri Denemark 2010-07-29 18:55:45 UTC
Fix built into libvirt-0.6.3-37.el5

Comment 15 Jiri Denemark 2010-09-02 11:58:40 UTC
Fixed in libvirt-0.8.2-1.el5

Comment 18 yanbing du 2010-10-27 08:21:42 UTC
Verified the bug on RHEL5.6-Server-x86_64-Xen, RHEL5.6-Client-x86_64-Xen and  RHEL5.6-Server-ia64-Xen.
# rpm -q libvirt 
libvirt-0.8.2-8.el5
Steps:
1. Open two terminal.
2. On one terminal, run "while true; do echo connect; done | virsh",
which will repeatedly connect and disconnect to/from libvirtd.
The connect/disconnect loop is not the best way to trigger this leak but seems
to be the easiest one and it shows quite fast.
3. On the other terminal, run "top -d1 -p $(pidof libvirtd)" to watch memory
consumption of libvirtd, take note the value of RES column.

With the new package the memory remain more-or-less steady, it sometimes stay
unchanged for a while or even go down. So this bug is fixed. Move to VERIFIED.

Comment 19 xhu 2010-10-29 06:59:32 UTC
Verified on RHEL5u6-Client-i386-xen and it passed:
kernel-2.6.18-228.el5xen
libvirt-0.8.2-9.el5
xen-3.0.3-117.el5

Comment 21 errata-xmlrpc 2011-01-13 23:12:06 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHEA-2011-0060.html