590073 – Memory leak in libvirtd

Bug 590073 - Memory leak in libvirtd

Summary: Memory leak in libvirtd

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Enterprise Linux 5
Classification:	Red Hat
Component:	libvirt
Sub Component:
Version:	5.4
Hardware:	x86_64
OS:	Linux
Priority:	urgent
Severity:	urgent
Target Milestone:	rc
Target Release:	---
Assignee:	Laine Stump
QA Contact:	Virtualization Bugs
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:	593339 606919 619711
TreeView+	depends on / blocked

Reported:	2010-05-07 16:42 UTC by Nandini Chandra
Modified:	2018-10-27 13:42 UTC (History)
CC List:	20 users (show)
Fixed In Version:	libvirt-0.8.2-1.el5
Doc Type:	Bug Fix
Doc Text:
Clone Of:
Clones:	606919 (view as bug list)
Environment:
Last Closed:	2011-01-13 23:12:06 UTC
Target Upstream Version:
Embargoed:
Dependent Products:

Attachments	(Terms of Use)
output of Valgrind when libvirtd was run under Valgrind (340.18 KB, application/octet-stream) 2010-05-07 16:42 UTC, Nandini Chandra	no flags	Details
View All

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Product Errata	RHEA-2011:0060	0	normal	SHIPPED_LIVE	libvirt bug fix and enhancement update	2011-01-12 17:22:30 UTC

Description Nandini Chandra 2010-05-07 16:42:02 UTC

Created attachment 412398 [details]
output of Valgrind when libvirtd was run under Valgrind

Description of problem:
The leak is pretty slow, but becomes problematic for systems that have been running a long time.For example, on one of the customer's systems that had been running for ~2 months with ~25 VM's, libvirtd was using ~2.3GiB of memory (over half of the total memory allocated to the dom0).

Snippet from the output of Valgrind when libvirtd was run under Valgrind:
valgrind -v --leak-check=full --show-reachable=yes --log-file=libvirtd.memcheck /usr/sbin/libvirtd --daemon
==3876== 789,432 bytes in 8,938 blocks are definitely lost in loss record 417 of 417
==3876==    at 0x4A05F1D: realloc (vg_replace_malloc.c:476)
==3876==    by 0x384F0191AE: virReallocN (memory.c:160)
==3876==    by 0x384F06166A: xenUnifiedAddDomainInfo  (xen_unified.c:1688)
==3876==    by 0x384F076A01: xenStoreDomainIntroduced (xs_internal.c:1373)
==3876==    by 0x384F0775FD: xenStoreWatchEvent       (xs_internal.c:1303)
==3876==    by 0x40E3FE: virEventRunOnce (event.c:451)
==3876==    by 0x40F7DE: qemudRunLoop (qemud.c:2079)
==3876==    by 0x413DE8: main (qemud.c:2956)

Version-Release number of selected component (if applicable):
libvirt-0.6.3-20.1.el5_4


How reproducible:
Consistently


Steps to Reproduce:
1.Run libvirtd on a dom0 long enough(atleast a week) 
2.Make sure dom0 has numerous guests.
3.Verify the memory usage of libvirtd using 
ps auxwww|grep libvirtd
  
Actual results:
libvirtd slowly leaks memory

Expected results:
libvirtd should not leak memory

Additional info:
1)There are quite a few fixes for memory leaks in the upstream code which aren't include in RHEL 5.
For example:
(upstream fix: 7be5c26d746643b5ba889d62212a615050fed772)
virDomainPtr xenXMDomainDefineXML(virConnectPtr conn, const char *xml) {
    virDomainPtr ret;
-    virDomainPtr olddomain;
<snip>
-        /* XXX wtf.com is this line for - it appears to be amemory leak */
-        if (!(olddomain = virGetDomain(conn, def->name, entry->def->uuid)))
-            goto error;

virGetDomain allocs() the domain.

2)I've also attached the output of Valgrind when libvirtd was run under Valgrind.(libvirtd.memcheck)

Comment 5 Laine Stump 2010-06-22 16:58:17 UTC

Here's Dan Berrange's comments on the two biggest offenders in the valgrind output.

On 06/08/2010 04:53 AM, Daniel P. Berrange wrote:

> On Mon, Jun 07, 2010 at 11:02:21PM -0400, Laine Stump wrote:
>> Before I dig into this, do either of the following memory leaks look 
>> familiar to any libvirt guys? (as subject says, they're from 0.6.3 in 
>> RHEL 5.5)
>>
>> ==3876== 357,840 bytes in 8,946 blocks are definitely lost in loss 
>> record 416 of 417
>> ==3876==    at 0x4A05E1C: malloc (vg_replace_malloc.c:195)
>> ==3876==    by 0x346A2019D9: read_message (xs.c:768)
>> ==3876==    by 0x346A201B4B: read_thread (xs.c:824)
>> ==3876==    by 0x346760673C: start_thread (pthread_create.c:301)
>> ==3876==    by 0x3466ED3D1C: clone (in /lib64/libc-2.5.so)
>
> That's a XenStore bug and I'm not sure its easily fixable. When you
> register for a watch notification with xenstore it spawns a background
> thread for that. When you close your xenstore handle it uses the pure
> evil  pthread_cancel() to kill that thread. Memory cleanup ? What's 
> that ?  Would need todo something with cancelation handlers or rewrite
> the code to not use pthread_cancel().
>
> You'll leak one record for each libvirt Xen connection you open &
> close
>
>> ==3876==
>> ==3876== 789,432 bytes in 8,938 blocks are definitely lost in loss 
>> record 417 of 417
>> ==3876==    at 0x4A05F1D: realloc (vg_replace_malloc.c:476)
>> ==3876==    by 0x384F0191AE: virReallocN (memory.c:160)
>> ==3876==    by 0x384F06166A: xenUnifiedAddDomainInfo (xen_unified.c:1688)
>> ==3876==    by 0x384F076A01: xenStoreDomainIntroduced (xs_internal.c:1373)
>> ==3876==    by 0x384F0775FD: xenStoreWatchEvent (xs_internal.c:1303)
>> ==3876==    by 0x40E3FE: virEventRunOnce (event.c:451)
>> ==3876==    by 0x40F7DE: qemudRunLoop (qemud.c:2079)
>> ==3876==    by 0x413DE8: main (qemud.c:2956)
>
> I'm not sure what this is caused buy offhand. I might say that it was a
> result of the virConnectPtr ref counting not being right, but if that
> were the case I'd expect to see valgrind report that 'virConnectPtr' was
> leaked too, but it doesn't. So it must be some other issue.

Comment 6 Paolo Bonzini 2010-06-23 11:39:52 UTC

These ones are also noticeable:

==3876== 262,168 bytes in 1 blocks are indirectly lost in loss record 414 of 417
==3876==    at 0x4A05140: calloc (vg_replace_malloc.c:418)
==3876==    by 0x384F01921D: virAlloc (memory.c:100)
==3876==    by 0x410B20: qemudDispatchClientEvent (qemud.c:1741)
==3876==    by 0x40E3FE: virEventRunOnce (event.c:451)
==3876==    by 0x40F7DE: qemudRunLoop (qemud.c:2079)
==3876==    by 0x413DE8: main (qemud.c:2956)
==3876== 

==3876== 262,496 (8 direct, 262,488 indirect) bytes in 1 blocks are definitely lost in loss record 415 of 417
==3876==    at 0x4A05F1D: realloc (vg_replace_malloc.c:476)
==3876==    by 0x384F0191AE: virReallocN (memory.c:160)
==3876==    by 0x40FA16: qemudRunLoop (qemud.c:2206)
==3876==    by 0x413DE8: main (qemud.c:2956)
==3876==

Comment 7 Laine Stump 2010-07-19 16:01:01 UTC

Note that Bug 606919, which was cloned from this bug, is in MODIFIED state. The modified xen userspace package xen-3.0.3-114.el5 will eliminate the first of the leaks in comment 5.

I'm looking into the other leak in comment 5 now.

The two leaks noted in comment 6 don't seem as concerning, since each only occurs once.

Comment 8 Laine Stump 2010-07-23 20:30:29 UTC

I'm unable to reproduce the second of the 2 leaks in Comment 5 on my RHEL5 system (with libvirt-0.6.3-33.el5_5.1 and xen-3.0.3-105.el5_5.4). Can you provide more information on what you're doing on the system to produce the leak?

(On my setup, I run libvirt under valgrind as described above, then startup a few guests, and leave virt-manager running overnight (it's calling libvirt several times/second).

Comment 10 Paolo Bonzini 2010-07-29 11:59:48 UTC

Maybe I'm missing something obvious, but here:

void
xenUnifiedDomainInfoListFree(xenUnifiedDomainInfoListPtr list)
{
    int i;

    if (list == NULL)
        return;

    for (i=0; i<list->count; i++) {
        VIR_FREE(list->doms[i]->name);
        VIR_FREE(list->doms[i]);
    }
    VIR_FREE(list);
}

isn't a VIR_FREE(list->doms); missing??

Comment 11 Laine Stump 2010-07-29 13:07:38 UTC

I guess my time would have been better spent examining the code rather than trying to reproduce first.

That is definitely the problem. Thanks, Paolo!

Comment 13 Jiri Denemark 2010-07-29 18:55:45 UTC

Fix built into libvirt-0.6.3-37.el5

Comment 15 Jiri Denemark 2010-09-02 11:58:40 UTC

Fixed in libvirt-0.8.2-1.el5

Comment 18 yanbing du 2010-10-27 08:21:42 UTC

Verified the bug on RHEL5.6-Server-x86_64-Xen, RHEL5.6-Client-x86_64-Xen and  RHEL5.6-Server-ia64-Xen.
# rpm -q libvirt 
libvirt-0.8.2-8.el5
Steps:
1. Open two terminal.
2. On one terminal, run "while true; do echo connect; done | virsh",
which will repeatedly connect and disconnect to/from libvirtd.
The connect/disconnect loop is not the best way to trigger this leak but seems
to be the easiest one and it shows quite fast.
3. On the other terminal, run "top -d1 -p $(pidof libvirtd)" to watch memory
consumption of libvirtd, take note the value of RES column.

With the new package the memory remain more-or-less steady, it sometimes stay
unchanged for a while or even go down. So this bug is fixed. Move to VERIFIED.

Comment 19 xhu 2010-10-29 06:59:32 UTC

Verified on RHEL5u6-Client-i386-xen and it passed:
kernel-2.6.18-228.el5xen
libvirt-0.8.2-9.el5
xen-3.0.3-117.el5

Comment 21 errata-xmlrpc 2011-01-13 23:12:06 UTC

An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHEA-2011-0060.html

Note You need to log in before you can comment on or make changes to this bug.

alexander
chris.lober
hbrock
herrold
james.brown
jdenemar
juzhang
jwest
jyang
mjenner
pbonzini
plyons
samuel.kielek
slords
smayhew
tao
virt-maint
xen-maint
xhu
ydu