Bug 250146 - memory leak: virConnectClose() doesn't free all the memory it should
Summary: memory leak: virConnectClose() doesn't free all the memory it should
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 5
Classification: Red Hat
Component: libvirt
Version: 5.1
Hardware: All
OS: Linux
medium
high
Target Milestone: ---
: ---
Assignee: Daniel Veillard
QA Contact: Virtualization Bugs
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2007-07-30 18:04 UTC by Lon Hohberger
Modified: 2009-12-14 21:12 UTC (History)
4 users (show)

Fixed In Version: RHEA-2007-0643
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2007-11-07 17:27:01 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
Reproducer. (367 bytes, text/x-csrc)
2007-07-30 18:04 UTC, Lon Hohberger
no flags Details
Fix postdecrement which should be predecrement. (402 bytes, patch)
2007-07-30 19:53 UTC, Lon Hohberger
no flags Details | Diff


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHEA-2007:0643 0 normal SHIPPED_LIVE libvirt enhancement update 2007-11-08 14:15:22 UTC

Description Lon Hohberger 2007-07-30 18:04:51 UTC
Description of problem:

If you run the attached test program, memory usage grows without bounds.

Version-Release number of selected component (if applicable): 0.2.3-4.el5

How reproducible: 100%

Steps to Reproduce:
1. gcc -o vtest test.c -lvirt -I/usr/include/libvirt -Wall
2. ./vtest
3. ps auwwx | grep vtest
  
Actual results: Unbound memory usage

Expected results: Flat memory usage

Additional info: I found this using valgrind initially.  This eventually breaks
fence_xvmd, which is used for virtual machine fencing.

Comment 1 Lon Hohberger 2007-07-30 18:04:51 UTC
Created attachment 160260 [details]
Reproducer.

Comment 3 Lon Hohberger 2007-07-30 19:04:11 UTC
Running with a debugging memory allocator:

  0x2aaaaaf07e48 (1600 bytes) allocation trace:
        0x2aaaaaac2fd1
  0x2aaaaaf077f0 (1600 bytes) allocation trace:
        0x2aaaaaac2fd1
  ...

GDB says:

(gdb) x/i 0x2aaaaaac2fd1
0x2aaaaaac2fd1 <__virGetNetwork+545>:   test   %rax,%rax



Comment 4 Lon Hohberger 2007-07-30 19:05:20 UTC
There's also a pile of 8, 16, and 32-byte bits left over from regcomp() it looks
like.

Comment 5 Lon Hohberger 2007-07-30 19:10:53 UTC
My error - 8 and, 24, and 32-byte chunks are not leaked over time, only 16-byte
chunks it seems.  Still trying to figure out where.

Comment 6 Lon Hohberger 2007-07-30 19:13:39 UTC
It's consistent; over two runs of two different time periods, I leaked 96
1600-byte blocks and 96 16-byte blocks.  I'm still tracking down where the
16-byte blocks are leaked from.

Comment 7 Lon Hohberger 2007-07-30 19:17:18 UTC
Seems to be:

(gdb) x/i 0x2aaaaaac2fae
0x2aaaaaac2fae <__virGetNetwork+510>:   test   %rax,%rax


Comment 8 Lon Hohberger 2007-07-30 19:34:38 UTC
Backtrace for the 1600 byte alloc

#2  0x00002aaaaacee6aa in calloc (sz=1, nmemb=1600) at alloc.c:958
#3  0x00002aaaaaac2fd1 in virHashCreate (size=50) at hash.c:92
#4  0x00002aaaaaadbd48 in xenXMOpen (conn=<value optimized out>, name=0x5e7d
<Address 0x5e7d out of bounds>, flags=19) at xm_internal.c:486
#5  0x00002aaaaaaca73c in xenUnifiedOpen (conn=0x2aaaaaef9cf0,
name=0x2aaaaaae0014 "Xen", flags=0) at xen_unified.c:123
#6  0x00002aaaaaac1ac2 in do_open (name=0x0, flags=0) at libvirt.c:304

The stack frame actually points here:
#3  0x00002aaaaaac2fd1 in virHashCreate (size=50) at hash.c:92
92              table->table = calloc(1, size * sizeof(virHashEntry));


So, xenXMOpen() is creating it (looks like).



Comment 9 Lon Hohberger 2007-07-30 19:35:28 UTC
Ah, maybe this is the problem:


/*
 * Free the config files in the cache if this is the
 * last connection
 */
int xenXMClose(virConnectPtr conn ATTRIBUTE_UNUSED) {
    if (!nconnections--) {
        virHashFree(nameConfigMap, NULL);
        nameConfigMap = NULL;
        virHashFree(configCache, xenXMConfigFree);
        configCache = NULL;
    }
    return (0);
}


Postdecrement operator?  

Should maybe be: 
  
   if (!--nconnections)


Comment 10 Daniel Veillard 2007-07-30 19:45:47 UTC
I hate when people mix operators like that :-(
I will double check tomorrow morning. Thanks for the heads-up !
Concerning RHEL-5, I doubt we will be able to get all the ACKs to fix the
leak, respinning libvirt and associated QA might be too much,

Daniel

Comment 11 Daniel Berrangé 2007-07-30 19:52:17 UTC
In addition to the problem mentioned in #10, there's a second bug in the error
case. If libvirt fails to open a connection to Xen it will also leak memory:

==6696== 18,400 bytes in 100 blocks are definitely lost in loss record 20 of 20
==6696==    at 0x4A05809: malloc (vg_replace_malloc.c:149)
==6696==    by 0x34818106AA: xenUnifiedOpen (xen_unified.c:96)
==6696==    by 0x3481807AC1: do_open (libvirt.c:304)
==6696==    by 0x400652: main (in /home/berrange/demo)
==6696== 

It allocates private data

    priv = malloc (sizeof *priv);


And then later has an failure path which doesn't free 'priv':

        /* If as root, then all drivers must succeed.
           If non-root, then only proxy must succeed */
        if (!priv->opened[i] && (getuid() == 0 || i == proxy_offset)) {
            for (j = 0; j < i; ++j)
                if (priv->opened[j]) drivers[j]->close (conn);
            return VIR_DRV_OPEN_ERROR;
        }
    }



Comment 12 Lon Hohberger 2007-07-30 19:53:58 UTC
Created attachment 160269 [details]
Fix postdecrement which should be predecrement.

Comment 13 Lon Hohberger 2007-07-30 19:54:59 UTC
That patch definitely helps.  I don't know if it solves the one the other Dan
found ;)


Comment 14 Daniel Berrangé 2007-07-30 19:57:24 UTC
WRT to comment #10, latest upstream has already changed that fragement to

int xenXMClose(virConnectPtr conn ATTRIBUTE_UNUSED) {
    nconnections--;
    if (nconnections <= 0) {
        virHashFree(nameConfigMap, NULL);
        nameConfigMap = NULL;
        virHashFree(configCache, xenXMConfigFree);
        configCache = NULL;
    }
    return (0);
}

Comment 16 Lon Hohberger 2007-07-30 20:29:27 UTC
==30951== 3,172 (336 direct, 2,836 indirect) bytes in 2 blocks are definitely
lost in loss record 60 of 67
==30951==    at 0x4A04B32: calloc (vg_replace_malloc.c:279)
==30951==    by 0x4EA8004: virGetConnect (hash.c:662)
==30951==    by 0x4EA6A76: do_open (libvirt.c:297)
==30951==    by 0x403840: xvmd_loop (fence_xvmd.c:543)
==30951==    by 0x403FF8: main (fence_xvmd.c:782)

Also seems to be a bit here.  I'll see if I can track it down.

Comment 17 Lon Hohberger 2007-07-30 20:32:13 UTC
The upstream version looks more like the rest of the libvirt code base; I'm
obsoleting my patch.

Comment 18 Lon Hohberger 2007-07-30 20:36:17 UTC
Scratch comment #16; it looks like my bad.

Comment 26 Daniel Veillard 2007-07-31 13:42:41 UTC
The fix in the patch was applied upstream on 04-Jul-07 , so
this is part of 0.3.0 and 0.3.1 latest upstream releases.

Daniel

Comment 28 Daniel Veillard 2007-07-31 20:32:30 UTC
900151 build (dist-5E-qu-candidate, RHEL-5:libvirt-0_2_3-5_el5)

libvirt-0.2.3-5.el5 has been built with the patch.
For QA I guess that having a machine with Xen installed and 
running the reproducer in comment #1 , and watching the 
memory consuption of the program stay flat over time should
be sufficient to assert the bug is fixed.

Daniel


Comment 32 errata-xmlrpc 2007-11-07 17:27:01 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on the solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHEA-2007-0643.html



Note You need to log in before you can comment on or make changes to this bug.