Bug 250146 - memory leak: virConnectClose() doesn't free all the memory it should
memory leak: virConnectClose() doesn't free all the memory it should
Status: CLOSED ERRATA
Product: Red Hat Enterprise Linux 5
Classification: Red Hat
Component: libvirt (Show other bugs)
5.1
All Linux
medium Severity high
: ---
: ---
Assigned To: Daniel Veillard
Virtualization Bugs
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2007-07-30 14:04 EDT by Lon Hohberger
Modified: 2009-12-14 16:12 EST (History)
4 users (show)

See Also:
Fixed In Version: RHEA-2007-0643
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2007-11-07 12:27:01 EST
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
Reproducer. (367 bytes, text/x-csrc)
2007-07-30 14:04 EDT, Lon Hohberger
no flags Details
Fix postdecrement which should be predecrement. (402 bytes, patch)
2007-07-30 15:53 EDT, Lon Hohberger
no flags Details | Diff

  None (edit)
Description Lon Hohberger 2007-07-30 14:04:51 EDT
Description of problem:

If you run the attached test program, memory usage grows without bounds.

Version-Release number of selected component (if applicable): 0.2.3-4.el5

How reproducible: 100%

Steps to Reproduce:
1. gcc -o vtest test.c -lvirt -I/usr/include/libvirt -Wall
2. ./vtest
3. ps auwwx | grep vtest
  
Actual results: Unbound memory usage

Expected results: Flat memory usage

Additional info: I found this using valgrind initially.  This eventually breaks
fence_xvmd, which is used for virtual machine fencing.
Comment 1 Lon Hohberger 2007-07-30 14:04:51 EDT
Created attachment 160260 [details]
Reproducer.
Comment 3 Lon Hohberger 2007-07-30 15:04:11 EDT
Running with a debugging memory allocator:

  0x2aaaaaf07e48 (1600 bytes) allocation trace:
        0x2aaaaaac2fd1
  0x2aaaaaf077f0 (1600 bytes) allocation trace:
        0x2aaaaaac2fd1
  ...

GDB says:

(gdb) x/i 0x2aaaaaac2fd1
0x2aaaaaac2fd1 <__virGetNetwork+545>:   test   %rax,%rax

Comment 4 Lon Hohberger 2007-07-30 15:05:20 EDT
There's also a pile of 8, 16, and 32-byte bits left over from regcomp() it looks
like.
Comment 5 Lon Hohberger 2007-07-30 15:10:53 EDT
My error - 8 and, 24, and 32-byte chunks are not leaked over time, only 16-byte
chunks it seems.  Still trying to figure out where.
Comment 6 Lon Hohberger 2007-07-30 15:13:39 EDT
It's consistent; over two runs of two different time periods, I leaked 96
1600-byte blocks and 96 16-byte blocks.  I'm still tracking down where the
16-byte blocks are leaked from.
Comment 7 Lon Hohberger 2007-07-30 15:17:18 EDT
Seems to be:

(gdb) x/i 0x2aaaaaac2fae
0x2aaaaaac2fae <__virGetNetwork+510>:   test   %rax,%rax
Comment 8 Lon Hohberger 2007-07-30 15:34:38 EDT
Backtrace for the 1600 byte alloc

#2  0x00002aaaaacee6aa in calloc (sz=1, nmemb=1600) at alloc.c:958
#3  0x00002aaaaaac2fd1 in virHashCreate (size=50) at hash.c:92
#4  0x00002aaaaaadbd48 in xenXMOpen (conn=<value optimized out>, name=0x5e7d
<Address 0x5e7d out of bounds>, flags=19) at xm_internal.c:486
#5  0x00002aaaaaaca73c in xenUnifiedOpen (conn=0x2aaaaaef9cf0,
name=0x2aaaaaae0014 "Xen", flags=0) at xen_unified.c:123
#6  0x00002aaaaaac1ac2 in do_open (name=0x0, flags=0) at libvirt.c:304

The stack frame actually points here:
#3  0x00002aaaaaac2fd1 in virHashCreate (size=50) at hash.c:92
92              table->table = calloc(1, size * sizeof(virHashEntry));


So, xenXMOpen() is creating it (looks like).

Comment 9 Lon Hohberger 2007-07-30 15:35:28 EDT
Ah, maybe this is the problem:


/*
 * Free the config files in the cache if this is the
 * last connection
 */
int xenXMClose(virConnectPtr conn ATTRIBUTE_UNUSED) {
    if (!nconnections--) {
        virHashFree(nameConfigMap, NULL);
        nameConfigMap = NULL;
        virHashFree(configCache, xenXMConfigFree);
        configCache = NULL;
    }
    return (0);
}


Postdecrement operator?  

Should maybe be: 
  
   if (!--nconnections)
Comment 10 Daniel Veillard 2007-07-30 15:45:47 EDT
I hate when people mix operators like that :-(
I will double check tomorrow morning. Thanks for the heads-up !
Concerning RHEL-5, I doubt we will be able to get all the ACKs to fix the
leak, respinning libvirt and associated QA might be too much,

Daniel
Comment 11 Daniel Berrange 2007-07-30 15:52:17 EDT
In addition to the problem mentioned in #10, there's a second bug in the error
case. If libvirt fails to open a connection to Xen it will also leak memory:

==6696== 18,400 bytes in 100 blocks are definitely lost in loss record 20 of 20
==6696==    at 0x4A05809: malloc (vg_replace_malloc.c:149)
==6696==    by 0x34818106AA: xenUnifiedOpen (xen_unified.c:96)
==6696==    by 0x3481807AC1: do_open (libvirt.c:304)
==6696==    by 0x400652: main (in /home/berrange/demo)
==6696== 

It allocates private data

    priv = malloc (sizeof *priv);


And then later has an failure path which doesn't free 'priv':

        /* If as root, then all drivers must succeed.
           If non-root, then only proxy must succeed */
        if (!priv->opened[i] && (getuid() == 0 || i == proxy_offset)) {
            for (j = 0; j < i; ++j)
                if (priv->opened[j]) drivers[j]->close (conn);
            return VIR_DRV_OPEN_ERROR;
        }
    }

Comment 12 Lon Hohberger 2007-07-30 15:53:58 EDT
Created attachment 160269 [details]
Fix postdecrement which should be predecrement.
Comment 13 Lon Hohberger 2007-07-30 15:54:59 EDT
That patch definitely helps.  I don't know if it solves the one the other Dan
found ;)
Comment 14 Daniel Berrange 2007-07-30 15:57:24 EDT
WRT to comment #10, latest upstream has already changed that fragement to

int xenXMClose(virConnectPtr conn ATTRIBUTE_UNUSED) {
    nconnections--;
    if (nconnections <= 0) {
        virHashFree(nameConfigMap, NULL);
        nameConfigMap = NULL;
        virHashFree(configCache, xenXMConfigFree);
        configCache = NULL;
    }
    return (0);
}
Comment 16 Lon Hohberger 2007-07-30 16:29:27 EDT
==30951== 3,172 (336 direct, 2,836 indirect) bytes in 2 blocks are definitely
lost in loss record 60 of 67
==30951==    at 0x4A04B32: calloc (vg_replace_malloc.c:279)
==30951==    by 0x4EA8004: virGetConnect (hash.c:662)
==30951==    by 0x4EA6A76: do_open (libvirt.c:297)
==30951==    by 0x403840: xvmd_loop (fence_xvmd.c:543)
==30951==    by 0x403FF8: main (fence_xvmd.c:782)

Also seems to be a bit here.  I'll see if I can track it down.
Comment 17 Lon Hohberger 2007-07-30 16:32:13 EDT
The upstream version looks more like the rest of the libvirt code base; I'm
obsoleting my patch.
Comment 18 Lon Hohberger 2007-07-30 16:36:17 EDT
Scratch comment #16; it looks like my bad.
Comment 26 Daniel Veillard 2007-07-31 09:42:41 EDT
The fix in the patch was applied upstream on 04-Jul-07 , so
this is part of 0.3.0 and 0.3.1 latest upstream releases.

Daniel
Comment 28 Daniel Veillard 2007-07-31 16:32:30 EDT
900151 build (dist-5E-qu-candidate, RHEL-5:libvirt-0_2_3-5_el5)

libvirt-0.2.3-5.el5 has been built with the patch.
For QA I guess that having a machine with Xen installed and 
running the reproducer in comment #1 , and watching the 
memory consuption of the program stay flat over time should
be sufficient to assert the bug is fixed.

Daniel
Comment 32 errata-xmlrpc 2007-11-07 12:27:01 EST
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on the solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHEA-2007-0643.html

Note You need to log in before you can comment on or make changes to this bug.