Bug 802851 - memory leaks/dangling pointers caused by virDomainDetachDeviceConfig (virsh detach-*)
memory leaks/dangling pointers caused by virDomainDetachDeviceConfig (virsh d...
Status: CLOSED ERRATA
Product: Red Hat Enterprise Linux 6
Classification: Red Hat
Component: libvirt (Show other bugs)
6.2
Unspecified Unspecified
unspecified Severity unspecified
: rc
: ---
Assigned To: Laine Stump
Virtualization Bugs
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2012-03-13 12:11 EDT by Laine Stump
Modified: 2012-06-20 02:50 EDT (History)
9 users (show)

See Also:
Fixed In Version: libvirt-0.9.10-6.el6
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2012-06-20 02:50:07 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
detach-disk.log (32.15 KB, text/plain)
2012-06-06 22:31 EDT, EricLee
no flags Details
detach-interface.log (22.54 KB, text/plain)
2012-06-06 22:31 EDT, EricLee
no flags Details
detach-device.log (30.04 KB, text/plain)
2012-06-06 22:32 EDT, EricLee
no flags Details

  None (edit)
Description Laine Stump 2012-03-13 12:11:50 EDT
virsh detach-device, virsh detach-disk, and virsh detach-interface (i.e. the virDomainDetachDeviceFlags() API) could leak memory in the following conditions:

1) Any time the "--persistent" (VIR_DOMAIN_AFFECT_CONFIG) flag was specified.

2) In some cases when an error was encountered during the detach,

Case (2) is least likely to happen, but had the most dangerous consequences, as it would result in a pointer to freed memory being left in the domain's device list.

The following three patches solve the issues (and their log messages give thorough explanations of the problems):

commit 8845d293757ff0deff17f65d555222ad6d462bfd
commit b59e59845f4ba59c68e17cbd0e41954bec75d333
commit f985773d06ed17cd699907ac8b271864c8b67870


commit f985773d06ed17cd699907ac8b271864c8b67870
Author: Laine Stump <laine@laine.org>
Date:   Tue Mar 6 18:06:14 2012 -0500

    util: eliminate device object leaks related to virDomain*Remove*()
    
    There are several functions in domain_conf.c that remove a device
    object from the domain's list of that object type, but don't free the
    object or return it to the caller to free. In many cases this isn't a
    problem because the caller already had a pointer to the object and
    frees it afterward, but in several cases the removed object was just
    left floating around with no references to it.
    
    In particular, the function qemuDomainDetachDeviceConfig() calls
    functions to locate and remove net (virDomainNetRemoveByMac), disk
    (virDomainDiskRemoveByName()), and lease (virDomainLeaseRemove())
    devices, but neither it nor its caller qemuDomainModifyDeviceConfig()
    ever obtain a pointer to the device being removed, much less free it.
    
    This patch modifies the following "remove" functions to return a
    pointer to the device object being removed from the domain device
    arrays, to give the caller the option of freeing the device object
    using that pointer if needed. In places where the object was
    previously leaked, it is now freed:
    
      virDomainDiskRemove
      virDomainDiskRemoveByName
      virDomainNetRemove
      virDomainNetRemoveByMac
      virDomainHostdevRemove
      virDomainLeaseRemove
      virDomainLeaseRemoveAt
    
    The functions that had been leaking:
    
      libxlDomainDetachConfig - leaked a virDomainDiskDef
      qemuDomainDetachDeviceConfig - could leak a virDomainDiskDef,
                                a virDomainNetDef, or a
                                virDomainLeaseDef
      qemuDomainDetachLease   - leaked a virDomainLeaseDef

commit b59e59845f4ba59c68e17cbd0e41954bec75d333
Author: Laine Stump <laine@laine.org>
Date:   Tue Mar 6 20:43:22 2012 -0500

    qemu: don't 'remove' hostdev objects from domain if operation fails
    
    There were certain paths through the hostdev detach code that could
    lead to the lower level function failing (and not removing the object
    from the domain's hostdevs list), but the higher level function
    free'ing the hostdev object anyway. This would leave a stale
    hostdevdef pointer in the list, which would surely cause a problem
    eventually.
    
    This patch relocates virDomainHostdevRemove from the lower level
    functions qemuDomainDetachThisHostDevice and
    qemuDomainDetachHostPciDevice, to their caller
    qemuDomainDetachThisHostDevice, placing it just before the call to
    virDomainHostdevDefFree. This makes it easy to verify that either both
    operations are done, or neither.
    
    NB: The "dangling pointer" part of this problem was introduced in
    commit 13d5a6, so it is not present in libvirt versions prior to
    0.9.9. Earlier versions would return failure in certain cases even
    though the the device object was removed/deleted, but the removal and
    deletion operations would always both happen or neither.

commit 8845d293757ff0deff17f65d555222ad6d462bfd
Author: Laine Stump <laine@laine.org>
Date:   Tue Mar 6 21:06:30 2012 -0500

    util: make virDomainLeaseDefFree global
    
    It will be used in a different file in an upcoming patch.
Comment 1 Huang Wenlong 2012-03-15 01:11:02 EDT
Hi, Laine 

I test libvirt-0.9.10-5.el6.x86_64 get en error but  I am not sure whether this bug can track this issue .

When I run virsh command  "detach-disk"  with wrong arguments , it gets lots of error and crash 


# virsh detach-disk  vdb  guest

error: failed to get domain 'vdb'
*** glibc detected *** virsh: munmap_chunk(): invalid pointer: 0x00007fff500c1478 ***
======= Backtrace: =========
/lib64/libc.so.6[0x30ca0750c6]
/usr/lib64/libvirt.so.0(virFree+0x29)[0x3af164e659]
virsh[0x41afca]
virsh[0x413ca7]
virsh[0x4242fc]
/lib64/libc.so.6(__libc_start_main+0xfd)[0x30ca01ecdd]
virsh[0x40a309]
======= Memory map: ========
00400000-0044b000 r-xp 00000000 fd:00 1065467                            /usr/bin/virsh
0064a000-00654000 rw-p 0004a000 fd:00 1065467                            /usr/bin/virsh
01a20000-01aab000 rw-p 00000000 00:00 0                                  [heap]
30c9800000-30c9820000 r-xp 00000000 fd:00 2102358                        /lib64/ld-2.12.so
30c9a1f000-30c9a20000 r--p 0001f000 fd:00 2102358                        /lib64/ld-2.12.so
30c9a20000-30c9a21000 rw-p 00020000 fd:00 2102358                        /lib64/ld-2.12.so
30c9a21000-30c9a22000 rw-p 00000000 00:00 0
30c9c00000-30c9c02000 r-xp 00000000 fd:00 2102367
 ...
2.# virsh list
 Id    Name                           State
----------------------------------------------------
 15    r                              running



Wenlong
Comment 2 Alex Jia 2012-03-15 03:22:59 EDT
(In reply to comment #1)
> # virsh detach-disk  vdb  guest
> 
> error: failed to get domain 'vdb'
> *** glibc detected *** virsh: munmap_chunk(): invalid pointer:

It's a different issues with Laine's fixed, the issue is introduced in commit 42accf1, and I have committed a patch to fix it:

https://www.redhat.com/archives/libvir-list/2012-March/msg00655.html
Comment 3 Huang Wenlong 2012-03-15 03:30:11 EDT
As comment 2  I remove the needinfo  and file a new bug to track my issue  Bug  803591 

thanks Alex.
Comment 4 Laine Stump 2012-03-16 10:26:33 EDT
Backports of the three above patches have been posted (as one 3-part series) to rhvirt-patches for inclusion in the RHEL build of libvirt:

http://post-office.corp.redhat.com/archives/rhvirt-patches/2012-March/msg01437.html
Comment 8 Laine Stump 2012-04-02 11:24:21 EDT
Note that the method used to verify this BZ is invalid - the memory leaks were in libvirtd, not virsh, so the way to test is to run libvirtd under valgrind, then run the virsh commands, then ctl-C out of the valgrind run of libvirtd and check the output.
Comment 9 EricLee 2012-06-06 01:48:37 EDT
To comment 7 and comment 8:

I have reproduce the bug with packages:
qemu-kvm-0.12.1.2-2.295.el6.x86_64
kernel-2.6.32-276.el6.x86_64
libvirt-0.9.10-21.el6.x86_64

Steps:
# service libvirtd stop
Stopping libvirtd daemon:                                  [  OK  ]

# valgrind --leak-check=full libvirtd >& /root/valgrind.log 2>&1

In another terminal:

# virsh detach-disk test vdb --persistent
Disk detached successfully
(test has two disk of "vda" and "vdb")

Ctl-C for the valgrind command in the first terminal.

# cat /root/valgrind.log
.......
==5696== LEAK SUMMARY:
==5696==    definitely lost: 176 bytes in 1 blocks
==5696==    indirectly lost: 130 bytes in 15 blocks
==5696==      possibly lost: 4,048 bytes in 11 blocks
==5696==    still reachable: 835,198 bytes in 9,499 blocks
==5696==         suppressed: 0 bytes in 0 blocks
==5696== Reachable blocks (those to which a pointer was found) are not shown.
==5696== To see them, rerun with: --leak-check=full --show-reachable=yes

And alse for #virsh detach-device disk.xml --persistent has memory leak.

Should we reopen the bug?
Comment 10 Alex Jia 2012-06-06 03:03:00 EDT
(In reply to comment #9)
> To comment 7 and comment 8:
> 
> # valgrind --leak-check=full libvirtd >& /root/valgrind.log 2>&1
> .......
> ==5696== LEAK SUMMARY:
> ==5696==    definitely lost: 176 bytes in 1 blocks
Please append -v option into valgrind then show details for 'definitely lost'
Comment 11 EricLee 2012-06-06 05:35:21 EDT
(In reply to comment #10)
> (In reply to comment #9)
> > To comment 7 and comment 8:
> > 
> > # valgrind --leak-check=full libvirtd >& /root/valgrind.log 2>&1
> > .......
> > ==5696== LEAK SUMMARY:
> > ==5696==    definitely lost: 176 bytes in 1 blocks
> Please append -v option into valgrind then show details for 'definitely lost'

I have retested for the bug with packages:
qemu-kvm-0.12.1.2-2.295.el6.x86_64
kernel-2.6.32-276.el6.x86_64
libvirt-0.9.10-21.el6.x86_64

Steps:
1. for detach-disk:
# service libvirtd stop
Stopping libvirtd daemon:                                  [  OK  ]

# valgrind --leak-check=full -v libvirtd >& /root/detach-disk.log 2>&1

In another terminal:

# virsh detach-disk test vdb --persistent
Disk detached successfully
(test has two disk of "vda" and "vdb")

Ctl-C for the valgrind command in the first terminal.

# cat /root/detach-disk.log
....
==6402== 23 bytes in 1 blocks are definitely lost in loss record 280 of 808
==6402==    at 0x4A05FDE: malloc (vg_replace_malloc.c:236)
==6402==    by 0x3694F00C97: __vasprintf_chk (in /lib64/libc-2.12.so)
==6402==    by 0x36ADA5B4C3: virVasprintf (in /usr/lib64/libvirt.so.0.9.10)
==6402==    by 0x36ADA5B567: virAsprintf (in /usr/lib64/libvirt.so.0.9.10)
==6402==    by 0x4464FA: ??? (in /usr/sbin/libvirtd)
==6402==    by 0x36ADAD142F: virStateInitialize (in /usr/lib64/libvirt.so.0.9.10)
==6402==    by 0x421140: ??? (in /usr/sbin/libvirtd)
==6402==    by 0x36ADA58D78: ??? (in /usr/lib64/libvirt.so.0.9.10)
==6402==    by 0x3695207850: start_thread (in /lib64/libpthread-2.12.so)
==6402==    by 0x3694EE767C: clone (in /lib64/libc-2.12.so)
==6402== 
==6402== 306 (176 direct, 130 indirect) bytes in 1 blocks are definitely lost in loss record 705 of 808
==6402==    at 0x4A04A28: calloc (vg_replace_malloc.c:467)
==6402==    by 0x36ADA4F1FD: virAlloc (in /usr/lib64/libvirt.so.0.9.10)
==6402==    by 0x36ADA467F1: ??? (in /usr/lib64/libvirt.so.0.9.10)
==6402==    by 0x36ADA46FCB: virCgroupForDomain (in /usr/lib64/libvirt.so.0.9.10)
==6402==    by 0x484D74: ??? (in /usr/sbin/libvirtd)
==6402==    by 0x459FC2: ??? (in /usr/sbin/libvirtd)
==6402==    by 0x36ADAD4965: virDomainDetachDeviceFlags (in /usr/lib64/libvirt.so.0.9.10)
==6402==    by 0x42CB08: ??? (in /usr/sbin/libvirtd)
==6402==    by 0x36ADB17874: ??? (in /usr/lib64/libvirt.so.0.9.10)
==6402==    by 0x36ADB18B00: ??? (in /usr/lib64/libvirt.so.0.9.10)
==6402==    by 0x36ADA5945B: ??? (in /usr/lib64/libvirt.so.0.9.10)
==6402==    by 0x36ADA58D78: ??? (in /usr/lib64/libvirt.so.0.9.10)
.....
==6402== LEAK SUMMARY:
==6402==    definitely lost: 199 bytes in 2 blocks
==6402==    indirectly lost: 130 bytes in 15 blocks
==6402==      possibly lost: 4,048 bytes in 11 blocks
==6402==    still reachable: 835,287 bytes in 9,502 blocks
==6402==         suppressed: 0 bytes in 0 blocks
==6402== Reachable blocks (those to which a pointer was found) are not shown.
==6402== To see them, rerun with: --leak-check=full --show-reachable=yes
......

2. for detach-interface:
# service libvirtd stop
Stopping libvirtd daemon:                                  [  OK  ]

# valgrind --leak-check=full -v libvirtd >& /root/detach-interface.log 2>&1

In another terminal:

# virsh detach-interface test network  --persistent
Interface detached successfully
(test only has ont interface)

Ctl-C for the valgrind command in the first terminal.

# cat /root/detach-interface.log
....
==6662== 23 bytes in 1 blocks are definitely lost in loss record 269 of 786
==6662==    at 0x4A05FDE: malloc (vg_replace_malloc.c:236)
==6662==    by 0x3694F00C97: __vasprintf_chk (in /lib64/libc-2.12.so)
==6662==    by 0x36ADA5B4C3: virVasprintf (in /usr/lib64/libvirt.so.0.9.10)
==6662==    by 0x36ADA5B567: virAsprintf (in /usr/lib64/libvirt.so.0.9.10)
==6662==    by 0x4464FA: ??? (in /usr/sbin/libvirtd)
==6662==    by 0x36ADAD142F: virStateInitialize (in /usr/lib64/libvirt.so.0.9.10)
==6662==    by 0x421140: ??? (in /usr/sbin/libvirtd)
==6662==    by 0x36ADA58D78: ??? (in /usr/lib64/libvirt.so.0.9.10)
==6662==    by 0x3695207850: start_thread (in /lib64/libpthread-2.12.so)
==6662==    by 0x3694EE767C: clone (in /lib64/libc-2.12.so)
....
==6662== LEAK SUMMARY:
==6662==    definitely lost: 23 bytes in 1 blocks
==6662==    indirectly lost: 0 bytes in 0 blocks
==6662==      possibly lost: 4,048 bytes in 11 blocks
==6662==    still reachable: 832,912 bytes in 9,458 blocks
==6662==         suppressed: 0 bytes in 0 blocks
==6662== Reachable blocks (those to which a pointer was found) are not shown.
==6662== To see them, rerun with: --leak-check=full --show-reachable=yes
....

3. for detach-device:
# service libvirtd stop
Stopping libvirtd daemon:                                  [  OK  ]

# valgrind --leak-check=full -v libvirtd >& /root/detach-device.log 2>&1

In another terminal:

# virsh detach-device test disk.xml --persistent
Device detached successfully

Ctl-C for the valgrind command in the first terminal.

# cat /root/detach-device.log
....
==6917== 23 bytes in 1 blocks are definitely lost in loss record 260 of 774
==6917==    at 0x4A05FDE: malloc (vg_replace_malloc.c:236)
==6917==    by 0x3694F00C97: __vasprintf_chk (in /lib64/libc-2.12.so)
==6917==    by 0x36ADA5B4C3: virVasprintf (in /usr/lib64/libvirt.so.0.9.10)
==6917==    by 0x36ADA5B567: virAsprintf (in /usr/lib64/libvirt.so.0.9.10)
==6917==    by 0x4464FA: ??? (in /usr/sbin/libvirtd)
==6917==    by 0x36ADAD142F: virStateInitialize (in /usr/lib64/libvirt.so.0.9.10)
==6917==    by 0x421140: ??? (in /usr/sbin/libvirtd)
==6917==    by 0x36ADA58D78: ??? (in /usr/lib64/libvirt.so.0.9.10)
==6917==    by 0x3695207850: start_thread (in /lib64/libpthread-2.12.so)
==6917==    by 0x3694EE767C: clone (in /lib64/libc-2.12.so)
==6917== 
==6917== 306 (176 direct, 130 indirect) bytes in 1 blocks are definitely lost in loss record 674 of 774
==6917==    at 0x4A04A28: calloc (vg_replace_malloc.c:467)
==6917==    by 0x36ADA4F1FD: virAlloc (in /usr/lib64/libvirt.so.0.9.10)
==6917==    by 0x36ADA467F1: ??? (in /usr/lib64/libvirt.so.0.9.10)
==6917==    by 0x36ADA46FCB: virCgroupForDomain (in /usr/lib64/libvirt.so.0.9.10)
==6917==    by 0x484D74: ??? (in /usr/sbin/libvirtd)
==6917==    by 0x459FC2: ??? (in /usr/sbin/libvirtd)
==6917==    by 0x36ADAD4965: virDomainDetachDeviceFlags (in /usr/lib64/libvirt.so.0.9.10)
==6917==    by 0x42CB08: ??? (in /usr/sbin/libvirtd)
==6917==    by 0x36ADB17874: ??? (in /usr/lib64/libvirt.so.0.9.10)
==6917==    by 0x36ADB18B00: ??? (in /usr/lib64/libvirt.so.0.9.10)
==6917==    by 0x36ADA5945B: ??? (in /usr/lib64/libvirt.so.0.9.10)
==6917==    by 0x36ADA58D78: ??? (in /usr/lib64/libvirt.so.0.9.10)
....
==6917== LEAK SUMMARY:
==6917==    definitely lost: 199 bytes in 2 blocks
==6917==    indirectly lost: 130 bytes in 15 blocks
==6917==      possibly lost: 4,048 bytes in 11 blocks
==6917==    still reachable: 831,401 bytes in 9,424 blocks
==6917==         suppressed: 0 bytes in 0 blocks
==6917== Reachable blocks (those to which a pointer was found) are not shown.
==6917== To see them, rerun with: --leak-check=full --show-reachable=yes
....

So I think it is a new memory leak of "virDomainDetachDeviceFlags", and I will file a new bug to track it.
Comment 12 Laine Stump 2012-06-06 13:37:37 EDT
I can't see exactly what is leaking, because you don't have full debuginfo installed for the libvirt package. You should run the following command:

   debuginfo-install libvirt

which will install full debugging line number / symbol info so that the entire call stack will fully specify function name as well as filename:line#.

Even without that information, I can see that this is not related to the leaks that were fixed for this bug (rather than a leak of a device object, it is a leak of some part of a virCgroup). So yes, you should file a new bug report.
Comment 13 EricLee 2012-06-06 22:30:07 EDT
(In reply to comment #12)
> I can't see exactly what is leaking, because you don't have full debuginfo
> installed for the libvirt package. You should run the following command:
> 
>    debuginfo-install libvirt
> 
> which will install full debugging line number / symbol info so that the
> entire call stack will fully specify function name as well as filename:line#.
> 
> Even without that information, I can see that this is not related to the
> leaks that were fixed for this bug (rather than a leak of a device object,
> it is a leak of some part of a virCgroup). So yes, you should file a new bug
> report.

Thanks for replying.

I will attach the valgrind log after debuginfo-install with the name of virsh command for those three commands.
Please see next three comments.

From those logs, we think leak reason is not this bug related, but similar with bug 769517's comment 9 and comment 11. So do not need file a new one.

Thanks
EricLee
Comment 14 EricLee 2012-06-06 22:31:13 EDT
Created attachment 590048 [details]
detach-disk.log
Comment 15 EricLee 2012-06-06 22:31:58 EDT
Created attachment 590050 [details]
detach-interface.log
Comment 16 EricLee 2012-06-06 22:32:34 EDT
Created attachment 590051 [details]
detach-device.log
Comment 18 errata-xmlrpc 2012-06-20 02:50:07 EDT
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHSA-2012-0748.html

Note You need to log in before you can comment on or make changes to this bug.