Bug 1143800 - Libvirtd was killed after several cycles creating/deleting external disk snapshot with glusterfs backend
Summary: Libvirtd was killed after several cycles creating/deleting external disk snap...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 7
Classification: Red Hat
Component: libvirt
Version: 7.1
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: rc
: ---
Assignee: Peter Krempa
QA Contact: Virtualization Bugs
URL:
Whiteboard:
Depends On: 1093594
Blocks:
TreeView+ depends on / blocked
 
Reported: 2014-09-18 03:41 UTC by Shanzhi Yu
Modified: 2016-01-21 09:32 UTC (History)
7 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2015-11-19 05:52:32 UTC
Target Upstream Version:


Attachments (Terms of Use)
memoryleak (2.44 MB, text/plain)
2014-09-18 03:44 UTC, Shanzhi Yu
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2015:2202 0 normal SHIPPED_LIVE libvirt bug fix and enhancement update 2015-11-19 08:17:58 UTC

Description Shanzhi Yu 2014-09-18 03:41:33 UTC
Description of problem:

Libvirtd was killed after several cycles creating/deleting external disk snapshot with glusterfs backend


Version-Release number of selected component (if applicable):

libvirt-1.2.8-2.el7.x86_64

How reproducible:

100%

Steps to Reproduce:

1. Prepare a guest xml with source file based on gluster server
# cat rh7-g.xml


2. Prepare four snapshot file

# cat s1.xml
<domainsnapshot>
<name>s1</name>
<disks>
<disk name='vda' type='network'>
<driver type='qcow2'/>
<source protocol='gluster' name='gluster-vol1/rhel7-qcow2.s1'>
<host name='10.66.x.xxx'/>
</source>
</disk>
</disks>
</domainsnapshot>

#for i in s2 s3 s4;do sed -e s/s1/$i/ s1.xml > $i.xml

create backing-chains:
rhel7-qcow2.s4->rhel7-qcow2.s3>-rhel7-qcow2.s2->rhel7-qcow2.s1->rhel7-qcow2.img

3. Start libvirtd in one terminal 

# free -m 
             total       used       free     shared    buffers     cached
Mem:          7461        283       7177          4          0         80
-/+ buffers/cache:        203       7258
Swap:         1023        194        829


# valgrind --leak-check=full libvirtd   2>&1 &>memoryleak

 
Killed

4. Do "define guest -> start guest -> sleep 30 -> create four external disk snapshot ->undefine guest ->destroy guest" cycle

Libvirtd was killed when try the fourth or fifth 

# red='\e[0;31m';NC='\e[0m' ;for num in $(seq 1 100);do echo -e "${red}Try the $num time:${NC}";virsh define rh7-g.xml ;virsh start  rh7-g;sleep 30;for i in s1 s2 s3 s4;do virsh snapshot-create rh7-g  $i.xml --reuse-external --disk-only;done;virsh undefine rh7-g;virsh destroy rh7-g;done
Try the 1 time:
Domain rh7-g defined from rh7-g.xml

Domain rh7-g started

Domain snapshot s1 created from 's1.xml'
Domain snapshot s2 created from 's2.xml'
Domain snapshot s3 created from 's3.xml'
Domain snapshot s4 created from 's4.xml'
Domain rh7-g has been undefined

Domain rh7-g destroyed

Try the 2 time:
Domain rh7-g defined from rh7-g.xml
...


Try the 4 time:
Domain rh7-g defined from rh7-g.xml

Domain rh7-g started

Domain snapshot s1 created from 's1.xml'
Domain snapshot s2 created from 's2.xml'
Domain snapshot s3 created from 's3.xml'
2014-09-18 03:33:59.419+0000: 3838: info : libvirt version: 1.2.8, package: 2.el7 (Red Hat, Inc. <http://bugzilla.redhat.com/bugzilla>, 2014-09-10-09:58:42, x86-019.build.eng.bos.redhat.com)
2014-09-18 03:33:59.419+0000: 3838: warning : virKeepAliveTimerInternal:143 : No response from client 0x7ff428a4bc60 after 6 keepalive messages in 36 seconds
2014-09-18 03:34:01.185+0000: 3837: warning : virKeepAliveTimerInternal:143 : No response from client 0x7ff428a4bc60 after 6 keepalive messages in 37 seconds
error: internal error: received hangup / error event on socket
error: Failed to reconnect to the hypervisor

I will attach memoryleak 

Actual results:


Expected results:


Additional info:

A brief valgrind trace:

==28646== Memcheck, a memory error detector
==28646== Copyright (C) 2002-2013, and GNU GPL'd, by Julian Seward et al.
==28646== Using Valgrind-3.9.0 and LibVEX; rerun with -h for copyright info
==28646== Command: libvirtd
==28646== 
==28646== Warning: noted but unhandled ioctl 0x89a2 with no size/direction hints
==28646==    This could cause spurious value errors to appear.
==28646==    See README_MISSING_SYSCALL_OR_IOCTL for guidance on writing a proper wrapper.
==28711== 
==28711== HEAP SUMMARY:
==28711==     in use at exit: 92,591,877 bytes in 21,517 blocks
==28711==   total heap usage: 131,390 allocs, 109,873 frees, 251,304,470 bytes allocated
==28711== 
==28711== LEAK SUMMARY:
==28711==    definitely lost: 9,169 bytes in 76 blocks
==28711==    indirectly lost: 8,924 bytes in 40 blocks
==28711==      possibly lost: 44,796,701 bytes in 420 blocks
==28711==    still reachable: 47,777,083 bytes in 20,981 blocks
==28711==         suppressed: 0 bytes in 0 blocks
==28711== Rerun with --leak-check=full to see details of leaked memory
==28711== 
==28711== For counts of detected and suppressed errors, rerun with: -v
==28711== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 3 from 3)
==28646== Warning: noted but unhandled ioctl 0x89a2 with no size/direction hints
==28646==    This could cause spurious value errors to appear.
==28646==    See README_MISSING_SYSCALL_OR_IOCTL for guidance on writing a proper wrapper.
==28899== 
==28899== HEAP SUMMARY:
==28899==     in use at exit: 1,391,222,243 bytes in 31,333 blocks
==28899==   total heap usage: 173,904 allocs, 142,571 frees, 1,560,240,340 bytes allocated
==28899== 
==28899== LEAK SUMMARY:
==28899==    definitely lost: 179,496 bytes in 1,013 blocks
==28899==    indirectly lost: 92,414 bytes in 369 blocks
==28899==      possibly lost: 940,368,330 bytes in 6,365 blocks
==28899==    still reachable: 450,582,003 bytes in 23,586 blocks
==28899==         suppressed: 0 bytes in 0 blocks
==28899== Rerun with --leak-check=full to see details of leaked memory
==28899== 
==28899== For counts of detected and suppressed errors, rerun with: -v
==28899== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 3 from 3)
==28900== could not unlink /tmp/vgdb-pipe-from-vgdb-to-28900-by-root-on-shyu_test_pc
==28900== could not unlink /tmp/vgdb-pipe-to-vgdb-from-28900-by-root-on-shyu_test_pc
==28900== could not unlink /tmp/vgdb-pipe-shared-mem-vgdb-28900-by-root-on-shyu_test_pc
==28646== Warning: noted but unhandled ioctl 0x89a2 with no size/direction hints
==28646==    This could cause spurious value errors to appear.
==28646==    See README_MISSING_SYSCALL_OR_IOCTL for guidance on writing a proper wrapper.
==29140== 
==29140== HEAP SUMMARY:
==29140==     in use at exit: 2,734,695,264 bytes in 41,767 blocks
==29140==   total heap usage: 222,782 allocs, 181,015 frees, 2,922,844,432 bytes allocated
==29140== 
==29140== LEAK SUMMARY:
==29140==    definitely lost: 458,454 bytes in 2,125 blocks
==29140==    indirectly lost: 358,294,985 bytes in 3,078 blocks
==29140==      possibly lost: 1,701,563,207 bytes in 11,492 blocks
==29140==    still reachable: 674,378,618 bytes in 25,072 blocks
==29140==         suppressed: 0 bytes in 0 blocks
==29140== Rerun with --leak-check=full to see details of leaked memory
==29140== 
==29140== For counts of detected and suppressed errors, rerun with: -v
==29140== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 3 from 3)
==29332== 
==29332== HEAP SUMMARY:
==29332==     in use at exit: 4,033,802,861 bytes in 58,023 blocks
==29332==   total heap usage: 818,469 allocs, 760,446 frees, 4,414,285,420 bytes allocated
==29332== 
==29332== LEAK SUMMARY:
==29332==    definitely lost: 608,119 bytes in 3,234 blocks
==29332==    indirectly lost: 761,091,420 bytes in 6,014 blocks
==29332==      possibly lost: 1,970,413,033 bytes in 13,256 blocks
==29332==    still reachable: 1,301,690,289 bytes in 35,519 blocks
==29332==         suppressed: 0 bytes in 0 blocks
==29332== Rerun with --leak-check=full to see details of leaked memory
==29332== 
==29332== For counts of detected and suppressed errors, rerun with: -v
==29332== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 3 from 3)

Comment 1 Shanzhi Yu 2014-09-18 03:44:23 UTC
Created attachment 938739 [details]
memoryleak

Comment 2 Peter Krempa 2014-09-26 08:02:15 UTC
The issue is that glfs_fini() leaks the memory allocated by glfs_new(). This is a known issue in gluster: https://bugzilla.redhat.com/show_bug.cgi?id=1093594

Comment 4 Peter Krempa 2015-03-02 14:03:04 UTC
Looks like the issue ( https://bugzilla.redhat.com/show_bug.cgi?id=1093594) will be fixed in libgfapi soon. Moving to ON_QA. Once the libgfapi package is fixed the memory leak should disappear.

Comment 5 Yang Yang 2015-09-06 09:58:51 UTC
Verified on libvirt-1.2.17-6.el7.x86_64 and glusterfs-server-3.7.3-1.el7.x86_64

Steps
1.prepare a running guest with following xml 
vim virt-tests-vm1.xml
 <disk type='network' device='disk'>
      <driver name='qemu' type='qcow2'/>
      <source protocol='gluster' name='gluster-vol1/rhel7.qcow2'>
        <host name='10.66.xx.xx'/>
      </source>
      <target dev='vda' bus='virtio'/>
      <alias name='virtio-disk0'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x07' function='0x0'/>
    </disk>

2.Prepare four snapshot file

# cat s1.xml
<domainsnapshot>
<name>s1</name>
<disks>
<disk name='vda' type='network'>
<driver type='qcow2'/>
<source protocol='gluster' name='gluster-vol1/rhel7-qcow2.s1'>
<host name='10.66.x.xxx'/>
</source>
</disk>
</disks>
</domainsnapshot>

#for i in s2 s3 s4;do sed -e s/s1/$i/ s1.xml > $i.xml ; done
#for i in s1 s2 s3 s4; do virsh snapshot-create virt-tests-vm1 $i --disk-only --no-metadata; done
create backing-chains:
rhel7-qcow2.s4->rhel7-qcow2.s3>-rhel7-qcow2.s2->rhel7-qcow2.s1->rhel7-qcow2.img

3. Try to create/delete snapshots for 30 times
# red='\e[0;31m';NC='\e[0m' ;for num in $(seq 1 100);do echo -e "${red}Try the $num time:${NC}";virsh define virt-tests-vm1.xml ;virsh start  virt-tests-vm1;sleep 30;for i in s1 s2 s3 s4;do virsh snapshot-create virt-tests-vm1  $i.xml --reuse-external --disk-only --no-metadata;done;virsh undefine virt-tests-vm1;virsh destroy virt-tests-vm1;done

Try the 1 time:
Domain virt-tests-vm1 defined from virt-tests-vm1.xml

Domain virt-tests-vm1 started

Domain snapshot s1 created from 's1.xml'
Domain snapshot s2 created from 's2.xml'
Domain snapshot s3 created from 's3.xml'
Domain snapshot s4 created from 's4.xml'
Domain virt-tests-vm1 has been undefined

Domain virt-tests-vm1 destroyed
.............
Try the 30 time:
Domain virt-tests-vm1 defined from virt-tests-vm1.xml

Domain virt-tests-vm1 started

Domain snapshot s1 created from 's1.xml'
Domain snapshot s2 created from 's2.xml'
Domain snapshot s3 created from 's3.xml'
Domain snapshot s4 created from 's4.xml'
Domain virt-tests-vm1 has been undefined

Domain virt-tests-vm1 destroyed

Libvirtd was not dead

Comment 7 errata-xmlrpc 2015-11-19 05:52:32 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHBA-2015-2202.html


Note You need to log in before you can comment on or make changes to this bug.