Bug 1409773

Summary: libgfapi leaks memory after glfs_fini
Product: Red Hat Enterprise Linux 7 Reporter: Han Han <hhan>
Component: glusterfsAssignee: Niels de Vos <ndevos>
Status: NEW --- QA Contact: Sweta Anandpara <sanandpa>
Severity: high Docs Contact: Milan Navratil <mnavrati>
Priority: unspecified    
Version: 7.3CC: amukherj, chhu, dyuan, ndevos, pasik, pkrempa, sabose, xuzhang, yisun
Target Milestone: rc   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Known Issue
Doc Text:
Memory leaks occur when certain applications fail to exit after unloading the Gluster libraries Gluster consists of many internal components and different translators that implement functions and features. The `gfapi` access method was added to integrate Gluster tightly with applications. However, not all components and translators are designed to be unloaded in running applications. As a consequence, programs that do not exit after unloading the Gluster libraries are unable to release some of the memory allocations that are performed internally by Gluster. To reduce the amount of memory leaks, prevent applications from calling the `glfs_init()` and `glfs_fini()` functions whenever possible. To release the leaked memory, you must restart long-running applications.
Story Points: ---
Clone Of: Environment:
Last Closed: Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Bug Depends On: 1196020    
Bug Blocks: 1449577    
Attachments:
Description Flags
The log of valgrind none

Description Han Han 2017-01-03 10:29:16 UTC
Created attachment 1236823 [details]
The log of valgrind

Description of problem:
As subject

Version-Release number of selected component (if applicable):
libvirt-2.0.0-10.el7_3.2.x86_64
qemu-kvm-rhev-2.6.0-28.el7_3.2.x86_64
glusterfs-3.8.4-10.el7.x86_64
kernel-3.10.0-514.6.1.el7.x86_64

How reproducible:
100%

Steps to Reproduce:
1. Prepare a running VM based on glusterfs back-end
# mount -t glusterfs xx.xx.xx.xx:/gluster-vol1 /var/tmp/gls
# virsh define gls.xml
Domain gls defined from gls.xml

# virsh start gls
Domain gls started

2. Create a external snapshot based on glusterfs
# virsh snapshot-create gls s1.xml --disk-only
Domain snapshot snap1-gluster created from 's1.xml'

# virsh snapshot-list gls
 Name                 Creation Time             State
------------------------------------------------------------
 snap1-gluster        2017-01-03 03:46:41 -0500 disk-snapshot

# cat s1.xml
<domainsnapshot>
<name>snap1-gluster</name>
<disks>
<disk name='vda' type='network'>
<driver type='qcow2'/>
<source protocol='gluster' name='gluster-vol1/gls.s1'>
<host name='xx.xx.xx.xx'/>
</source>
</disk>
</disks>
</domainsnapshot>

3. Attach a disk and create external snapshots on glusterfs
# qemu-img create -f qcow2 /var/tmp/gls/vdb.qcow2 100M
Formatting '/var/tmp/gls/vdb.qcow2', fmt=qcow2 size=104857600 encryption=off cluster_size=65536 lazy_refcounts=off refcount_bits=16
# virsh attach-device gls vdb-net.xml
Device attached successfully

# virsh snapshot-create gls s2.xml --disk-only
Domain snapshot snap2-gluster created from 's2.xml'
# virsh snapshot-list gls
 Name                 Creation Time             State
------------------------------------------------------------
 snap1-gluster        2017-01-03 03:46:41 -0500 disk-snapshot
 snap2-gluster        2017-01-03 03:46:54 -0500 disk-snapshot
# cat s2.xml
<domainsnapshot>
<name>snap2-gluster</name>
<disks>
<disk name='vda' type='network'>
<driver type='qcow2'/>
<source protocol='gluster' name='gluster-vol1/gls.s2'>
<host name='xx.xx.xx.xx'/>
</source>
</disk>
<disk name='vdb' type='network'>
<driver type='qcow2'/>
<source protocol='gluster' name='gluster-vol1/gls-TT.s2'>
<host name='xx.xx.xx.xx'/>
</source>
</disk>
</disks>
</domainsnapshot>

4. Destroy and start VM, attach disk, create external snapshots, one on local, one on glusterfs
# virsh destroy gls
Domain gls destroyed

# virsh start gls
Domain gls started

# sleep 10
# qemu-img create -f qcow2 /tmp/gls-ll.qcow2 100M
Formatting '/tmp/gls-ll.qcow2', fmt=qcow2 size=104857600 encryption=off cluster_size=65536 lazy_refcounts=off refcount_bits=16
# virsh attach-device gls vdb-local.xml
Device attached successfully

# virsh snapshot-create gls s3.xml --disk-only
Domain snapshot snap created from 's3.xml'
# virsh snapshot-list gls
 Name                 Creation Time             State
------------------------------------------------------------
 snap                 2017-01-03 03:47:44 -0500 disk-snapshot
 snap1-gluster        2017-01-03 03:46:41 -0500 disk-snapshot
 snap2-gluster        2017-01-03 03:46:54 -0500 disk-snapshot

# cat s3.xml
<domainsnapshot>
<name>snap</name>
<disks>
<disk name='vda'>
<source file='/tmp/gls.s3'/>
</disk>
<disk name='vdb' type='network'>
<driver type='qcow2'/>
<source protocol='gluster' name='gluster-vol1/gls-ll.s3'>
<host name='xx.xx.xx.xx'/>
</source>
</disk>
</disks>
</domainsnapshot>

5. Destroy and start VM, create external snapshot with memory
# virsh destroy gls
Domain gls destroyed

# virsh start gls
Domain gls started

# sleep 10
# virsh snapshot-create gls ss1.xml
Domain snapshot snap1-mem created from 'ss1.xml'
# virsh snapshot-list gls
 Name                 Creation Time             State
------------------------------------------------------------
 snap                 2017-01-03 03:47:44 -0500 disk-snapshot
 snap1-gluster        2017-01-03 03:46:41 -0500 disk-snapshot
 snap1-mem            2017-01-03 03:48:24 -0500 running
 snap2-gluster        2017-01-03 03:46:54 -0500 disk-snapshot
# <domainsnapshot>
<name>snap1-mem</name>
<memory snapshot='external' file='/tmp/gls-mem.img'/>
<disks>
<disk name='vda' type='network'>
<driver type='qcow2'/>
<source protocol='gluster' name='gluster-vol1/gls.ss1'>
<host name='xx.xx.xx.xx'/>
</source>
</disk>
</disks>
</domainsnapshot>

6. Destroy VM
# virsh destroy gls
Domain gls destroyed

Actual results:
All these operations will make libvirtd occupy over 20% memory(about 2g) and never free the memory.

Expected results:
No memory leak

Additional info:
Run above steps with valgrind monitoring libvirtd:
# valgrind --leak-check=full --trace-children=no --child-silent-after-fork=yes --log-file=val.log libvirtd 
==4455== Memcheck, a memory error detector
==4455== Copyright (C) 2002-2015, and GNU GPL'd, by Julian Seward et al.
==4455== Using Valgrind-3.11.0 and LibVEX; rerun with -h for copyright info
==4455== Command: libvirtd
==4455== Parent PID: 29376
==4455== 
==4455== Warning: noted but unhandled ioctl 0x89a2 with no size/direction hints.
==4455==    This could cause spurious value errors to appear.
==4455==    See README_MISSING_SYSCALL_OR_IOCTL for guidance on writing a proper wrapper.
==4455== 
==4455== HEAP SUMMARY:
==4455==     in use at exit: 813,802,703 bytes in 21,486 blocks
==4455==   total heap usage: 514,985 allocs, 493,499 frees, 3,627,381,515 bytes allocated
==4455== 

The full log is in the attachment.

Comment 1 Han Han 2017-01-03 10:32:52 UTC
The bug can be reproduced on libvirt-2.0.0-10.el7.x86_64

Comment 2 Peter Krempa 2017-01-03 11:20:36 UTC
The leak is in the gluster library.