Bug 1009355

Summary: glusterfs memory leak when do fio test with native gluster backend
Product: Red Hat Enterprise Linux 6 Reporter: Xiaomei Gao <xigao>
Component: glusterfsAssignee: Bug Updates Notification Mailing List <rhs-bugs>
Status: CLOSED WONTFIX QA Contact: storage-qa-internal <storage-qa-internal>
Severity: medium Docs Contact:
Priority: medium    
Version: 6.5CC: areis, chayang, gpo+redhat, juzhang, michen, mkenneth, mzhan, qzhang, rbalakri, rpacheco, virt-maint, wquan, yama
Target Milestone: rc   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2017-12-06 11:58:02 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Xiaomei Gao 2013-09-18 09:28:55 UTC
Description of problem:
During testing glusterfs performance with fuse bypass, there is memory leak. Btw, it works well for fuse mount. The issue happened on both virtio-blk driver and virtio-scsi driver.

Version-Release number of selected component (if applicable):
qemu-kvm-0.12.1.2-2.401.el6.x86_64
kernel-2.6.32-418.el6.x86_64
glusterfs-3.4.0.24rhs-1.el6rhs.x86_64

How reproducible:
1/4

Steps to Reproduce:
1. Testbed:
   - Hardware: 1 client (4CPU * 8GB); 2 server (8CPU * 16GB );
               private network is 1-Gbit
   - Setup: 1 Gluster volume made up of 1 brick (on SSD) from each server;
            single replication enabled 
   - Client KVM image: 2VCPUs * 4GB RAM; cache=one; aio=threads

2. Create image with fuse bypass on gluster client.
   #/usr/bin/qemu-img create -f raw gluster://192.168.0.17:24007/gv1/storage2.raw 40G

3. Boot guest with data disk.
   # /usr/libexec/qemu-kvm \
     -drive file='/home/RHEL-Server-6.5-64.raw',if=none,id=virtio-scsi0-id0,media=disk,cache=none,snapshot=off,format=raw,aio=threads \
    -device scsi-hd,drive=virtio-scsi0-id0 \
    -drive file='gluster://192.168.0.17:24007/gv1/storage2.raw',if=none,id=virtio-scsi2-id1,media=disk,cache=none,snapshot=off,format=raw,aio=threads \
    -m 4096 \
    -smp 2,maxcpus=2,cores=1,threads=1,sockets=2 \
    ...

4. In guest
   # i=`/bin/ls /dev/[vs]db` &&  mkfs.ext4 $i -F > /dev/null; partprobe; umount /mnt; mount $i /mnt && echo 3 > /proc/sys/vm/drop_caches && sleep 3
   # fio --rw=%s --bs=%s --iodepth=%s --runtime=1m --direct=1 --filename=/mnt/%s --name=job1 --ioengine=libaio --thread --group_reporting --numjobs=16 --size=512MB --time_based --ioscheduler=deadline

Actual results:
- Before running job, host memory shows:
  # free -m
             total       used       free     shared    buffers     cached
  Mem:        7615        177       7437          0          8         34
-/+ buffers/cache:        134       7480
  Swap:       2047          0       2047

- The whole job keeps running about one hour, after running about 30 minutes,the free memory on host decreases to ~200M and host hangs at last.

- Please refer to the log:
  http://kvm-perf.englab.nay.redhat.com/results/3510-autotest/dell-op780-06.qe.lab.eng.nay.redhat.com/debug/client.0.log



Expected results:
There is no memory leak.

Additional info:

Comment 4 Xiaomei Gao 2013-11-18 13:52:13 UTC
(In reply to Asias He from comment #2)
> Xiaomei,I can not reproduce this with gluster 3.4.0.34 on my test machine.
> Could you test against the latest gluster package.

I could still reproduce the issue on latest version.
- Host version
  kernel-2.6.32-431.el6.x86_64
  qemu-kvm-0.12.1.2-2.415.el6_5.3.x86_64
  glusterfs-libs-3.4.0.36rhs-1.el6.x86_64
  glusterfs-api-3.4.0.36rhs-1.el6.x86_64
  glusterfs-3.4.0.36rhs-1.el6.x86_64

- Guest version
  kernel-2.6.32-431.el6.x86_64

- Before running fio test
[root@dell-op780-06 ~]# free -m
             total       used       free     shared    buffers     cached
Mem:          7615        424       7190          0          6         44
-/+ buffers/cache:        372       7242
Swap:         2047          0       2047

- After running fio test
[root@dell-op780-06 ~]# free -m
             total       used       free     shared    buffers     cached
Mem:          7615       7514        100          0          0         17
-/+ buffers/cache:       7496        119
Swap:         2047        528       1519

Comment 5 Xiaomei Gao 2013-11-20 11:08:30 UTC
Qemu-kvm sometimes will core dump. Please check the following info.
(gdb) bt full
#0  0x00007fda2b3d278a in _int_free (av=0x7fda2b6e9e80, p=0x7fda41b2a460, have_lock=0) at malloc.c:5005
        size = 16777344
        fb = <value optimized out>
        nextchunk = 0x7fda42b2a4e0
        nextsize = 2097168
        nextinuse = <value optimized out>
        prevsize = <value optimized out>
        bck = 0x7fda2fc48000
        fwd = 0x7fda296e9ed8
        errstr = 0x0
        locked = 1
#1  0x00007fda2af22402 in synctask_destroy (task=0x7fda2fc5c010) at syncop.c:148
No locals.
#2  0x00007fda2af227d0 in syncenv_processor (thdata=0x7fda2f9cbb40) at syncop.c:389
        env = 0x7fda2f9ca4c0
        proc = 0x7fda2f9cbb40
        task = <value optimized out>
#3  0x00007fda2ddf19d1 in start_thread (arg=0x7fda0cb3c700) at pthread_create.c:301
        __res = <value optimized out>
        pd = 0x7fda0cb3c700
        now = <value optimized out>
        unwind_buf = {cancel_jmp_buf = {{jmp_buf = {140574492706560, 4723731760081897745, 140575051375456, 140574492707264, 0, 3, 
                -4739466426402109167, -4739393486156363503}, mask_was_saved = 0}}, priv = {pad = {0x0, 0x0, 0x0, 0x0}, data = {prev = 0x0, 
              cleanup = 0x0, canceltype = 0}}}
        not_first_call = <value optimized out>
        pagesize_m1 = <value optimized out>
        sp = <value optimized out>
        freesize = <value optimized out>
#4  0x00007fda2b442b6d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:115

Comment 7 Jeff Cody 2014-04-15 13:36:59 UTC
This is almost certainly an issue in libglusterfs, rather than qemu itself (both the leak, and comment #5).

For the issue described in comment #5, that sounds like bug #1010638.

Comment 10 Jan Kurik 2017-12-06 11:58:02 UTC
Red Hat Enterprise Linux 6 is in the Production 3 Phase. During the Production 3 Phase, Critical impact Security Advisories (RHSAs) and selected Urgent Priority Bug Fix Advisories (RHBAs) may be released as they become available.

The official life cycle policy can be reviewed here:

http://redhat.com/rhel/lifecycle

This issue does not meet the inclusion criteria for the Production 3 Phase and will be marked as CLOSED/WONTFIX. If this remains a critical requirement, please contact Red Hat Customer Support to request a re-evaluation of the issue, citing a clear business justification. Note that a strong business justification will be required for re-evaluation. Red Hat Customer Support can be contacted via the Red Hat Customer Portal at the following URL:

https://access.redhat.com/