Bug 1009355 - glusterfs memory leak when do fio test with native gluster backend
glusterfs memory leak when do fio test with native gluster backend
Status: NEW
Product: Red Hat Enterprise Linux 6
Classification: Red Hat
Component: glusterfs (Show other bugs)
6.5
x86_64 Linux
medium Severity medium
: rc
: ---
Assigned To: Bug Updates Notification Mailing List
storage-qa-internal@redhat.com
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2013-09-18 05:28 EDT by Xiaomei Gao
Modified: 2017-09-14 08:31 EDT (History)
13 users (show)

See Also:
Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed:
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)

  None (edit)
Description Xiaomei Gao 2013-09-18 05:28:55 EDT
Description of problem:
During testing glusterfs performance with fuse bypass, there is memory leak. Btw, it works well for fuse mount. The issue happened on both virtio-blk driver and virtio-scsi driver.

Version-Release number of selected component (if applicable):
qemu-kvm-0.12.1.2-2.401.el6.x86_64
kernel-2.6.32-418.el6.x86_64
glusterfs-3.4.0.24rhs-1.el6rhs.x86_64

How reproducible:
1/4

Steps to Reproduce:
1. Testbed:
   - Hardware: 1 client (4CPU * 8GB); 2 server (8CPU * 16GB );
               private network is 1-Gbit
   - Setup: 1 Gluster volume made up of 1 brick (on SSD) from each server;
            single replication enabled 
   - Client KVM image: 2VCPUs * 4GB RAM; cache=one; aio=threads

2. Create image with fuse bypass on gluster client.
   #/usr/bin/qemu-img create -f raw gluster://192.168.0.17:24007/gv1/storage2.raw 40G

3. Boot guest with data disk.
   # /usr/libexec/qemu-kvm \
     -drive file='/home/RHEL-Server-6.5-64.raw',if=none,id=virtio-scsi0-id0,media=disk,cache=none,snapshot=off,format=raw,aio=threads \
    -device scsi-hd,drive=virtio-scsi0-id0 \
    -drive file='gluster://192.168.0.17:24007/gv1/storage2.raw',if=none,id=virtio-scsi2-id1,media=disk,cache=none,snapshot=off,format=raw,aio=threads \
    -m 4096 \
    -smp 2,maxcpus=2,cores=1,threads=1,sockets=2 \
    ...

4. In guest
   # i=`/bin/ls /dev/[vs]db` &&  mkfs.ext4 $i -F > /dev/null; partprobe; umount /mnt; mount $i /mnt && echo 3 > /proc/sys/vm/drop_caches && sleep 3
   # fio --rw=%s --bs=%s --iodepth=%s --runtime=1m --direct=1 --filename=/mnt/%s --name=job1 --ioengine=libaio --thread --group_reporting --numjobs=16 --size=512MB --time_based --ioscheduler=deadline

Actual results:
- Before running job, host memory shows:
  # free -m
             total       used       free     shared    buffers     cached
  Mem:        7615        177       7437          0          8         34
-/+ buffers/cache:        134       7480
  Swap:       2047          0       2047

- The whole job keeps running about one hour, after running about 30 minutes,the free memory on host decreases to ~200M and host hangs at last.

- Please refer to the log:
  http://kvm-perf.englab.nay.redhat.com/results/3510-autotest/dell-op780-06.qe.lab.eng.nay.redhat.com/debug/client.0.log



Expected results:
There is no memory leak.

Additional info:
Comment 4 Xiaomei Gao 2013-11-18 08:52:13 EST
(In reply to Asias He from comment #2)
> Xiaomei,I can not reproduce this with gluster 3.4.0.34 on my test machine.
> Could you test against the latest gluster package.

I could still reproduce the issue on latest version.
- Host version
  kernel-2.6.32-431.el6.x86_64
  qemu-kvm-0.12.1.2-2.415.el6_5.3.x86_64
  glusterfs-libs-3.4.0.36rhs-1.el6.x86_64
  glusterfs-api-3.4.0.36rhs-1.el6.x86_64
  glusterfs-3.4.0.36rhs-1.el6.x86_64

- Guest version
  kernel-2.6.32-431.el6.x86_64

- Before running fio test
[root@dell-op780-06 ~]# free -m
             total       used       free     shared    buffers     cached
Mem:          7615        424       7190          0          6         44
-/+ buffers/cache:        372       7242
Swap:         2047          0       2047

- After running fio test
[root@dell-op780-06 ~]# free -m
             total       used       free     shared    buffers     cached
Mem:          7615       7514        100          0          0         17
-/+ buffers/cache:       7496        119
Swap:         2047        528       1519
Comment 5 Xiaomei Gao 2013-11-20 06:08:30 EST
Qemu-kvm sometimes will core dump. Please check the following info.
(gdb) bt full
#0  0x00007fda2b3d278a in _int_free (av=0x7fda2b6e9e80, p=0x7fda41b2a460, have_lock=0) at malloc.c:5005
        size = 16777344
        fb = <value optimized out>
        nextchunk = 0x7fda42b2a4e0
        nextsize = 2097168
        nextinuse = <value optimized out>
        prevsize = <value optimized out>
        bck = 0x7fda2fc48000
        fwd = 0x7fda296e9ed8
        errstr = 0x0
        locked = 1
#1  0x00007fda2af22402 in synctask_destroy (task=0x7fda2fc5c010) at syncop.c:148
No locals.
#2  0x00007fda2af227d0 in syncenv_processor (thdata=0x7fda2f9cbb40) at syncop.c:389
        env = 0x7fda2f9ca4c0
        proc = 0x7fda2f9cbb40
        task = <value optimized out>
#3  0x00007fda2ddf19d1 in start_thread (arg=0x7fda0cb3c700) at pthread_create.c:301
        __res = <value optimized out>
        pd = 0x7fda0cb3c700
        now = <value optimized out>
        unwind_buf = {cancel_jmp_buf = {{jmp_buf = {140574492706560, 4723731760081897745, 140575051375456, 140574492707264, 0, 3, 
                -4739466426402109167, -4739393486156363503}, mask_was_saved = 0}}, priv = {pad = {0x0, 0x0, 0x0, 0x0}, data = {prev = 0x0, 
              cleanup = 0x0, canceltype = 0}}}
        not_first_call = <value optimized out>
        pagesize_m1 = <value optimized out>
        sp = <value optimized out>
        freesize = <value optimized out>
#4  0x00007fda2b442b6d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:115
Comment 7 Jeff Cody 2014-04-15 09:36:59 EDT
This is almost certainly an issue in libglusterfs, rather than qemu itself (both the leak, and comment #5).

For the issue described in comment #5, that sounds like bug #1010638.

Note You need to log in before you can comment on or make changes to this bug.