Bug 1010638

Summary: qemu-kvm crash during fio test with gluster native mode
Product: Red Hat Enterprise Linux 7 Reporter: Xiaomei Gao <xigao>
Component: glusterfsAssignee: Bug Updates Notification Mailing List <rhs-bugs>
Status: CLOSED WONTFIX QA Contact: storage-qa-internal <storage-qa-internal>
Severity: high Docs Contact:
Priority: medium    
Version: 7.0CC: areis, chrisw, juzhang, virt-maint, wquan, yama
Target Milestone: rcKeywords: Triaged
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2020-12-15 07:27:30 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Xiaomei Gao 2013-09-22 08:30:42 UTC
Description of problem:
During testing the performance of gluster native mode, qemu-kvm will core dump for memory leak, which seems like the Bug 1009355 on RHEL 6.

Version-Release number of selected component (if applicable):
qemu-kvm-1.5.3-3.el7.x86_64
kernel-3.10.0-19.el7.x86_64
glusterfs-fuse-3.4.0.15rhs-1.el7.x86_64
glusterfs-3.4.0.15rhs-1.el7.x86_64

How reproducible:
1/4

Steps to Reproduce:
1. Testbed:
   - Hardware: 1 client (4CPU * 8GB); 2 server (8CPU * 16GB );
               private network is 1-Gbit
   - Setup: 1 Gluster volume made up of 1 brick (on SSD) from each server;
            single replication enabled 
   - Client KVM image: 2VCPUs * 4GB RAM; cache=one; aio=threads

2. Create image with gluster native mode.
   #/usr/bin/qemu-img create -f qcow2 gluster://192.168.0.17:24007/gv1/storage2.qcow2 40G

3. Boot guest with data disk.
   # /usr/libexec/qemu-kvm \
    ...
    -drive file='/home/RHEL-Server-7.0-64.qcow2',index=0,if=none,id=drive-virtio-disk1,media=disk,cache=none,snapshot=off,format=qcow2,aio=threads \
    -device virtio-blk-pci,bus=pci.0,addr=0x4,drive=drive-virtio-disk1,bootindex=0 \
    -drive file='gluster://192.168.0.17:24007/vol/storage2.qcow2',index=2,if=none,id=drive-virtio-disk2,media=disk,cache=none,snapshot=off,format=qcow2,aio=threads \
    -device virtio-blk-pci,bus=pci.0,addr=0x5,drive=drive-virtio-disk2,bootindex=1 \
    -m 4096 \
    -smp 2,maxcpus=2,cores=1,threads=1,sockets=2 \
    ...

4. In guest
   # i=`/bin/ls /dev/[vs]db` &&  mkfs.ext4 $i -F > /dev/null; partprobe; umount /mnt; mount $i /mnt && echo 3 > /proc/sys/vm/drop_caches && sleep 3
   # fio --rw=%s --bs=%s --iodepth=%s --runtime=1m --direct=1 --filename=/mnt/%s --name=job1 --ioengine=libaio --thread --group_reporting --numjobs=16 --size=512MB --time_based --ioscheduler=deadline

Actual results:
Qemu-kvm core dump.
1. (gdb) bt
#0  0x00007f4bb5322999 in __GI_raise (sig=sig@entry=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:56
#1  0x00007f4bb53240a8 in __GI_abort () at abort.c:90
#2  0x00007f4bb5363147 in __libc_message (do_abort=do_abort@entry=2, fmt=fmt@entry=0x7f4bb546b888 "*** Error in `%s': %s: 0x%s ***\n")
    at ../sysdeps/unix/sysv/linux/libc_fatal.c:196
#3  0x00007f4bb5368f47 in malloc_printerr (action=<optimized out>, str=0x7f4bb5468f78 "corrupted double-linked list", ptr=<optimized out>)
    at malloc.c:4951
#4  0x00007f4bb536a57a in _int_free (av=0x7f4bb56a7760 <main_arena>, p=0x7f4bc891b8d0, have_lock=0) at malloc.c:3939
#5  0x00007f4bb4e39d62 in synctask_destroy () from /lib64/libglusterfs.so.0
#6  0x00007f4bb4e3ab60 in syncenv_processor () from /lib64/libglusterfs.so.0
#7  0x00007f4bb829bde3 in start_thread (arg=0x7f4a7dffb700) at pthread_create.c:308
#8  0x00007f4bb53e30dd in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:113

2. Please refer to the following log:
http://kvm-perf.englab.nay.redhat.com/results/3527-autotest/dell-op780-06.qe.lab.eng.nay.redhat.com/debug/client.0.log

Expected results:
Guest works well.

Additional info:
# cat /proc/cpuinfo on client
processor	: 3
vendor_id	: GenuineIntel
cpu family	: 6
model		: 23
model name	: Intel(R) Core(TM)2 Quad CPU    Q9400  @ 2.66GHz
stepping	: 10
microcode	: 0xa0b
cpu MHz		: 2660.196
cache size	: 3072 KB
physical id	: 0
siblings	: 4
core id		: 3
cpu cores	: 4
apicid		: 3
initial apicid	: 3
fpu		: yes
fpu_exception	: yes
cpuid level	: 13
wp		: yes
flags		: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx lm constant_tsc arch_perfmon pebs bts rep_good nopl aperfmperf pni dtes64 monitor ds_cpl vmx smx est tm2 ssse3 cx16 xtpr pdcm sse4_1 xsave lahf_lm dtherm tpr_shadow vnmi flexpriority
bogomips	: 5320.39
clflush size	: 64
cache_alignment	: 64
address sizes	: 36 bits physical, 48 bits virtual
power management:

Comment 2 Xiaomei Gao 2013-09-23 02:12:45 UTC
Hit different core dump info and not sure if it is the same issue.
(gdb) bt
#0  0x00007f79ec757b6a in object_dynamic_cast_assert (obj=obj@entry=0x7f79ed3c3e18 <ioport_read_table+1138488>, 
    typename=typename@entry=0x7f79ec8e4c2a "virtio-device", file=file@entry=0x7f79ec8f8a00 "/builddir/build/BUILD/qemu-1.5.3/hw/block/virtio-blk.c", 
    line=line@entry=45, func=func@entry=0x7f79ec8f8b10 <__func__.27992> "virtio_blk_req_complete") at qom/object.c:438
#1  0x00007f79ec7af9a5 in virtio_blk_req_complete (req=req@entry=0x7f79f148bdf0, status=status@entry=0)
    at /usr/src/debug/qemu-1.5.3/hw/block/virtio-blk.c:45
#2  0x00007f79ec7afbb1 in virtio_blk_rw_complete (opaque=0x7f79f148bdf0, ret=<optimized out>) at /usr/src/debug/qemu-1.5.3/hw/block/virtio-blk.c:85
#3  0x00007f79ec634912 in bdrv_co_em_bh (opaque=0x7f79f4bc8830) at block.c:4137
#4  0x00007f79ec624f9a in aio_bh_poll (ctx=ctx@entry=0x7f79ef2d3a00) at async.c:70
#5  0x00007f79ec624b68 in aio_poll (ctx=0x7f79ef2d3a00, blocking=blocking@entry=false) at aio-posix.c:185
#6  0x00007f79ec624e90 in aio_ctx_dispatch (source=<optimized out>, callback=<optimized out>, user_data=<optimized out>) at async.c:167
#7  0x00007f79ebc9de06 in g_main_context_dispatch () from /lib64/libglib-2.0.so.0
#8  0x00007f79ec72077a in glib_pollfds_poll () at main-loop.c:187
#9  os_host_main_loop_wait (timeout=<optimized out>) at main-loop.c:232
#10 main_loop_wait (nonblocking=<optimized out>) at main-loop.c:464
#11 0x00007f79ec620241 in main_loop () at vl.c:2060
#12 main (argc=<optimized out>, argv=<optimized out>, envp=<optimized out>) at vl.c:4451

Comment 3 Paolo Bonzini 2013-11-22 15:00:23 UTC
Please try to reproduce after doing "export MALLOC_PERTURB_=42" in the shell.

Comment 4 juzhang 2013-11-25 01:48:15 UTC
(In reply to Paolo Bonzini from comment #3)
> Please try to reproduce after doing "export MALLOC_PERTURB_=42" in the shell.

Hi Xiaomei,

Would you please reply comment3?

Best Regards,
Junyi

Comment 5 Xiaomei Gao 2013-11-25 14:33:56 UTC
(In reply to Paolo Bonzini from comment #3)
> Please try to reproduce after doing "export MALLOC_PERTURB_=42" in the shell.

- Try it on the qemu-kvm-1.5.3-19.el7.x86_64 and glusterfs-3.4.0.40rhs-2.el7.x86_64 with "export MALLOC_PERTURB_=42" in the shell.

- I didn't hit memory leak which caused qemu-kvm core dump, but qemu reports error "main-loop: WARNING: I/O thread spun for 1000 iterations", and guest reports the following info:
[   30.712106] Bridge firewalling registered
[   43.071494] EXT4-fs (vdb): mounted filesystem with ordered data mode. Opts: (null)
[   49.316813] nr_pdflush_threads exported in /proc is scheduled for removal
[   49.320518] sysctl: The scan_unevictable_pages sysctl/node-interface has been disabled for lack of a legitimate use case.  If you have one, please send an email to linux-mm.
[  142.512487] hrtimer: interrupt took 3901691 ns

Comment 6 Jeff Cody 2013-12-03 15:12:08 UTC
I've been unable to reproduce this bug.  I've left it running overnight with 16 threads, up through 64 threads.  Using MALLOC_PERTURB_ has not had any affect for me.

I noticed the reproduced ratio is 1/4  - has this been seen more than once?  I.e., can it be reproduced easily?  If so, could you try it with glusterfs 3.4.1 instead of 3.4.0, and see if it still occurs?

Comment 7 Xiaomei Gao 2013-12-04 05:27:39 UTC
(In reply to Jeff Cody from comment #6)
> I've been unable to reproduce this bug.  I've left it running overnight with
> 16 threads, up through 64 threads.  Using MALLOC_PERTURB_ has not had any
> affect for me.

We test it with the following parameter's permutation and combination which can run about one hour, but hit the issue at about the 20th~30th minute.
rw = "read write randread randwrite randrw"
bs = "4k 16k 64k 256k"
iodepth = "1 8 64"
threads = "16"

> I noticed the reproduced ratio is 1/4  - has this been seen more than once? 
> I.e., can it be reproduced easily?  If so, could you try it with glusterfs
> 3.4.1 instead of 3.4.0, and see if it still occurs?

It happened more than once. When running for about 20~30 minutes,we could see only ~200M free memory left on the host which finally leads to qemu core dump. Sure, we will test against glusterfs 3.4.1 and see if the issue happened.

Comment 8 Xiaomei Gao 2013-12-05 12:14:38 UTC
(In reply to Jeff Cody from comment #6)
> I've been unable to reproduce this bug.  I've left it running overnight with
> 16 threads, up through 64 threads.  Using MALLOC_PERTURB_ has not had any
> affect for me.
> 
> I noticed the reproduced ratio is 1/4  - has this been seen more than once? 
> I.e., can it be reproduced easily?  If so, could you try it with glusterfs
> 3.4.1 instead of 3.4.0, and see if it still occurs?

I couldn't find glusterfs 3.4.1 in brew web, Could you help build one or what else means to get it?

Thanks 
Xiaomei

Comment 9 Jeff Cody 2013-12-10 05:23:51 UTC
(In reply to Xiaomei Gao from comment #8)
> (In reply to Jeff Cody from comment #6)
> > I've been unable to reproduce this bug.  I've left it running overnight with
> > 16 threads, up through 64 threads.  Using MALLOC_PERTURB_ has not had any
> > affect for me.
> > 
> > I noticed the reproduced ratio is 1/4  - has this been seen more than once? 
> > I.e., can it be reproduced easily?  If so, could you try it with glusterfs
> > 3.4.1 instead of 3.4.0, and see if it still occurs?
> 
> I couldn't find glusterfs 3.4.1 in brew web, Could you help build one or
> what else means to get it?
> 
> Thanks 
> Xiaomei

Hi Xiomei,

I've been able to reproduce this now with gluster v3.4.0 and 3.4.1, so I don't need any testing with 3.4.1.  The key is the memory leak - as fio keeps running, more memory is leaked by qemu.

Comment 10 Anand Avati 2013-12-11 06:57:20 UTC
None of the above backtraces indicate the crash is because of a memory leak. There may be a memory leak - but it doesn't look like the cause for the crash. For e.g:

#3  0x00007f4bb5368f47 in malloc_printerr (action=<optimized out>, str=0x7f4bb5468f78 "corrupted double-linked list", ptr=<optimized out>)
    at malloc.c:4951

does not look like a crash from a memory leak. Similarly,

#0  0x00007f79ec757b6a in object_dynamic_cast_assert (obj=obj@entry=0x7f79ed3c3e18 <ioport_read_table+1138488>, 
    typename=typename@entry=0x7f79ec8e4c2a "virtio-device", file=file@entry=0x7f79ec8f8a00 "/builddir/build/BUILD/qemu-1.5.3/hw/block/virtio-blk.c", 
    line=line@entry=45, func=func@entry=0x7f79ec8f8b10 <__func__.27992> "virtio_blk_req_complete") at qom/object.c:438

does not look like a memory leak related crash either. These look like memory corruption issues (unclear yet if the issue is on the gluster side or qemu side). Am I missing something?

Comment 16 Ademar Reis 2014-03-21 21:56:09 UTC
We can't tell for sure if the crash is caused by a leak or not, so I'm changing the summary.

Comment 19 juzhang 2014-03-25 01:41:48 UTC
Hi Xiaomei,

Can you Jeff's comment?

Best Regards,
Junyi

Comment 21 Jeff Cody 2014-03-27 00:28:47 UTC
Using the QE machine from Xiaomei, I was able to reproduce this with autotest.

I turned on core dumps, and examined the resulting core.  It was a similar backtrace as seen in comment #0:

(gdb) thread 1
[Switching to thread 1 (Thread 0x7f90d7fff700 (LWP 12903))]
#0  0x00007f9204568989 in raise () from /lib64/libc.so.6
(gdb) bt
#0  0x00007f9204568989 in raise () from /lib64/libc.so.6
#1  0x00007f920456a098 in abort () from /lib64/libc.so.6
#2  0x00007f92045a90e7 in __libc_message () from /lib64/libc.so.6
#3  0x00007f92045aefb7 in malloc_printerr () from /lib64/libc.so.6
#4  0x00007f92045b05ea in _int_free () from /lib64/libc.so.6
#5  0x00007f9204079942 in synctask_destroy () from /lib64/libglusterfs.so.0
#6  0x00007f920407a770 in syncenv_processor () from /lib64/libglusterfs.so.0
#7  0x00007f920791ddf3 in start_thread () from /lib64/libpthread.so.0
#8  0x00007f92046293cd in clone () from /lib64/libc.so.6


(gdb) thread 4
[Switching to thread 4 (Thread 0x7f920989ba40 (LWP 12216))]
#0  0x00007f92046638c2 in _dl_addr () from /lib64/libc.so.6
(gdb) bt
#0  0x00007f92046638c2 in _dl_addr () from /lib64/libc.so.6
#1  0x00007f920463d915 in backtrace_symbols_fd () from /lib64/libc.so.6
#2  0x00007f92045a913f in __libc_message () from /lib64/libc.so.6
#3  0x00007f92045aefb7 in malloc_printerr () from /lib64/libc.so.6
#4  0x00007f92045b2062 in _int_malloc () from /lib64/libc.so.6
#5  0x00007f92045b3efa in calloc () from /lib64/libc.so.6
#6  0x00007f920407a249 in synctask_new () from /lib64/libglusterfs.so.0
#7  0x00007f920730175b in glfs_pwritev_async () from /lib64/libgfapi.so.0
#8  0x00007f9209999b56 in qemu_gluster_aio_rw (qiov=0x7f920cc08130, qiov=0x7f920cc08130, write=1, opaque=<optimized out>, cb=<optimized out>, nb_sectors=32, sector_num=18397760, bs=<optimized out>) at block/gluster.c:566
#9  qemu_gluster_aio_writev (bs=<optimized out>, sector_num=18397760, qiov=0x7f920cc08130, nb_sectors=32, cb=<optimized out>, opaque=<optimized out>) at block/gluster.c:608
#10 0x00007f920998de8a in bdrv_co_io_em (bs=0x7f920abbf0c0, sector_num=18397760, nb_sectors=32, iov=0x7f920cc08130, is_write=<optimized out>) at block.c:4616
#11 0x00007f9209992b4a in bdrv_aligned_pwritev (flags=0, qiov=0x7f920cc08130, bytes=16384, offset=<optimized out>, req=0x7f920fe51730, bs=0x7f920abbf0c0) at block.c:3018
#12 bdrv_co_do_pwritev (bs=0x7f920abbf0c0, offset=<optimized out>, bytes=16384, qiov=<optimized out>, flags=(unknown: 0)) at block.c:3147
#13 0x00007f9209992b4a in bdrv_aligned_pwritev (flags=0, qiov=0x7f920cc08130, bytes=16384, offset=<optimized out>, req=0x7f920fe51870, bs=0x7f920abbae70) at block.c:3018
#14 bdrv_co_do_pwritev (bs=0x7f920abbae70, offset=<optimized out>, bytes=16384, qiov=<optimized out>, flags=(unknown: 0)) at block.c:3147
#15 0x00007f92099935d4 in bdrv_co_do_writev (flags=<optimized out>, qiov=<optimized out>, nb_sectors=<optimized out>, sector_num=<optimized out>, bs=<optimized out>) at block.c:3171
#16 bdrv_co_do_rw (opaque=0x7f920daf9580) at block.c:4470
#17 0x00007f92099c701a in coroutine_trampoline (i0=<optimized out>, i1=<optimized out>) at coroutine-ucontext.c:118
#18 0x00007f920457a570 in ?? () from /lib64/libc.so.6
#19 0x00007fff59d6d630 in ?? ()
#20 0x0000000000000000 in ?? ()


(gdb) frame 8
#8  0x00007f9209999b56 in qemu_gluster_aio_rw (qiov=0x7f920cc08130, qiov=0x7f920cc08130, write=1, opaque=<optimized out>, cb=<optimized out>, nb_sectors=32, sector_num=18397760, bs=<optimized out>) at block/gluster.c:566
566             ret = glfs_pwritev_async(s->fd, qiov->iov, qiov->niov, offset, 0,
(gdb) print s->fd
$11 = (struct glfs_fd *) 0x7f920abec890
(gdb) print *qiov
$12 = {iov = 0x7f920cc04128, niov = 4, nalloc = -1, size = 16384}
(gdb) print offset
$13 = 9419653120
(gdb) print *acb
$14 = {common = {aiocb_info = 0x7f9209eb0040 <gluster_aiocb_info>, bs = 0x7f920abbf0c0, cb = 0x7f920998de10 <bdrv_co_io_em_complete>, opaque = 0x7f920fe51650}, size = 16384, ret = 0, finished = 0x0, bh = 0x0}
(gdb)

From the backtraces seen above (and in comment #0), this looks to be a glusterfs issue, with some memory corruption occurring across threads in libglusterfs.

Note for reproducing:  I had some runs where the autotest ran to completion (2-3 hours?) without failure.  I was able to get 2 failures on the QE machine, and those were around the 45min-60min mark when they occurred.

Comment 22 Ludek Smid 2014-06-26 09:05:09 UTC
This request was resolved in Red Hat Enterprise Linux 7.0.

Contact your manager or support representative in case you have further questions about the request.

Comment 23 Ludek Smid 2014-06-26 11:16:16 UTC
The comment above is incorrect. The correct version is bellow.
I'm sorry for any inconvenience.
---------------------------------------------------------------

This request was NOT resolved in Red Hat Enterprise Linux 7.0.

Contact your manager or support representative in case you need
to escalate this bug.

Comment 26 RHEL Program Management 2020-12-15 07:27:30 UTC
After evaluating this issue, there are no plans to address it further or fix it in an upcoming release.  Therefore, it is being closed.  If plans change such that this issue will be fixed in an upcoming release, then the bug can be reopened.