Bug 1032418
| Summary: | qemu-kvm core dump when mirroring block using glusterfs:native backend | |||
|---|---|---|---|---|
| Product: | Red Hat Enterprise Linux 7 | Reporter: | Xu Han <xuhan> | |
| Component: | glusterfs | Assignee: | Bug Updates Notification Mailing List <rhs-bugs> | |
| Status: | CLOSED WONTFIX | QA Contact: | storage-qa-internal <storage-qa-internal> | |
| Severity: | high | Docs Contact: | ||
| Priority: | high | |||
| Version: | 7.0 | CC: | chrisw, juzhang, pbonzini, virt-maint, xfu, xuhan | |
| Target Milestone: | rc | Keywords: | Triaged | |
| Target Release: | --- | |||
| Hardware: | x86_64 | |||
| OS: | Linux | |||
| Whiteboard: | ||||
| Fixed In Version: | Doc Type: | If docs needed, set a value | ||
| Doc Text: | Story Points: | --- | ||
| Clone Of: | ||||
| : | 1034398 (view as bug list) | Environment: | ||
| Last Closed: | 2020-12-15 07:28:33 UTC | Type: | Bug | |
| Regression: | --- | Mount Type: | --- | |
| Documentation: | --- | CRM: | ||
| Verified Versions: | Category: | --- | ||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | ||
| Cloudforms Team: | --- | Target Upstream Version: | ||
| Embargoed: | ||||
| Bug Depends On: | ||||
| Bug Blocks: | 1034398 | |||
Tested few times and found once mirroring block used glusterfs:native backend then qemu-kvm may core dump. So modified summary. Glusterfs server information: Volume Name: gv0 Type: Distribute Volume ID: e04a57d0-614c-4bf7-8458-46d0cb544483 Status: Started Number of Bricks: 1 Transport-type: tcp Bricks: Brick1: 10.66.5.134:/home/brick1 # mount ... /dev/mapper/VolGroup-lv_home on /home type ext4 (rw) ... # fdisk -l ... Disk /dev/mapper/VolGroup-lv_home: 437.6 GB, 437553987584 bytes 255 heads, 63 sectors/track, 53196 cylinders Units = cylinders of 16065 * 512 = 8225280 bytes Sector size (logical/physical): 512 bytes / 4096 bytes I/O size (minimum/optimal): 4096 bytes / 4096 bytes Disk identifier: 0x00000000 ... xuhan, I didn't understand if this is related to block-job-pause or not. Can you install full debug information and get a new backtrace from gdb? Thanks! Hi Paolo, You are right, it is not related to block-job-pause. I forgot to note in comment 2, sorry. And here is the backtrace. (gdb) bt #0 0x00007ffff31edc59 in __memcpy_ssse3_back () from /lib64/libc.so.6 #1 0x00007ffff5a3ad8b in memcpy (__len=65536, __src=<optimized out>, __dest=<optimized out>) at /usr/include/bits/string3.h:51 #2 iov_copy (scnt=52, src=0x7fffe4061dc0, dcnt=<optimized out>, dst=0x55555708cbe0) at ../../libglusterfs/src/common-utils.h:406 #3 glfs_preadv_async_cbk (frame=0x7fffea767a98, cookie=0x7fffe400e230, this=<optimized out>, op_ret=<optimized out>, op_errno=0, iovec=0x7fffe4061dc0, count=52, stbuf=0x7fffe9f0c340, iobref=0x7fffe4061d30, xdata=0x0) at glfs-fops.c:630 #4 0x00007fffe39dd839 in io_stats_readv_cbk (frame=0x7fffea940cc4, cookie=<optimized out>, this=<optimized out>, op_ret=6815744, op_errno=0, vector=0x7fffe4061dc0, count=52, buf=0x7fffe9f0c340, iobref=0x7fffe4061d30, xdata=0x0) at io-stats.c:1343 #5 0x00007fffe3bf06a6 in mdc_readv_cbk (frame=0x7fffea93f33c, cookie=<optimized out>, this=<optimized out>, op_ret=6815744, op_errno=<optimized out>, vector=<optimized out>, count=52, stbuf=0x7fffe9f0c340, iobref=0x7fffe4061d30, xdata=0x0) at md-cache.c:1446 #6 0x00007ffff2bc7a7e in default_readv_cbk (frame=0x7fffea94af5c, cookie=<optimized out>, this=<optimized out>, op_ret=6815744, op_errno=0, vector=0x7fffe4061dc0, count=52, stbuf=0x7fffe9f0c340, iobref=0x7fffe4061d30, xdata=0x0) at defaults.c:203 #7 0x00007fffe8032637 in ioc_frame_unwind (frame=<optimized out>) at page.c:878 #8 ioc_frame_return (frame=<optimized out>) at page.c:921 #9 0x00007fffe803282f in ioc_waitq_return (waitq=waitq@entry=0x55555690faa0) at page.c:402 #10 0x00007fffe8032ddc in ioc_fault_cbk (frame=0x7fffea772784, cookie=<optimized out>, this=<optimized out>, op_ret=<optimized out>, op_errno=<optimized out>, vector=<optimized out>, count=1, stbuf=0x7fffe405e038, iobref=0x7fffe4060240, xdata=0x0) at page.c:530 #11 0x00007fffe8240c2b in ra_frame_unwind (frame=<optimized out>) at page.c:441 #12 0x00007fffe8240d82 in ra_frame_return (frame=<optimized out>) at page.c:476 #13 0x00007fffe8240e2f in ra_waitq_return (waitq=waitq@entry=0x555557d77f30) at page.c:125 #14 0x00007fffe824122e in ra_fault_cbk (frame=0x7fffea739dbc, cookie=<optimized out>, this=0x7fffe4009e40, op_ret=131072, op_errno=0, vector=0x7fffe9f0c720, count=1, stbuf=0x7fffe9f0c820, iobref=0x7fffe405e420, xdata=0x0) at page.c:224 #15 0x00007ffff2bc7a7e in default_readv_cbk (frame=0x7fffea948054, cookie=<optimized out>, this=<optimized out>, op_ret=131072, op_errno=0, vector=0x7fffe9f0c720, count=1, stbuf=0x7fffe9f0c820, iobref=0x7fffe405e420, xdata=0x0) at defaults.c:203 #16 0x00007fffe8685954 in dht_readv_cbk (frame=0x7fffea948100, cookie=<optimized out>, this=<optimized out>, op_ret=131072, op_errno=0, vector=<optimized out>, count=1, stbuf=0x7fffe9f0c820, iobref=0x7fffe405e420, xdata=0x0) at dht-inode-read.c:418 #17 0x00007fffe88c0dc8 in client3_3_readv_cbk (req=<optimized out>, iov=<optimized out>, count=<optimized out>, myframe=0x7fffea94870c) at client-rpc-fops.c:2682 #18 0x00007ffff582c030 in rpc_clnt_handle_reply (clnt=clnt@entry=0x7fffe401ce80, pollin=pollin@entry=0x7fffe40429e0) at rpc-clnt.c:771 #19 0x00007ffff582c2a9 in rpc_clnt_notify (trans=<optimized out>, mydata=0x7fffe401ceb0, event=<optimized out>, data=0x7fffe40429e0) at rpc-clnt.c:903 #20 0x00007ffff5828823 in rpc_transport_notify (this=this@entry=0x7fffe40418e0, event=event@entry=RPC_TRANSPORT_MSG_RECEIVED, data=data@entry=0x7fffe40429e0) at rpc-transport.c:499 #21 0x00007fffe9501af1 in socket_event_poll_in (this=this@entry=0x7fffe40418e0) at socket.c:2119 #22 0x00007fffe9504314 in socket_event_handler (fd=<optimized out>, idx=1, data=data@entry=0x7fffe40418e0, poll_in=1, poll_out=0, poll_err=0) ---Type <return> to continue, or q <return> to quit--- at socket.c:2229 #23 0x00007ffff2c08f0a in event_dispatch_epoll_handler (i=<optimized out>, events=0x7fffe40008e0, event_pool=0x55555653a5e0) at event-epoll.c:384 #24 event_dispatch_epoll (event_pool=0x55555653a5e0) at event-epoll.c:445 #25 0x00007ffff5a38c24 in glfs_poller (data=<optimized out>) at glfs.c:385 #26 0x00007ffff6055de3 in start_thread () from /lib64/libpthread.so.0 #27 0x00007ffff319c16d in clone () from /lib64/libc.so.6 Thanks, xuhan BTW, only hit this issue on windows 2012 R2 guest so far, tested 4 times on rhel7 guest and could not reproduced. virtio drivers version: virtio-win-1.6.7-2.el7.noarch I think this is a glusterfs bug. Can you try the following: gluster volume set <name> performance.io-cache off gluster volume set <name> performance.read-ahead off and check if the issue still happens? BTW, is there a way to limit the maximum IO size from qemu on the block driver? There isn't yet a way to limit maximum IO size, as requests come directly from the guest. We could introduce one, but so far there was no reason for that. Why would Gluster like to have that? No real reason, we could have provided a "quick fix" by setting the max read IO size in the gluster block driver to 2MB. In any case that wouldn't be the "right" fix too. I'm anyways sending out a proper fix for this soon. With set 'performance.io-cache' and 'performance.read-ahead' off, tested 3 times and have not hit this issue. gluster> volume info gv0 Volume Name: gv0 Type: Distribute Volume ID: e04a57d0-614c-4bf7-8458-46d0cb544483 Status: Started Number of Bricks: 1 Transport-type: tcp Bricks: Brick1: 10.66.5.134:/home/brick1 Options Reconfigured: performance.read-ahead: off performance.io-cache: off This request was not resolved in time for the current release. Red Hat invites you to ask your support representative to propose this request, if still desired, for consideration in the next release of Red Hat Enterprise Linux. After evaluating this issue, there are no plans to address it further or fix it in an upcoming release. Therefore, it is being closed. If plans change such that this issue will be fixed in an upcoming release, then the bug can be reopened. |
Description of problem: qemu-kvm core dump when execute 'block-job-pause' using glusterfs:native backend. Version-Release number of selected component (if applicable): qemu-kvm-rhev-1.5.3-19.el7.x86_64 glusterfs-3.4.0.40rhs-2.el7.x86_64 How reproducible: always Steps to Reproduce: 1. boot guest using glusterfs:native backend. # /usr/libexec/qemu-kvm -M pc -cpu Penryn -enable-kvm -m 4096 -smp 4,socket=1,cores=4,threads=1 -name rhel7 -nodefaults -nodefconfig -drive file=gluster://10.66.5.134/gv0/win2012.qcow2,if=none,id=drive-virtio-disk0,format=qcow2,werror=stop,rerror=stop,aio=native -device virtio-scsi-pci,id=virtio-disk0 -device scsi-hd,bus=virtio-disk0.0,drive=drive-virtio-disk0,id=scsi-hd -global PIIX4_PM.disable_s3=0 -global PIIX4_PM.disable_s4=0 -vnc :10 -vga qxl -global qxl-vga.vram_size=67108864 -monitor stdio -boot menu=on -netdev tap,id=netdev0,vhost=on,script=/etc/ovs-ifup,downscript=/etc/ovs-ifdown -device virtio-net-pci,mac=ce:71:f6:64:8f:18,netdev=netdev0,id=net0 -global qxl-vga.revision=3 -qmp tcp:0:5555,server,nowait 2. start mirroring block via qmp. { "execute": "drive-mirror", "arguments": { "device": "drive-virtio-disk0", "target": "sn-1", "format": "qcow2", "mode": "absolute-paths", "sync": "full", "speed": 1000000000, "on-source-error": "stop", "on-target-error": "stop" }} 3. execute 'block-job-pause' via qmp. {"execute": "block-job-pause", "arguments": { "device": "drive-virtio-disk0"}} Actual results: Atfer step3, qemu-kvm core dump: Program received signal SIGSEGV, Segmentation fault. Expected results: block job should be paused and resumed with no error. Additional info: (gdb) bt #0 0x00007ffff31e5d10 in __memcpy_ssse3 () from /lib64/libc.so.6 #1 0x00007ffff5a3ad8b in glfs_preadv_async_cbk () from /lib64/libgfapi.so.0 #2 0x00007fffe39dd839 in io_stats_readv_cbk () from /usr/lib64/glusterfs/3.4.0.40rhs/xlator/debug/io-stats.so #3 0x00007fffe3bf06a6 in mdc_readv_cbk () from /usr/lib64/glusterfs/3.4.0.40rhs/xlator/performance/md-cache.so #4 0x00007ffff2bc7a7e in default_readv_cbk () from /lib64/libglusterfs.so.0 #5 0x00007fffe8032637 in ioc_frame_return () from /usr/lib64/glusterfs/3.4.0.40rhs/xlator/performance/io-cache.so #6 0x00007fffe803282f in ioc_waitq_return () from /usr/lib64/glusterfs/3.4.0.40rhs/xlator/performance/io-cache.so #7 0x00007fffe8032ddc in ioc_fault_cbk () from /usr/lib64/glusterfs/3.4.0.40rhs/xlator/performance/io-cache.so #8 0x00007fffe8240c2b in ra_frame_unwind () from /usr/lib64/glusterfs/3.4.0.40rhs/xlator/performance/read-ahead.so #9 0x00007fffe8240e2f in ra_waitq_return () from /usr/lib64/glusterfs/3.4.0.40rhs/xlator/performance/read-ahead.so #10 0x00007fffe824122e in ra_fault_cbk () from /usr/lib64/glusterfs/3.4.0.40rhs/xlator/performance/read-ahead.so #11 0x00007ffff2bc7a7e in default_readv_cbk () from /lib64/libglusterfs.so.0 #12 0x00007fffe8685954 in dht_readv_cbk () from /usr/lib64/glusterfs/3.4.0.40rhs/xlator/cluster/distribute.so #13 0x00007fffe88c0dc8 in client3_3_readv_cbk () from /usr/lib64/glusterfs/3.4.0.40rhs/xlator/protocol/client.so #14 0x00007ffff582c030 in rpc_clnt_handle_reply () from /lib64/libgfrpc.so.0 #15 0x00007ffff582c2a9 in rpc_clnt_notify () from /lib64/libgfrpc.so.0 #16 0x00007ffff5828823 in rpc_transport_notify () from /lib64/libgfrpc.so.0 #17 0x00007fffe9501af1 in socket_event_poll_in () from /usr/lib64/glusterfs/3.4.0.40rhs/rpc-transport/socket.so #18 0x00007fffe9504314 in socket_event_handler () from /usr/lib64/glusterfs/3.4.0.40rhs/rpc-transport/socket.so #19 0x00007ffff2c08f0a in event_dispatch_epoll () from /lib64/libglusterfs.so.0 #20 0x00007ffff5a38c24 in glfs_poller () from /lib64/libgfapi.so.0 #21 0x00007ffff6055de3 in start_thread () from /lib64/libpthread.so.0 #22 0x00007ffff319c16d in clone () from /lib64/libc.so.6