RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.
Bug 1032418 - qemu-kvm core dump when mirroring block using glusterfs:native backend
Summary: qemu-kvm core dump when mirroring block using glusterfs:native backend
Keywords:
Status: CLOSED WONTFIX
Alias: None
Product: Red Hat Enterprise Linux 7
Classification: Red Hat
Component: glusterfs
Version: 7.0
Hardware: x86_64
OS: Linux
high
high
Target Milestone: rc
: ---
Assignee: Bug Updates Notification Mailing List
QA Contact: storage-qa-internal@redhat.com
URL:
Whiteboard:
Depends On:
Blocks: 1034398
TreeView+ depends on / blocked
 
Reported: 2013-11-20 07:08 UTC by Xu Han
Modified: 2020-12-15 07:28 UTC (History)
6 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
: 1034398 (view as bug list)
Environment:
Last Closed: 2020-12-15 07:28:33 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)

Description Xu Han 2013-11-20 07:08:40 UTC
Description of problem:
qemu-kvm core dump when execute 'block-job-pause' using glusterfs:native backend.

Version-Release number of selected component (if applicable):
qemu-kvm-rhev-1.5.3-19.el7.x86_64
glusterfs-3.4.0.40rhs-2.el7.x86_64

How reproducible:
always

Steps to Reproduce:
1. boot guest using glusterfs:native backend.
# /usr/libexec/qemu-kvm -M pc -cpu Penryn -enable-kvm -m 4096 -smp 4,socket=1,cores=4,threads=1 -name rhel7 -nodefaults -nodefconfig -drive file=gluster://10.66.5.134/gv0/win2012.qcow2,if=none,id=drive-virtio-disk0,format=qcow2,werror=stop,rerror=stop,aio=native -device virtio-scsi-pci,id=virtio-disk0 -device scsi-hd,bus=virtio-disk0.0,drive=drive-virtio-disk0,id=scsi-hd -global PIIX4_PM.disable_s3=0 -global PIIX4_PM.disable_s4=0 -vnc :10 -vga qxl -global qxl-vga.vram_size=67108864 -monitor stdio -boot menu=on -netdev tap,id=netdev0,vhost=on,script=/etc/ovs-ifup,downscript=/etc/ovs-ifdown -device virtio-net-pci,mac=ce:71:f6:64:8f:18,netdev=netdev0,id=net0 -global qxl-vga.revision=3 -qmp tcp:0:5555,server,nowait

2. start mirroring block via qmp.
{ "execute": "drive-mirror", "arguments": { "device": "drive-virtio-disk0", "target": "sn-1", "format": "qcow2", "mode": "absolute-paths", "sync": "full", "speed": 1000000000, "on-source-error": "stop", "on-target-error": "stop" }}

3. execute 'block-job-pause' via qmp.
{"execute": "block-job-pause", "arguments": { "device": "drive-virtio-disk0"}}


Actual results:
Atfer step3, qemu-kvm core dump:
Program received signal SIGSEGV, Segmentation fault.

Expected results:
block job should be paused and resumed with no error.

Additional info:
(gdb) bt
#0  0x00007ffff31e5d10 in __memcpy_ssse3 () from /lib64/libc.so.6
#1  0x00007ffff5a3ad8b in glfs_preadv_async_cbk () from /lib64/libgfapi.so.0
#2  0x00007fffe39dd839 in io_stats_readv_cbk () from /usr/lib64/glusterfs/3.4.0.40rhs/xlator/debug/io-stats.so
#3  0x00007fffe3bf06a6 in mdc_readv_cbk () from /usr/lib64/glusterfs/3.4.0.40rhs/xlator/performance/md-cache.so
#4  0x00007ffff2bc7a7e in default_readv_cbk () from /lib64/libglusterfs.so.0
#5  0x00007fffe8032637 in ioc_frame_return () from /usr/lib64/glusterfs/3.4.0.40rhs/xlator/performance/io-cache.so
#6  0x00007fffe803282f in ioc_waitq_return () from /usr/lib64/glusterfs/3.4.0.40rhs/xlator/performance/io-cache.so
#7  0x00007fffe8032ddc in ioc_fault_cbk () from /usr/lib64/glusterfs/3.4.0.40rhs/xlator/performance/io-cache.so
#8  0x00007fffe8240c2b in ra_frame_unwind () from /usr/lib64/glusterfs/3.4.0.40rhs/xlator/performance/read-ahead.so
#9  0x00007fffe8240e2f in ra_waitq_return () from /usr/lib64/glusterfs/3.4.0.40rhs/xlator/performance/read-ahead.so
#10 0x00007fffe824122e in ra_fault_cbk () from /usr/lib64/glusterfs/3.4.0.40rhs/xlator/performance/read-ahead.so
#11 0x00007ffff2bc7a7e in default_readv_cbk () from /lib64/libglusterfs.so.0
#12 0x00007fffe8685954 in dht_readv_cbk () from /usr/lib64/glusterfs/3.4.0.40rhs/xlator/cluster/distribute.so
#13 0x00007fffe88c0dc8 in client3_3_readv_cbk () from /usr/lib64/glusterfs/3.4.0.40rhs/xlator/protocol/client.so
#14 0x00007ffff582c030 in rpc_clnt_handle_reply () from /lib64/libgfrpc.so.0
#15 0x00007ffff582c2a9 in rpc_clnt_notify () from /lib64/libgfrpc.so.0
#16 0x00007ffff5828823 in rpc_transport_notify () from /lib64/libgfrpc.so.0
#17 0x00007fffe9501af1 in socket_event_poll_in () from /usr/lib64/glusterfs/3.4.0.40rhs/rpc-transport/socket.so
#18 0x00007fffe9504314 in socket_event_handler () from /usr/lib64/glusterfs/3.4.0.40rhs/rpc-transport/socket.so
#19 0x00007ffff2c08f0a in event_dispatch_epoll () from /lib64/libglusterfs.so.0
#20 0x00007ffff5a38c24 in glfs_poller () from /lib64/libgfapi.so.0
#21 0x00007ffff6055de3 in start_thread () from /lib64/libpthread.so.0
#22 0x00007ffff319c16d in clone () from /lib64/libc.so.6

Comment 2 Xu Han 2013-11-20 08:58:22 UTC
Tested few times and found once mirroring block used glusterfs:native backend then qemu-kvm may core dump. So modified summary.

Comment 3 Xu Han 2013-11-22 07:16:43 UTC
Glusterfs server information:
Volume Name: gv0
Type: Distribute
Volume ID: e04a57d0-614c-4bf7-8458-46d0cb544483
Status: Started
Number of Bricks: 1
Transport-type: tcp
Bricks:
Brick1: 10.66.5.134:/home/brick1

# mount
...
/dev/mapper/VolGroup-lv_home on /home type ext4 (rw)
...

# fdisk -l
...
Disk /dev/mapper/VolGroup-lv_home: 437.6 GB, 437553987584 bytes
255 heads, 63 sectors/track, 53196 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes
Sector size (logical/physical): 512 bytes / 4096 bytes
I/O size (minimum/optimal): 4096 bytes / 4096 bytes
Disk identifier: 0x00000000
...

Comment 4 Paolo Bonzini 2013-11-22 14:45:30 UTC
xuhan, I didn't understand if this is related to block-job-pause or not.

Can you install full debug information and get a new backtrace from gdb?

Thanks!

Comment 5 Xu Han 2013-11-25 02:59:02 UTC
Hi Paolo,

You are right, it is not related to block-job-pause. I forgot to note in comment 2, sorry.

And here is the backtrace.
(gdb) bt
#0  0x00007ffff31edc59 in __memcpy_ssse3_back () from /lib64/libc.so.6
#1  0x00007ffff5a3ad8b in memcpy (__len=65536, __src=<optimized out>, __dest=<optimized out>) at /usr/include/bits/string3.h:51
#2  iov_copy (scnt=52, src=0x7fffe4061dc0, dcnt=<optimized out>, dst=0x55555708cbe0) at ../../libglusterfs/src/common-utils.h:406
#3  glfs_preadv_async_cbk (frame=0x7fffea767a98, cookie=0x7fffe400e230, this=<optimized out>, op_ret=<optimized out>, op_errno=0, 
    iovec=0x7fffe4061dc0, count=52, stbuf=0x7fffe9f0c340, iobref=0x7fffe4061d30, xdata=0x0) at glfs-fops.c:630
#4  0x00007fffe39dd839 in io_stats_readv_cbk (frame=0x7fffea940cc4, cookie=<optimized out>, this=<optimized out>, op_ret=6815744, op_errno=0, 
    vector=0x7fffe4061dc0, count=52, buf=0x7fffe9f0c340, iobref=0x7fffe4061d30, xdata=0x0) at io-stats.c:1343
#5  0x00007fffe3bf06a6 in mdc_readv_cbk (frame=0x7fffea93f33c, cookie=<optimized out>, this=<optimized out>, op_ret=6815744, 
    op_errno=<optimized out>, vector=<optimized out>, count=52, stbuf=0x7fffe9f0c340, iobref=0x7fffe4061d30, xdata=0x0) at md-cache.c:1446
#6  0x00007ffff2bc7a7e in default_readv_cbk (frame=0x7fffea94af5c, cookie=<optimized out>, this=<optimized out>, op_ret=6815744, op_errno=0, 
    vector=0x7fffe4061dc0, count=52, stbuf=0x7fffe9f0c340, iobref=0x7fffe4061d30, xdata=0x0) at defaults.c:203
#7  0x00007fffe8032637 in ioc_frame_unwind (frame=<optimized out>) at page.c:878
#8  ioc_frame_return (frame=<optimized out>) at page.c:921
#9  0x00007fffe803282f in ioc_waitq_return (waitq=waitq@entry=0x55555690faa0) at page.c:402
#10 0x00007fffe8032ddc in ioc_fault_cbk (frame=0x7fffea772784, cookie=<optimized out>, this=<optimized out>, op_ret=<optimized out>, 
    op_errno=<optimized out>, vector=<optimized out>, count=1, stbuf=0x7fffe405e038, iobref=0x7fffe4060240, xdata=0x0) at page.c:530
#11 0x00007fffe8240c2b in ra_frame_unwind (frame=<optimized out>) at page.c:441
#12 0x00007fffe8240d82 in ra_frame_return (frame=<optimized out>) at page.c:476
#13 0x00007fffe8240e2f in ra_waitq_return (waitq=waitq@entry=0x555557d77f30) at page.c:125
#14 0x00007fffe824122e in ra_fault_cbk (frame=0x7fffea739dbc, cookie=<optimized out>, this=0x7fffe4009e40, op_ret=131072, op_errno=0, 
    vector=0x7fffe9f0c720, count=1, stbuf=0x7fffe9f0c820, iobref=0x7fffe405e420, xdata=0x0) at page.c:224
#15 0x00007ffff2bc7a7e in default_readv_cbk (frame=0x7fffea948054, cookie=<optimized out>, this=<optimized out>, op_ret=131072, op_errno=0, 
    vector=0x7fffe9f0c720, count=1, stbuf=0x7fffe9f0c820, iobref=0x7fffe405e420, xdata=0x0) at defaults.c:203
#16 0x00007fffe8685954 in dht_readv_cbk (frame=0x7fffea948100, cookie=<optimized out>, this=<optimized out>, op_ret=131072, op_errno=0, 
    vector=<optimized out>, count=1, stbuf=0x7fffe9f0c820, iobref=0x7fffe405e420, xdata=0x0) at dht-inode-read.c:418
#17 0x00007fffe88c0dc8 in client3_3_readv_cbk (req=<optimized out>, iov=<optimized out>, count=<optimized out>, myframe=0x7fffea94870c)
    at client-rpc-fops.c:2682
#18 0x00007ffff582c030 in rpc_clnt_handle_reply (clnt=clnt@entry=0x7fffe401ce80, pollin=pollin@entry=0x7fffe40429e0) at rpc-clnt.c:771
#19 0x00007ffff582c2a9 in rpc_clnt_notify (trans=<optimized out>, mydata=0x7fffe401ceb0, event=<optimized out>, data=0x7fffe40429e0)
    at rpc-clnt.c:903
#20 0x00007ffff5828823 in rpc_transport_notify (this=this@entry=0x7fffe40418e0, event=event@entry=RPC_TRANSPORT_MSG_RECEIVED, 
    data=data@entry=0x7fffe40429e0) at rpc-transport.c:499
#21 0x00007fffe9501af1 in socket_event_poll_in (this=this@entry=0x7fffe40418e0) at socket.c:2119
#22 0x00007fffe9504314 in socket_event_handler (fd=<optimized out>, idx=1, data=data@entry=0x7fffe40418e0, poll_in=1, poll_out=0, poll_err=0)
---Type <return> to continue, or q <return> to quit---
    at socket.c:2229
#23 0x00007ffff2c08f0a in event_dispatch_epoll_handler (i=<optimized out>, events=0x7fffe40008e0, event_pool=0x55555653a5e0) at event-epoll.c:384
#24 event_dispatch_epoll (event_pool=0x55555653a5e0) at event-epoll.c:445
#25 0x00007ffff5a38c24 in glfs_poller (data=<optimized out>) at glfs.c:385
#26 0x00007ffff6055de3 in start_thread () from /lib64/libpthread.so.0
#27 0x00007ffff319c16d in clone () from /lib64/libc.so.6

Thanks,
xuhan

Comment 6 Xu Han 2013-11-25 05:14:08 UTC
BTW, only hit this issue on windows 2012 R2 guest so far, tested 4 times on rhel7 guest and could not reproduced.

virtio drivers version:
virtio-win-1.6.7-2.el7.noarch

Comment 7 Anand Avati 2013-11-25 05:25:33 UTC
I think this is a glusterfs bug. Can you try the following:

gluster volume set <name> performance.io-cache off
gluster volume set <name> performance.read-ahead off

and check if the issue still happens?

BTW, is there a way to limit the maximum IO size from qemu on the block driver?

Comment 8 Paolo Bonzini 2013-11-25 09:58:57 UTC
There isn't yet a way to limit maximum IO size, as requests come directly from the guest.  We could introduce one, but so far there was no reason for that.  Why would Gluster like to have that?

Comment 9 Anand Avati 2013-11-25 18:18:15 UTC
No real reason, we could have provided a "quick fix" by setting the max read IO size in the gluster block driver to 2MB. In any case that wouldn't be the "right" fix too. I'm anyways sending out a proper fix for this soon.

Comment 10 Xu Han 2013-11-26 07:49:28 UTC
With set 'performance.io-cache' and 'performance.read-ahead' off, tested 3 times and have not hit this issue.

gluster> volume info gv0
Volume Name: gv0
Type: Distribute
Volume ID: e04a57d0-614c-4bf7-8458-46d0cb544483
Status: Started
Number of Bricks: 1
Transport-type: tcp
Bricks:
Brick1: 10.66.5.134:/home/brick1
Options Reconfigured:
performance.read-ahead: off
performance.io-cache: off

Comment 11 RHEL Program Management 2014-03-22 06:30:17 UTC
This request was not resolved in time for the current release.
Red Hat invites you to ask your support representative to
propose this request, if still desired, for consideration in
the next release of Red Hat Enterprise Linux.

Comment 14 RHEL Program Management 2020-12-15 07:28:33 UTC
After evaluating this issue, there are no plans to address it further or fix it in an upcoming release.  Therefore, it is being closed.  If plans change such that this issue will be fixed in an upcoming release, then the bug can be reopened.


Note You need to log in before you can comment on or make changes to this bug.