1678575 – qemu-io -c 'write' to a raw file based on libgfapi result in core dump

Bug 1678575 - qemu-io -c 'write' to a raw file based on libgfapi result in core dump

Summary: qemu-io -c 'write' to a raw file based on libgfapi result in core dump

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Enterprise Linux Advanced Virtualization
Classification:	Red Hat
Component:	qemu-kvm
Sub Component:
Version:	8.0
Hardware:	Unspecified
OS:	Unspecified
Priority:	high
Severity:	high
Target Milestone:	rc
Target Release:	8.0
Assignee:	Stefano Garzarella
QA Contact:	Tingting Mao
Docs Contact:
URL:
Whiteboard:
Depends On:	1691320
Blocks:	1728657
TreeView+	depends on / blocked

Reported:	2019-02-19 06:47 UTC by Tingting Mao
Modified:	2019-11-06 07:13 UTC (History)
CC List:	11 users (show)
Fixed In Version:	qemu-kvm-4.0.0-3.module+el8.1.0+3265+26c4ed71
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2019-11-06 07:12:59 UTC
Type:	Bug
Target Upstream Version:
Embargoed:
Dependent Products:

Attachments	(Terms of Use)
gdb backtrace (21.87 KB, text/plain) 2019-02-19 06:47 UTC, Tingting Mao	no flags	Details
View All

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Product Errata	RHBA-2019:3723	0	None	None	None	2019-11-06 07:13:39 UTC

Description Tingting Mao 2019-02-19 06:47:16 UTC

Created attachment 1536231 [details]
gdb backtrace

Description of problem:
Qemu-io -c 'write' to a raw file based on RBD result in core dump.


Version-Release number of selected component (if applicable):
qemu-kvm-3.1.0-15.module+el8+2792+e33e01a0
kernel-4.18.0-67.el8

Gluster server:
glusterfs-server-3.12.2-43.el7rhgs


How reproducible:
3/3


Steps to Reproduce:
# qemu-img create -f raw gluster://10.73.196.181/vol0/base.img 20G
Formatting 'gluster://10.73.196.181/vol0/base.img', fmt=raw size=21474836480
[2019-02-19 06:22:40.145943] E [MSGID: 108006] [afr-common.c:5040:__afr_handle_child_down_event] 0-vol0-replicate-0: All subvolumes are down. Going offline until atleast one of them comes back up.

# qemu-img info gluster://10.73.196.181/vol0/base.img
[2019-02-19 06:22:58.892622] E [MSGID: 108006] [afr-common.c:5040:__afr_handle_child_down_event] 0-vol0-replicate-0: All subvolumes are down. Going offline until atleast one of them comes back up.
image: gluster://10.73.196.181/vol0/base.img
file format: raw
virtual size: 20G (21474836480 bytes)
disk size: 0

# qemu-io -f raw -c 'write -P 1 0 1.5G' gluster://10.73.196.181/vol0/base.img 
[2019-02-19 06:23:48.179933] E [rpc-clnt.c:350:saved_frames_unwind] (--> /lib64/libglusterfs.so.0(_gf_log_callingfn+0x12e)[0x7efd1f6f45ae] (--> /lib64/libgfrpc.so.0(saved_frames_unwind+0x1c1)[0x7efd1f4b5151] (--> /lib64/libgfrpc.so.0(saved_frames_destroy+0x12)[0x7efd1f4b5282] (--> /lib64/libgfrpc.so.0(rpc_clnt_connection_cleanup+0x8b)[0x7efd1f4b699b] (--> /lib64/libgfrpc.so.0(rpc_clnt_notify+0x2f0)[0x7efd1f4b7510] ))))) 0-vol0-client-1: forced unwinding frame type(GlusterFS 3.3) op(WRITE(13)) called at 2019-02-19 06:23:48.178013 (xid=0x11)
[2019-02-19 06:23:48.180435] E [MSGID: 108006] [afr-common.c:5040:__afr_handle_child_down_event] 0-vol0-replicate-0: All subvolumes are down. Going offline until atleast one of them comes back up.
[2019-02-19 06:23:48.181049] E [rpc-clnt.c:350:saved_frames_unwind] (--> /lib64/libglusterfs.so.0(_gf_log_callingfn+0x12e)[0x7efd1f6f45ae] (--> /lib64/libgfrpc.so.0(saved_frames_unwind+0x1c1)[0x7efd1f4b5151] (--> /lib64/libgfrpc.so.0(saved_frames_destroy+0x12)[0x7efd1f4b5282] (--> /lib64/libgfrpc.so.0(rpc_clnt_connection_cleanup+0x8b)[0x7efd1f4b699b] (--> /lib64/libgfrpc.so.0(rpc_clnt_notify+0x2f0)[0x7efd1f4b7510] ))))) 0-vol0-client-0: forced unwinding frame type(GlusterFS 3.3) op(WRITE(13)) called at 2019-02-19 06:23:48.177942 (xid=0x12)
[2019-02-19 06:23:48.181120] E [MSGID: 114031] [client-rpc-fops.c:1562:client3_3_finodelk_cbk] 0-vol0-client-0: remote operation failed [Transport endpoint is not connected]
[2019-02-19 06:23:48.181158] E [MSGID: 114031] [client-rpc-fops.c:1562:client3_3_finodelk_cbk] 0-vol0-client-1: remote operation failed [Transport endpoint is not connected]
[2019-02-19 06:23:48.188170] E [inode.c:485:__inode_unref] (-->/lib64/libglusterfs.so.0(fd_unref+0x197) [0x7efd1f716d77] -->/lib64/libglusterfs.so.0(inode_unref+0x25) [0x7efd1f702845] -->/lib64/libglusterfs.so.0(+0x3e12d) [0x7efd1f70212d] ) 0-: Assertion failed: inode->ref
Segmentation fault (core dumped)


Actual results:
As above.


Expected results:
No core dump, and write data to the file based on libgfapi successfully.


Additional info:

Comment 1 Tingting Mao 2019-02-19 06:54:01 UTC

For the info of 'E [MSGID: 108006] [afr-common.c:5040:__afr_handle_child_down_event] 0-vol0-replicate-0: All subvolumes are down. Going offline until atleast one of them comes back up.', it should be because of bz #1609220.

Comment 4 Stefano Garzarella 2019-03-21 12:15:42 UTC

This bug is related to an issue in GlusterFS server (more details in BZ1691320) when a write operation is >= 1GB.

As a workaround until it is not fixed and also to guarantee compatibility with old versions, I could limit the max transfer size in the gluster driver in QEMU (eg. 512MB or 1023MB).

Comment 5 Stefano Garzarella 2019-04-09 14:54:55 UTC

Patch is merged upstream and will be released with QEMU v4.0:

commit de23e72bb7515888fdea2a58c58a2e02370123bd
Author: Stefano Garzarella <sgarzare>
Date:   Thu Mar 28 11:52:27 2019 +0100

    block/gluster: limit the transfer size to 512 MiB
    
    Several versions of GlusterFS (3.12? -> 6.0.1) fail when the
    transfer size is greater or equal to 1024 MiB, so we are
    limiting the transfer size to 512 MiB to avoid this rare issue.
    
    Buglink: https://bugzilla.redhat.com/show_bug.cgi?id=1691320
    Signed-off-by: Stefano Garzarella <sgarzare>
    Reviewed-by: Niels de Vos <ndevos>
    Signed-off-by: Kevin Wolf <kwolf>

Comment 7 Tingting Mao 2019-06-05 07:44:33 UTC

Verified this bug as below:


Tested with:
kernel-4.18.0-95.el8
qemu-kvm-4.0.0-3.module+el8.1.0+3265+26c4ed71

Gluster server:
glusterfs-server-3.12.2-43.el7rhgs.x86_64
# gluster volume info vol
 
Volume Name: vol
Type: Distribute
Volume ID: 6b299573-ffed-49c6-8b0f-68602c389085
Status: Started
Snapshot Count: 0
Number of Bricks: 1
Transport-type: tcp
Bricks:
Brick1: gluster-virt-qe-01.lab.eng.pek2.redhat.com:/data/brick1/gv2
Options Reconfigured:
transport.address-family: inet
nfs.disable: on


Steps:
# qemu-img create -f raw gluster://gluster-virt-qe-01.lab.eng.pek2.redhat.com/vol/base.img 20G

# qemu-img info gluster://gluster-virt-qe-01.lab.eng.pek2.redhat.com/vol/base.img
image: gluster://gluster-virt-qe-01.lab.eng.pek2.redhat.com/vol/base.img
file format: raw
virtual size: 20G (21474836480 bytes)
disk size: 0

# qemu-io -f raw -c 'write -P 1 0 1.5G' gluster://gluster-virt-qe-01.lab.eng.pek2.redhat.com/vol/base.img
wrote 1610612736/1610612736 bytes at offset 0
1.500 GiB, 1 ops; 0:00:33.97 (45.212 MiB/sec and 0.0294 ops/sec)

# qemu-img info gluster://gluster-virt-qe-01.lab.eng.pek2.redhat.com/vol/base.img
image: gluster://gluster-virt-qe-01.lab.eng.pek2.redhat.com/vol/base.img
file format: raw
virtual size: 20G (21474836480 bytes)
disk size: 1.5G


Additional info:
Also tested data size 0.2G, 0.5G, 0.9G, 1G and 1.9G, all work normally.

Comment 14 errata-xmlrpc 2019-11-06 07:12:59 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2019:3723

Note You need to log in before you can comment on or make changes to this bug.