Bug 1451191

Summary: qemu-img: block/gluster.c:1307: find_allocation: Assertion `offs >= start' failed.
Product: Red Hat Enterprise Linux 7 Reporter: Suqin Huang <shuang>
Component: qemu-kvm-rhevAssignee: Jeff Cody <jcody>
Status: CLOSED ERRATA QA Contact: Suqin Huang <shuang>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 7.4CC: aliang, chayang, coli, juzhang, knoel, michen, mrezanin, qzhang, shuang, virt-maint
Target Milestone: rc   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: qemu-kvm-rhev-2.9.0-8.el7 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2017-08-02 04:38:29 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Suqin Huang 2017-05-16 05:46:01 UTC
Description of problem:
qemu-img: block/gluster.c:1307: find_allocation: Assertion `offs >= start' failed. when Convert the image on gluster server.


Version-Release number of selected component (if applicable):
qemu-kvm-rhev-2.9.0-3.el7.x86_64

How reproducible:
100% 

Steps to Reproduce:
1.
qemu-img convert -f qcow2 -O raw gluster://bootp-73-199-200.lab.eng.pek2.redhat.com:0/gv0/rhel74-64-virtio.qcow2 convert.img


2.
3.

Actual results:
qemu-img: block/gluster.c:1307: find_allocation: Assertion `offs >= start' failed and core dump

Expected results:


Additional info:
1. gluster log:
[2017-05-16 05:23:39.056896] I [MSGID: 104045] [glfs-master.c:91:notify] 0-gfapi: New graph 696e7465-6c2d-6533-3132-32352d382d32 (0) coming up
[2017-05-16 05:23:39.056943] I [MSGID: 114020] [client.c:2356:notify] 0-gv0-client-0: parent translators are ready, attempting connect on transport
[2017-05-16 05:23:39.060411] I [MSGID: 114020] [client.c:2356:notify] 0-gv0-client-1: parent translators are ready, attempting connect on transport
[2017-05-16 05:23:39.062864] I [rpc-clnt.c:1999:rpc_clnt_reconfig] 0-gv0-client-0: changing port to 49152 (from 0)
[2017-05-16 05:23:39.065748] I [rpc-clnt.c:1999:rpc_clnt_reconfig] 0-gv0-client-1: changing port to 49152 (from 0)
[2017-05-16 05:23:39.067621] I [MSGID: 114057] [client-handshake.c:1439:select_server_supported_programs] 0-gv0-client-0: Using Program GlusterFS 3.3, Num (1298437), Version (330)
[2017-05-16 05:23:39.068830] I [MSGID: 114046] [client-handshake.c:1215:client_setvolume_cbk] 0-gv0-client-0: Connected to gv0-client-0, attached to remote volume '/data/brick1/gv0'.
[2017-05-16 05:23:39.068847] I [MSGID: 114047] [client-handshake.c:1226:client_setvolume_cbk] 0-gv0-client-0: Server and Client lk-version numbers are not same, reopening the fds
[2017-05-16 05:23:39.068894] I [MSGID: 108005] [afr-common.c:4659:afr_notify] 0-gv0-replicate-0: Subvolume 'gv0-client-0' came back up; going online.
[2017-05-16 05:23:39.069815] I [MSGID: 114035] [client-handshake.c:201:client_set_lk_version_cbk] 0-gv0-client-0: Server lk version = 1
[2017-05-16 05:23:39.070667] I [MSGID: 114057] [client-handshake.c:1439:select_server_supported_programs] 0-gv0-client-1: Using Program GlusterFS 3.3, Num (1298437), Version (330)
[2017-05-16 05:23:39.071812] I [MSGID: 114046] [client-handshake.c:1215:client_setvolume_cbk] 0-gv0-client-1: Connected to gv0-client-1, attached to remote volume '/data/brick1/gv0'.
[2017-05-16 05:23:39.071828] I [MSGID: 114047] [client-handshake.c:1226:client_setvolume_cbk] 0-gv0-client-1: Server and Client lk-version numbers are not same, reopening the fds
[2017-05-16 05:23:39.082302] I [MSGID: 114035] [client-handshake.c:201:client_set_lk_version_cbk] 0-gv0-client-1: Server lk version = 1
[2017-05-16 05:23:39.085735] I [MSGID: 104041] [glfs-resolve.c:885:__glfs_active_subvol] 0-gv0: switched to graph 696e7465-6c2d-6533-3132-32352d382d32 (0)
[2017-05-16 05:23:39.089087] W [MSGID: 114031] [client-rpc-fops.c:2211:client3_3_seek_cbk] 0-gv0-client-0: remote operation failed [No such device or address]

2. qemu-img: block/gluster.c:1307: find_allocation: Assertion `offs >= start' failed.


3. call trace

[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
Core was generated by `qemu-img convert -f qcow2 -O raw gluster://bootp-73-199-200.lab.eng.pek2.redhat'.
Program terminated with signal 6, Aborted.
#0  0x00007f2f7128d1d7 in __GI_raise (sig=sig@entry=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:56
56	  return INLINE_SYSCALL (tgkill, 3, pid, selftid, sig);
(gdb) bt
#0  0x00007f2f7128d1d7 in __GI_raise (sig=sig@entry=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:56
#1  0x00007f2f7128e8c8 in __GI_abort () at abort.c:90
#2  0x00007f2f71286146 in __assert_fail_base (fmt=0x7f2f713d73a8 "%s%s%s:%u: %s%sAssertion `%s' failed.\n%n", assertion=assertion@entry=0x7f2f87d7601b "offs >= start", 
    file=file@entry=0x7f2f87d7b658 "block/gluster.c", line=line@entry=1307, function=function@entry=0x7f2f87d7bdd0 <__PRETTY_FUNCTION__.24790> "find_allocation") at assert.c:92
#3  0x00007f2f712861f2 in __GI___assert_fail (assertion=assertion@entry=0x7f2f87d7601b "offs >= start", file=file@entry=0x7f2f87d7b658 "block/gluster.c", line=line@entry=1307, 
    function=function@entry=0x7f2f87d7bdd0 <__PRETTY_FUNCTION__.24790> "find_allocation") at assert.c:101
#4  0x00007f2f87cfdf42 in find_allocation (bs=0x7f2f88d31400, hole=<synthetic pointer>, data=<synthetic pointer>, start=2358575104) at block/gluster.c:1307
#5  qemu_gluster_co_get_block_status (bs=0x7f2f88d31400, sector_num=<optimized out>, nb_sectors=1408, pnum=0x7f2f5d0dbd6c, file=0x7f2f5d0dbd70) at block/gluster.c:1368
#6  0x00007f2f87ce5c96 in bdrv_co_get_block_status (bs=0x7f2f88d31400, sector_num=sector_num@entry=4606592, nb_sectors=1408, pnum=pnum@entry=0x7f2f5d0dbd6c, 
    file=file@entry=0x7f2f5d0dbd70) at block/io.c:1786
#7  0x00007f2f87ce5df1 in bdrv_co_get_block_status (bs=bs@entry=0x7f2f88d2e000, sector_num=sector_num@entry=7733248, nb_sectors=<optimized out>, nb_sectors@entry=4194303, 
    pnum=pnum@entry=0x7f2f5d0dbe8c, file=file@entry=0x7f2f5d0dbe90) at block/io.c:1820
#8  0x00007f2f87ce5e8b in bdrv_co_get_block_status_above (file=0x7f2f5d0dbe90, pnum=0x7f2f5d0dbe8c, nb_sectors=4194303, sector_num=7733248, base=0x0, bs=<optimized out>)
    at block/io.c:1856
#9  bdrv_get_block_status_above_co_entry (opaque=opaque@entry=0x7f2f5d0dbe00) at block/io.c:1872
#10 0x00007f2f87ce5b90 in bdrv_get_block_status_above (bs=0x7f2f88d2e000, base=<optimized out>, sector_num=<optimized out>, nb_sectors=<optimized out>, pnum=<optimized out>, 
    file=<optimized out>) at block/io.c:1904
#11 0x00007f2f87c9b8e2 in convert_iteration_sectors (s=0x7ffe4be4f730, sector_num=7733248) at qemu-img.c:1557
#12 0x00007f2f87c9cadc in convert_co_do_copy (opaque=0x7ffe4be4f730) at qemu-img.c:1766
#13 0x00007f2f87d63aba in coroutine_trampoline (i0=<optimized out>, i1=<optimized out>) at util/coroutine-ucontext.c:79
#14 0x00007f2f7129ecf0 in ?? () from /usr/lib64/libc-2.17.so
#15 0x00007ffe4be4ee80 in ?? ()
#16 0x0000000000000000 in ?? ()

Comment 2 Jeff Cody 2017-05-16 16:57:14 UTC
Here is the code that is asserting:

    offs = glfs_lseek(s->fd, start, SEEK_HOLE);
    if (offs < 0) {
        return -errno;          /* D1 and (H3 or H4) */
    }
    assert(offs >= start);


I think this means that glfs_lseek() is returning bad data.  SEEK_HOLE means "Adjust the file offset to the next hole in the file greater than or equal to offset", i.e. for this call it means adjust the file offset to the next hole >= start.

So the call to glfs_lseek() should either return error (offs < 0), or offs >= start.  I can't think of any valid way glfs_lseek() could return offs < start && offs >= 0.

That said, if versions of gluster already released are returning invalid glfs_lseek() results, we might need a more graceful way to handle it in QEMU.

Comment 3 Jeff Cody 2017-05-16 17:06:15 UTC
What version of libgfapi is installed on the host, and what version of gluster is the gluster server running?

Comment 4 Jeff Cody 2017-05-16 19:04:49 UTC
Also, could you make rhel74-64-virtio.qcow2 available?  I have been unable to reproduce this with either gluster 3.8.11 or 3.11.

Comment 6 Jeff Cody 2017-05-22 21:12:14 UTC
I am able to reproduce this locally now, with source images > ~6GB.  What is being returned is not offs == 0, but (offs > 0 && offs < start), which seems to be an invalid lseek return.

This is a duplicate of BZ #1425293 (see comment #13 on that BZ), however I am leaving this BZ open for QEMU, because we will need some sort of workaround for gluster versions already released.

Comment 7 Miroslav Rezanina 2017-06-06 08:54:36 UTC
Fix included in qemu-kvm-rhev-2.9.0-8.el7

Comment 9 Suqin Huang 2017-06-07 01:23:21 UTC
Result: PASS
convert the image successfully and boot up image successfully

Package: qemu-kvm-rhev-2.9.0-8.el7.x86_64

Steps: 
1. Convert image on glusterfs server
qemu-img convert -f qcow2 -O raw gluster://bootp-73-199-200.lab.eng.pek2.redhat.com:0/gv0/rhel74-64-virtio.qcow2 convert.img

2. boot up convert.img image

Comment 10 Suqin Huang 2017-06-07 01:24:02 UTC
According to comment9, update bug status to VERIFIED

Comment 12 errata-xmlrpc 2017-08-02 04:38:29 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2017:2392