Bug 1277922 - Aborted (core dumped) for "qemu-kvm: block.c:3332: bdrv_aligned_pwritev: Assertion `!qiov || bytes == qiov->size' failed." after 'lvextend' the guest primary image and 'cont' the installing guest for the 2nd time [NEEDINFO]
Aborted (core dumped) for "qemu-kvm: block.c:3332: bdrv_aligned_pwritev: Asse...
Status: CLOSED ERRATA
Product: Red Hat Enterprise Linux 7
Classification: Red Hat
Component: qemu-kvm-rhev (Show other bugs)
7.2
All Unspecified
high Severity urgent
: rc
: ---
Assigned To: Fam Zheng
FuXiangChun
: Regression, ZStream
: 1253200 1281520 (view as bug list)
Depends On:
Blocks: RHV4.1PPC 1281520 1283987 1288337
  Show dependency treegraph
 
Reported: 2015-11-04 06:08 EST by Gu Nini
Modified: 2016-11-07 15:51 EST (History)
27 users (show)

See Also:
Fixed In Version: qemu-kvm-rhev-2.5.0-1.el7
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
: 1283987 (view as bug list)
Environment:
Last Closed: 2016-11-07 15:51:58 EST
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---
knoel: needinfo? (snagar)


Attachments (Terms of Use)
Debug_info_for_the_core_file-11032015 (3.67 KB, text/plain)
2015-11-04 06:08 EST, Gu Nini
no flags Details
Reproducer script (1.55 KB, application/x-shellscript)
2015-11-11 18:03 EST, David Gibson
no flags Details


External Trackers
Tracker ID Priority Status Summary Last Updated
Red Hat Knowledge Base (Solution) 2158101 None None None 2016-02-10 23:59 EST

  None (edit)
Description Gu Nini 2015-11-04 06:08:12 EST
Created attachment 1089558 [details]
Debug_info_for_the_core_file-11032015

Description of problem:
If install kvm guest on a lvm, after 'lvextend' the guest primary image and 'cont' the installing guest for the 2nd time, the guest 'Aborted (core dumped)' for "qemu-kvm: block.c:3332: bdrv_aligned_pwritev: Assertion `!qiov || bytes == qiov->size' failed."

Version-Release number of selected component (if applicable):
Host kernel: 3.10.0-327.el7.ppc64le
Guest kernel: 3.10.0-327.el7.ppc64
Qemu-kvm-rhev: qemu-kvm-rhev-2.3.0-31.el7.ppc64le
SLOF: SLOF-20150313-5.gitc89b0df.el7.noarch

How reproducible:
6/10

Steps to Reproduce:
1. Create pv/vg and a 1GB lv on host:
# pvcreate /dev/sdc
# vgcreate vg /dev/sdc
# lvcreate -n lv1 -L 1G vg

2. Create qemu image on the lv with a large virtual size on the host:
# qemu-img create -f qcow2 /dev/vg/lv1 80G

3. Boot up and install a guest on the lv:

/usr/libexec/qemu-kvm -name lvtest -machine pseries,accel=kvm,usb=off -m 8192 -realtime mlock=off -smp 8,sockets=1,cores=4,threads=2 -uuid 95346a10-1828-403a-a610-ac5a52a29411 -no-user-config -nodefaults -monitor stdio -rtc base=utc,clock=host -no-shutdown -boot strict=on -device nec-usb-xhci,id=xhci -device spapr-vscsi,id=scsi0,reg=0x1000 -drive id=drive_image1,if=none,cache=none,snapshot=off,aio=native,format=qcow2,file=/dev/vg/lv1 -device virtio-blk-pci,id=image1,drive=drive_image1,bus=pci.0,addr=02,bootindex=1 -drive file=/home/RHEL-7.2-20151030.0-Server-ppc64-dvd1.iso,if=none,id=drive-scsi0-0-0-0,readonly=on,format=raw,cache=none -device scsi-cd,bus=scsi0.0,channel=0,scsi-id=0,lun=0,drive=drive-scsi0-0-0-0,id=scsi0-0-0-0,bootindex=2 -netdev tap,id=hostnet0 -device virtio-net-pci,netdev=hostnet0,id=net0,mac=52:54:00:c4:e7:11,bus=pci.0,addr=0x9 -chardev pty,id=charserial0 -device spapr-vty,chardev=charserial0,reg=0x30000000 -device usb-kbd,id=input0 -device usb-mouse,id=input1 -device usb-tablet,id=input2 -vnc 0:1 -device VGA,id=video0,bus=pci.0,addr=0x1 -msg timestamp=on
char device redirected to /dev/pts/3 (label charserial0)
QEMU 2.3.0 monitor - type 'help' for more information
(qemu) info status
VM status: running
(qemu) info status
VM status: paused (io-error)
(qemu) cont

4. Check guest running status in hmp with cmd 'info status' as that in step3, when it's in 'paused (io-error)' status, extend the lv on host with following cmd and then 'cont' the guest in hmp:
# lvextend -L +512M /dev/vg/lv1

5. Repeat step4 for another time to let the guest installation finish with enough disk space


Actual results:
In step5, the guest 'Aborted (core dumped)' just after issue cmd 'cont':

(qemu) cont
(qemu) info stqemu-kvm: util/iov.c:319: qemu_iovec_concat_iov: Assertion `soffset == 0' failed.
Aborted (core dumped)


Expected results:
No core dump, the guest installation could finish.

Additional info:
The verified bz 1247042 on x86_64 is with the same type 'Aborted (core dumped)'.
There is following test case in polarion:
https://polarion.engineering.redhat.com/polarion/#/project/RedHatEnterpriseLinux7/workitem?id=RHEL7-6929
Comment 2 Gu Nini 2015-11-04 06:24:47 EST
Add another 'Additional info':
Sometimes, after step5, the bug did not occur, but the guest installation could not finish after a long time, once even 18+ hours, so I just quit the guest.
Comment 3 David Gibson 2015-11-04 19:38:50 EST
Way too late for 7.2, moving to 7.2.z.
Comment 4 David Gibson 2015-11-04 19:46:59 EST
A couple of questions to narrow down where the problem could be.

  1) Are you able to reproduce this problem with an z86 host and guest?

  2) Are you able to reproduce this problem if you use virtio-blk or virtio-scsi instead of spapr-vscsi?

This one is pretty odd, looking at the code, I really can't see how that assertion could fail.
Comment 5 Gu Nini 2015-11-06 07:28:28 EST
(In reply to Gu Nini from comment #0)
> 
> Actual results:
> In step5, the guest 'Aborted (core dumped)' just after issue cmd 'cont':
> 
> (qemu) cont
> (qemu) info stqemu-kvm: util/iov.c:319: qemu_iovec_concat_iov: Assertion
> `soffset == 0' failed.
> Aborted (core dumped)
> 

Please note the core dump is in following type, I have made a mistake:

(qemu) cont
(qemu) qemu-kvm: block.c:3332: bdrv_aligned_pwritev: Assertion `!qiov || bytes == qiov->size' failed.
Aborted (core dumped)



(In reply to David Gibson from comment #4)
> A couple of questions to narrow down where the problem could be.
> 
>   1) Are you able to reproduce this problem with an z86 host and guest?

On ppc64le host, the problem could be reproduced on LVs both created on host physical disk and iscsi block disk, while it failed to reproduced on iscsi file disk.


On x86_64 host, since I failed to find an iscsi server that could provide block disk, so I just used the file disk and failed to reproduce the bug, will retry it if iscsi server with block disk available; however, I met 1/5 following type core dump which had occurred once in former ppc64le test:

(qemu) cont
(qemu) qemu-kvm: util/iov.c:319: qemu_iovec_concat_iov: Assertion `soffset == 0' failed.
Aborted (core dumped)

It's a pity the core file was lost since autotest was running on the same host.

> 
>   2) Are you able to reproduce this problem if you use virtio-blk or
> virtio-scsi instead of spapr-vscsi?

Please note primary disk of the guest for the bug is really virtio-blk.
Comment 6 Laurent Vivier 2015-11-06 09:42:40 EST
I'm really surprised by the use of qemu-img on a Logical Volume (without FS).

Extending the size of the LV is like extending the size of a qcow2 file while a guest is using it.

Kevin, can we really do that (if "yes", I'm really impressed...) ?
Comment 7 David Gibson 2015-11-08 18:54:47 EST
Laurent,

I believe extending the LV should be ok (though reducing would not be ok).  IIUC, the way qcow2 works, it allocates space from the front of the file/devices.

More to the point, I'm pretty sure putting a qcow2 directly on an LV which gets extended on ENOSPC, is something that RHEV does all the time, so it's an important case.
Comment 8 David Gibson 2015-11-10 01:44:15 EST
I have a reproducer script on ibm-p8-virt-02, in ~dwg/bz1277922.sh.  You need to manually run through the installer prompts, and run the lvextend when it hits the enospc pause, but it does the VG/LV setup and so forth for you.

I've hit a couple of different assertion failures so far, but haven't gotten far with analyzing them yet.
Comment 9 Laurent Vivier 2015-11-10 04:42:53 EST
I've tested this procedure on an x86_64 system with upstream qemu and real Logical Volume, and I've the following error (and crash) after the first lvextend and continue:

qemu-system-x86_64: /home/lvivier/Projects/qemu/util/iov.c:331: qemu_iovec_concat_iov: Assertion `soffset == 0' failed.

upstream qemu is:

ce27861 configure: Don't disable optimization for non-fortify builds
Comment 10 Kevin Wolf 2015-11-10 05:02:53 EST
(In reply to Laurent Vivier from comment #6)
> I'm really surprised by the use of qemu-img on a Logical Volume (without FS).
> 
> Extending the size of the LV is like extending the size of a qcow2 file
> while a guest is using it.
> 
> Kevin, can we really do that (if "yes", I'm really impressed...) ?

Yes. And sorry if you can't sleep well any more now. :-)

At first I was surprised to learn about such use in RHEV, but by now I've mostly
just accepted it as a part of reality. As long as the management tool makes sure
that the LV is always large enough, there is nothing preventing the use of qcow2
on a block device.
Comment 11 Gu Nini 2015-11-10 05:17:06 EST
(In reply to Laurent Vivier from comment #9)
> I've tested this procedure on an x86_64 system with upstream qemu and real
> Logical Volume, and I've the following error (and crash) after the first
> lvextend and continue:
> 
> qemu-system-x86_64: /home/lvivier/Projects/qemu/util/iov.c:331:
> qemu_iovec_concat_iov: Assertion `soffset == 0' failed.
> 
> upstream qemu is:
> 
> ce27861 configure: Don't disable optimization for non-fortify builds

Laurent,

The crash you met is in the same type as I met in commnet #5 , I have met it both in x86_64 and ppc64le hosts; however, since it's not in the same type as that in the original bug, and it always occurs after the first lvextend/guest_cont(in comment #5, I also met it here) WHILE that in the bug occurs after the second time, are you sure it's the same cause as that in the bug? If no, we should not change haredware to all. BTW, I only met the crash in the bug on ppc64le.
Comment 12 Laurent Vivier 2015-11-10 05:24:04 EST
on x86_64, I have also some crashes without assertion:

Program received signal SIGSEGV, Segmentation fault.
0x00007f25b57a93aa in tcmalloc::ThreadCache::ReleaseToCentralCache(tcmalloc::ThreadCache::FreeList*, unsigned long, int) () from /lib64/libtcmalloc.so.4
(gdb) bt
#0  0x00007f25b57a93aa in tcmalloc::ThreadCache::ReleaseToCentralCache(tcmalloc::ThreadCache::FreeList*, unsigned long, int) () from /lib64/libtcmalloc.so.4
#1  0x00007f25b57a96bf in tcmalloc::ThreadCache::Scavenge() ()
   from /lib64/libtcmalloc.so.4
#2  0x00007f25b57bb58f in tc_deletearray () from /lib64/libtcmalloc.so.4
#3  0x00007f25b529579f in g_free () from /lib64/libglib-2.0.so.0
#4  0x0000560f9d049559 in virtio_blk_free_request (req=0x560fa17a6000)
    at /home/lvivier/Projects/qemu/hw/block/virtio-blk.c:45
#5  virtio_blk_rw_complete (opaque=<optimized out>, ret=0)
    at /home/lvivier/Projects/qemu/hw/block/virtio-blk.c:121
#6  0x0000560f9d281eee in bdrv_co_complete (acb=0x560fa17e7730)
    at /home/lvivier/Projects/qemu/block/io.c:2097
#7  0x0000560f9d2cca4a in coroutine_trampoline (i0=<optimized out>, 
    i1=<optimized out>)
    at /home/lvivier/Projects/qemu/util/coroutine-ucontext.c:80
#8  0x00007f25b4aaffa0 in ?? () from /lib64/libc.so.6
#9  0x00007ffc70141c10 in ?? ()
#10 0x0000000000000000 in ?? ()
Comment 13 Laurent Vivier 2015-11-10 05:28:55 EST
(In reply to Gu Nini from comment #11)
> (In reply to Laurent Vivier from comment #9)
> > I've tested this procedure on an x86_64 system with upstream qemu and real
> > Logical Volume, and I've the following error (and crash) after the first
> > lvextend and continue:
> > 
> > qemu-system-x86_64: /home/lvivier/Projects/qemu/util/iov.c:331:
> > qemu_iovec_concat_iov: Assertion `soffset == 0' failed.
> > 
> > upstream qemu is:
> > 
> > ce27861 configure: Don't disable optimization for non-fortify builds
> 
> Laurent,
> 
> The crash you met is in the same type as I met in commnet #5 , I have met it
> both in x86_64 and ppc64le hosts; however, since it's not in the same type
> as that in the original bug, and it always occurs after the first
> lvextend/guest_cont(in comment #5, I also met it here) WHILE that in the bug
> occurs after the second time, are you sure it's the same cause as that in
> the bug? If no, we should not change haredware to all. BTW, I only met the
> crash in the bug on ppc64le.

You're right it's not the same but I have anyway a crash on x86_64 (comment 12).
Feel free to revert the hardware to ppc64le if you think it is better.
Comment 14 Gu Nini 2015-11-10 06:12:16 EST
(In reply to Laurent Vivier from comment #12)
> on x86_64, I have also some crashes without assertion:
> 

I also had met the Segmentation fault (core dump) issue on both x86_64 and ppc64le test, again it seems a different problem...

(In reply to Laurent Vivier from comment #13)
......
> 
> You're right it's not the same but I have anyway a crash on x86_64 (comment
> 12).
> Feel free to revert the hardware to ppc64le if you think it is better.

Ok, we can leave as it is and wait for deeper investigation.

Thanks.
Comment 15 Laurent Vivier 2015-11-10 09:16:54 EST
(In reply to Gu Nini from comment #14)
> (In reply to Laurent Vivier from comment #12)
> > on x86_64, I have also some crashes without assertion:
> > 
> 
> I also had met the Segmentation fault (core dump) issue on both x86_64 and
> ppc64le test, again it seems a different problem...
> 
> (In reply to Laurent Vivier from comment #13)
> ......
> > 
> > You're right it's not the same but I have anyway a crash on x86_64 (comment
> > 12).
> > Feel free to revert the hardware to ppc64le if you think it is better.
> 
> Ok, we can leave as it is and wait for deeper investigation.

I've been able to reproduce the same kind of error on x86_64 with upstream qemu:

qemu-system-x86_64: /home/lvivier/Projects/qemu/block/io.c:1123: bdrv_aligned_pwritev: Assertion `!qiov || bytes == qiov->size' failed.

Previously, the following options were missing on my command line:

-drive ... snapshot=off,aio=native ...

backtrace is:

Program received signal SIGABRT, Aborted.
0x00007f480678f9c8 in raise () from /lib64/libc.so.6
(gdb) bt
#0  0x00007f480678f9c8 in raise () from /lib64/libc.so.6
#1  0x00007f480679165a in abort () from /lib64/libc.so.6
#2  0x00007f4806788187 in __assert_fail_base () from /lib64/libc.so.6
#3  0x00007f4806788232 in __assert_fail () from /lib64/libc.so.6
#4  0x0000559376a7dcc6 in bdrv_aligned_pwritev (bs=bs@entry=0x559377ee2000, 
    req=req@entry=0x55937dc77ca0, offset=offset@entry=1648361472, 
    bytes=bytes@entry=8192, qiov=0x55937dc77db0, flags=flags@entry=0)
    at /home/lvivier/Projects/qemu/block/io.c:1123
#5  0x0000559376a7e70d in bdrv_co_do_pwritev (bs=0x559377ee2000, 
    offset=<optimized out>, bytes=8192, qiov=qiov@entry=0x55937dc77db0, 
    flags=flags@entry=(unknown: 0))
    at /home/lvivier/Projects/qemu/block/io.c:1370
#6  0x0000559376a7f318 in bdrv_co_do_writev (flags=(unknown: 0), 
    qiov=0x55937dc77db0, nb_sectors=<optimized out>, 
    sector_num=<optimized out>, bs=<optimized out>)
    at /home/lvivier/Projects/qemu/block/io.c:1394
#7  bdrv_co_writev (bs=<optimized out>, sector_num=<optimized out>, 
    nb_sectors=<optimized out>, qiov=qiov@entry=0x55937dc77db0)
    at /home/lvivier/Projects/qemu/block/io.c:1403
#8  0x0000559376a5a965 in qcow2_co_writev (bs=0x559377ed0800, 
    sector_num=63810432, remaining_sectors=16, qiov=0x559379f88038)
    at /home/lvivier/Projects/qemu/block/qcow2.c:1584
#9  0x0000559376a7dc6d in bdrv_aligned_pwritev (bs=bs@entry=0x559377ed0800, 
---Type <return> to continue, or q <return> to quit---
    req=req@entry=0x55937dc77f10, offset=offset@entry=32670433280, 
    bytes=bytes@entry=516096, qiov=0x559379f88038, flags=<optimized out>, 
    flags@entry=0) at /home/lvivier/Projects/qemu/block/io.c:1148
#10 0x0000559376a7e70d in bdrv_co_do_pwritev (bs=0x559377ed0800, 
    offset=<optimized out>, bytes=516096, qiov=0x559379f88038, 
    flags=(unknown: 0)) at /home/lvivier/Projects/qemu/block/io.c:1370
#11 0x0000559376a7f3c0 in bdrv_co_do_writev (flags=<optimized out>, 
    qiov=<optimized out>, nb_sectors=<optimized out>, 
    sector_num=<optimized out>, bs=<optimized out>)
    at /home/lvivier/Projects/qemu/block/io.c:1394
#12 bdrv_co_do_rw (opaque=0x559377e57e30)
    at /home/lvivier/Projects/qemu/block/io.c:2132
#13 0x0000559376ac7a4a in coroutine_trampoline (i0=<optimized out>, 
    i1=<optimized out>)
    at /home/lvivier/Projects/qemu/util/coroutine-ucontext.c:80
#14 0x00007f48067a3fa0 in ?? () from /lib64/libc.so.6
#15 0x00007ffe0049f150 in ?? ()
#16 0x0000000000000000 in ?? ()
Comment 16 David Gibson 2015-11-10 22:38:05 EST
Nini,

I've seen several different assertion failures with no changes in setup.  I think the exact crash is non-deterministic, rather than it being a difference between platforms.

Laurent,

Good start on tracking this down.

I've now tried reproducing the problem with both spapr-vscsi and virtio-scsi disks, and wasn't able to.  That's another datapoint supporting it being the virtio-blk patch which caused this problem.
Comment 17 David Gibson 2015-11-10 23:43:44 EST
Since this doesn't appear to be Power specific, I'm reassigning to the normal pool.
Comment 18 David Gibson 2015-11-11 01:09:26 EST
Here's some deeper analysis of a single failure:

(gdb) bt
#0  0x00003fffb6c6e578 in __GI_raise (sig=<optimized out>) at ../nptl/sysdeps/unix/sysv/linux/raise.c:56
#1  0x00003fffb6c706fc in __GI_abort () at abort.c:90
#2  0x00003fffb7a988c4 in g_assertion_message (domain=<optimized out>, file=0x2772af20 "qom/object.c", 
    line=<optimized out>, func=<optimized out>, message=0x28514260 "assertion failed: (obj->ref > 0)")
    at gtestutils.c:2292
#3  0x00003fffb7a989cc in g_assertion_message_expr (domain=0x0, file=0x2772af20 "qom/object.c", 
    line=<optimized out>, func=0x2772ae90 <__FUNCTION__.29928> "object_unref", expr=<optimized out>)
    at gtestutils.c:2307
#4  0x00000000275dec44 in object_unref (obj=0x28510540) at qom/object.c:716
#5  0x000000002737064c in memory_region_unref (mr=<optimized out>) at /usr/src/debug/qemu-2.3.0/memory.c:1369
#6  0x000000002737315c in memory_region_del_subregion (mr=0x28544000, subregion=0x28544200)
    at /usr/src/debug/qemu-2.3.0/memory.c:1802
#7  0x0000000027373288 in memory_region_finalize (obj=<optimized out>) at /usr/src/debug/qemu-2.3.0/memory.c:1327
#8  0x00000000275ded3c in object_deinit (type=0x28680300, obj=<optimized out>) at qom/object.c:399
#9  object_finalize (data=0x28544000) at qom/object.c:413
#10 object_unref (obj=0x28544000) at qom/object.c:720
#11 0x00000000275decac in object_property_del_all (obj=0x28510540) at qom/object.c:367
#12 object_finalize (data=0x28510540) at qom/object.c:412
#13 object_unref (obj=0x28510540) at qom/object.c:720
#14 0x000000002737064c in memory_region_unref (mr=<optimized out>) at /usr/src/debug/qemu-2.3.0/memory.c:1369
#15 0x00000000273166a0 in address_space_unmap (as=<optimized out>, buffer=<optimized out>, len=<optimized out>, 
    is_write=<optimized out>, access_len=16) at /usr/src/debug/qemu-2.3.0/exec.c:2652
#16 0x00000000273aacf0 in virtqueue_fill (vq=0x29670000, elem=0x297fa010, len=<optimized out>, idx=0)
    at /usr/src/debug/qemu-2.3.0/hw/virtio/virtio.c:266
#17 0x00000000273aafe4 in virtqueue_push (vq=0x29670000, elem=<optimized out>, len=<optimized out>)
    at /usr/src/debug/qemu-2.3.0/hw/virtio/virtio.c:294
#18 0x000000002738408c in virtio_blk_complete_request (req=0x297fa000, status=<optimized out>)
    at /usr/src/debug/qemu-2.3.0/hw/block/virtio-blk.c:58
#19 0x0000000027384774 in virtio_blk_req_complete (status=0 '\000', req=0x297fa000)
    at /usr/src/debug/qemu-2.3.0/hw/block/virtio-blk.c:64
#20 virtio_blk_rw_complete (opaque=<optimized out>, ret=0) at /usr/src/debug/qemu-2.3.0/hw/block/virtio-blk.c:119
#21 0x00000000276212a4 in bdrv_co_em_bh (opaque=0x2971a340) at block.c:4981
#22 0x0000000027616e64 in aio_bh_poll (ctx=0x28760840) at async.c:85
#23 0x000000002762cf3c in aio_dispatch_clients (ctx=<optimized out>, client_mask=<optimized out>) at aio-posix.c:139
#24 0x0000000027616b90 in aio_ctx_dispatch (source=<optimized out>, callback=<optimized out>, 
    user_data=<optimized out>) at async.c:219
#25 0x00003fffb7a626d0 in g_main_dispatch (context=0x28610240) at gmain.c:3109
#26 g_main_context_dispatch (context=0x28610240) at gmain.c:3708
#27 0x000000002762b02c in glib_pollfds_poll () at main-loop.c:209
#28 os_host_main_loop_wait (timeout=<optimized out>) at main-loop.c:254
#29 main_loop_wait (nonblocking=<optimized out>) at main-loop.c:503
#30 0x000000002730c630 in main_loop () at vl.c:1818
#31 main (argc=<optimized out>, argv=<optimized out>, envp=<optimized out>) at vl.c:4394

Looking into what's going in the memory_region_unref at #14:

(gdb) frame 15
#15 0x00000000273166a0 in address_space_unmap (as=<optimized out>, buffer=<optimized out>, len=<optimized out>, 
    is_write=<optimized out>, access_len=16) at /usr/src/debug/qemu-2.3.0/exec.c:2652
2652	        memory_region_unref(mr);
(gdb) p mr->name
$30 = 0x28507090 "ppc_spapr.ram"


That's the main RAM memory region, so it looks like something is going wrong with the refcounting of regions mapped into the virtqueue.
Comment 20 David Gibson 2015-11-11 18:03 EST
Created attachment 1092912 [details]
Reproducer script

Here's the script I'm using to reproduce on ibm-p8-virt-02.

Obviously it will need adjustment for local environment and further adjustment to run on an x86 host.
Comment 21 David Gibson 2015-11-11 18:04:36 EST
Realised the information Laurent found from a bisect didn't make it into the BZ itself.

From: Laurent Vivier <lvivier@redhat.com>
To: David Gibson <dgibson@redhat.com>
Subject: BZ1277922 ("Assertion `!qiov || bytes == qiov->size'")
Date: Tue, 10 Nov 2015 19:03:16 +0100

David,

As I've been able to reproduce the problem on x86_64, I've tried to
bisect (on upstream, v2.2 was good, v2.3 was bad)

I have a commit id, but we can't be sure it is the good one as I have
marked a commit as "bad" if QEMU crashes (for any reason) during the
install and as "good" if not (on one try).


commit 95f7142abc86a916682bd735aecd90172ffa0d30
Author: Peter Lieven <pl@kamp.de>
Date:   Mon Feb 2 14:52:21 2015 +0100

    virtio-blk: introduce multiread

    this patch finally introduces multiread support to virtio-blk. While
    multiwrite support was there for a long time, read support was missing.

    The complete merge logic is moved into virtio-blk.c which has
    been the only user of request merging ever since. This is required
    to be able to merge chunks of requests and immediately invoke callbacks
    for those requests. Secondly, this is required to switch to
    direct invocation of coroutines which is planned at a later stage.

    The following benchmarks show the performance of running fio with
    4 worker threads on a local ram disk. The numbers show the average
    of 10 test runs after 1 run as warmup phase.

                  |        4k        |       64k        |        4k
    MB/s          | rd seq | rd rand | rd seq | rd rand | wr seq | wr rand
    --------------+--------+---------+--------+---------+--------+--------
    master        | 1221   | 1187    | 4178   | 4114    | 1745   | 1213
    multiread     | 1829   | 1189    | 4639   | 4110    | 1894   | 1216

    Signed-off-by: Peter Lieven <pl@kamp.de>
    Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Comment 22 Kevin Wolf 2015-11-12 05:00:39 EST
As you posted an entirely different backtrace that doesn't crash in the block
layer, I'm not sure what Laurent actually ended up testing. But if we assume
that it's the  assertion failure in bdrv_aligned_pwritev() as he posted in
comment 15, the result suggests that the problem (which is that the request
size and the size of the passed qiov don't match) is in virtio-blk.

However, looking at the virtio-blk code actually issuing the requests, I see
that this is already explicitly asserted there for the num_reqs > 1 case:

    assert(nb_sectors == qiov->size / BDRV_SECTOR_SIZE);

For the other case, nb_sectors is already initialised with the right value and
never changed:

    int nb_sectors = mrb->reqs[start]->qiov.size / BDRV_SECTOR_SIZE;

Looking closer at the backtrace, the failing request doesn't directly come from
virtio-blk anyway, but from qcow2. There qiov is reset and then filled in the
non-encrypted case with the following code (the encrypted one is obviously
correct, too, but it's not our case):

    qemu_iovec_reset(&hd_qiov);
    qemu_iovec_concat(&hd_qiov, qiov, bytes_done, cur_nr_sectors * 512);

After this, hd_qiov must be consistent with cur_nr_sectors if the
qiov passed to qcow2 was large enough (which was asserted in the first
bdrv_aligned_pwritev() call in the chain).

The only code I can see that touches qiov between here and there is the
alignment code in bdrv_co_do_pwritev(), but I don't expect it to be used in
our case.


Laurent, if this is reproducible for you, can you check whether the alignment
code triggers for you, and also dump the qiov when the assertion fails? We
probably both want the qiov of the failing bdrv_co_do_pwritev() and the one
that was passed to qcow2 a few calls down the stack trace.

So far I only see that we have a 8192 byte requests, but not what size qiov
ended up. We have remaining_sectors=16 in qcow2_co_writev(), though, so bytes
looks like a plausible value at least.
Comment 23 David Gibson 2015-11-12 05:52:46 EST
Kevin,

Sorry forgot to summarize this for you earlier.  We've seen a number of different aborts / assertion failures with similar reproducer; it doesn't seem to be deterministic (even with guest smp disabled, although it seems to trigger a bit more rarely then).

There's several factors pointing towards virtio_blk, though none of them are conclusive:  1) the asserts have usually had virtio_blk functions *somewhere* in the backtrace, 2) we haven't been able to reproduce with virtio-scsi or spapr-vscsi disks instead of virtio_blk and 3) Laurent did a bisect (on ppc) to a virtio blk patch (see comment 21).  The last isn't entirely conclusive because the problem isn't 100% reproducible, although for me the reproducer script I've attached (on ppc64le) seems to trip it on >50% of attempts.

Because of the variety of crash symptoms, I'm suspecting internal memory corruption in qemu (perhaps a buffer overrun).

Unfortunately some console problems meant I've lost the details, but I did have one run where one of the structures seemed to have been overwritten with some 0s followed by some repeated 0xfffffffc values, which also suggests some sort of memory overwrite (or maybe use-after-free).
Comment 24 Laurent Vivier 2015-11-12 06:06:20 EST
Kevin,

I have bisected on x86_64.

the problem can take different forms, the last I have are:

* qemu-system-x86_64: /home/lvivier/Projects/qemu/block/accounting.c:41: block_acct_done: Assertion `cookie->type < BLOCK_MAX_IOTYPE'

or

* Program received signal SIGSEGV, Segmentation fault.
0x00007fb7e43c0000 in ?? ()
(gdb) bt
#0  0x00007fb7e43c0000 in ?? ()
#1  0x0000562d1d01ebfe in virtio_blk_req_complete (status=0 '\000',
    req=0x562d23469330) at /home/lvivier/Projects/qemu/hw/block/virtio-blk.c:63
#2  virtio_blk_rw_complete (opaque=<optimized out>, ret=0)
    at /home/lvivier/Projects/qemu/hw/block/virtio-blk.c:110
...

As David said, I think also we have a memory corruption in the block sub-system.

We can imagine something like: on the io-error, the "req" is freed but as we add space on disk and ask to continue it is used anyway.
Comment 25 Laurent Vivier 2015-11-12 10:20:31 EST
With valgrind, the following errors come with commit 95f7142 (virtio-blk: introduce multiread):

==2867== Invalid read of size 4
==2867==    at 0x1F0160: virtio_blk_rw_complete (virtio-blk.c:95)
==2867==    by 0x3D291D: bdrv_co_em_bh (block.c:4744)
==2867==    by 0x3CBED6: aio_bh_poll (async.c:84)
==2867==    by 0x3DA94F: aio_dispatch (aio-posix.c:137)
==2867==    by 0x3CBD5D: aio_ctx_dispatch (async.c:223)
==2867==    by 0x64A5A89: g_main_context_dispatch (in /usr/lib64/libglib-2.0.so.0.4400.1)
==2867==    by 0x3D9582: glib_pollfds_poll (main-loop.c:200)
==2867==    by 0x3D9582: os_host_main_loop_wait (main-loop.c:245)
==2867==    by 0x3D9582: main_loop_wait (main-loop.c:494)
==2867==    by 0x1A4F6B: main_loop (vl.c:1794)
==2867==    by 0x1A4F6B: main (vl.c:4350)
==2867==  Address 0x14725174 is 49,220 bytes inside a block of size 49,272 free'd
==2867==    at 0x4C29D6A: free (in /usr/lib64/valgrind/vgpreload_memcheck-amd64-linux.so)
==2867==    by 0x64AB79E: g_free (in /usr/lib64/libglib-2.0.so.0.4400.1)
==2867==    by 0x64C344C: g_slice_free1 (in /usr/lib64/libglib-2.0.so.0.4400.1)
==2867==    by 0x1F01F9: virtio_blk_free_request (virtio-blk.c:44)
==2867==    by 0x1F01F9: virtio_blk_rw_complete (virtio-blk.c:112)
==2867==    by 0x3D291D: bdrv_co_em_bh (block.c:4744)
==2867==    by 0x3CBED6: aio_bh_poll (async.c:84)
==2867==    by 0x3DA94F: aio_dispatch (aio-posix.c:137)
==2867==    by 0x3CBD5D: aio_ctx_dispatch (async.c:223)
==2867==    by 0x64A5A89: g_main_context_dispatch (in /usr/lib64/libglib-2.0.so.0.4400.1)
==2867==    by 0x3D9582: glib_pollfds_poll (main-loop.c:200)
==2867==    by 0x3D9582: os_host_main_loop_wait (main-loop.c:245)
==2867==    by 0x3D9582: main_loop_wait (main-loop.c:494)
==2867==    by 0x1A4F6B: main_loop (vl.c:1794)
==2867==    by 0x1A4F6B: main (vl.c:4350)

[line numbers are ones from commit 95f7142]

virtio-blk.c:

     86 static void virtio_blk_rw_complete(void *opaque, int ret)
     87 {
     88     VirtIOBlockReq *next = opaque;
     89 
     90     while (next) {
     91         VirtIOBlockReq *req = next;
     92         next = req->mr_next;
     93         trace_virtio_blk_rw_complete(req, ret);
     94 
     95         if (req->qiov.nalloc != -1) {
     96             /* If nalloc is != 1 req->qiov is a local copy of the original
     97              * external iovec. It was allocated in submit_merged_requests
     98              * to be able to merge requests. */
     99             qemu_iovec_destroy(&req->qiov);
    100         }
    101 
    102         if (ret) {
    103             int p = virtio_ldl_p(VIRTIO_DEVICE(req->dev), &req->out.type);
    104             bool is_read = !(p & VIRTIO_BLK_T_OUT);
    105             if (virtio_blk_handle_rw_error(req, -ret, is_read)) {
    106                 continue;
    107             }
    108         }
    109 
    110         virtio_blk_req_complete(req, VIRTIO_BLK_S_OK);
    111         block_acct_done(blk_get_stats(req->dev->blk), &req->acct);
    112         virtio_blk_free_request(req);
    113     }
    114 }

In line 95 we try to use "req" which has been released previously (in another loop) in line 112.
Comment 26 Laurent Vivier 2015-11-12 14:12:08 EST
On x86, disabling requests merging seems to workaround the problem:

--- a/hw/block/virtio-blk.c
+++ b/hw/block/virtio-blk.c
@@ -405,6 +405,7 @@ void virtio_blk_submit_multireq(BlockBackend *blk, MultiReqBuffer *mrb)
         VirtIOBlockReq *req = mrb->reqs[i];
         if (num_reqs > 0) {
             bool merge = true;
+merge = false;
 
             /* merge would exceed maximum number of IOVs */
             if (niov + req->qiov.niov > IOV_MAX) {
Comment 27 David Gibson 2015-11-12 23:36:27 EST
The valgrind information is a good clue, but I don't think the problem is right there.

At line 95 req should already have been advanced to the next thing in the chain, which should still be valid (line 91).

So that suggests that the mr_next chain is somehow ending up with a cycle.
Comment 28 Fam Zheng 2015-11-12 23:44:00 EST
The problem is that at the time of io error, req was also added to the dma restart queue, so there is a double completion.

I'm working on a fix right now.
Comment 29 Fam Zheng 2015-11-13 04:55:19 EST
A fix was posted to upstream list:

http://marc.info/?l=qemu-devel&m=144740716414450&w=2
Comment 34 Paolo Bonzini 2015-11-20 08:29:51 EST
*** Bug 1253200 has been marked as a duplicate of this bug. ***
Comment 35 Thomas Huth 2015-11-22 23:59:06 EST
*** Bug 1281520 has been marked as a duplicate of this bug. ***
Comment 39 FuXiangChun 2016-09-12 05:51:36 EDT
According to comment0, Reproduced the bug with qemu-kvm-rhev-2.3.0-31.

(qemu) cli: line 1:  9040 Killed                  /usr/libexec/qemu-kvm -name lvtest -machine pc,accel=kvm,usb=off -m 4096 -realtime mlock=off -smp 4,sockets=2,cores=2,threads=1 -uuid 95346a10-1828-403a-a610-ac5a52a29411 -no-user-config -nodefaults -monitor stdio -rtc base=utc,clock=host -no-shutdown -boot dc -device nec-usb-xhci,id=xhci -drive id=drive_image1,if=none,cache=none,snapshot=off,aio=native,format=qcow2,file=/dev/xfu_vg/lv1 -device virtio-blk-pci,id=image1,drive=drive_image1,bus=pci.0,addr=06,bootindex=1 -cdrom RHEL-7.3-20160908.3-Server-x86_64-dvd1.iso -netdev tap,id=hostnet0 -device virtio-net-pci,netdev=hostnet0,id=net0,mac=52:54:00:c4:e7:11,bus=pci.0,addr=0x9 -device usb-kbd,id=input0 -device usb-mouse,id=input1 -device usb-tablet,id=input2 -vnc :1 -msg timestamp=on -vga qxl

verified the bug with qemu-kvm-rhev-2.6.0-23
Comment 41 errata-xmlrpc 2016-11-07 15:51:58 EST
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHBA-2016-2673.html

Note You need to log in before you can comment on or make changes to this bug.