Bug 1449609

Summary: qemu coredump when dd on multiple usb-storage devices concurrently in guest
Product: Red Hat Enterprise Linux 7 Reporter: yduan
Component: qemu-kvm-rhevAssignee: Gerd Hoffmann <kraxel>
Status: CLOSED ERRATA QA Contact: Gu Nini <ngu>
Severity: high Docs Contact:
Priority: high    
Version: 7.4CC: chayang, coli, hachen, jinzhao, juzhang, knoel, kraxel, lmiksik, mrezanin, ngu, ovasik, virt-maint, xfu, yduan
Target Milestone: rc   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: qemu-kvm-rhev-2.10.0-1.el7 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2018-04-11 00:19:31 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1473046    
Attachments:
Description Flags
Full_CMD none

Description yduan 2017-05-10 11:13:52 UTC
Created attachment 1277600 [details]
Full_CMD

Description of problem:
qemu coredump when dd on multiple usb-storage devices concurrently in guest

Version-Release number of selected component (if applicable):
Host:
# rpm -q qemu-kvm-rhev
qemu-kvm-rhev-2.9.0-3.el7.x86_64
# uname -r
3.10.0-664.el7.x86_64
Guest:
3.10.0-663.el7.x86_64

How reproducible:
1/2

Steps to Reproduce:
1.Boot a VM with multiple usb-storage devices:
/usr/libexec/qemu-kvm \
 ......
 -device usb-ehci,id=usb1 \
 -drive file=images/stg1.qcow2,format=qcow2,id=drive_disk1,if=none,cache=none,aio=native,werror=stop,rerror=stop \
 -device usb-storage,drive=drive_disk1,id=device_disk1 \
 ......
 -drive file=images/stg6.qcow2,format=qcow2,id=drive_disk6,if=none,cache=none,aio=native,werror=stop,rerror=stop \
 -device usb-storage,drive=drive_disk6,id=device_disk6 \
 -device usb-ehci,id=usb2 \
 -drive file=images/stg7.qcow2,format=qcow2,id=drive_disk7,if=none,cache=none,aio=native,werror=stop,rerror=stop \
 -device usb-storage,drive=drive_disk7,id=device_disk7 \
 ......
 -drive file=images/stg12.qcow2,format=qcow2,id=drive_disk12,if=none,cache=none,aio=native,werror=stop,rerror=stop \
 -device usb-storage,drive=drive_disk12,id=device_disk12 \
 -device usb-ehci,id=usb3 \
 -drive file=images/stg13.qcow2,format=qcow2,id=drive_disk13,if=none,cache=none,aio=native,werror=stop,rerror=stop \
 -device usb-storage,drive=drive_disk13,id=device_disk13 \
 ......
 -drive file=images/stg16.qcow2,format=qcow2,id=drive_disk16,if=none,cache=none,aio=native,werror=stop,rerror=stop \
 -device usb-storage,drive=drive_disk16,id=device_disk16 \

 Full command line is as attachment.

2."fdisk -l" in the guest:
# fdisk -l

Disk /dev/vda: 21.5 GB, 21474836480 bytes, 41943040 sectors
Units = sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk label type: dos
Disk identifier: 0x000cdb05

   Device Boot      Start         End      Blocks   Id  System
/dev/vda1   *        2048     2099199     1048576   83  Linux
/dev/vda2         2099200    41943039    19921920   8e  Linux LVM

Disk /dev/mapper/rhel_bootp--73--199--233-root: 18.2 GB, 18249416704 bytes, 35643392 sectors
Units = sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes


Disk /dev/mapper/rhel_bootp--73--199--233-swap: 2147 MB, 2147483648 bytes, 4194304 sectors
Units = sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes


Disk /dev/sda: 209 MB, 209715200 bytes, 409600 sectors
Units = sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes


......


Disk /dev/sdp: 209 MB, 209715200 bytes, 409600 sectors
Units = sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes

3. # sh dd
# cat dd
dd if=/dev/zero of=/dev/sda bs=1M count=100 &
dd if=/dev/zero of=/dev/sdb bs=1M count=100 &
dd if=/dev/zero of=/dev/sdc bs=1M count=100 &
dd if=/dev/zero of=/dev/sdd bs=1M count=100 &
dd if=/dev/zero of=/dev/sde bs=1M count=100 &
dd if=/dev/zero of=/dev/sdf bs=1M count=100 &
dd if=/dev/zero of=/dev/sdg bs=1M count=100 &
dd if=/dev/zero of=/dev/sdh bs=1M count=100 &
dd if=/dev/zero of=/dev/sdi bs=1M count=100 &
dd if=/dev/zero of=/dev/sdj bs=1M count=100 &
dd if=/dev/zero of=/dev/sdk bs=1M count=100 &
dd if=/dev/zero of=/dev/sdl bs=1M count=100 &
dd if=/dev/zero of=/dev/sdm bs=1M count=100 &
dd if=/dev/zero of=/dev/sdn bs=1M count=100 &
dd if=/dev/zero of=/dev/sdo bs=1M count=100 &
dd if=/dev/zero of=/dev/sdp bs=1M count=100 &

Actual results:
qemu-kvm: hw/usb/core.c:492: usb_cancel_packet: Assertion `usb_packet_is_inflight(p)' failed.
max_usb.sh: line 60: 16187 Aborted                 (core dumped)

Expected results:
There should be no coredump.

Additional info:
# gdb core.16187
(gdb) bt
#0  0x00007f2dd6afd1d7 in raise () from /lib64/libc.so.6
#1  0x00007f2dd6afe8c8 in abort () from /lib64/libc.so.6
#2  0x00007f2dd6af6146 in __assert_fail_base () from /lib64/libc.so.6
#3  0x00007f2dd6af61f2 in __assert_fail () from /lib64/libc.so.6
#4  0x000055bbd4c8250d in usb_cancel_packet (p=0x55bbd7dbcb80) at hw/usb/core.c:492
#5  0x000055bbd4c8a658 in ehci_free_packet (p=0x55bbd7dbcb40) at hw/usb/hcd-ehci.c:522
#6  0x000055bbd4c8a875 in ehci_cancel_queue (q=q@entry=0x55bbd7d7ceb0) at hw/usb/hcd-ehci.c:580
#7  0x000055bbd4c8a8cb in ehci_free_queue (q=0x55bbd7d7ceb0, warn=0x0) at hw/usb/hcd-ehci.c:607
#8  0x000055bbd4c8ac0a in ehci_queues_rip_unseen (ehci=ehci@entry=0x55bbddde09f0, async=1) at hw/usb/hcd-ehci.c:656
#9  0x000055bbd4c8c2b7 in ehci_advance_async_state (ehci=0x55bbddde09f0) at hw/usb/hcd-ehci.c:2144
#10 0x000055bbd4c8c59c in ehci_frame_timer (opaque=0x55bbddde09f0) at hw/usb/hcd-ehci.c:2299
#11 0x000055bbd4db3ea1 in aio_bh_call (bh=0x55bbddaf1150) at util/async.c:90
#12 aio_bh_poll (ctx=ctx@entry=0x55bbd7d85700) at util/async.c:118
#13 0x000055bbd4db6f44 in aio_poll (ctx=0x55bbd7d85700, blocking=blocking@entry=true) at util/aio-posix.c:682
#14 0x000055bbd4d465d1 in bdrv_aio_cancel (acb=0x55bbdbb0c800) at block/io.c:2129
#15 0x000055bbd4c8a658 in ehci_free_packet (p=0x55bbd7dbcb40) at hw/usb/hcd-ehci.c:522
#16 0x000055bbd4c8a875 in ehci_cancel_queue (q=q@entry=0x55bbd7d7ceb0) at hw/usb/hcd-ehci.c:580
#17 0x000055bbd4c8a8cb in ehci_free_queue (q=0x55bbd7d7ceb0, warn=0x0) at hw/usb/hcd-ehci.c:607
#18 0x000055bbd4c8ac0a in ehci_queues_rip_unseen (ehci=ehci@entry=0x55bbddde09f0, async=1) at hw/usb/hcd-ehci.c:656
#19 0x000055bbd4c8c2b7 in ehci_advance_async_state (ehci=0x55bbddde09f0) at hw/usb/hcd-ehci.c:2144
#20 0x000055bbd4c8c59c in ehci_frame_timer (opaque=0x55bbddde09f0) at hw/usb/hcd-ehci.c:2299
#21 0x000055bbd4db3ea1 in aio_bh_call (bh=0x55bbddaf1150) at util/async.c:90
#22 aio_bh_poll (ctx=ctx@entry=0x55bbd7d85700) at util/async.c:118
#23 0x000055bbd4db6bb0 in aio_dispatch (ctx=0x55bbd7d85700) at util/aio-posix.c:429
#24 0x000055bbd4db3d7e in aio_ctx_dispatch (source=<optimized out>, callback=<optimized out>, user_data=<optimized out>) at util/async.c:261
#25 0x00007f2dd83abd7a in g_main_context_dispatch () from /lib64/libglib-2.0.so.0
#26 0x000055bbd4db5e6c in glib_pollfds_poll () at util/main-loop.c:213
#27 os_host_main_loop_wait (timeout=<optimized out>) at util/main-loop.c:261
#28 main_loop_wait (nonblocking=nonblocking@entry=0) at util/main-loop.c:517
#29 0x000055bbd4aa701c in main_loop () at vl.c:1898
#30 main (argc=<optimized out>, argv=<optimized out>, envp=<optimized out>) at vl.c:4720



(gdb) bt full
#0  0x00007f2dd6afd1d7 in raise () from /lib64/libc.so.6
No symbol table info available.
#1  0x00007f2dd6afe8c8 in abort () from /lib64/libc.so.6
No symbol table info available.
#2  0x00007f2dd6af6146 in __assert_fail_base () from /lib64/libc.so.6
No symbol table info available.
#3  0x00007f2dd6af61f2 in __assert_fail () from /lib64/libc.so.6
No symbol table info available.
#4  0x000055bbd4c8250d in usb_cancel_packet (p=0x55bbd7dbcb80) at hw/usb/core.c:492
        callback = false
        __PRETTY_FUNCTION__ = "usb_cancel_packet"
#5  0x000055bbd4c8a658 in ehci_free_packet (p=0x55bbd7dbcb40) at hw/usb/hcd-ehci.c:522
No locals.
#6  0x000055bbd4c8a875 in ehci_cancel_queue (q=q@entry=0x55bbd7d7ceb0) at hw/usb/hcd-ehci.c:580
        p = <optimized out>
        packets = 0
#7  0x000055bbd4c8a8cb in ehci_free_queue (q=0x55bbd7d7ceb0, warn=0x0) at hw/usb/hcd-ehci.c:607
        head = 0x55bbddde1140
        cancelled = <optimized out>
#8  0x000055bbd4c8ac0a in ehci_queues_rip_unseen (ehci=ehci@entry=0x55bbddde09f0, async=1) at hw/usb/hcd-ehci.c:656
        head = 0x55bbddde1140
        q = <optimized out>
        tmp = 0x55bbd7d7c7f0
#9  0x000055bbd4c8c2b7 in ehci_advance_async_state (ehci=0x55bbddde09f0) at hw/usb/hcd-ehci.c:2144
        ehci = 0x55bbddde09f0
#10 0x000055bbd4c8c59c in ehci_frame_timer (opaque=0x55bbddde09f0) at hw/usb/hcd-ehci.c:2299
        ehci = 0x55bbddde09f0
        need_timer = 1
        expire_time = <optimized out>
        t_now = 3074643889872
        ns_elapsed = <optimized out>
        uframes = 6
        skipped_uframes = <optimized out>
        i = <optimized out>
#11 0x000055bbd4db3ea1 in aio_bh_call (bh=0x55bbddaf1150) at util/async.c:90
No locals.
#12 aio_bh_poll (ctx=ctx@entry=0x55bbd7d85700) at util/async.c:118
        bh = <optimized out>
        bhp = <optimized out>
        next = 0x55bbddaf00d0
---Type <return> to continue, or q <return> to quit---
        ret = 1
        deleted = false
#13 0x000055bbd4db6f44 in aio_poll (ctx=0x55bbd7d85700, blocking=blocking@entry=true) at util/aio-posix.c:682
        node = <optimized out>
        i = <optimized out>
        ret = 0
        progress = <optimized out>
        timeout = <optimized out>
        start = 0
        __PRETTY_FUNCTION__ = "aio_poll"
#14 0x000055bbd4d465d1 in bdrv_aio_cancel (acb=0x55bbdbb0c800) at block/io.c:2129
        __PRETTY_FUNCTION__ = "bdrv_aio_cancel"
#15 0x000055bbd4c8a658 in ehci_free_packet (p=0x55bbd7dbcb40) at hw/usb/hcd-ehci.c:522
No locals.
#16 0x000055bbd4c8a875 in ehci_cancel_queue (q=q@entry=0x55bbd7d7ceb0) at hw/usb/hcd-ehci.c:580
        p = <optimized out>
        packets = 0
#17 0x000055bbd4c8a8cb in ehci_free_queue (q=0x55bbd7d7ceb0, warn=0x0) at hw/usb/hcd-ehci.c:607
        head = 0x55bbddde1140
        cancelled = <optimized out>
#18 0x000055bbd4c8ac0a in ehci_queues_rip_unseen (ehci=ehci@entry=0x55bbddde09f0, async=1) at hw/usb/hcd-ehci.c:656
        head = 0x55bbddde1140
        q = <optimized out>
        tmp = 0x55bbd7d7c7f0
#19 0x000055bbd4c8c2b7 in ehci_advance_async_state (ehci=0x55bbddde09f0) at hw/usb/hcd-ehci.c:2144
        ehci = 0x55bbddde09f0
#20 0x000055bbd4c8c59c in ehci_frame_timer (opaque=0x55bbddde09f0) at hw/usb/hcd-ehci.c:2299
        ehci = 0x55bbddde09f0
        need_timer = 1
        expire_time = <optimized out>
        t_now = 3074643076714
        ns_elapsed = <optimized out>
        uframes = 1
        skipped_uframes = <optimized out>
        i = <optimized out>
#21 0x000055bbd4db3ea1 in aio_bh_call (bh=0x55bbddaf1150) at util/async.c:90
No locals.
#22 aio_bh_poll (ctx=ctx@entry=0x55bbd7d85700) at util/async.c:118
        bh = <optimized out>
        bhp = <optimized out>
---Type <return> to continue, or q <return> to quit---
        next = 0x55bbddaf00d0
        ret = 1
        deleted = false
#23 0x000055bbd4db6bb0 in aio_dispatch (ctx=0x55bbd7d85700) at util/aio-posix.c:429
No locals.
#24 0x000055bbd4db3d7e in aio_ctx_dispatch (source=<optimized out>, callback=<optimized out>, user_data=<optimized out>) at util/async.c:261
        ctx = <optimized out>
#25 0x00007f2dd83abd7a in g_main_context_dispatch () from /lib64/libglib-2.0.so.0
No symbol table info available.
#26 0x000055bbd4db5e6c in glib_pollfds_poll () at util/main-loop.c:213
        context = 0x55bbd7de4a50
        pfds = <optimized out>
#27 os_host_main_loop_wait (timeout=<optimized out>) at util/main-loop.c:261
        context = 0x55bbd7de4a50
        ret = 1
        spin_counter = 0
#28 main_loop_wait (nonblocking=nonblocking@entry=0) at util/main-loop.c:517
        ret = 1
        timeout = 4294967295
        timeout_ns = <optimized out>
#29 0x000055bbd4aa701c in main_loop () at vl.c:1898
No locals.
#30 main (argc=<optimized out>, argv=<optimized out>, envp=<optimized out>) at vl.c:4720
        i = <optimized out>
        snapshot = <optimized out>
        linux_boot = <optimized out>
        initrd_filename = <optimized out>
        kernel_filename = <optimized out>
        kernel_cmdline = <optimized out>
        boot_order = <optimized out>
        boot_once = 0x0
        cyls = <optimized out>
        heads = <optimized out>
        secs = <optimized out>
        translation = <optimized out>
        opts = <optimized out>
        machine_opts = <optimized out>
        hda_opts = <optimized out>
        icount_opts = <optimized out>
        accel_opts = <optimized out>
---Type <return> to continue, or q <return> to quit---
        olist = <optimized out>
        optind = 113
        optarg = 0x7ffc77472497 "usb-storage,drive=drive_disk16,id=device_disk16"
        loadvm = <optimized out>
        machine_class = 0x0
        cpu_model = <optimized out>
        vga_model = 0x7ffc774717be "qxl"
        qtest_chrdev = <optimized out>
        qtest_log = <optimized out>
        pid_file = <optimized out>
        incoming = <optimized out>
        defconfig = <optimized out>
        userconfig = <optimized out>
        nographic = <optimized out>
        display_type = <optimized out>
        display_remote = <optimized out>
        log_mask = <optimized out>
        log_file = <optimized out>
        trace_file = <optimized out>
        maxram_size = <optimized out>
        ram_slots = <optimized out>
        vmstate_dump_file = <optimized out>
        main_loop_err = 0x0
        err = 0x0
        list_data_dirs = <optimized out>
        bdo_queue = {sqh_first = 0x0, sqh_last = 0x7ffc774707c0}
        __func__ = "main"
        __FUNCTION__ = "main"

Comment 4 Gerd Hoffmann 2017-05-11 14:22:29 UTC
https://brewweb.engineering.redhat.com/brew/taskinfo?taskID=13179248
please test

Comment 5 hachen 2017-05-19 09:30:44 UTC
I tried 5 times on
kernel-3.10.0-656.el7.x86_64
qemu-kvm-rhev-2.9.0-5.el7.x86_64.
I could not reproduce this bug.

Sorry Gerd,I did not save the build you provided in comment 4.
Could you please provide another build that fixed this issue?
Many thanks.

Comment 6 Gerd Hoffmann 2017-05-19 10:28:34 UTC
It's a race, so there is no reliable reproducer.

new scratch build:
https://brewweb.engineering.redhat.com/brew/taskinfo?taskID=13238187

Comment 7 hachen 2017-05-22 05:00:52 UTC
I tested 5 times on 
kernel-3.10.0-656.el7.x86_64
qemu-kvm-rhev-2.9.0-3.el7.bz1449609.1.x86_64 (provided in the comment 6)

It works fine. I didn't encounter the qemu dump issue.

And I tired 3 times on the reporter's machine, try to reproduce it.
But it did not happen.

Comment 8 Gerd Hoffmann 2017-05-22 06:24:10 UTC
> And I tired 3 times on the reporter's machine, try to reproduce it.
> But it did not happen.

Well, it's a race ...

You can try to put some load to the test machine, that might make it more likely to happen.

Comment 9 hachen 2017-05-22 06:28:37 UTC
(In reply to Gerd Hoffmann from comment #8)
> > And I tired 3 times on the reporter's machine, try to reproduce it.
> > But it did not happen.
> 
> Well, it's a race ...
> 
> You can try to put some load to the test machine, that might make it more
> likely to happen.

I understand that.
Since there is no reliable reproducer, I will not to try to reproduce it.
The comment 7 is just to show that the fix works, at least in my 5 tests.
Thanks.

Comment 10 Gerd Hoffmann 2017-05-22 08:30:22 UTC
submitted upstream
https://patchwork.ozlabs.org/patch/764642/

Comment 11 CongLi 2017-05-22 09:28:27 UTC
(In reply to Gerd Hoffmann from comment #6)
> It's a race, so there is no reliable reproducer.
> 
> new scratch build:
> https://brewweb.engineering.redhat.com/brew/taskinfo?taskID=13238187

Hi Yanbin,

Could you please try to test this bug with the build above?

Thanks.

Comment 12 yduan 2017-05-23 04:30:58 UTC
Hi,

Can not reproduce this bug no matter with qemu-kvm-rhev-2.9.0-3.el7.x86_64 or qemu-kvm-rhev-2.9.0-3.el7.bz1449609.1.x86_64.

I met this problem once.

Thanks.

Comment 13 Gerd Hoffmann 2017-06-06 14:00:28 UTC
upstream commit 3bfecee2cb71f21cd39d6183f18b446c01917573
(also pick 3ae7eb88c472799f17beb653f4bc3fb8b124c72e while being at it).

Comment 14 Miroslav Rezanina 2017-06-08 16:27:22 UTC
Fix included in qemu-kvm-rhev-2.9.0-9.el7

Comment 18 Gerd Hoffmann 2017-06-12 08:38:52 UTC
https://patchwork.ozlabs.org/patch/774447/

Comment 20 Gerd Hoffmann 2017-07-12 13:21:21 UTC
(In reply to Gerd Hoffmann from comment #18)
> https://patchwork.ozlabs.org/patch/774447/

commit ad3c5412f2704672bb212bb82035c9b1a72db782

Comment 25 errata-xmlrpc 2018-04-11 00:19:31 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2018:1104