Bug 1684342 - [data plane] Qemu core dump for 'virtio_scsi_ctx_check: Assertion `blk_get_aio_context(d->conf.blk) == s->ctx' failed' when create a snapshot with blockdev-create
Summary: [data plane] Qemu core dump for 'virtio_scsi_ctx_check: Assertion `blk_get_ai...
Keywords:
Status: CLOSED WONTFIX
Alias: None
Product: Red Hat Enterprise Linux Advanced Virtualization
Classification: Red Hat
Component: qemu-kvm
Version: ---
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: pre-dev-freeze
: ---
Assignee: Kevin Wolf
QA Contact: aihua liang
URL:
Whiteboard:
Depends On: 1683937
Blocks:
TreeView+ depends on / blocked
 
Reported: 2019-03-01 02:25 UTC by aihua liang
Modified: 2020-03-04 16:01 UTC (History)
10 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of: 1683937
Environment:
Last Closed: 2019-05-13 10:56:31 UTC
Type: ---
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)

Description aihua liang 2019-03-01 02:25:34 UTC
+++ This bug was initially created as a clone of Bug #1683937 +++

Description of problem:
After creating a snapshot for a living guest image(data plane enabled) with blockdev-create/blockdev-add while specifying the image as the backing file with 'backing-file'/'backing', qemu core dumped:

# ./vm12.sh rhel76be.qcow2 
QEMU 3.1.0 monitor - type 'help' for more information
(qemu) 
(qemu) qemu-kvm: /builddir/build/BUILD/qemu-3.1.0/hw/scsi/virtio-scsi.c:246: virtio_scsi_ctx_check: Assertion `blk_get_aio_context(d->conf.blk) == s->ctx' failed.
./vm12.sh: line 28: 68870 Aborted                 (core dumped) /usr/libexec/qemu-kvm -name 'avocado-vt-vm1' -sandbox off -machine pseries -nodefaults -vga std -chardev socket,id=qmp_id_qmpmonitor1,path=/var/tmp/avocado1,server,nowait -mon chardev=qmp_id_qmpmonitor1,mode=control -smp 8,cores=2,threads=2,sockets=2 -object iothread,id=iothread1 -device nec-usb-xhci,id=usb1,bus=pci.0,addr=0x4 -device virtio-serial-pci,id=virtio-serial0,bus=pci.0,addr=0x5 -chardev socket,path=/tmp/qga.sock,server,nowait,id=qga0 -device virtserialport,bus=virtio-serial0.0,chardev=qga0,id=qemu-ga0,name=org.qemu.guest_agent.0 -device virtio-scsi-pci,id=virtio_scsi_pci0,bus=pci.0,addr=0x6,iothread=iothread1 -blockdev node-name=file0,driver=file,filename=$1 -blockdev node-name=disk0,driver=qcow2,file=file0 -device scsi-hd,drive=disk0,id=image0 -device virtio-net-pci,mac=9a:78:79:7a:7b:6c,id=id8e5D72,netdev=idrYUYaH -netdev tap,id=idrYUYaH,vhost=on -m 1024 -device usb-tablet,id=usb-tablet1,bus=usb1.0,port=1 -device usb-kbd,id=usb-kbd1,bus=usb1.0,port=2 -vnc :10 -rtc base=localtime,clock=host,driftfix=slew -boot menu=off,strict=off,order=cdn -enable-kvm -monitor stdio

The script used to create the snapshot is as follows:
# cat imagecreate_scratch.sh
#!/bin/bash
echo -e "{'execute':'qmp_capabilities'} {'execute':'blockdev-create','arguments':{'options': {'driver':'file','filename':'/home/ngu/inc1.qcow2','size':32212254720},'job-id':'job1'}}" | nc -U /var/tmp/avocado1
sleep 2
echo -e "{'execute':'qmp_capabilities'} {'execute':'blockdev-add','arguments':{'driver':'file','node-name':'iinc11','filename':'/home/ngu/inc1.qcow2'}}" | nc -U /var/tmp/avocado1
sleep 2
echo -e "{'execute':'qmp_capabilities'} {'execute':'blockdev-create', 'arguments': {'job-id': 'job2', 'options': {'driver': 'qcow2', 'file': 'iinc11', 'size': 32212254720,'backing-fmt':'qcow2','backing-file':'rhel76be.qcow2'}}}" | nc -U /var/tmp/avocado1
sleep 5
echo -e "{'execute':'qmp_capabilities'} {'execute':'blockdev-add','arguments':{'driver':'qcow2','node-name':'tmp','file':'iinc11','backing':'disk0'}}" | nc -U /var/tmp/avocado1
sleep 5
echo -e "{'execute':'qmp_capabilities'} {'execute':'job-dismiss','arguments':{'id':'job1'}}" | nc -U /var/tmp/avocado1
sleep 2
echo -e "{'execute':'qmp_capabilities'} {'execute':'job-dismiss','arguments':{'id':'job2'}}" | nc -U /var/tmp/avocado1


Version-Release number of selected component (if applicable):
Host kernel: 4.18.0-71.el8.ppc64le
qemu-kvm-3.1.0-18.module+el8+2834+fa8bb6e2.ppc64le

How reproducible:
100%

Steps to Reproduce:
1. Boot up a guest with a system disk(the node-name is disk0, data plane enabled) 
2. Try to create a scratch disk as the snapshot of disk0 with blockdev-create/blockdev-add(as the script imagecreate_scratch.sh shows)
3. After the snapshot creation finishes, qemu core dumped occurs, or sometimes you need to do a system_reset to trigger the core dump.

Actual results:
In step3, qemu core dumped.

Expected results:
No core dump would occur.

Additional info:
Both system disk and data disk would cause the bug.

--- Additional comment from Gu Nini on 2019-02-28 08:39 UTC ---

Issue reproduced on x86 platform.

Comment 2 aihua liang 2019-03-01 02:29:08 UTC
Test on qemu-kvm-rhev-2.12.0-23.el7.x86_64, also hit this issue.

Comment 3 aihua liang 2019-03-01 02:40:21 UTC
Coredump info on RHEL7:
  (gdb) bt full
#0  0x00007f3aec1b6207 in raise () at /lib64/libc.so.6
#1  0x00007f3aec1b78f8 in abort () at /lib64/libc.so.6
#2  0x00007f3aec1af026 in __assert_fail_base () at /lib64/libc.so.6
#3  0x00007f3aec1af0d2 in  () at /lib64/libc.so.6
#4  0x000055e924d99a91 in virtio_scsi_ctx_check (s=<optimized out>, s=<optimized out>, d=0x55e928c32280)
    at /usr/src/debug/qemu-2.12.0/hw/scsi/virtio-scsi.c:246
#5  0x000055e924e34cc6 in virtio_scsi_handle_cmd_vq (s=<optimized out>, s=<optimized out>, d=0x55e928c32280)
    at /usr/src/debug/qemu-2.12.0/hw/scsi/virtio-scsi.c:246
        vs = 0x55e929690170
        rc = <optimized out>
        req = 0x55e928f4f2c0
        next = <optimized out>
        ret = <optimized out>
        progress = true
        reqs = {tqh_first = 0x0, tqh_last = 0x7f3ae42a96d0}
#6  0x000055e924e34cc6 in virtio_scsi_handle_cmd_vq (req=0x55e928f4f2c0, s=0x55e929690170)
    at /usr/src/debug/qemu-2.12.0/hw/scsi/virtio-scsi.c:559
        vs = 0x55e929690170
        rc = <optimized out>
        req = 0x55e928f4f2c0
        next = <optimized out>
        ret = <optimized out>
        progress = true
        reqs = {tqh_first = 0x0, tqh_last = 0x7f3ae42a96d0}
#7  0x000055e924e34cc6 in virtio_scsi_handle_cmd_vq (s=s@entry=0x55e929690170, vq=vq@entry=0x55e929698100)
    at /usr/src/debug/qemu-2.12.0/hw/scsi/virtio-scsi.c:599
        req = 0x55e928f4f2c0
        next = <optimized out>
        ret = <optimized out>
        progress = true
        reqs = {tqh_first = 0x0, tqh_last = 0x7f3ae42a96d0}
---Type <return> to continue, or q <return> to quit---
#8  0x000055e924e357da in virtio_scsi_data_plane_handle_cmd (vdev=<optimized out>, vq=0x55e929698100)
    at /usr/src/debug/qemu-2.12.0/hw/scsi/virtio-scsi-dataplane.c:60
        progress = <optimized out>
        s = 0x55e929690170
#9  0x000055e924e438f6 in virtio_queue_host_notifier_aio_poll (vq=0x55e929698100)
    at /usr/src/debug/qemu-2.12.0/hw/virtio/virtio.c:1512
        n = 0x55e929698168
        vq = 0x55e929698100
        progress = <optimized out>
#10 0x000055e924e438f6 in virtio_queue_host_notifier_aio_poll (opaque=0x55e929698168)
    at /usr/src/debug/qemu-2.12.0/hw/virtio/virtio.c:2427
        n = 0x55e929698168
        vq = 0x55e929698100
        progress = <optimized out>
#11 0x000055e9250fe72e in run_poll_handlers_once (ctx=ctx@entry=0x55e927e89900, timeout=timeout@entry=0x7f3ae42a97d8)
    at util/aio-posix.c:501
        progress = false
        node = 0x55e928074780
#12 0x000055e9250ff594 in aio_poll (timeout=0x7f3ae42a97d8, max_ns=16000, ctx=0x55e927e89900) at util/aio-posix.c:539
        progress = <optimized out>
        start_time = 255621519877114
        elapsed_time = <optimized out>
        max_ns = 16000
        node = <optimized out>
        i = <optimized out>
        ret = 0
        progress = <optimized out>
        timeout = -1
        start = 255621519875089
        __PRETTY_FUNCTION__ = "aio_poll"
#13 0x000055e9250ff594 in aio_poll (timeout=0x7f3ae42a97d8, ctx=0x55e927e89900) at util/aio-posix.c:574
---Type <return> to continue, or q <return> to quit---
        max_ns = 16000
        node = <optimized out>
        i = <optimized out>
        ret = 0
        progress = <optimized out>
        timeout = -1
        start = 255621519875089
        __PRETTY_FUNCTION__ = "aio_poll"
#14 0x000055e9250ff594 in aio_poll (ctx=0x55e927e89900, blocking=blocking@entry=true) at util/aio-posix.c:615
        node = <optimized out>
        i = <optimized out>
        ret = 0
        progress = <optimized out>
        timeout = -1
        start = 255621519875089
        __PRETTY_FUNCTION__ = "aio_poll"
#15 0x000055e924ece8ae in iothread_run (opaque=0x55e927ea7ce0) at iothread.c:64
        iothread = 0x55e927ea7ce0
#16 0x00007f3aec554dd5 in start_thread () at /lib64/libpthread.so.0
#17 0x00007f3aec27dead in clone () at /lib64/libc.so.6

Comment 4 aihua liang 2019-03-01 02:47:26 UTC
Reproduce steps:
  1.Start guest with vm paused status and data_plane enable on scsi disk.
      /usr/libexec/qemu-kvm \
    -S \
    -name 'avocado-vt-vm1' \
    -machine pc  \
    -nodefaults \
    -device VGA,bus=pci.0,addr=0x2  \
    -chardev socket,id=qmp_id_qmpmonitor1,path=/var/tmp/monitor-qmpmonitor1-20190123-032240-rOoB4cgD,server,nowait \
    -mon chardev=qmp_id_qmpmonitor1,mode=control  \
    -chardev socket,id=qmp_id_catch_monitor,path=/var/tmp/monitor-catch_monitor-20190123-032240-rOoB4cgD,server,nowait \
    -mon chardev=qmp_id_catch_monitor,mode=control \
    -device pvpanic,ioport=0x505,id=id8Ec4Bn  \
    -chardev socket,id=serial_id_serial0,path=/var/tmp/serial-serial0-20190123-032240-rOoB4cgD,server,nowait \
    -device isa-serial,chardev=serial_id_serial0  \
    -chardev socket,id=seabioslog_id_20190123-032240-rOoB4cgD,path=/var/tmp/seabios-20190123-032240-rOoB4cgD,server,nowait \
    -device isa-debugcon,chardev=seabioslog_id_20190123-032240-rOoB4cgD,iobase=0x402 \
    -device ich9-usb-ehci1,id=usb1,addr=0x1d.7,multifunction=on,bus=pci.0 \
    -device ich9-usb-uhci1,id=usb1.0,multifunction=on,masterbus=usb1.0,addr=0x1d.0,firstport=0,bus=pci.0 \
    -device ich9-usb-uhci2,id=usb1.1,multifunction=on,masterbus=usb1.0,addr=0x1d.2,firstport=2,bus=pci.0 \
    -device ich9-usb-uhci3,id=usb1.2,multifunction=on,masterbus=usb1.0,addr=0x1d.4,firstport=4,bus=pci.0 \
    -object iothread,id=iothread0 \
    -device virtio-scsi-pci,id=virtio_scsi_pci0,bus=pci.0,addr=0x3,iothread=iothread0 \
    -blockdev driver=file,node-name=file_base,filename=/root/sn1,auto-read-only=on \
    -blockdev driver=qcow2,file=file_base,node-name=drive_image1,auto-read-only=on \
    -device scsi-hd,id=image1,drive=drive_image1 \
    -device virtio-net-pci,mac=9a:39:3a:3b:3c:3d,id=id1JNQsL,vectors=4,netdev=idVpZZ6A,bus=pci.0,addr=0x4  \
    -netdev tap,id=idVpZZ6A,vhost=on \
    -m 4096  \
    -smp 2,maxcpus=2,cores=1,threads=1,sockets=2  \
    -cpu 'Penryn',+kvm_pv_unhalt \
    -device usb-tablet,id=usb-tablet1,bus=usb1.0,port=1  \
    -vnc :0  \
    -rtc base=utc,clock=host,driftfix=slew  \
    -boot order=cdn,once=c,menu=off,strict=off \
    -enable-kvm \
    -monitor stdio \
    -qmp tcp:0:3000,server,nowait \

 2.Create snapshot image on drive_image1.
    {'execute':'blockdev-create','arguments':{'options': {'driver':'file','filename':'/root/sn$i','size':21474836480},'job-id':'job1'}}
    {'execute':'blockdev-add','arguments':{'driver':'file','node-name':'drive_sn$i','filename':'/root/sn$i'}}
    {'execute':'blockdev-create','arguments':{'options': {'driver': 'qcow2','file':'drive_sn$i','size':21474836480,'backing-file':'/home/kvm_autotest_root/images/rhel77-64-virtio.qcow2','backing-fmt':'qcow2'},'job-id':'job2'}}
    {'execute':'blockdev-add','arguments':{'driver':'qcow2','node-name':'sn$i','file':'drive_sn$i','backing':'drive_image1'}}
    {'execute':'job-dismiss','arguments':{'id':'job1'}}
    {'execute':'job-dismiss','arguments':{'id':'job2'}}

 3.Continue vm
   (qemu)cont

 After step3, qemu core dump with info: 
   'virtio_scsi_ctx_check: Assertion `blk_get_aio_context(d->conf.blk) == s->ctx' failed'

Additional info:
   virtio_blk works well.

Comment 5 Kevin Wolf 2019-04-23 13:21:45 UTC
This is what happens:

1. The virtio device first puts the nodes 'drive_image1' and 'file_base' into the AioContext of the iothread, but spec.
2. blockdev-add creates a new node 'sn1' in the main AioContext.
3. 'drive_image1' is added as a backing file to 'sn1'. Because a backing file must be in the same AioContext as its overlay, this pulls 'drive_image1' back into the main AioContext (see bdrv_open_backing_file()).
4. When processing the next request, virtio-scsi notices that the block node is not in the iothread AioContext any more.

What should happen in step 3 is that the overlay is instead attached to the iothread AioContext. If there are conflicting requirements with respect to the AioContext a node must be in, attaching it should fail.


Note You need to log in before you can comment on or make changes to this bug.