Bug 1844343 - Occurred core dump on hotunplug disk that is doing IO
Summary: Occurred core dump on hotunplug disk that is doing IO
Keywords:
Status: CLOSED DUPLICATE of bug 1812399
Alias: None
Product: Red Hat Enterprise Linux Advanced Virtualization
Classification: Red Hat
Component: qemu-kvm
Version: 8.2
Hardware: x86_64
OS: Linux
high
high
Target Milestone: rc
: 8.3
Assignee: Sergio Lopez
QA Contact: qing.wang
URL:
Whiteboard:
: 1869994 (view as bug list)
Depends On: 1812399 1888131
Blocks: 1844983 1897525
TreeView+ depends on / blocked
 
Reported: 2020-06-05 07:22 UTC by qing.wang
Modified: 2020-11-26 01:13 UTC (History)
10 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
: 1844983 1897525 (view as bug list)
Environment:
Last Closed: 2020-11-13 12:11:21 UTC
Type: Bug
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Knowledge Base (Solution) 5341841 0 None None None 2020-09-21 05:57:00 UTC

Description qing.wang 2020-06-05 07:22:23 UTC
Description of problem:

qemu crash when hotunplug disk that is doing io.

02:45:13 INFO | Start to unplug devices "stg0" by monitor qmpmonitor1.
02:45:13 DEBUG| (monitor avocado-vt-vm1.qmpmonitor1) Sending command 'device_del' 
02:45:13 DEBUG| Send command: {'execute': 'device_del', 'arguments': {'id': 'stg0'}, 'id': 'NPwoGmu6'}
02:45:13 INFO | [qemu output] qemu-kvm: /builddir/build/BUILD/qemu-4.2.0/hw/scsi/virtio-scsi.c:250: virtio_scsi_ctx_check: Assertion `blk_get_aio_context(d->conf.blk) == s->ctx' failed.
02:45:36 WARNI| registers is not alive. Can't query the avocado-vt-vm1 status
02:45:42 DEBUG| Attempting to log into 'avocado-vt-vm1' (timeout 360s)
02:45:42 INFO | [qemu output] /tmp/aexpect_UtyR1AXR/aexpect-rly3turi.sh: line 1: 268982 Aborted                 (core dumped) MALLOC_PERTURB_=1 /usr/libexec/qemu-kvm -S -name 'avocado-vt-vm1' -sandbox on -machine q35 -device pcie-root-port,id=pcie-root-port-0,multifunction=on,bus=pcie.0,addr=0x1,chassis=1 -device pcie-pci-bridge,id=pcie-pci-bridge-0,addr=0x0,bus=pcie-root-port-0 -nodefaults -device VGA,bus=pcie.0,addr=0x2 -m 30720 -smp 12,maxcpus=12,cores=6,threads=1,dies=1,sockets=2 -cpu 'Skylake-Server',+kvm_pv_unhalt -chardev socket,server,path=/var/tmp/avocado_6j4emk2s/monitor-qmpmonitor1-20200605-023351-cxpRnbd9,id=qmp_id_qmpmonitor1,nowait -mon chardev=qmp_id_qmpmonitor1,mode=control -chardev socket,server,path=/var/tmp/avocado_6j4emk2s/monitor-catch_monitor-20200605-023351-cxpRnbd9,id=qmp_id_catch_monitor,nowait -mon chardev=qmp_id_catch_monitor,mode=control -device pvpanic,ioport=0x505,id=id0yBSp5 -chardev socket,server,path=/var/tmp/avocado_6j4emk2s/serial-serial0-20200605-023351-cxpRnbd9,id=chardev_serial0,nowait -device isa-serial,id=serial0,chardev=chardev_serial0 -chardev socket,id=seabioslog_id_20200605-023351-cxpRnbd9,path=/var/tmp/avocado_6j4emk2s/seabios-20200605-023351-cxpRnbd9,server,nowait -device isa-debugcon,chardev=seabioslog_id_20200605-023351-cxpRnbd9,iobase=0x402 -device pcie-root-port,id=pcie-root-port-1,port=0x1,addr=0x1.0x1,bus=pcie.0,chassis=2 -device qemu-xhci,id=usb1,bus=pcie-root-port-1,addr=0x0 -device usb-tablet,id=usb-tablet1,bus=usb1.0,port=1 -object iothread,id=iothread0 -object iothread,id=iothread1 -device pcie-root-port,id=pcie-root-port-2,port=0x2,addr=0x1.0x2,bus=pcie.0,chassis=3 -device virtio-scsi-pci,id=virtio_scsi_pci0,bus=pcie-root-port-2,addr=0x0,iothread=iothread0 -blockdev node-name=file_image1,driver=file,aio=threads,filename=/home/kvm_autotest_root/images/rhel830-64-virtio-scsi.qcow2,cache.direct=on,cache.no-flush=off -blockdev node-name=drive_image1,driver=qcow2,cache.direct=on,cache.no-flush=off,file=file_image1 -device scsi-hd,id=image1,drive=drive_image1,write-cache=on -device pcie-root-port,id=pcie-root-port-3,port=0x3,addr=0x1.0x3,bus=pcie.0,chassis=4 -device virtio-net-pci,mac=9a:03:3d:56:61:b2,id=idYARjM3,netdev=idhVJiQ3,bus=pcie-root-port-3,addr=0x0 -netdev tap,id=idhVJiQ3,vhost=on,vhostfd=21,fd=6 -vnc :0 -rtc base=utc,clock=host,driftfix=slew -boot menu=off,order=cdn,once=c,strict=off -enable-kvm -device pcie-root-port,id=pcie_extra_root_port_0,multifunction=on,bus=pcie.0,addr=0x3,chassis=5


#0  0x00007fa9476d370f in raise () at /lib64/libc.so.6
#1  0x00007fa9476bdb25 in abort () at /lib64/libc.so.6
#2  0x00007fa9476bd9f9 in _nl_load_domain.cold.0 () at /lib64/libc.so.6
#3  0x00007fa9476cbcc6 in .annobin_assert.c_end () at /lib64/libc.so.6
#4  0x0000561e59c587c8 in virtio_scsi_ctx_check
    (d=0x561e5c347970, s=<optimized out>, s=<optimized out>)
    at /usr/src/debug/qemu-kvm-4.2.0-22.module+el8.2.1+6758+cb8d64c2.x86_64/hw/scsi/virtio-scsi.c:250
#5  0x0000561e59c587c8 in virtio_scsi_ctx_check
    (s=0x561e5d049f70, s=0x561e5d049f70, d=0x561e5c347970)
    at /usr/src/debug/qemu-kvm-4.2.0-22.module+el8.2.1+6758+cb8d64c2.x86_64/hw/scsi/virtio-scsi.c:247
#6  0x0000561e59c587c8 in virtio_scsi_handle_cmd_req_prepare
    (req=0x7fa930012ad0, s=0x561e5d049f70)
    at /usr/src/debug/qemu-kvm-4.2.0-22.module+el8.2.1+6758+cb8d64c2.x86_64/hw/scsi/virtio-scsi.c:569
#7  0x0000561e59c587c8 in virtio_scsi_handle_cmd_vq
    (s=s@entry=0x561e5d049f70, vq=vq@entry=0x7fa91c6a5140)
    at /usr/src/debug/qemu-kvm-4.2.0-22.module+el8.2.1+6758+cb8d64c2.x86_64/hw/scsi/virtio-scsi.c:612
#8  0x0000561e59c5948e in virtio_scsi_data_plane_handle_cmd
    (vdev=<optimized out>, vq=0x7fa91c6a5140)
    at /usr/src/debug/qemu-kvm-4.2.0-22.module+el8.2.1+6758+cb8d64c2.x86_64/hw/scsi/virtio-scsi---Type <RET> for more, q to quit, c to continue without paging--c
dataplane.c:60
#9  0x0000561e59c66fbe in virtio_queue_notify_aio_vq (vq=<optimized out>) at /usr/src/debug/qemu-kvm-4.2.0-22.module+el8.2.1+6758+cb8d64c2.x86_64/hw/virtio/virtio.c:2243
#10 0x0000561e59f2a046 in run_poll_handlers_once (ctx=ctx@entry=0x561e5be1bb70, timeout=timeout@entry=0x7fa93f70e658) at util/aio-posix.c:517
#11 0x0000561e59f2afba in run_poll_handlers (timeout=0x7fa93f70e658, max_ns=4000, ctx=0x561e5be1bb70) at util/aio-posix.c:562
#12 0x0000561e59f2afba in try_poll_mode (timeout=0x7fa93f70e658, ctx=0x561e5be1bb70) at util/aio-posix.c:597
#13 0x0000561e59f2afba in aio_poll (ctx=0x561e5be1bb70, blocking=blocking@entry=true) at util/aio-posix.c:639
#14 0x0000561e59d057f4 in iothread_run (opaque=0x561e5be02b60) at iothread.c:75
#15 0x0000561e59f2cd84 in qemu_thread_start (args=0x561e5be1c080) at util/qemu-thread-posix.c:519
#16 0x00007fa947a662de in start_thread () at /lib64/libpthread.so.0
#17 0x00007fa947797e83 in clone () at /lib64/libc.so.6

Version-Release number of selected component (if applicable):


How reproducible:
70% happened in automation.

Steps to Reproduce:
1.boot vm
/usr/libexec/qemu-kvm \
    -S  \
    -name 'avocado-vt-vm1'  \
    -sandbox on  \
    -machine q35 \
    -device pcie-root-port,id=pcie-root-port-0,multifunction=on,bus=pcie.0,addr=0x1,chassis=1 \
    -device pcie-pci-bridge,id=pcie-pci-bridge-0,addr=0x0,bus=pcie-root-port-0  \
    -nodefaults \
    -device VGA,bus=pcie.0,addr=0x2 \
    -m 30720  \
    -smp 12,maxcpus=12,cores=6,threads=1,dies=1,sockets=2  \
    -cpu 'Skylake-Server',+kvm_pv_unhalt \
    -chardev socket,server,path=/var/tmp/avocado_6j4emk2s/monitor-qmpmonitor1-20200605-023351-cxpRnbd9,id=qmp_id_qmpmonitor1,nowait  \
    -mon chardev=qmp_id_qmpmonitor1,mode=control \
    -chardev socket,server,path=/var/tmp/avocado_6j4emk2s/monitor-catch_monitor-20200605-023351-cxpRnbd9,id=qmp_id_catch_monitor,nowait  \
    -mon chardev=qmp_id_catch_monitor,mode=control \
    -device pvpanic,ioport=0x505,id=id0yBSp5 \
    -chardev socket,server,path=/var/tmp/avocado_6j4emk2s/serial-serial0-20200605-023351-cxpRnbd9,id=chardev_serial0,nowait \
    -device isa-serial,id=serial0,chardev=chardev_serial0  \
    -chardev socket,id=seabioslog_id_20200605-023351-cxpRnbd9,path=/var/tmp/avocado_6j4emk2s/seabios-20200605-023351-cxpRnbd9,server,nowait \
    -device isa-debugcon,chardev=seabioslog_id_20200605-023351-cxpRnbd9,iobase=0x402 \
    -device pcie-root-port,id=pcie-root-port-1,port=0x1,addr=0x1.0x1,bus=pcie.0,chassis=2 \
    -device qemu-xhci,id=usb1,bus=pcie-root-port-1,addr=0x0 \
    -device usb-tablet,id=usb-tablet1,bus=usb1.0,port=1 \
    -object iothread,id=iothread0 \
    -object iothread,id=iothread1 \
    -device pcie-root-port,id=pcie-root-port-2,port=0x2,addr=0x1.0x2,bus=pcie.0,chassis=3 \
    -device virtio-scsi-pci,id=virtio_scsi_pci0,bus=pcie-root-port-2,addr=0x0,iothread=iothread0 \
    -blockdev node-name=file_image1,driver=file,aio=threads,filename=/home/kvm_autotest_root/images/rhel830-64-virtio-scsi.qcow2,cache.direct=on,cache.no-flush=off \
    -blockdev node-name=drive_image1,driver=qcow2,cache.direct=on,cache.no-flush=off,file=file_image1 \
    -device scsi-hd,id=image1,drive=drive_image1,write-cache=on \
    -device pcie-root-port,id=pcie-root-port-3,port=0x3,addr=0x1.0x3,bus=pcie.0,chassis=4 \
    -device virtio-net-pci,mac=9a:03:3d:56:61:b2,id=idYARjM3,netdev=idhVJiQ3,bus=pcie-root-port-3,addr=0x0  \
    -netdev tap,id=idhVJiQ3,vhost=on,vhostfd=21,fd=6  \
    -vnc :0  \
    -rtc base=utc,clock=host,driftfix=slew  \
    -boot menu=off,order=cdn,once=c,strict=off \
    -enable-kvm \
    -device pcie-root-port,id=pcie_extra_root_port_0,multifunction=on,bus=pcie.0,addr=0x3,chassis=5

2.hotpulg one disk
 {"execute": "blockdev-add", "arguments": {"node-name": "file_stg0", "driver": "file", "aio": "threads", "filename": "/home/kvm_autotest_root/images/storage0.qcow2", "cache": {"direct": true, "no-flush": false}}, "id": "bfCXHvne"}
 {"execute": "blockdev-add", "arguments": {"node-name": "drive_stg0", "driver": "qcow2", "cache": {"direct": true, "no-flush": false}, "file": "file_stg0"}, "id": "i2JYBkct"}
 {"execute": "device_add", "arguments": {"driver": "scsi-hd", "id": "stg0", "drive": "drive_stg0", "write-cache": "on", "bus": "virtio_scsi_pci0.0"}, "id": "ODAFuMrY"}

3.execute io test on the new disk in guest
 /home/iozone_inst/src/current/iozone -az -g 10G -y 32k -i 0 -i 1 -I -f /mnt/sdb1/iozone_test

4.hot-unplug disk

 {"execute": "device_del", "arguments": {"id": "stg0"}, "id": "O6HOCEti"}

5 remove blockdev node
 {"execute": "blockdev-del", "arguments": {"node-name": "drive_stg0"}, "id": "xVEiOfoF"}
 {"execute": "blockdev-del", "arguments": {"node-name": "file_stg0"}, "id": "wlFMohtd"}
 
6. repeat step 2-5 50 times

Actual results:
crash happened on step 

Expected results:
no crash

Additional info:
automation script:
python3 ConfigTest.py --testcase=hotplug_unplug_during_io_repeat.default.q35 --iothread_scheme=roundrobin --nr_iothreads=2 --platform=x86_64 --guestname=RHEL.8.3.0 --driveformat=virtio_scsi --nicmodel=virtio_net --imageformat=qcow2 --machines=q35 --customsparams="image_aio=threads" --clone=no

Comment 2 qing.wang 2020-06-05 09:24:07 UTC
Hit same issue on
{'kvm_version': '4.18.0-193.13.el8.x86_64', 'qemu_version': 'qemu-kvm-4.2.0-19.module+el8.3.0+6478+69f490bb.x86_64'}

Comment 4 qing.wang 2020-06-09 05:59:30 UTC
Not hit this issue on 
3.10.0-1127.el7.x86_64
qemu-kvm-common-rhev-2.12.0-44.el7_8.2.x86_64

Comment 6 Sergio Lopez 2020-06-09 07:34:26 UTC
I'm getting a "403 Forbidden" when trying to download the coredump from http://fileshare.englab.nay.redhat.com/pub/section2/images_backup/qbugs/1844343/core.qemu-kvm.0.a044351966c04920bece765d4fd501e7.268982.1591339513000000.lz4.

Could you please give me access or upload it to some other space?

Thanks,
Sergio.

Comment 7 Sergio Lopez 2020-06-11 08:51:03 UTC
I've reproduced the issue and started a discussion about the root cause upstream:

 - https://lists.nongnu.org/archive/html/qemu-block/2020-06/msg00634.html

Comment 8 Sergio Lopez 2020-06-29 13:43:44 UTC
The root cause is that virtio-scsi may access a scsi disk that it's not actually available. Maxim sent a patchset that addresses this issue:

https://patchew.org/QEMU/20200511160951.8733-1-mlevitsk@redhat.com/

I tried this on my test environment (where I reproduced the issue before), and seems to fix the problem.

Comment 10 John Ferlan 2020-06-30 15:58:45 UTC
(In reply to Sergio Lopez from comment #8)
> The root cause is that virtio-scsi may access a scsi disk that it's not
> actually available. Maxim sent a patchset that addresses this issue:
> 
> https://patchew.org/QEMU/20200511160951.8733-1-mlevitsk@redhat.com/
> 
> I tried this on my test environment (where I reproduced the issue before),
> and seems to fix the problem.

See bug 1812399 for reference - we might be able to duplicate to that. Markus has also done a lot of work recently upstream related to device realization timing with when a device is really ready for use.

Comment 11 Sergio Lopez 2020-06-30 16:19:46 UTC
(In reply to John Ferlan from comment #10)
> (In reply to Sergio Lopez from comment #8)
> > The root cause is that virtio-scsi may access a scsi disk that it's not
> > actually available. Maxim sent a patchset that addresses this issue:
> > 
> > https://patchew.org/QEMU/20200511160951.8733-1-mlevitsk@redhat.com/
> > 
> > I tried this on my test environment (where I reproduced the issue before),
> > and seems to fix the problem.
> 
> See bug 1812399 for reference - we might be able to duplicate to that.
> Markus has also done a lot of work recently upstream related to device
> realization timing with when a device is really ready for use.

I'd say the root cause is the same, but the reproducers are different and trigger the issue from different code paths, so perhaps it's worth keeping both BZs.

Comment 13 qing.wang 2020-07-02 09:44:54 UTC
Hit same issue on qemu-kvm-5.0.0-0.module+el8.3.0+6612+6b86f0c9.x86_64

#0  0x00007f5ee90d77ff in raise () at /lib64/libc.so.6
#1  0x00007f5ee90c1c35 in abort () at /lib64/libc.so.6
#2  0x00007f5ee90c1b09 in _nl_load_domain.cold.0 () at /lib64/libc.so.6
#3  0x00007f5ee90cfde6 in .annobin_assert.c_end () at /lib64/libc.so.6
#4  0x0000556962420748 in virtio_scsi_ctx_check
    (d=0x556964205d80, s=<optimized out>, s=<optimized out>)
    at /usr/src/debug/qemu-kvm-5.0.0-0.module+el8.3.0+6612+6b86f0c9.x86_64/hw/scsi/virtio-scsi.c:250
#5  0x0000556962420748 in virtio_scsi_ctx_check
    (s=0x55696500eec0, s=0x55696500eec0, d=0x556964205d80)
    at /usr/src/debug/qemu-kvm-5.0.0-0.module+el8.3.0+6612+6b86f0c9.x86_64/hw/scsi/virtio-scsi.c:247
#6  0x0000556962420748 in virtio_scsi_handle_cmd_req_prepare
    (req=0x7f5ed403cae0, s=0x55696500eec0)
    at /usr/src/debug/qemu-kvm-5.0.0-0.module+el8.3.0+6612+6b86f0c9.x86_64/hw/scsi/virtio-scsi.c:569
#7  0x0000556962420748 in virtio_scsi_handle_cmd_vq
    (s=s@entry=0x55696500eec0, vq=vq@entry=0x7f5ee002d140)
    at /usr/src/debug/qemu-kvm-5.0.0-0.module+el8.3.0+6612+6b86f0c9.x86_64/hw/scsi/virtio-scsi.c:612
#8  0x000055696242149e in virtio_scsi_data_plane_handle_cmd
    (vdev=<optimized out>, vq=0x7f5ee002d140)
    at /usr/src/debug/qemu-kvm-5.0.0-0.module+el8.3.0+6612+6b86f0c9.x86_64/hw/scsi/virt--Type <RET> for more, q to quit, c to continue without paging--
io-scsi-dataplane.c:60
#9  0x000055696242f0ae in virtio_queue_notify_aio_vq (vq=<optimized out>)
    at /usr/src/debug/qemu-kvm-5.0.0-0.module+el8.3.0+6612+6b86f0c9.x86_64/hw/virtio/virtio.c:2324
#10 0x0000556962700b09 in aio_dispatch_handler
    (ctx=ctx@entry=0x556963dbf510, node=0x7f5b30006e50)
    at /usr/src/debug/qemu-kvm-5.0.0-0.module+el8.3.0+6612+6b86f0c9.x86_64/util/aio-posix.c:328
#11 0x00005569627015bc in aio_dispatch_ready_handlers
    (ready_list=<optimized out>, ctx=<optimized out>)
    at /usr/src/debug/qemu-kvm-5.0.0-0.module+el8.3.0+6612+6b86f0c9.x86_64/util/aio-posix.c:358
#12 0x00005569627015bc in aio_poll (ctx=0x556963dbf510, blocking=blocking@entry=true)
    at /usr/src/debug/qemu-kvm-5.0.0-0.module+el8.3.0+6612+6b86f0c9.x86_64/util/aio-posix.c:653
#13 0x00005569624cf544 in iothread_run (opaque=0x556963ce6360)
    at /usr/src/debug/qemu-kvm-5.0.0-0.module+el8.3.0+6612+6b86f0c9.x86_64/iothread.c:75
#14 0x0000556962703db4 in qemu_thread_start (args=0x556963dc3570)
    at /usr/src/debug/qemu-kvm-5.0.0-0.module+el8.3.0+6612+6b86f0c9.x86_64/util/qemu-thread-posix.c:519
#15 0x00007f5ee947014a in start_thread () at /lib64/libpthread.so.0
#16 0x00007f5ee919d123 in clone () at /lib64/libc.so.6

http://fileshare.englab.nay.redhat.com/pub/section2/images_backup/qbugs/1844343/core.qemu-kvm.0.84149c9a23614ecd9afae03eb38e4c7a.371720.1593679868000000.lz4

Comment 14 CongLi 2020-09-21 05:57:00 UTC
*** Bug 1869994 has been marked as a duplicate of this bug. ***

Comment 22 John Ferlan 2020-11-13 12:11:21 UTC
Since the patches for bug 1812399 have been shown by comment 21 to resolve this issue, I am duplicating this bug to there.

It's perhaps notable that bug 1888131 was cloned/generated to be included in the first zstream for RHEL 8.3.0.z which thus would get the resolution to the customer sooner than the RHEL-AV 8.3.1 release. I'll leave it up to Marina to adjust the customer tracker.

*** This bug has been marked as a duplicate of bug 1812399 ***


Note You need to log in before you can comment on or make changes to this bug.