Description of problem: qemu crash when iothreads enabled, resume the guest after disk extension on the aio is native and blk disk. Version-Release number of selected component (if applicable): host: {'kvm_version': '4.18.0-175.el8.x86_64', 'qemu_version': 'qemu-kvm-core-4.2.0-12.module+el8.2.0+5858+afd073bc.x86_64'} guest: 4.18.0-177.el8.x86_64 How reproducible: 70% Steps to Reproduce: 1) Make sure that /tmp is tmpfs $ mount | grep /tmp tmpfs on /tmp type tmpfs (rw,nosuid,nodev,seclabel) 2) Create a raw image for the loop device $ ./qemu-img create -f raw /tmp/test.raw 50M 3) Create a loop device from the test image and make it writable for my user # losetup /dev/loop6 /tmp/test.raw # chmod 666 /dev/loop6 4) Create the qcow2 overlay $ ./qemu-img create -f qcow2 /dev/loop6 500M 5) Start a guest that uses this empty disk with an iothread and aio=native /usr/libexec/qemu-kvm \ -name 'avocado-vt-vm1' \ -sandbox on \ -machine pc \ -nodefaults \ -device VGA,bus=pci.0,addr=0x2 \ -m 2G \ -smp 12,maxcpus=12,cores=6,threads=1,dies=1,sockets=2 \ -cpu 'Opteron_G5',+kvm_pv_unhalt \ \ -device pvpanic,ioport=0x505,id=idJenAbB \ \ -device qemu-xhci,id=usb1,bus=pci.0,addr=0x3 \ -object iothread,id=iothread0 \ -object iothread,id=iothread1 \ -object iothread,id=iothread2 \ -device virtio-scsi-pci,id=virtio_scsi_pci0,bus=pci.0,addr=0x4,iothread=iothread0 \ -blockdev node-name=file_image1,driver=file,aio=threads,filename=/home/kvm_autotest_root/images/rhel820-64-virtio-scsi.qcow2,cache.direct=on,cache.no-flush=off \ -blockdev node-name=drive_image1,driver=qcow2,cache.direct=on,cache.no-flush=off,file=file_image1 \ -device scsi-hd,id=image1,drive=drive_image1,write-cache=on \ -blockdev node-name=file_stg1,driver=host_device,aio=native,filename=/dev/loop6,cache.direct=on,cache.no-flush=off \ -blockdev node-name=drive_stg1,driver=qcow2,cache.direct=on,cache.no-flush=off,file=file_stg1 \ -device virtio-blk-pci,id=stg1,drive=drive_stg1,write-cache=on,rerror=stop,werror=stop,serial=TARGET_DISK0,bus=pci.0,addr=0x5,iothread=iothread1 \ -device virtio-net-pci,mac=9a:bb:e1:81:7e:f5,id=id1XDlV6,netdev=idgQKvAZ,bus=pci.0,addr=0x6 \ -netdev tap,id=idgQKvAZ,vhost=on \ -device usb-tablet,id=usb-tablet1,bus=usb1.0,port=1 \ -vnc :5 \ -rtc base=utc,clock=host,driftfix=slew \ -boot menu=off,order=cdn,once=c,strict=off \ -enable-kvm -monitor stdio \ 6) Boot vm and execute (guest) # dd if=/dev/urandom of=/dev/vda oflag=direct bs=500M It will hit io-error and guest in paused status (qemu) info status VM status: paused (io-error) 7) increase the available space 100M $ ./qemu-img resize -f raw /tmp/test.raw 150M Image resized. # losetup -c /dev/loop6 8) Continue the guest (qemu) info status VM status: paused (io-error) (qemu) cont 9) repeat steps 7 and 8 with small increases of the size until the disk size greater than io request. ./qemu-img resize -f raw /tmp/test.raw 250M && losetup -c /dev/loop6 ./qemu-img resize -f raw /tmp/test.raw 350M && losetup -c /dev/loop6 ./qemu-img resize -f raw /tmp/test.raw 450M && losetup -c /dev/loop6 ./qemu-img resize -f raw /tmp/test.raw 550M && losetup -c /dev/loop6 Actual results: qemu coredump Expected results: after step 9,the guest return to running status Additional info: Same operation not found issue on aio threads disks Same operation not found issue when iothread disable Same operation not found issue on slow train qemu-kvm-2.12.0-99.module+el8.2.0+5827+8c39933c Same issue found on qemu-kvm-4.2.0-13.module+el8.2.0+5898+fb4bceae and older version like as 5220 Reproduce by automation script: python ConfigTest.py --testcase=disk_extension.aio_native.with_virtio_blk --guestname=RHEL.8.2 --driveformat=virtio_scsi --platform=x86_64 --clone=no --customsparams="iothread_scheme=roundrobin\n iothreads = iothread0 iothread1 iothread2\nimage_iothread = AUTO" --nrepeat=5
#0 0x00007f7d345f770f in raise () at /lib64/libc.so.6 #1 0x00007f7d345e1b25 in abort () at /lib64/libc.so.6 #2 0x00007f7d345e19f9 in _nl_load_domain.cold.0 () at /lib64/libc.so.6 #3 0x00007f7d345efcc6 in .annobin_assert.c_end () at /lib64/libc.so.6 #4 0x000055ba2eeacd14 in blk_get_aio_context (blk=0x55ba31a49f60) at block/block-backend.c:1904 #5 0x000055ba2eeacd14 in blk_get_aio_context (blk=0x55ba31a49f60) at block/block-backend.c:1898 #6 0x000055ba2ec50c62 in virtio_blk_rw_complete (opaque=0x7f7d18030a90, ret=-28) at /usr/src/debug/qemu-kvm-4.2.0-12.module+el8.2.0+5858+afd073bc.x86_64/hw/block/virtio-blk.c:121 #7 0x000055ba2eeab2ce in blk_aio_complete (acb=0x55ba313a0fa0) at block/block-backend.c:1339 #8 0x000055ba2ef565a3 in coroutine_trampoline (i0=<optimized out>, i1=<optimized out>) at util/coroutine-ucontext.c:115 #9 0x00007f7d3460d2e0 in __start_context () at /lib64/libc.so.6 #10 0x00007f7d27ffdc40 in () #11 0x0000000000000000 in () #0 0x00007fd1ae62770f in raise () from /lib64/libc.so.6 [Current thread is 1 (Thread 0x7fd1b3ac7f00 (LWP 22721))] (gdb) bt #0 0x00007fd1ae62770f in raise () at /lib64/libc.so.6 #1 0x00007fd1ae611b25 in abort () at /lib64/libc.so.6 #2 0x00007fd1ae6119f9 in _nl_load_domain.cold.0 () at /lib64/libc.so.6 #3 0x00007fd1ae61fcc6 in .annobin_assert.c_end () at /lib64/libc.so.6 #4 0x000055f150f3484c in qemu_co_mutex_unlock (mutex=mutex@entry=0x55f151a692c0) at util/qemu-coroutine-lock.c:285 #5 0x000055f150e6ae02 in qcow2_co_pwritev_task (l2meta=<optimized out>, qiov_offset=<optimized out>, qiov=0x7fd19c3b6078, bytes=<optimized out>, offset=<optimized out>, file_cluster_offset=<optimized out>, bs=0x55f151a61740) at block/qcow2.c:2458 #6 0x000055f150e6ae02 in qcow2_co_pwritev_task_entry (task=<optimized out>) at block/qcow2.c:2471 #7 0x000055f150eacd71 in aio_task_co (opaque=0x55f152bb28a0) at block/aio_task.c:45 #8 0x000055f150f355a3 in coroutine_trampoline (i0=<optimized out>, i1=<optimized out>) at util/coroutine-ucontext.c:115 #9 0x00007fd1ae63d2e0 in __start_context () at /lib64/libc.so.6 #10 0x00007fd1a659cc40 in () #11 0x0000000000000000 in () http://fileshare.englab.nay.redhat.com/pub/section2/images_backup/bug1812765/
Hit same issue on 4.18.0-193.2.1.el8_2.x86_64 qemu-kvm-common-4.2.0-21.module+el8.2.1+6586+8b7713b9.x86_64 [8.2.1-AV] -1- Q35 + Seabios + 8.3 + Luks + Virtio_scsi + Local + aio_threads http://fileshare.englab.nay.redhat.com/pub/section2/images_backup/qlogs/%5b8.2.1-AV%5d-1-Q35+Seabios+8.3+Luks+Virtio_scsi+Local+aio_threads/test-results/150-Host_RHEL.m8.u2.product_av.luks.virtio_scsi.up.virtio_net.Guest.RHEL.8.3.0.x86_64.io-github-autotest-qemu.disk_extension.aio_native.with_virtio_blk.q35/debug.log workspace/bin/python ConfigTest.py --testcase=disk_extension.aio_native.with_virtio_blk.q35 --iothread_scheme=roundrobin --nr_iothreads=2 --platform=x86_64 --guestname=RHEL.8.3.0 --driveformat=virtio_scsi --nicmodel=virtio_net --imageformat=luks --machines=q35 --customsparams="image_aio=threads" --clone=no #0 0x00007f1f1d93c70f in raise () at /lib64/libc.so.6 #1 0x00007f1f1d926b25 in abort () at /lib64/libc.so.6 #2 0x00007f1f1d9269f9 in _nl_load_domain.cold.0 () at /lib64/libc.so.6 #3 0x00007f1f1d934cc6 in .annobin_assert.c_end () at /lib64/libc.so.6 #4 0x0000563130845b44 in blk_get_aio_context (blk=0x56313471f240) at block/block-backend.c:1968 #5 0x0000563130845b44 in blk_get_aio_context (blk=0x56313471f240) at block/block-backend.c:1962 #6 0x00005631305e90f2 in virtio_blk_rw_complete (opaque=0x7f1f0c127410, ret=0) at /usr/src/debug/qemu-kvm-4.2.0-21.module+el8.2.1+6586+8b7713b9.x86_64/hw/block/virtio-blk.c:121 #7 0x00005631308440de in blk_aio_complete (acb=0x5631331007a0) at block/block-backend.c:1375 #8 0x00005631308ef1b3 in coroutine_trampoline (i0=<optimized out>, i1=<optimized out>) at util/coroutine-ucontext.c:115 #9 0x00007f1f1d9522e0 in __start_context () at /lib64/libc.so.6 #10 0x00007f1f15976c40 in () #11 0x0000000000000000 in () coredump file:http://fileshare.englab.nay.redhat.com/pub/section2/images_backup/bug1812765/core.qemu-kvm.0.0b04ebff4afa49869e0f6a53233abb01.482258.1589822534000000.lz4
This issue is a also reproducible with upstream code. I've just posted a patch addressing it: - https://lists.gnu.org/archive/html/qemu-devel/2020-06/msg00304.html Sergio.
v2: https://lists.nongnu.org/archive/html/qemu-devel/2020-06/msg00667.html was accepted, setting ITR = 8.2.1. To be backported once patches are pulled upstream.
Verified on qemu-kvm-common-4.2.0-28.module+el8.2.1+7211+16dfe810.x86_64. python ConfigTest.py --testcase=disk_extension.aio_native.with_virtio_blk --guestname=RHEL.8.3.0 --driveformat=virtio_scsi --platform=x86_64 --clone=no --iothread_scheme=roundrobin --nr_iothreads=2 --nrepeat=20 python ConfigTest.py --testcase=disk_extension --guestname=RHEL.8.3.0 --driveformat=virtio_scsi,virtio_blk --platform=x86_64 --clone=no --iothread_scheme=roundrobin --nr_iothreads=2 --nrepeat=10 python ConfigTest.py --testcase=disk_extension.aio_native.with_virtio_blk --guestname=RHEL.8.3.0 --driveformat=virtio_scsi --platform=x86_64 --clone=no --iothread_scheme=roundrobin --nr_iothreads=2 --nrepeat=10 --customsparams="qemu_force_use_drive_expression = yes" No issue found with above commands.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2020:3172