Bug 1812765 - qemu with iothreads enabled crashes on resume after enospc pause for disk extension
Summary: qemu with iothreads enabled crashes on resume after enospc pause for disk ext...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux Advanced Virtualization
Classification: Red Hat
Component: qemu-kvm
Version: 8.2
Hardware: Unspecified
OS: Linux
high
high
Target Milestone: rc
: 8.0
Assignee: Sergio Lopez
QA Contact: qing.wang
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2020-03-12 06:04 UTC by qing.wang
Modified: 2020-07-28 07:13 UTC (History)
10 users (show)

Fixed In Version: qemu-kvm-4.2.0-28.module+el8.2.1+7211+16dfe810
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2020-07-28 07:12:15 UTC
Type: Bug
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2020:3172 0 None None None 2020-07-28 07:13:29 UTC

Description qing.wang 2020-03-12 06:04:24 UTC
Description of problem:

qemu crash when iothreads enabled, resume the guest after disk extension on the aio is native and blk disk.

Version-Release number of selected component (if applicable):
host:
{'kvm_version': '4.18.0-175.el8.x86_64', 'qemu_version': 'qemu-kvm-core-4.2.0-12.module+el8.2.0+5858+afd073bc.x86_64'}

guest:
4.18.0-177.el8.x86_64

How reproducible:
70%

Steps to Reproduce:
1) Make sure that /tmp is tmpfs
$ mount | grep /tmp
tmpfs on /tmp type tmpfs (rw,nosuid,nodev,seclabel)

2) Create a raw image for the loop device
$ ./qemu-img create -f raw /tmp/test.raw 50M


3) Create a loop device from the test image and make it writable for my user
# losetup /dev/loop6 /tmp/test.raw
# chmod 666 /dev/loop6

4) Create the qcow2 overlay
$ ./qemu-img create -f qcow2 /dev/loop6 500M


5) Start a guest that uses this empty disk with an iothread and aio=native
/usr/libexec/qemu-kvm \
    -name 'avocado-vt-vm1'  \
    -sandbox on  \
    -machine pc  \
    -nodefaults \
    -device VGA,bus=pci.0,addr=0x2 \
    -m 2G  \
    -smp 12,maxcpus=12,cores=6,threads=1,dies=1,sockets=2  \
    -cpu 'Opteron_G5',+kvm_pv_unhalt  \
    \
    -device pvpanic,ioport=0x505,id=idJenAbB \
    \
    -device qemu-xhci,id=usb1,bus=pci.0,addr=0x3 \
    -object iothread,id=iothread0 \
    -object iothread,id=iothread1 \
    -object iothread,id=iothread2 \
    -device virtio-scsi-pci,id=virtio_scsi_pci0,bus=pci.0,addr=0x4,iothread=iothread0 \
    -blockdev node-name=file_image1,driver=file,aio=threads,filename=/home/kvm_autotest_root/images/rhel820-64-virtio-scsi.qcow2,cache.direct=on,cache.no-flush=off \
    -blockdev node-name=drive_image1,driver=qcow2,cache.direct=on,cache.no-flush=off,file=file_image1 \
    -device scsi-hd,id=image1,drive=drive_image1,write-cache=on \
    -blockdev node-name=file_stg1,driver=host_device,aio=native,filename=/dev/loop6,cache.direct=on,cache.no-flush=off \
    -blockdev node-name=drive_stg1,driver=qcow2,cache.direct=on,cache.no-flush=off,file=file_stg1 \
    -device virtio-blk-pci,id=stg1,drive=drive_stg1,write-cache=on,rerror=stop,werror=stop,serial=TARGET_DISK0,bus=pci.0,addr=0x5,iothread=iothread1 \
    -device virtio-net-pci,mac=9a:bb:e1:81:7e:f5,id=id1XDlV6,netdev=idgQKvAZ,bus=pci.0,addr=0x6  \
    -netdev tap,id=idgQKvAZ,vhost=on \
    -device usb-tablet,id=usb-tablet1,bus=usb1.0,port=1  \
    -vnc :5  \
    -rtc base=utc,clock=host,driftfix=slew  \
    -boot menu=off,order=cdn,once=c,strict=off \
    -enable-kvm -monitor stdio \

6) Boot vm and execute
(guest) # dd if=/dev/urandom of=/dev/vda oflag=direct bs=500M

It will hit io-error and guest in paused status
(qemu) info status
VM status: paused (io-error)

7) increase the available space 100M
$ ./qemu-img resize -f raw /tmp/test.raw 150M
Image resized.
# losetup -c /dev/loop6

8) Continue the guest
(qemu) info status
VM status: paused (io-error)
(qemu) cont

9) repeat steps 7 and 8 with small increases of the size until the disk size greater than io request.
./qemu-img resize -f raw /tmp/test.raw 250M && losetup -c /dev/loop6
./qemu-img resize -f raw /tmp/test.raw 350M && losetup -c /dev/loop6
./qemu-img resize -f raw /tmp/test.raw 450M && losetup -c /dev/loop6
./qemu-img resize -f raw /tmp/test.raw 550M && losetup -c /dev/loop6

Actual results:
qemu coredump 

Expected results:
after step 9,the guest return to running status

Additional info:

Same operation not found issue on aio threads disks
Same operation not found issue when iothread disable
Same operation not found issue on slow train qemu-kvm-2.12.0-99.module+el8.2.0+5827+8c39933c

Same issue found on qemu-kvm-4.2.0-13.module+el8.2.0+5898+fb4bceae
and older version like as 5220 

Reproduce by automation script:
python ConfigTest.py --testcase=disk_extension.aio_native.with_virtio_blk --guestname=RHEL.8.2 --driveformat=virtio_scsi --platform=x86_64 --clone=no --customsparams="iothread_scheme=roundrobin\n iothreads = iothread0 iothread1 iothread2\nimage_iothread = AUTO" --nrepeat=5

Comment 1 qing.wang 2020-03-12 06:18:57 UTC
#0  0x00007f7d345f770f in raise () at /lib64/libc.so.6
#1  0x00007f7d345e1b25 in abort () at /lib64/libc.so.6
#2  0x00007f7d345e19f9 in _nl_load_domain.cold.0 () at /lib64/libc.so.6
#3  0x00007f7d345efcc6 in .annobin_assert.c_end () at /lib64/libc.so.6
#4  0x000055ba2eeacd14 in blk_get_aio_context (blk=0x55ba31a49f60) at block/block-backend.c:1904
#5  0x000055ba2eeacd14 in blk_get_aio_context (blk=0x55ba31a49f60) at block/block-backend.c:1898
#6  0x000055ba2ec50c62 in virtio_blk_rw_complete (opaque=0x7f7d18030a90, ret=-28)
    at /usr/src/debug/qemu-kvm-4.2.0-12.module+el8.2.0+5858+afd073bc.x86_64/hw/block/virtio-blk.c:121
#7  0x000055ba2eeab2ce in blk_aio_complete (acb=0x55ba313a0fa0) at block/block-backend.c:1339
#8  0x000055ba2ef565a3 in coroutine_trampoline (i0=<optimized out>, i1=<optimized out>)
    at util/coroutine-ucontext.c:115
#9  0x00007f7d3460d2e0 in __start_context () at /lib64/libc.so.6
#10 0x00007f7d27ffdc40 in  ()
#11 0x0000000000000000 in  ()


#0  0x00007fd1ae62770f in raise () from /lib64/libc.so.6
[Current thread is 1 (Thread 0x7fd1b3ac7f00 (LWP 22721))]
(gdb) bt
#0  0x00007fd1ae62770f in raise () at /lib64/libc.so.6
#1  0x00007fd1ae611b25 in abort () at /lib64/libc.so.6
#2  0x00007fd1ae6119f9 in _nl_load_domain.cold.0 () at /lib64/libc.so.6
#3  0x00007fd1ae61fcc6 in .annobin_assert.c_end () at /lib64/libc.so.6
#4  0x000055f150f3484c in qemu_co_mutex_unlock (mutex=mutex@entry=0x55f151a692c0)
    at util/qemu-coroutine-lock.c:285
#5  0x000055f150e6ae02 in qcow2_co_pwritev_task
    (l2meta=<optimized out>, qiov_offset=<optimized out>, qiov=0x7fd19c3b6078, bytes=<optimized out>, offset=<optimized out>, file_cluster_offset=<optimized out>, bs=0x55f151a61740)
    at block/qcow2.c:2458
#6  0x000055f150e6ae02 in qcow2_co_pwritev_task_entry (task=<optimized out>)
    at block/qcow2.c:2471
#7  0x000055f150eacd71 in aio_task_co (opaque=0x55f152bb28a0) at block/aio_task.c:45
#8  0x000055f150f355a3 in coroutine_trampoline (i0=<optimized out>, i1=<optimized out>)
    at util/coroutine-ucontext.c:115
#9  0x00007fd1ae63d2e0 in __start_context () at /lib64/libc.so.6
#10 0x00007fd1a659cc40 in  ()
#11 0x0000000000000000 in  ()

http://fileshare.englab.nay.redhat.com/pub/section2/images_backup/bug1812765/

Comment 2 qing.wang 2020-05-21 09:50:11 UTC
Hit same issue on

4.18.0-193.2.1.el8_2.x86_64
qemu-kvm-common-4.2.0-21.module+el8.2.1+6586+8b7713b9.x86_64

[8.2.1-AV] -1- Q35 + Seabios + 8.3 + Luks + Virtio_scsi + Local + aio_threads

http://fileshare.englab.nay.redhat.com/pub/section2/images_backup/qlogs/%5b8.2.1-AV%5d-1-Q35+Seabios+8.3+Luks+Virtio_scsi+Local+aio_threads/test-results/150-Host_RHEL.m8.u2.product_av.luks.virtio_scsi.up.virtio_net.Guest.RHEL.8.3.0.x86_64.io-github-autotest-qemu.disk_extension.aio_native.with_virtio_blk.q35/debug.log

workspace/bin/python ConfigTest.py --testcase=disk_extension.aio_native.with_virtio_blk.q35 --iothread_scheme=roundrobin --nr_iothreads=2 --platform=x86_64 --guestname=RHEL.8.3.0 --driveformat=virtio_scsi --nicmodel=virtio_net --imageformat=luks --machines=q35 --customsparams="image_aio=threads" --clone=no


#0  0x00007f1f1d93c70f in raise () at /lib64/libc.so.6
#1  0x00007f1f1d926b25 in abort () at /lib64/libc.so.6
#2  0x00007f1f1d9269f9 in _nl_load_domain.cold.0 () at /lib64/libc.so.6
#3  0x00007f1f1d934cc6 in .annobin_assert.c_end () at /lib64/libc.so.6
#4  0x0000563130845b44 in blk_get_aio_context (blk=0x56313471f240)
    at block/block-backend.c:1968
#5  0x0000563130845b44 in blk_get_aio_context (blk=0x56313471f240)
    at block/block-backend.c:1962
#6  0x00005631305e90f2 in virtio_blk_rw_complete (opaque=0x7f1f0c127410, ret=0)
    at /usr/src/debug/qemu-kvm-4.2.0-21.module+el8.2.1+6586+8b7713b9.x86_64/hw/block/virtio-blk.c:121
#7  0x00005631308440de in blk_aio_complete (acb=0x5631331007a0) at block/block-backend.c:1375
#8  0x00005631308ef1b3 in coroutine_trampoline (i0=<optimized out>, i1=<optimized out>)
    at util/coroutine-ucontext.c:115
#9  0x00007f1f1d9522e0 in __start_context () at /lib64/libc.so.6
#10 0x00007f1f15976c40 in  ()
#11 0x0000000000000000 in  ()


coredump file:http://fileshare.englab.nay.redhat.com/pub/section2/images_backup/bug1812765/core.qemu-kvm.0.0b04ebff4afa49869e0f6a53233abb01.482258.1589822534000000.lz4

Comment 3 Sergio Lopez 2020-06-02 08:18:26 UTC
This issue is a also reproducible with upstream code. I've just posted a patch addressing it:

 - https://lists.gnu.org/archive/html/qemu-devel/2020-06/msg00304.html

Sergio.

Comment 4 John Ferlan 2020-06-09 11:59:21 UTC
v2: https://lists.nongnu.org/archive/html/qemu-devel/2020-06/msg00667.html was accepted, setting ITR = 8.2.1.  To be backported once patches are pulled upstream.

Comment 16 qing.wang 2020-06-29 06:06:40 UTC
Verified on qemu-kvm-common-4.2.0-28.module+el8.2.1+7211+16dfe810.x86_64. 

python ConfigTest.py --testcase=disk_extension.aio_native.with_virtio_blk --guestname=RHEL.8.3.0 --driveformat=virtio_scsi --platform=x86_64 --clone=no --iothread_scheme=roundrobin --nr_iothreads=2  --nrepeat=20

python ConfigTest.py --testcase=disk_extension --guestname=RHEL.8.3.0 --driveformat=virtio_scsi,virtio_blk --platform=x86_64 --clone=no --iothread_scheme=roundrobin --nr_iothreads=2  --nrepeat=10

python ConfigTest.py --testcase=disk_extension.aio_native.with_virtio_blk --guestname=RHEL.8.3.0 --driveformat=virtio_scsi --platform=x86_64 --clone=no --iothread_scheme=roundrobin --nr_iothreads=2  --nrepeat=10  --customsparams="qemu_force_use_drive_expression = yes"

No issue found with above commands.

Comment 18 errata-xmlrpc 2020-07-28 07:12:15 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:3172


Note You need to log in before you can comment on or make changes to this bug.