Bug 2046659

Summary: qemu crash with qemu_mutex_unlock_impl after execute blockdev-reopen with iothread
Product: Red Hat Enterprise Linux 9 Reporter: qing.wang <qinwang>
Component: qemu-kvmAssignee: Kevin Wolf <kwolf>
qemu-kvm sub component: virtio-blk,scsi QA Contact: qing.wang <qinwang>
Status: CLOSED ERRATA Docs Contact:
Severity: high    
Priority: high CC: coli, jinzhao, juzhang, kkiwi, kwolf, lijin, mrezanin, qzhang, virt-maint, xuwei, yduan
Version: 9.0Keywords: Triaged
Target Milestone: rcFlags: pm-rhel: mirror+
Target Release: ---   
Hardware: x86_64   
OS: All   
Whiteboard:
Fixed In Version: qemu-kvm-6.2.0-9.el9 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of:
: 2067118 (view as bug list) Environment:
Last Closed: 2022-05-17 12:25:28 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 2067118    

Description qing.wang 2022-01-27 06:49:59 UTC
Description of problem:
Qemu crash  after execute block-reopen with  iothread.

qemu: qemu_mutex_unlock_impl: Operation not permitted

                #0  0x00007f3fc651d7fc __pthread_kill_implementation (libc.so.6 + 0x8f7fc)
                #1  0x00007f3fc64d0676 raise (libc.so.6 + 0x42676)
                #2  0x00007f3fc64ba7d3 abort (libc.so.6 + 0x2c7d3)
                #3  0x0000564ec7fc6da2 qemu_mutex_unlock_impl (qemu-kvm + 0x966da2)
                #4  0x0000564ec7e3329f bdrv_subtree_drained_end (qemu-kvm + 0x7d329f)
                #5  0x00007f3fc6808ba0 g_slist_foreach (libglib-2.0.so.0 + 0x6dba0)
                #6  0x00007f3fc680db6f g_slist_free_full (libglib-2.0.so.0 + 0x72b6f)
                #7  0x0000564ec7debf7c qmp_blockdev_reopen (qemu-kvm + 0x78bf7c)
                #8  0x0000564ec7f0ce9f qmp_marshal_blockdev_reopen (qemu-kvm + 0x8ace9f)
                #9  0x0000564ec7fb72a7 do_qmp_dispatch_bh (qemu-kvm + 0x9572a7)
                #10 0x0000564ec7fc2cd1 aio_dispatch (qemu-kvm + 0x962cd1)
                #11 0x0000564ec7fdcf02 aio_ctx_dispatch (qemu-kvm + 0x97cf02)
                #12 0x00007f3fc67efd4f g_main_context_dispatch (libglib-2.0.so.0 + 0x54d4f)
                #13 0x0000564ec7fe9bc3 main_loop_wait (qemu-kvm + 0x989bc3)
                #14 0x0000564ec7c63d57 qemu_main_loop (qemu-kvm + 0x603d57)
                #15 0x0000564ec79bc6f2 main (qemu-kvm + 0x35c6f2)
                #16 0x00007f3fc64bb560 __libc_start_call_main (libc.so.6 + 0x2d560)
                #17 0x00007f3fc64bb60c __libc_start_main@@GLIBC_2.34 (libc.so.6 + 0x2d60c)
                #18 0x0000564ec79bbdb5 _start (qemu-kvm + 0x35bdb5)

Version-Release number of selected component (if applicable):
Red Hat Enterprise Linux release 9.0 Beta (Plow)
5.14.0-47.el9.x86_64
qemu-kvm-6.2.0-5.el9.x86_64
seabios-bin-1.15.0-1.el9.noarch
edk2-ovmf-20210527gite1999b264f1f-8.el9.noarch
virtio-win-prewhql-0.1-215.iso

Same issue also found on
qemu-kvm-6.1.0-8.el9.x86_64

How reproducible:
100%

Steps to Reproduce:
1.Create images
 qemu-img create -f qcow2 /home/kvm_autotest_root/images/stg1.qcow2 11G
 qemu-img create -f qcow2 /home/kvm_autotest_root/images/stg2.qcow2 12G

2.boot VM and enable iothread
/usr/libexec/qemu-kvm \
    -name 'avocado-vt-vm1'  \
    -sandbox on  \
    -machine q35 \
    -device pcie-root-port,id=pcie-root-port-0,multifunction=on,bus=pcie.0,addr=0x1,chassis=1 \
    -device pcie-pci-bridge,id=pcie-pci-bridge-0,addr=0x0,bus=pcie-root-port-0  \
    -nodefaults \
    -device VGA,bus=pcie.0,addr=0x2 \
    -m 8G \
    -smp 10,maxcpus=10,cores=5,threads=1,dies=1,sockets=2  \
    -cpu 'Cascadelake-Server-noTSX',+kvm_pv_unhalt \
    \
    -device pcie-root-port,id=pcie-root-port-1,port=0x1,addr=0x1.0x1,bus=pcie.0,chassis=2 \
    -device qemu-xhci,id=usb1,bus=pcie-root-port-1,addr=0x0 \
    -device usb-tablet,id=usb-tablet1,bus=usb1.0,port=1 \
    -object iothread,id=iothread0 \
    -object iothread,id=iothread1 \
    -object throttle-group,x-iops-total=40,id=group1 \
    -object throttle-group,x-iops-total=50,id=group2 \
    -device pcie-root-port,id=pcie-root-port-2,port=0x2,addr=0x1.0x2,bus=pcie.0,chassis=3 \
    -device virtio-scsi-pci,id=virtio_scsi_pci0,bus=pcie-root-port-2,addr=0x0,iothread=iothread0 \
    -blockdev node-name=file_image1,driver=file,auto-read-only=on,discard=unmap,aio=threads,filename=/home/kvm_autotest_root/images/rhel900-64-virtio-scsi.qcow2,cache.direct=on,cache.no-flush=off \
    -blockdev node-name=drive_image1,driver=qcow2,read-only=off,cache.direct=on,cache.no-flush=off,file=file_image1 \
    -device scsi-hd,id=image1,drive=drive_image1,write-cache=on \
    -blockdev node-name=file_stg1,driver=file,auto-read-only=on,discard=unmap,aio=threads,filename=/home/kvm_autotest_root/images/stg1.qcow2,cache.direct=on,cache.no-flush=off \
    -blockdev node-name=qcow2_stg1,driver=qcow2,file=file_stg1,read-only=off,cache.direct=on,cache.no-flush=off \
    -blockdev node-name=drive_stg1,driver=throttle,throttle-group=group1,file=qcow2_stg1 \
    -device scsi-hd,id=stg1,drive=drive_stg1,write-cache=on,serial=TARGET_DISK1 \
    -blockdev node-name=file_stg2,driver=file,auto-read-only=on,discard=unmap,aio=threads,filename=/home/kvm_autotest_root/images/stg2.qcow2,cache.direct=on,cache.no-flush=off \
    -blockdev node-name=qcow2_stg2,driver=qcow2,file=file_stg2,read-only=off,cache.direct=on,cache.no-flush=off \
    -blockdev node-name=drive_stg2,driver=throttle,throttle-group=group1,file=qcow2_stg2 \
    -device scsi-hd,id=stg2,drive=drive_stg2,write-cache=on,serial=TARGET_DISK2 \
    -device pcie-root-port,id=pcie-root-port-3,port=0x3,addr=0x1.0x3,bus=pcie.0,chassis=4 \
    -device virtio-net-pci,mac=9a:5c:50:fc:6e:9d,id=idsubcb0,netdev=idJyDQVf,bus=pcie-root-port-3,addr=0x0  \
    -netdev tap,id=idJyDQVf,vhost=on  \
    -vnc :5  \
    -monitor stdio \
    -qmp tcp:0:5955,server,nowait  \
    -rtc base=utc,clock=host,driftfix=slew  \
    -boot menu=off,order=cdn,once=c,strict=off \
    -enable-kvm \
    -device pcie-root-port,id=pcie_extra_root_port_0,multifunction=on,bus=pcie.0,addr=0x3,chassis=5

3.execute blockdev-reopen to change different throttle group
{"execute": "qmp_capabilities", "id": "zAbD4B7e"}
{"execute": "blockdev-reopen", "arguments": {"options": [{"driver": "throttle", "node-name": "drive_stg2", "file": "qcow2_stg2", "throttle-group": "group2"}]}, "id": "05L43QcB"}


Actual results:
after step 3 ,the qemu crashed

Expected results:
qemu should not crash

Additional info:
  It may work after remove iothread ,like as:
-device virtio-scsi-pci,id=virtio_scsi_pci0,bus=pcie-root-port-2,addr=0x0 \

Not find this issue on 
Red Hat Enterprise Linux release 8.5 (Ootpa)
4.18.0-348.el8.x86_64
qemu-kvm-6.0.0-33.module+el8.5.0+13740+349232b6.2.x86_64

But it use different interface:
{'execute': 'x-blockdev-reopen', 'arguments': {'driver': 'throttle', 'node-name': 'drive_stg2', 'file': 'qcow2_stg2', 'throttle-group': 'group2'}, 'id': 'hQKhdJ8l'}


auto case:
 python3 ConfigTest.py --testcase=throttle_operation_test.group_move.to_normal --platform=x86_64 --guestname=RHEL.9.0.0 --driveformat=virtio_scsi  --clone=no  --iothread_scheme=roundrobin --nr_iothreads=2

Comment 2 Klaus Heinrich Kiwi 2022-01-27 17:47:59 UTC
@kwolf can you take a look?

Comment 3 Klaus Heinrich Kiwi 2022-01-31 15:38:18 UTC
(In reply to Klaus Heinrich Kiwi from comment #2)
> @kwolf can you take a look?

Forgot to assign on my first try

Comment 4 Kevin Wolf 2022-02-01 20:25:53 UTC
I can reproduce this. The reproducer can actually be simplified as follows:

$ qemu-img create -f raw /tmp/test.img 64M
        
$ qemu-system-x86_64 \
    -object iothread,id=thread0 \
    -object throttle-group,x-iops-total=40,id=tg0 \
    -blockdev file,filename=/tmp/test.img,node-name=file \
    -blockdev throttle,file=file,throttle-group=tg0,node-name=disk \
    -device virtio-scsi-pci,iothread=thread0 \
    -device scsi-hd,drive=disk \
    -qmp stdio <<EOF
{"execute":"qmp_capabilities"}
{"execute": "blockdev-reopen", "arguments": {"options": [
    {"driver": "throttle","node-name": "disk", "file": "fmt",
    "throttle-group": "tg0"}]}}
EOF

The problem seems to be that qmp_blockdev_reopen() calls bdrv_subtree_drained_end() without taking the AioContext lock for the BlockDriverState.

Comment 5 Kevin Wolf 2022-02-03 14:27:38 UTC
Patches sent upstream:
https://lists.gnu.org/archive/html/qemu-block/2022-02/msg00120.html

Comment 7 Yanan Fu 2022-02-21 03:25:55 UTC
QE bot(pre verify): Set 'Verified:Tested,SanityOnly' as gating/tier1 test pass.

Comment 11 qing.wang 2022-02-22 07:31:54 UTC
Passed testing on

Red Hat Enterprise Linux release 9.0 Beta (Plow)
5.14.0-56.el9.x86_64
qemu-kvm-6.2.0-9.el9.x86_64
seabios-bin-1.15.0-1.el9.noarch
edk2-ovmf-20220126gitbb1bba3d77-2.el9.noarch
virtio-win-prewhql-0.1-215.iso

Comment 13 errata-xmlrpc 2022-05-17 12:25:28 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (new packages: qemu-kvm), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2022:2307