Bug 2190368

Summary: Qemu hang when do block jobs on multiple disks(all bind to the same iothread)
Product: Red Hat Enterprise Linux 9 Reporter: aihua liang <aliang>
Component: qemu-kvmAssignee: Kevin Wolf <kwolf>
qemu-kvm sub component: Block Jobs QA Contact: aihua liang <aliang>
Status: CLOSED CURRENTRELEASE Docs Contact:
Severity: high    
Priority: high CC: coli, jinzhao, juzhang, vgoyal, virt-maint
Version: 9.3Keywords: CustomerScenariosInitiative, Regression, Triaged
Target Milestone: rc   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2023-06-19 07:53:07 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description aihua liang 2023-04-28 07:14:37 UTC
Description of problem:
Qemu hang when do block jobs on multiple disks(all bind to the same iothread)

Version-Release number of selected component (if applicable):
kernel version:5.14.0-300.el9.x86_64
qemu-kvm version:qemu-kvm-8.0.0-1.el9

How reproducible:
100%

Steps to Reproduce:
1.Start guest with all disks binded to the same iothread.
  /usr/libexec/qemu-kvm \
     -S  \
     -name 'avocado-vt-vm1'  \
     -sandbox on  \
     -blockdev '{"node-name": "file_ovmf_code", "driver": "file", "filename": "/usr/share/OVMF/OVMF_CODE.secboot.fd", "auto-read-only": true, "discard": "unmap"}' \
     -blockdev '{"node-name": "drive_ovmf_code", "driver": "raw", "read-only": true, "file": "file_ovmf_code"}' \
     -blockdev '{"node-name": "file_ovmf_vars", "driver": "file", "filename": "/root/avocado/data/avocado-vt/avocado-vt-vm1_rhel930-64-virtio-ovmf_qcow2_filesystem_VARS.fd", "auto-read-only": true, "discard": "unmap"}' \
     -blockdev '{"node-name": "drive_ovmf_vars", "driver": "raw", "read-only": false, "file": "file_ovmf_vars"}' \
     -machine q35,memory-backend=mem-machine_mem,pflash0=drive_ovmf_code,pflash1=drive_ovmf_vars \
     -device '{"id": "pcie-root-port-0", "driver": "pcie-root-port", "multifunction": true, "bus": "pcie.0", "addr": "0x1", "chassis": 1}' \
     -device '{"id": "pcie-pci-bridge-0", "driver": "pcie-pci-bridge", "addr": "0x0", "bus": "pcie-root-port-0"}'  \
     -nodefaults \
     -device '{"driver": "VGA", "bus": "pcie.0", "addr": "0x2"}' \
     -m 30720 \
     -object '{"size": 32212254720, "id": "mem-machine_mem", "qom-type": "memory-backend-ram"}'  \
     -smp 10,maxcpus=10,cores=5,threads=1,dies=1,sockets=2  \
     -cpu 'Cascadelake-Server-noTSX',+kvm_pv_unhalt \
     -chardev socket,server=on,wait=off,path=/var/tmp/monitor-qmpmonitor1-20230427-224336-J7MQPHZS,id=qmp_id_qmpmonitor1  \
     -mon chardev=qmp_id_qmpmonitor1,mode=control \
     -chardev socket,server=on,wait=off,path=/var/tmp/monitor-catch_monitor-20230427-224336-J7MQPHZS,id=qmp_id_catch_monitor  \
     -mon chardev=qmp_id_catch_monitor,mode=control \
     -device '{"ioport": 1285, "driver": "pvpanic", "id": "idj0cRks"}' \
     -chardev socket,server=on,wait=off,path=/var/tmp/serial-serial0-20230427-224336-J7MQPHZS,id=chardev_serial0 \
     -device '{"id": "serial0", "driver": "isa-serial", "chardev": "chardev_serial0"}'  \
     -chardev socket,id=seabioslog_id_20230427-224336-J7MQPHZS,path=/var/tmp/seabios-20230427-224336-J7MQPHZS,server=on,wait=off \
     -device isa-debugcon,chardev=seabioslog_id_20230427-224336-J7MQPHZS,iobase=0x402 \
     -device '{"id": "pcie-root-port-1", "port": 1, "driver": "pcie-root-port", "addr": "0x1.0x1", "bus": "pcie.0", "chassis": 2}' \
     -device '{"driver": "qemu-xhci", "id": "usb1", "bus": "pcie-root-port-1", "addr": "0x0"}' \
     -device '{"driver": "usb-tablet", "id": "usb-tablet1", "bus": "usb1.0", "port": "1"}' \
     -object '{"qom-type": "iothread", "id": "iothread0"}' \
     -device '{"id": "pcie-root-port-2", "port": 2, "driver": "pcie-root-port", "addr": "0x1.0x2", "bus": "pcie.0", "chassis": 3}' \
     -device '{"id": "virtio_scsi_pci0", "driver": "virtio-scsi-pci", "bus": "pcie-root-port-2", "addr": "0x0", "iothread": "iothread0"}' \
     -blockdev '{"node-name": "file_image1", "driver": "file", "auto-read-only": true, "discard": "unmap", "aio": "threads", "filename": "/home/kvm_autotest_root/images/rhel930-64-virtio-ovmf.qcow2", "cache": {"direct": true, "no-flush": false}}' \
     -blockdev '{"node-name": "drive_image1", "driver": "qcow2", "read-only": false, "cache": {"direct": true, "no-flush": false}, "file": "file_image1"}' \
     -device '{"driver": "scsi-hd", "id": "image1", "drive": "drive_image1", "write-cache": "on"}' \
     -blockdev '{"node-name": "file_data1", "driver": "file", "auto-read-only": true, "discard": "unmap", "aio": "threads", "filename": "/root/avocado/data/avocado-vt/data1.qcow2", "cache": {"direct": true, "no-flush": false}}' \
     -blockdev '{"node-name": "drive_data1", "driver": "qcow2", "read-only": false, "cache": {"direct": true, "no-flush": false}, "file": "file_data1"}' \
     -device '{"driver": "scsi-hd", "id": "data1", "drive": "drive_data1", "write-cache": "on", "serial": "DATA_DISK1"}' \
     -blockdev '{"node-name": "file_data2", "driver": "file", "auto-read-only": true, "discard": "unmap", "aio": "threads", "filename": "/root/avocado/data/avocado-vt/data2.qcow2", "cache": {"direct": true, "no-flush": false}}' \
     -blockdev '{"node-name": "drive_data2", "driver": "qcow2", "read-only": false, "cache": {"direct": true, "no-flush": false}, "file": "file_data2"}' \
     -device '{"driver": "scsi-hd", "id": "data2", "drive": "drive_data2", "write-cache": "on", "serial": "DATA_DISK2"}' \
     -device '{"id": "pcie-root-port-3", "port": 3, "driver": "pcie-root-port", "addr": "0x1.0x3", "bus": "pcie.0", "chassis": 4}' \
     -device '{"driver": "virtio-net-pci", "mac": "9a:8d:92:84:2d:3a", "id": "id10C3uU", "netdev": "idQFxlLo", "bus": "pcie-root-port-3", "addr": "0x0"}'  \
     -netdev tap,id=idQFxlLo,vhost=on  \
     -vnc :0  \
     -rtc base=utc,clock=host,driftfix=slew  \
     -boot menu=off,order=cdn,once=c,strict=off \
     -enable-kvm \
     -device '{"id": "pcie_extra_root_port_0", "driver": "pcie-root-port", "multifunction": true, "bus": "pcie.0", "addr": "0x3", "chassis": 5}' \
     -monitor stdio \
     -qmp tcp:0:3000,server=on,wait=off \

2. In guest, dd some files.
   (guest)#mfks.ext4 /dev/sdb
          #mkdir /mnt/a
          #mount /dev/sdb /mnt/a
          #dd if=/dev/urandom of=/mnt/a/test bs=1M count=1000 oflag=direct

          #mkfs.ext4 /dev/sdc
          #mkdir /mnt/b
          #mount /dev/sdc /mnt/b
          #dd if=/dev/urandom of=/mnt/b/test bs=1M count=1000 oflag=direct

3. Create mirror nodes, then do block mirror on multi-disks
    #!/bin/bash
let i=1
exec 3<>/dev/tcp/localhost/3000
echo -e "{'execute':'qmp_capabilities' }" >&3
sleep 3
while [ $i -lt 3 ]
do
        echo -e "{'execute':'blockdev-create','arguments':{'options': {'driver':'file','filename':'/root/sn$i','size':2147483648},'job-id':'job1'}}" >&3
        read response <&3
        echo "$i: $response"
        sleep 5
        echo -e "{'execute':'blockdev-add','arguments':{'driver':'file','node-name':'file_mirror$i','filename':'/root/sn$i'}}" >&3
        read response <&3
        echo "$i: $response"
        sleep 3
        echo -e "{'execute':'blockdev-create','arguments':{'options': {'driver': 'qcow2','file':'file_mirror$i','size':2147483648},'job-id':'job2'}}" >&3
        read response <&3
        echo "$i: $response"
        sleep 5
        echo -e "{'execute':'blockdev-add','arguments':{'driver':'qcow2','node-name':'drive_mirror$i','file':'file_mirror$i'}}" >&3
        read response <&3
        echo "$i: $response"
        sleep 3
        echo -e "{'execute':'job-dismiss','arguments':{'id':'job1'}}" >&3
        read response <&3
        echo "$i: $response"
        sleep 1
        echo -e "{'execute':'job-dismiss','arguments':{'id':'job2'}}" >&3
        sleep 3
        read response <&3
        echo $response
let i=$i+1
done
sleep 2
echo -e "{'execute':'blockdev-mirror','arguments':{'sync': 'full', 'speed': 20000000, 'device': 'drive_data1', 'target': 'drive_mirror1', 'job-id': 'drive_data1_zeZy'}}" >&3
read response <&3
echo $response
sleep 1
echo -e "{'execute':'blockdev-mirror','arguments':{'sync': 'full', 'device': 'drive_data2', 'target': 'drive_mirror2', 'job-id': 'drive_data2_yadK'}}" >&3
read response <&3
echo $response
sleep 2

Actual results:
Qemu hang when do mirror on multi-disks.

Expected results:
Mirror on multi-disks should success

Additional info:
Both virito_blk and virtio_scsi hit this issue.
Not hit this issue when multi-disks bind to different iothreads.

Comment 2 aihua liang 2023-04-28 07:36:13 UTC
Test on qemu-kvm-7.2.0-14.el9_2, not hit this issue. So it's a regression.

Comment 6 aihua liang 2023-06-19 07:53:07 UTC
Test on qemu-kvm-8.0.0-5.el9 with cases: and run total 100 times, no this issue any more.
 (097/100) repeat25.Host_RHEL.m9.u3.ovmf.qcow2.virtio_scsi.up.virtio_net.Guest.RHEL.9.3.0.x86_64.io-github-autotest-qemu.blockdev_stream_multiple_blocks.q35: PASS (216.91 s)
 (098/100) repeat25.Host_RHEL.m9.u3.ovmf.qcow2.virtio_scsi.up.virtio_net.Guest.RHEL.9.3.0.x86_64.io-github-autotest-qemu.blockdev_inc_backup.multi_data_disks.q35: PASS (160.92 s)
 (099/100) repeat25.Host_RHEL.m9.u3.ovmf.qcow2.virtio_scsi.up.virtio_net.Guest.RHEL.9.3.0.x86_64.io-github-autotest-qemu.blockdev_mirror_multiple_blocks.q35: PASS (316.20 s)
 (100/100) repeat25.Host_RHEL.m9.u3.ovmf.qcow2.virtio_scsi.up.virtio_net.Guest.RHEL.9.3.0.x86_64.io-github-autotest-qemu.blockdev_commit.multi_data_disks.q35: PASS (118.26 s)
RESULTS    : PASS 100 | ERROR 0 | FAIL 0 | SKIP 0 | WARN 0 | INTERRUPT 0 | CANCEL 0
JOB HTML   : /root/avocado/job-results/job-2023-06-18T21.53-3fbbc3e/results.html

And also run regression test, all pass.

So will set bug's status to "Closed->CurrentRelease".