Bug 2215192

Summary: qemu crash on virtio_blk_set_status: Assertion `!s->dataplane_started' failed when hotplug/unplug virtio disks repeatedly [RHEL-8]
Product: Red Hat Enterprise Linux 9 Reporter: qing.wang <qinwang>
Component: qemu-kvmAssignee: Stefan Hajnoczi <stefanha>
qemu-kvm sub component: virtio-blk,scsi QA Contact: qing.wang <qinwang>
Status: CLOSED MIGRATED Docs Contact:
Severity: high    
Priority: high CC: aliang, chayang, coli, jinzhao, juzhang, kwolf, lijin, qizhu, stefanha, virt-maint, xuwei, ymankad, zhenyzha
Version: 9.4Keywords: CustomerScenariosInitiative, MigratedToJIRA, Reopened, Triaged
Target Milestone: rc   
Target Release: ---   
Hardware: x86_64   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2023-09-22 16:27:36 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description qing.wang 2023-06-15 03:50:50 UTC
Description of problem:
Enable iothread on disk.
Hotplug-unplug virtio blk disks repeatedly, sometimes it crashes on

qemu-kvm: ../hw/block/virtio-blk.c:1043: virtio_blk_set_status: Assertion `!s->dataplane_started' failed

Version-Release number of selected component (if applicable):
Red Hat Enterprise Linux release 8.9 Beta (Ootpa)
4.18.0-494.el8.x86_64
qemu-kvm-6.2.0-35.module+el8.9.0+19024+8193e2ac.x86_64
seabios-bin-1.16.0-3.module+el8.8.0+16781+9f4724c2.noarch

How reproducible:
<5%

Steps to Reproduce:
1.Boot vm
/usr/libexec/qemu-kvm \
     -S  \
     -name 'avocado-vt-vm1'  \
     -sandbox on  \
     -machine q35,memory-backend=mem-machine_mem \
     -device pcie-root-port,id=pcie-root-port-0,multifunction=on,bus=pcie.0,addr=0x1,chassis=1 \
     -device pcie-pci-bridge,id=pcie-pci-bridge-0,addr=0x0,bus=pcie-root-port-0  \
     -nodefaults \
     -device VGA,bus=pcie-pci-bridge-0,addr=0x1 \
     -m 8192 \
     -object memory-backend-ram,size=8192M,id=mem-machine_mem  \
     -smp 16,maxcpus=16,cores=8,threads=1,dies=1,sockets=2  \
     -cpu 'EPYC-Rome',+kvm_pv_unhalt \
     \
     -device pcie-root-port,id=pcie-root-port-1,port=0x1,addr=0x1.0x1,bus=pcie.0,chassis=2 \
     -device qemu-xhci,id=usb1,bus=pcie-root-port-1,addr=0x0 \
     -device usb-tablet,id=usb-tablet1,bus=usb1.0,port=1 \
     -object iothread,id=iothread0 \
     -object iothread,id=iothread1 \
     -blockdev '{"node-name": "file_image1", "driver": "file", "auto-read-only": true, "discard": "unmap", "aio": "threads", "filename": "/home/kvm_autotest_root/images/rhel890-64-virtio.qcow2", "cache": {"direct": true, "no-flush": false}}' \
     -blockdev '{"node-name": "drive_image1", "driver": "qcow2", "read-only": false, "cache": {"direct": true, "no-flush": false}, "file": "file_image1"}' \
     -device pcie-root-port,id=pcie-root-port-2,port=0x2,addr=0x1.0x2,bus=pcie.0,chassis=3 \
     -device virtio-blk-pci,id=image1,drive=drive_image1,bootindex=0,write-cache=on,bus=pcie-root-port-2,addr=0x0,iothread=iothread0 \
     -blockdev '{"node-name": "file_stg0", "driver": "file", "auto-read-only": true, "discard": "unmap", "aio": "threads", "filename": "/home/kvm_autotest_root/images/stg0.qcow2", "cache": {"direct": true, "no-flush": false}}' \
     -blockdev '{"node-name": "drive_stg0", "driver": "qcow2", "read-only": false, "cache": {"direct": true, "no-flush": false}, "file": "file_stg0"}' \
     -device pcie-root-port,id=pcie-root-port-3,port=0x3,addr=0x1.0x3,bus=pcie.0,chassis=4 \
     -device virtio-blk-pci,id=stg0,drive=drive_stg0,bootindex=1,write-cache=on,bus=pcie-root-port-3,addr=0x0,iothread=iothread1 \
     -blockdev '{"node-name": "file_stg1", "driver": "file", "auto-read-only": true, "discard": "unmap", "aio": "threads", "filename": "/home/kvm_autotest_root/images/stg1.qcow2", "cache": {"direct": true, "no-flush": false}}' \
     -blockdev '{"node-name": "drive_stg1", "driver": "qcow2", "read-only": false, "cache": {"direct": true, "no-flush": false}, "file": "file_stg1"}' \
     -device pcie-root-port,id=pcie-root-port-4,port=0x4,addr=0x1.0x4,bus=pcie.0,chassis=5 \
     -device virtio-blk-pci,id=stg1,drive=drive_stg1,bootindex=2,write-cache=on,bus=pcie-root-port-4,addr=0x0,iothread=iothread0 \
     -blockdev '{"node-name": "file_stg2", "driver": "file", "auto-read-only": true, "discard": "unmap", "aio": "threads", "filename": "/home/kvm_autotest_root/images/stg2.qcow2", "cache": {"direct": true, "no-flush": false}}' \
     -blockdev '{"node-name": "drive_stg2", "driver": "qcow2", "read-only": false, "cache": {"direct": true, "no-flush": false}, "file": "file_stg2"}' \
     -device pcie-root-port,id=pcie-root-port-5,port=0x5,addr=0x1.0x5,bus=pcie.0,chassis=6 \
     -device virtio-blk-pci,id=stg2,drive=drive_stg2,bootindex=3,write-cache=on,bus=pcie-root-port-5,addr=0x0,iothread=iothread1 \
     -device pcie-root-port,id=pcie-root-port-6,port=0x6,addr=0x1.0x6,bus=pcie.0,chassis=7 \
     -device virtio-net-pci,mac=9a:2d:25:2e:bd:05,id=iduknQjC,netdev=idkVAS0H,bus=pcie-root-port-6,addr=0x0  \
     -netdev tap,id=idkVAS0H,vhost=on  \
     -vnc :0  \
     -rtc base=utc,clock=host,driftfix=slew  \
     -boot menu=off,order=cdn,once=c,strict=off \
     -enable-kvm \
     -device pcie-root-port,id=pcie_extra_root_port_0,multifunction=on,bus=pcie.0,addr=0x2,chassis=8 \
     -device pcie-root-port,id=pcie_extra_root_port_1,addr=0x2.0x1,bus=pcie.0,chassis=9 \
     -device pcie-root-port,id=pcie_extra_root_port_2,addr=0x2.0x2,bus=pcie.0,chassis=10

2.unplug disks
{"execute": "device_del", "arguments": {"id": "stg1"}, "id": "JQwvBYjx"}
{"execute": "device_del", "arguments": {"id": "stg2"}, "id": "W9phaXjN"}

3.plug disks
{"execute": "device_add", "arguments": {"driver": "virtio-blk-pci", "id": "stg1", "drive": "drive_stg1", "bootindex": "2", "write-cache": "on", "bus": "pcie-root-port-4", "addr": "0x0"}, "id": "9rzTDYUj"}
{"execute": "device_add", "arguments": {"driver": "virtio-blk-pci", "id": "stg2", "drive": "drive_stg2", "bootindex": "3", "write-cache": "on", "bus": "pcie-root-port-5", "addr": "0x0"}, "id": "tomEiZ7o"}

4. loop step2-3 3000 times without delay

Actual results:
It crashes sometimes

Expected results:
no crash

Additional info:
  It looks like not a regression issue, the same issue also can be found on
qemu-kvm-6.2.0-20.module+el8.8.0+16743+e56fc0d8.x86_64

Comment 2 qing.wang 2023-06-15 04:00:05 UTC
BT:
#0  0x00007fde88adeacf raise (libc.so.6)
#1  0x00007fde88ab1ea5 abort (libc.so.6)
#2  0x00007fde88ab1d79 __assert_fail_base.cold.0 (libc.so.6)
#3  0x00007fde88ad7426 __assert_fail (libc.so.6)
#4  0x000055bd7a1175c8 virtio_blk_set_status (qemu-kvm)
#5  0x000055bd7a1474e4 virtio_set_status (qemu-kvm)
#6  0x000055bd7a05b243 virtio_pci_common_write (qemu-kvm)
#7  0x000055bd7a0f6777 memory_region_write_accessor (qemu-kvm)
#8  0x000055bd7a0f320e access_with_adjusted_size (qemu-kvm)
#9  0x000055bd7a0f62a3 memory_region_dispatch_write (qemu-kvm)
#10 0x000055bd7a0e7f2e flatview_write_continue (qemu-kvm)
#11 0x000055bd7a0e8093 flatview_write (qemu-kvm)
#12 0x000055bd7a0ebc6f address_space_write (qemu-kvm)
#13 0x000055bd7a1a28b9 kvm_cpu_exec (qemu-kvm)
#14 0x000055bd7a1a36e5 kvm_vcpu_thread_fn (qemu-kvm)
#15 0x000055bd7a2dfdd4 qemu_thread_start (qemu-kvm)
#16 0x00007fde88e5d1ca start_thread (libpthread.so.0)
#17 0x00007fde88ac9e73 __clone (libc.so.6)

coredump file:
http://fileshare.hosts.qa.psi.pek2.redhat.com/pub/section2/images_backup/qbugs/2215192/2023-06-14/core.qemu-kvm.0.5f7588420b954f1782fcbcedcfac907b.910272.1686786489000000.lz4

Comment 4 Kevin Wolf 2023-06-26 15:53:47 UTC
Stefan, another virtio-blk one, can you have a look?

Comment 5 Stefan Hajnoczi 2023-07-12 21:04:39 UTC
I tried reproducing this on qemu-kvm-8.0.0-4.el9 but couldn't trigger the assertion failure on my host.

Next I'll look at the coredump.

Comment 8 Stefan Hajnoczi 2023-07-13 13:28:06 UTC
(In reply to qing.wang from comment #2)
> BT:
> #0  0x00007fde88adeacf raise (libc.so.6)
> #1  0x00007fde88ab1ea5 abort (libc.so.6)
> #2  0x00007fde88ab1d79 __assert_fail_base.cold.0 (libc.so.6)
> #3  0x00007fde88ad7426 __assert_fail (libc.so.6)
> #4  0x000055bd7a1175c8 virtio_blk_set_status (qemu-kvm)
> #5  0x000055bd7a1474e4 virtio_set_status (qemu-kvm)
> #6  0x000055bd7a05b243 virtio_pci_common_write (qemu-kvm)
> #7  0x000055bd7a0f6777 memory_region_write_accessor (qemu-kvm)
> #8  0x000055bd7a0f320e access_with_adjusted_size (qemu-kvm)
> #9  0x000055bd7a0f62a3 memory_region_dispatch_write (qemu-kvm)
> #10 0x000055bd7a0e7f2e flatview_write_continue (qemu-kvm)
> #11 0x000055bd7a0e8093 flatview_write (qemu-kvm)
> #12 0x000055bd7a0ebc6f address_space_write (qemu-kvm)
> #13 0x000055bd7a1a28b9 kvm_cpu_exec (qemu-kvm)
> #14 0x000055bd7a1a36e5 kvm_vcpu_thread_fn (qemu-kvm)
> #15 0x000055bd7a2dfdd4 qemu_thread_start (qemu-kvm)
> #16 0x00007fde88e5d1ca start_thread (libpthread.so.0)
> #17 0x00007fde88ac9e73 __clone (libc.so.6)
> 
> coredump file:
> http://fileshare.hosts.qa.psi.pek2.redhat.com/pub/section2/images_backup/
> qbugs/2215192/2023-06-14/core.qemu-kvm.0.5f7588420b954f1782fcbcedcfac907b.
> 910272.1686786489000000.lz4

The permissions on this coredump file prevent me from downloading it. Please make the file readable. Thanks!

Comment 9 qing.wang 2023-07-17 08:01:21 UTC
It does not hit this issue with the latest version

Red Hat Enterprise Linux release 8.9 Beta (Ootpa)
4.18.0-500.el8.x86_64
qemu-kvm-6.2.0-36.module+el8.9.0+19222+f46ac890.x86_64
seabios-bin-1.16.0-3.module+el8.9.0+18724+20190c23.noarch
edk2-ovmf-20220126gitbb1bba3d77-5.el8.noarch
virtio-win-prewhql-0.1-239.iso


python ConfigTest.py --testcase=multi_disk_wild_hotplug.without_delay --platform=x86_64 --guestname=RHEL.8.9.0 --driveformat=virtio_blk  --imageformat=qcow2 --machines=q35 --firmware=default_bios --netdst=virbr0 --iothread_scheme=roundrobin --nr_iothreads=2 --customsparams="vm_mem_limit = 8G" --nrepeat=20

Comment 10 Stefan Hajnoczi 2023-07-17 14:55:21 UTC
I think this bug still exists upstream and downstream but is difficult to trigger.

It occurs when a QMP command dispatch BH is scheduled just before a vcpu writes to the VIRTIO Device Status Register. Then another vcpu must write to the same VIRTIO Device Status Register in order to reach this assertion failure. I have written about the scenario here: https://lore.kernel.org/qemu-devel/20230713194226.GA335220@fedora/

The best solution is not clear to me yet, so I started the qemu-devel discussion (see link above) to reach consensus on how to deal with this situation.

Do you want to keep this bug closed because it's too hard to reproduce/verify? (I will still pursue a fix upstream.)

Comment 11 qing.wang 2023-07-19 07:45:11 UTC
I can not reproduce this issue on the latest version, that is why I mark it as the current release

test over 50 times on the latest version

Red Hat Enterprise Linux release 8.9 Beta (Ootpa)
4.18.0-500.el8.x86_64
qemu-kvm-6.2.0-36.module+el8.9.0+19222+f46ac890.x86_64



It may reproduce on the last version or earlier 
(reproduce ratio 5%)

qemu-kvm-6.2.0-35.module+el8.9.0+19166+e262ca96.x86_64


I would like to keep it open since you mentioned it still exists in Upstream
(BTW, I am not sure you mentioned "QMP command dispatch" issue is related to 
Bug 2214985 - [qemu-kvm] no response with QMP command device_add when repeatedly hotplug/unplug virtio disks [RHEL-8]
It is a different result but they are using the same test, and the most failed result is no response with the QMP command )

Comment 14 Stefan Hajnoczi 2023-08-23 11:02:41 UTC
Sounds good to me. I have moved it to RHEL 9.

Comment 15 RHEL Program Management 2023-09-22 16:20:02 UTC
Issue migration from Bugzilla to Jira is in process at this time. This will be the last message in Jira copied from the Bugzilla bug.