Bug 2214985

Summary: [qemu-kvm] no response with QMP command device_add when repeatedly hotplug/unplug virtio disks [RHEL-8]
Product: Red Hat Enterprise Linux 8 Reporter: qing.wang <qinwang>
Component: qemu-kvmAssignee: Stefan Hajnoczi <stefanha>
qemu-kvm sub component: virtio-blk,scsi QA Contact: qing.wang <qinwang>
Status: CLOSED MIGRATED Docs Contact:
Severity: high    
Priority: high CC: aliang, chayang, coli, jinzhao, juzhang, kwolf, lijin, qizhu, virt-maint, xuwei, ymankad, zhenyzha
Version: 8.9Keywords: MigratedToJIRA, Triaged
Target Milestone: rc   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2023-09-22 16:27:24 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description qing.wang 2023-06-14 11:17:25 UTC
Description of problem:
hotplug-unplug virtio blk disks, it can not get response with QMP command device_add

Version-Release number of selected component (if applicable):
Red Hat Enterprise Linux release 8.9 Beta (Ootpa)
4.18.0-494.el8.x86_64
qemu-kvm-6.2.0-35.module+el8.9.0+19024+8193e2ac.x86_64
seabios-bin-1.16.0-3.module+el8.8.0+16781+9f4724c2.noarch


How reproducible:
20%

Steps to Reproduce:
1.boot vm 
/usr/libexec/qemu-kvm \
     -S  \
     -name 'avocado-vt-vm1'  \
     -sandbox on  \
     -machine q35,memory-backend=mem-machine_mem \
     -device pcie-root-port,id=pcie-root-port-0,multifunction=on,bus=pcie.0,addr=0x1,chassis=1 \
     -device pcie-pci-bridge,id=pcie-pci-bridge-0,addr=0x0,bus=pcie-root-port-0  \
     -nodefaults \
     -device VGA,bus=pcie-pci-bridge-0,addr=0x1 \
     -m 8192 \
     -object memory-backend-ram,size=8192M,id=mem-machine_mem  \
     -smp 16,maxcpus=16,cores=8,threads=1,dies=1,sockets=2  \
     -cpu 'EPYC-Rome',+kvm_pv_unhalt \
     \
     -device pcie-root-port,id=pcie-root-port-1,port=0x1,addr=0x1.0x1,bus=pcie.0,chassis=2 \
     -device qemu-xhci,id=usb1,bus=pcie-root-port-1,addr=0x0 \
     -device usb-tablet,id=usb-tablet1,bus=usb1.0,port=1 \
     -blockdev '{"node-name": "file_image1", "driver": "file", "auto-read-only": true, "discard": "unmap", "aio": "threads", "filename": "/home/kvm_autotest_root/images/rhel890-64-virtio.qcow2", "cache": {"direct": true, "no-flush": false}}' \
     -blockdev '{"node-name": "drive_image1", "driver": "qcow2", "read-only": false, "cache": {"direct": true, "no-flush": false}, "file": "file_image1"}' \
     -device pcie-root-port,id=pcie-root-port-2,port=0x2,addr=0x1.0x2,bus=pcie.0,chassis=3 \
     -device virtio-blk-pci,id=image1,drive=drive_image1,bootindex=0,write-cache=on,bus=pcie-root-port-2,addr=0x0 \
     -blockdev '{"node-name": "file_stg0", "driver": "file", "auto-read-only": true, "discard": "unmap", "aio": "threads", "filename": "/home/kvm_autotest_root/images/stg0.qcow2", "cache": {"direct": true, "no-flush": false}}' \
     -blockdev '{"node-name": "drive_stg0", "driver": "qcow2", "read-only": false, "cache": {"direct": true, "no-flush": false}, "file": "file_stg0"}' \
     -device pcie-root-port,id=pcie-root-port-3,port=0x3,addr=0x1.0x3,bus=pcie.0,chassis=4 \
     -device virtio-blk-pci,id=stg0,drive=drive_stg0,bootindex=1,write-cache=on,bus=pcie-root-port-3,addr=0x0 \
     -blockdev '{"node-name": "file_stg1", "driver": "file", "auto-read-only": true, "discard": "unmap", "aio": "threads", "filename": "/home/kvm_autotest_root/images/stg1.qcow2", "cache": {"direct": true, "no-flush": false}}' \
     -blockdev '{"node-name": "drive_stg1", "driver": "qcow2", "read-only": false, "cache": {"direct": true, "no-flush": false}, "file": "file_stg1"}' \
     -device pcie-root-port,id=pcie-root-port-4,port=0x4,addr=0x1.0x4,bus=pcie.0,chassis=5 \
     -device virtio-blk-pci,id=stg1,drive=drive_stg1,bootindex=2,write-cache=on,bus=pcie-root-port-4,addr=0x0 \
     -blockdev '{"node-name": "file_stg2", "driver": "file", "auto-read-only": true, "discard": "unmap", "aio": "threads", "filename": "/home/kvm_autotest_root/images/stg2.qcow2", "cache": {"direct": true, "no-flush": false}}' \
     -blockdev '{"node-name": "drive_stg2", "driver": "qcow2", "read-only": false, "cache": {"direct": true, "no-flush": false}, "file": "file_stg2"}' \
     -device pcie-root-port,id=pcie-root-port-5,port=0x5,addr=0x1.0x5,bus=pcie.0,chassis=6 \
     -device virtio-blk-pci,id=stg2,drive=drive_stg2,bootindex=3,write-cache=on,bus=pcie-root-port-5,addr=0x0 \
     -device pcie-root-port,id=pcie-root-port-6,port=0x6,addr=0x1.0x6,bus=pcie.0,chassis=7 \
     -device virtio-net-pci,mac=9a:1c:50:28:d7:66,id=idSvt1Qo,netdev=idifg3k2,bus=pcie-root-port-6,addr=0x0  \
     -netdev tap,id=idifg3k2,vhost=on  \
     -vnc :1  \
     -rtc base=utc,clock=host,driftfix=slew  \
     -boot menu=off,order=cdn,once=c,strict=off \
     -enable-kvm \
     -device pcie-root-port,id=pcie_extra_root_port_0,multifunction=on,bus=pcie.0,addr=0x2,chassis=8 \
     -device pcie-root-port,id=pcie_extra_root_port_1,addr=0x2.0x1,bus=pcie.0,chassis=9 \
     -device pcie-root-port,id=pcie_extra_root_port_2,addr=0x2.0x2,bus=pcie.0,chassis=10
2.unplug disks
{"execute": "device_del", "arguments": {"id": "stg1"}, "id": "JQwvBYjx"}
{"execute": "device_del", "arguments": {"id": "stg2"}, "id": "W9phaXjN"}

3.plug disks
{"execute": "device_add", "arguments": {"driver": "virtio-blk-pci", "id": "stg1", "drive": "drive_stg1", "bootindex": "2", "write-cache": "on", "bus": "pcie-root-port-4", "addr": "0x0"}, "id": "9rzTDYUj"}
{"execute": "device_add", "arguments": {"driver": "virtio-blk-pci", "id": "stg2", "drive": "drive_stg2", "bootindex": "3", "write-cache": "on", "bus": "pcie-root-port-5", "addr": "0x0"}, "id": "tomEiZ7o"}

4. loop step2-3 3000 times without delay

Actual results:
sometimes the device_add command can not get a response 


Expected results:
Each QMP command should get a response even if the result is a wrong return.
{"return": {}, "id": "tomEiZ7o"}

Additional info:

It looks like not a regression issue, the same issue also can be found on
qemu-kvm-6.2.0-20.module+el8.8.0+16743+e56fc0d8.x86_64
and 
Red Hat Enterprise Linux release 8.6 (Ootpa)
4.18.0-372.59.1.el8_6.x86_64
qemu-kvm-6.2.0-11.module+el8.6.0+18167+43cf40f3.8.x86_64
seabios-bin-1.15.0-2.module+el8.6.0+14757+c25ee005.noarch

Comment 4 Kevin Wolf 2023-06-26 14:48:11 UTC
I think the problem there is that in thread 6, we come from address_space_write() which has a RCU_READ_LOCK_GUARD(), but then have a nested event loop, which could do anything including drain_call_rcu() (here called by qmp_device_add()). I don't know anything about the RCU implementation, so I might be wrong, but this looks suspiciously like the deadlock you're seeing.

Stefan, as this is related to virtio-blk locking, can you please have a look?

Comment 5 RHEL Program Management 2023-09-22 16:19:15 UTC
Issue migration from Bugzilla to Jira is in process at this time. This will be the last message in Jira copied from the Bugzilla bug.