Bug 1531393
Summary: | [Q35] "DEVICE_DELETED" event didn't return after delete the second passthrough vf device | ||||||
---|---|---|---|---|---|---|---|
Product: | Red Hat Enterprise Linux 7 | Reporter: | jingzhao <jinzhao> | ||||
Component: | qemu-kvm-rhev | Assignee: | Peter Xu <peterx> | ||||
Status: | CLOSED ERRATA | QA Contact: | jingzhao <jinzhao> | ||||
Severity: | high | Docs Contact: | |||||
Priority: | high | ||||||
Version: | 7.5 | CC: | alex.williamson, chayang, ehabkost, hhuang, jen, jinzhao, juzhang, knoel, lmiksik, marcel, michen, mrezanin, mtessun, peterx, rhodain, salmy, toneata, virt-maint, yafu, yfu | ||||
Target Milestone: | rc | Keywords: | Regression, ZStream | ||||
Target Release: | --- | ||||||
Hardware: | Unspecified | ||||||
OS: | Unspecified | ||||||
Whiteboard: | |||||||
Fixed In Version: | qemu-kvm-rhev-2.12.0-1.el7 | Doc Type: | If docs needed, set a value | ||||
Doc Text: | Story Points: | --- | |||||
Clone Of: | |||||||
: | 1552224 1552626 1555213 (view as bug list) | Environment: | |||||
Last Closed: | 2018-11-01 11:04:08 UTC | Type: | Bug | ||||
Regression: | --- | Mount Type: | --- | ||||
Documentation: | --- | CRM: | |||||
Verified Versions: | Category: | --- | |||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
Cloudforms Team: | --- | Target Upstream Version: | |||||
Embargoed: | |||||||
Bug Depends On: | |||||||
Bug Blocks: | 1552224, 1552626, 1555213, 1558351 | ||||||
Attachments: |
|
Didn't reproduce the issue without -device intel-iommu,intremap=on,caching-mode=on,eim=on,device-iotlb=on The qemu command line: /usr/libexec/qemu-kvm \ -M q35 \ -cpu Haswell-noTSX \ -nodefaults -rtc base=utc \ -m 4G \ -smp 4,sockets=4,cores=1,threads=1 \ -enable-kvm \ -uuid 990ea161-6b67-47b2-b803-19fb01d30d12 \ -k en-us \ -nodefaults \ -global isa-debugcon.iobase=0x402 \ -boot menu=on \ -qmp tcp:0:6667,server,nowait \ -usb \ -device usb-tablet \ -vga qxl \ -device pcie-root-port,bus=pcie.0,id=root0,multifunction=on,chassis=1,addr=0xa.0 \ -drive file=/home/sriov/rhel75-seabios-bk.qcow2,if=none,id=drive-virtio-disk0,format=qcow2,cache=none,werror=stop,rerror=stop \ -device virtio-blk-pci,drive=drive-virtio-disk0,id=virtio-disk0,bus=root0 \ -device pcie-root-port,bus=pcie.0,id=root1,chassis=11,addr=0xa.1 \ -device virtio-net-pci,netdev=tap10,mac=00:52:68:26:31:03,bus=root1,id=net0 -netdev tap,id=tap10 \ -device pcie-root-port,bus=pcie.0,id=root2,chassis=12,addr=0xa.2 \ -device virtio-net-pci,netdev=tap11,mac=00:52:68:26:31:00,bus=root2,id=net1 -netdev tap,id=tap11 \ -device pcie-root-port,bus=pcie.0,id=root6,chassis=15,addr=0xa.5 \ -device vfio-pci,host=04:02.0,id=vf1,bus=root6,x-req=off \ -device pcie-root-port,bus=pcie.0,id=root7,chassis=4 \ -device vfio-pci,host=04:02.1,id=vf2,bus=root7,x-req=off \ -monitor stdio \ -vnc :1 \ Marked it as regression since didn't reproduce it with qemu-kvm-rhev-2.9.0-16.el7_4.13.x86_64 Hi Peter, Following the logs, the Guest OS finishes the hot-unplug sequence and powers down the device. It seems something on QEMU's IOMMU side prevents the hot-unplug to finish. Can you please have a look? Thanks, Marcel (In reply to Marcel Apfelbaum from comment #8) > Hi Peter, > > Following the logs, the Guest OS finishes the hot-unplug sequence > and powers down the device. It seems something on QEMU's IOMMU side > prevents the hot-unplug to finish. > > Can you please have a look? > Thanks, > Marcel Hi, Marcel, The event is sent in RCU thread: #0 qapi_event_send_device_deleted (has_device=true, device=0x55c2500b81f0 "vfio0", path=0x55c2513530d0 "/machine/peripheral/vfio0", errp=0x55c24f3f3d10 <error_abort>) at qapi-event.c:412 #1 0x000055c24e3ca86c in device_finalize (obj=0x55c25128b270) at /root/git/qemu/hw/core/qdev.c:1087 #2 0x000055c24e5ad6db in object_deinit (obj=0x55c25128b270, type=0x55c250094700) at /root/git/qemu/qom/object.c:462 #3 0x000055c24e5ad709 in object_deinit (obj=0x55c25128b270, type=0x55c2500a40c0) at /root/git/qemu/qom/object.c:466 #4 0x000055c24e5ad709 in object_deinit (obj=0x55c25128b270, type=0x55c250083030) at /root/git/qemu/qom/object.c:466 #5 0x000055c24e5ad74c in object_finalize (data=0x55c25128b270) at /root/git/qemu/qom/object.c:476 #6 0x000055c24e5ae6db in object_unref (obj=0x55c25128b270) at /root/git/qemu/qom/object.c:911 #7 0x000055c24e206988 in memory_region_unref (mr=0x55c25128b4e0) at /root/git/qemu/memory.c:1747 #8 0x000055c24e209d16 in do_address_space_destroy (as=0x55c25128b488) at /root/git/qemu/memory.c:2787 #9 0x000055c24e6da1c8 in call_rcu_thread (opaque=0x0) at /root/git/qemu/util/rcu.c:278 #10 0x00007f78faef4dd5 in start_thread () from /lib64/libpthread.so.0 #11 0x00007f78fac1e94d in clone () from /lib64/libc.so.6 Luckily I still have the debugging stuff on jinzhao's host due to the other bug, so I did a quick test. I think it's because some references of vfio-pci device is leaked somehow due to existance of VT-d, since this is the backtrace I got when the error happens (or say, when the event is missed): (gdb) bt #0 object_unref (obj=0x5559618ef0c0) at /root/git/qemu/qom/object.c:904 #1 0x000055595d710988 in memory_region_unref (mr=0x5559618ef330) at /root/git/qemu/memory.c:1747 #2 0x000055595d713d16 in do_address_space_destroy (as=0x5559618ef2d8) at /root/git/qemu/memory.c:2787 #3 0x000055595dbe41c8 in call_rcu_thread (opaque=0x0) at /root/git/qemu/util/rcu.c:278 #4 0x00007f85ee1b5dd5 in start_thread () from /lib64/libpthread.so.0 #5 0x00007f85ededf94d in clone () from /lib64/libc.so.6 (gdb) p obj->ref $2 = 3 Basically here obj->ref should be 1 otherwise device_finalize() for vfio-pci device won't be called, and event won't be sent. I'll dig more tomorrow. For now, I'll take over the bug since it's for sure vt-d related. Thanks, Peter This is reproducable upstream too, and even without XL710 - I tested with X540, which also hit this. A funny thing is that upstream QEMU doesn't have this problem until commit f7b879e072 ("qdev: defer DEVICE_DEL event until instance_finalize()", 2017-10-18). However I don't think that's the problematic commit. That commit just revealed another issue after it fixed one. After I built a very customized QEMU (and with some script for parsing) I think these are the two refcounts that are missing (via MemoryRegion referencing, since memory_region_ref() will add ref onto its owner): Remaining refcount holders: { "0x5639af8f8000": { "count": 1, "devid": 1, "name": "0000:05:10.2 BAR 0 mmaps[0]", "mr": "0x5639af8f8000" }, "0x5639af8f86d0": { "count": 1, "devid": 1, "name": "0000:05:10.2 BAR 3 mmaps[0]", "mr": "0x5639af8f86d0" } } It means that we referenced these two regions one more time for the 2nd VF (and the 1st VF have paired unrefs, since we have no problem with it): "0000:05:10.2 BAR 0 mmaps[0]" "0000:05:10.2 BAR 3 mmaps[0]" They are the two BARs for the X540 VF device. Though, I am still investigating on how the VT-d device can affect them to leak that refcount. Peter Well... this funny refcount leakage seems to be a more general issue on memory API rather than VT-d itself. I posted my thoughts and solution upstream as RFC (since it touches memory core and I haven't tested too much): http://lists.gnu.org/archive/html/qemu-devel/2018-01/msg04703.html I'll wait for some feedback on the patches before I decide what to do next. Peter Please follow 0day process, check the link https://mojo.redhat.com/docs/DOC-1022552 Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2018:3443 |
Created attachment 1377297 [details] dmesg log Description of problem: qmp didn't get the "DEVICE_DELETED" event after delete the second passthrough vf device, although the device had been deleted successfully in guest Version-Release number of selected component (if applicable): kernel-3.10.0-825.el7.x86_64 qemu-kvm-rhev-2.10.0-14.el7.x86_64 passthrough vf device: XL710 How reproducible: 3/3 Steps to Reproduce: 1. Boot guest with qemu command line [1] 2. Deleted two vf device in qmp {"execute":"device_del","arguments":{"id":"vf2"}} {"return": {}} {"timestamp": {"seconds": 1515122236, "microseconds": 265862}, "event": "DEVICE_DELETED", "data": {"device": "vf2", "path": "/machine/peripheral/vf2"}} {"execute":"device_del","arguments":{"id":"vf1"}} {"return": {}} Actual results: devices can be deleted successfully [root@localhost ~]# lspci -vvv -t -[0000:00]-+-00.0 Intel Corporation 82G33/G31/P35/P31 Express DRAM Controller +-01.0 Red Hat, Inc. QXL paravirtual graphic card +-02.0-[01]-- +-0a.0-[02]----00.0 Red Hat, Inc. Virtio block device +-0a.1-[03]----00.0 Red Hat, Inc. Virtio network device +-0a.2-[04]----00.0 Red Hat, Inc. Virtio network device +-0a.5-[05]-- +-1d.0 Intel Corporation 82801I (ICH9 Family) USB UHCI Controller #1 +-1d.1 Intel Corporation 82801I (ICH9 Family) USB UHCI Controller #2 +-1d.2 Intel Corporation 82801I (ICH9 Family) USB UHCI Controller #3 +-1d.7 Intel Corporation 82801I (ICH9 Family) USB2 EHCI Controller #1 +-1f.0 Intel Corporation 82801IB (ICH9) LPC Interface Controller +-1f.2 Intel Corporation 82801IR/IO/IH (ICH9R/DO/DH) 6 port SATA Controller [AHCI mode] \-1f.3 Intel Corporation 82801I (ICH9 Family) SMBus Controller qmp didn't get the "DEVICE_DELETED" event Expected results: qmp can get the DEVICE_DELETED event Additional info: Didn't reproduce the issue with "virtio-net-pci" devices guest kernel line doesn't contain "intel_iommu=on" [1] /usr/libexec/qemu-kvm \ -M q35,accel=kvm,kernel-irqchip=split \ -device intel-iommu,intremap=on,caching-mode=on,eim=on,device-iotlb=on \ -cpu Haswell-noTSX \ -nodefaults -rtc base=utc \ -m 4G \ -smp 4,sockets=4,cores=1,threads=1 \ -enable-kvm \ -uuid 990ea161-6b67-47b2-b803-19fb01d30d12 \ -k en-us \ -nodefaults \ -chardev file,path=/home/seabios.log,id=seabios -device isa-debugcon,chardev=seabios,iobase=0x402 \ -boot menu=on \ -qmp tcp:0:6667,server,nowait \ -usb \ -device usb-tablet \ -vga qxl \ -device pcie-root-port,bus=pcie.0,id=root0,multifunction=on,chassis=1,addr=0xa.0 \ -drive file=/home/sriov/rhel75-seabios-bk.qcow2,if=none,id=drive-virtio-disk0,format=qcow2,cache=none,werror=stop,rerror=stop \ -device virtio-blk-pci,drive=drive-virtio-disk0,id=virtio-disk0,bus=root0 \ -device pcie-root-port,bus=pcie.0,id=root1,chassis=11,addr=0xa.1 \ -device virtio-net-pci,netdev=tap10,mac=00:52:68:26:31:03,bus=root1,id=net0 -netdev tap,id=tap10 \ -device pcie-root-port,bus=pcie.0,id=root2,chassis=12,addr=0xa.2 \ -device virtio-net-pci,netdev=tap11,mac=00:52:68:26:31:00,bus=root2,id=net1 -netdev tap,id=tap11 \ -device pcie-root-port,bus=pcie.0,id=root6,chassis=15,addr=0xa.5 \ -device vfio-pci,host=04:02.0,id=vf1,bus=root6,x-req=off \ -device pcie-root-port,bus=pcie.0,id=root7,chassis=4 \ -device vfio-pci,host=04:02.1,id=vf2,bus=root7,x-req=off \ -monitor stdio \ -vnc :1 \ Host devices info: [root@dell-per730-29 ~]# lspci -v -s 04:02.0 04:02.0 Ethernet controller: Intel Corporation Ethernet Virtual Function 700 Series (rev 02) Subsystem: Intel Corporation Device 0000 Flags: bus master, fast devsel, latency 0, NUMA node 0 [virtual] Memory at 92400000 (64-bit, prefetchable) [size=64K] [virtual] Memory at 92910000 (64-bit, prefetchable) [size=16K] Capabilities: [70] MSI-X: Enable+ Count=5 Masked- Capabilities: [a0] Express Endpoint, MSI 00 Capabilities: [100] Advanced Error Reporting Capabilities: [150] Alternative Routing-ID Interpretation (ARI) Capabilities: [1a0] Transaction Processing Hints Capabilities: [1d0] Access Control Services Kernel driver in use: vfio-pci Kernel modules: i40evf [root@dell-per730-29 ~]# lspci -v -s 04:00.0 04:00.0 Ethernet controller: Intel Corporation Ethernet Controller XL710 for 40GbE QSFP+ (rev 02) Subsystem: Intel Corporation Ethernet Converged Network Adapter XL710-Q2 Flags: bus master, fast devsel, latency 0, IRQ 55, NUMA node 0 Memory at 91000000 (64-bit, prefetchable) [size=16M] Memory at 92808000 (64-bit, prefetchable) [size=32K] Expansion ROM at 92b00000 [disabled] [size=512K] Capabilities: [40] Power Management version 3 Capabilities: [50] MSI: Enable- Count=1/1 Maskable+ 64bit+ Capabilities: [70] MSI-X: Enable+ Count=129 Masked- Capabilities: [a0] Express Endpoint, MSI 00 Capabilities: [e0] Vital Product Data Capabilities: [100] Advanced Error Reporting Capabilities: [140] Device Serial Number 88-90-15-ff-ff-fe-fd-3c Capabilities: [150] Alternative Routing-ID Interpretation (ARI) Capabilities: [160] Single Root I/O Virtualization (SR-IOV) Capabilities: [1a0] Transaction Processing Hints Capabilities: [1b0] Access Control Services Capabilities: [1d0] #19 Kernel driver in use: i40e Kernel modules: i40e [root@dell-per730-29 ~]# ll /sys/bus/pci/devices/0000\:04\:02.0/driver lrwxrwxrwx. 1 root root 0 Jan 4 21:52 /sys/bus/pci/devices/0000:04:02.0/driver -> ../../../../bus/pci/drivers/vfio-pci