RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.
Bug 1531393 - [Q35] "DEVICE_DELETED" event didn't return after delete the second passthrough vf device
Summary: [Q35] "DEVICE_DELETED" event didn't return after delete the second passthroug...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 7
Classification: Red Hat
Component: qemu-kvm-rhev
Version: 7.5
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: rc
: ---
Assignee: Peter Xu
QA Contact: jingzhao
URL:
Whiteboard:
Depends On:
Blocks: 1552224 1552626 1555213 1558351
TreeView+ depends on / blocked
 
Reported: 2018-01-05 05:03 UTC by jingzhao
Modified: 2018-11-01 11:04 UTC (History)
20 users (show)

Fixed In Version: qemu-kvm-rhev-2.12.0-1.el7
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
: 1552224 1552626 1555213 (view as bug list)
Environment:
Last Closed: 2018-11-01 11:04:08 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
dmesg log (50.79 KB, text/plain)
2018-01-05 05:03 UTC, jingzhao
no flags Details

Description jingzhao 2018-01-05 05:03:29 UTC
Created attachment 1377297 [details]
dmesg log

Description of problem:
 qmp didn't get the "DEVICE_DELETED" event after delete the second passthrough vf device, although the device had been deleted successfully in guest

Version-Release number of selected component (if applicable):
kernel-3.10.0-825.el7.x86_64
qemu-kvm-rhev-2.10.0-14.el7.x86_64
passthrough vf device: XL710

How reproducible:
3/3

Steps to Reproduce:
1. Boot guest with qemu command line [1]
2. Deleted two vf device in qmp
{"execute":"device_del","arguments":{"id":"vf2"}}
{"return": {}}
{"timestamp": {"seconds": 1515122236, "microseconds": 265862}, "event": "DEVICE_DELETED", "data": {"device": "vf2", "path": "/machine/peripheral/vf2"}}
{"execute":"device_del","arguments":{"id":"vf1"}}
{"return": {}}


Actual results:
devices can be deleted successfully
[root@localhost ~]# lspci -vvv -t
-[0000:00]-+-00.0  Intel Corporation 82G33/G31/P35/P31 Express DRAM Controller
           +-01.0  Red Hat, Inc. QXL paravirtual graphic card
           +-02.0-[01]--
           +-0a.0-[02]----00.0  Red Hat, Inc. Virtio block device
           +-0a.1-[03]----00.0  Red Hat, Inc. Virtio network device
           +-0a.2-[04]----00.0  Red Hat, Inc. Virtio network device
           +-0a.5-[05]--
           +-1d.0  Intel Corporation 82801I (ICH9 Family) USB UHCI Controller #1
           +-1d.1  Intel Corporation 82801I (ICH9 Family) USB UHCI Controller #2
           +-1d.2  Intel Corporation 82801I (ICH9 Family) USB UHCI Controller #3
           +-1d.7  Intel Corporation 82801I (ICH9 Family) USB2 EHCI Controller #1
           +-1f.0  Intel Corporation 82801IB (ICH9) LPC Interface Controller
           +-1f.2  Intel Corporation 82801IR/IO/IH (ICH9R/DO/DH) 6 port SATA Controller [AHCI mode]
           \-1f.3  Intel Corporation 82801I (ICH9 Family) SMBus Controller

qmp didn't get the "DEVICE_DELETED" event

Expected results:
qmp can get the DEVICE_DELETED event

Additional info:
Didn't reproduce the issue with "virtio-net-pci" devices
guest kernel line doesn't contain "intel_iommu=on"

[1]
/usr/libexec/qemu-kvm \
-M q35,accel=kvm,kernel-irqchip=split \
-device intel-iommu,intremap=on,caching-mode=on,eim=on,device-iotlb=on \
-cpu Haswell-noTSX \
-nodefaults -rtc base=utc \
-m 4G \
-smp 4,sockets=4,cores=1,threads=1 \
-enable-kvm \
-uuid 990ea161-6b67-47b2-b803-19fb01d30d12 \
-k en-us \
-nodefaults \
-chardev file,path=/home/seabios.log,id=seabios -device isa-debugcon,chardev=seabios,iobase=0x402 \
-boot menu=on \
-qmp tcp:0:6667,server,nowait \
-usb \
-device usb-tablet \
-vga qxl \
-device pcie-root-port,bus=pcie.0,id=root0,multifunction=on,chassis=1,addr=0xa.0 \
-drive file=/home/sriov/rhel75-seabios-bk.qcow2,if=none,id=drive-virtio-disk0,format=qcow2,cache=none,werror=stop,rerror=stop \
-device virtio-blk-pci,drive=drive-virtio-disk0,id=virtio-disk0,bus=root0 \
-device pcie-root-port,bus=pcie.0,id=root1,chassis=11,addr=0xa.1 \
-device virtio-net-pci,netdev=tap10,mac=00:52:68:26:31:03,bus=root1,id=net0 -netdev tap,id=tap10 \
-device pcie-root-port,bus=pcie.0,id=root2,chassis=12,addr=0xa.2 \
-device virtio-net-pci,netdev=tap11,mac=00:52:68:26:31:00,bus=root2,id=net1 -netdev tap,id=tap11 \
-device pcie-root-port,bus=pcie.0,id=root6,chassis=15,addr=0xa.5 \
-device vfio-pci,host=04:02.0,id=vf1,bus=root6,x-req=off \
-device pcie-root-port,bus=pcie.0,id=root7,chassis=4 \
-device vfio-pci,host=04:02.1,id=vf2,bus=root7,x-req=off \
-monitor stdio \
-vnc :1 \


Host devices info:
[root@dell-per730-29 ~]# lspci -v -s 04:02.0 
04:02.0 Ethernet controller: Intel Corporation Ethernet Virtual Function 700 Series (rev 02)
	Subsystem: Intel Corporation Device 0000
	Flags: bus master, fast devsel, latency 0, NUMA node 0
	[virtual] Memory at 92400000 (64-bit, prefetchable) [size=64K]
	[virtual] Memory at 92910000 (64-bit, prefetchable) [size=16K]
	Capabilities: [70] MSI-X: Enable+ Count=5 Masked-
	Capabilities: [a0] Express Endpoint, MSI 00
	Capabilities: [100] Advanced Error Reporting
	Capabilities: [150] Alternative Routing-ID Interpretation (ARI)
	Capabilities: [1a0] Transaction Processing Hints
	Capabilities: [1d0] Access Control Services
	Kernel driver in use: vfio-pci
	Kernel modules: i40evf

[root@dell-per730-29 ~]# lspci -v -s 04:00.0 
04:00.0 Ethernet controller: Intel Corporation Ethernet Controller XL710 for 40GbE QSFP+ (rev 02)
	Subsystem: Intel Corporation Ethernet Converged Network Adapter XL710-Q2
	Flags: bus master, fast devsel, latency 0, IRQ 55, NUMA node 0
	Memory at 91000000 (64-bit, prefetchable) [size=16M]
	Memory at 92808000 (64-bit, prefetchable) [size=32K]
	Expansion ROM at 92b00000 [disabled] [size=512K]
	Capabilities: [40] Power Management version 3
	Capabilities: [50] MSI: Enable- Count=1/1 Maskable+ 64bit+
	Capabilities: [70] MSI-X: Enable+ Count=129 Masked-
	Capabilities: [a0] Express Endpoint, MSI 00
	Capabilities: [e0] Vital Product Data
	Capabilities: [100] Advanced Error Reporting
	Capabilities: [140] Device Serial Number 88-90-15-ff-ff-fe-fd-3c
	Capabilities: [150] Alternative Routing-ID Interpretation (ARI)
	Capabilities: [160] Single Root I/O Virtualization (SR-IOV)
	Capabilities: [1a0] Transaction Processing Hints
	Capabilities: [1b0] Access Control Services
	Capabilities: [1d0] #19
	Kernel driver in use: i40e
	Kernel modules: i40e

[root@dell-per730-29 ~]# ll /sys/bus/pci/devices/0000\:04\:02.0/driver
lrwxrwxrwx. 1 root root 0 Jan  4 21:52 /sys/bus/pci/devices/0000:04:02.0/driver -> ../../../../bus/pci/drivers/vfio-pci

Comment 2 jingzhao 2018-01-05 05:10:14 UTC
Didn't reproduce the issue without -device intel-iommu,intremap=on,caching-mode=on,eim=on,device-iotlb=on 

The qemu command line:
/usr/libexec/qemu-kvm \
-M q35 \
-cpu Haswell-noTSX \
-nodefaults -rtc base=utc \
-m 4G \
-smp 4,sockets=4,cores=1,threads=1 \
-enable-kvm \
-uuid 990ea161-6b67-47b2-b803-19fb01d30d12 \
-k en-us \
-nodefaults \
-global isa-debugcon.iobase=0x402 \
-boot menu=on \
-qmp tcp:0:6667,server,nowait \
-usb \
-device usb-tablet \
-vga qxl \
-device pcie-root-port,bus=pcie.0,id=root0,multifunction=on,chassis=1,addr=0xa.0 \
-drive file=/home/sriov/rhel75-seabios-bk.qcow2,if=none,id=drive-virtio-disk0,format=qcow2,cache=none,werror=stop,rerror=stop \
-device virtio-blk-pci,drive=drive-virtio-disk0,id=virtio-disk0,bus=root0 \
-device pcie-root-port,bus=pcie.0,id=root1,chassis=11,addr=0xa.1 \
-device virtio-net-pci,netdev=tap10,mac=00:52:68:26:31:03,bus=root1,id=net0 -netdev tap,id=tap10 \
-device pcie-root-port,bus=pcie.0,id=root2,chassis=12,addr=0xa.2 \
-device virtio-net-pci,netdev=tap11,mac=00:52:68:26:31:00,bus=root2,id=net1 -netdev tap,id=tap11 \
-device pcie-root-port,bus=pcie.0,id=root6,chassis=15,addr=0xa.5 \
-device vfio-pci,host=04:02.0,id=vf1,bus=root6,x-req=off \
-device pcie-root-port,bus=pcie.0,id=root7,chassis=4 \
-device vfio-pci,host=04:02.1,id=vf2,bus=root7,x-req=off \
-monitor stdio \
-vnc :1 \

Comment 3 jingzhao 2018-01-05 05:48:09 UTC
Marked it as regression since didn't reproduce it with qemu-kvm-rhev-2.9.0-16.el7_4.13.x86_64

Comment 8 Marcel Apfelbaum 2018-01-16 10:37:12 UTC
Hi Peter,

Following the logs, the Guest OS finishes the hot-unplug sequence
and powers down the device. It seems something on QEMU's IOMMU side
prevents the hot-unplug to finish.

Can you please have a look?
Thanks,
Marcel

Comment 9 Peter Xu 2018-01-16 14:03:33 UTC
(In reply to Marcel Apfelbaum from comment #8)
> Hi Peter,
> 
> Following the logs, the Guest OS finishes the hot-unplug sequence
> and powers down the device. It seems something on QEMU's IOMMU side
> prevents the hot-unplug to finish.
> 
> Can you please have a look?
> Thanks,
> Marcel

Hi, Marcel,

The event is sent in RCU thread:

#0  qapi_event_send_device_deleted (has_device=true, device=0x55c2500b81f0 "vfio0", path=0x55c2513530d0 "/machine/peripheral/vfio0", 
    errp=0x55c24f3f3d10 <error_abort>) at qapi-event.c:412
#1  0x000055c24e3ca86c in device_finalize (obj=0x55c25128b270) at /root/git/qemu/hw/core/qdev.c:1087
#2  0x000055c24e5ad6db in object_deinit (obj=0x55c25128b270, type=0x55c250094700) at /root/git/qemu/qom/object.c:462
#3  0x000055c24e5ad709 in object_deinit (obj=0x55c25128b270, type=0x55c2500a40c0) at /root/git/qemu/qom/object.c:466
#4  0x000055c24e5ad709 in object_deinit (obj=0x55c25128b270, type=0x55c250083030) at /root/git/qemu/qom/object.c:466
#5  0x000055c24e5ad74c in object_finalize (data=0x55c25128b270) at /root/git/qemu/qom/object.c:476
#6  0x000055c24e5ae6db in object_unref (obj=0x55c25128b270) at /root/git/qemu/qom/object.c:911
#7  0x000055c24e206988 in memory_region_unref (mr=0x55c25128b4e0) at /root/git/qemu/memory.c:1747
#8  0x000055c24e209d16 in do_address_space_destroy (as=0x55c25128b488) at /root/git/qemu/memory.c:2787
#9  0x000055c24e6da1c8 in call_rcu_thread (opaque=0x0) at /root/git/qemu/util/rcu.c:278
#10 0x00007f78faef4dd5 in start_thread () from /lib64/libpthread.so.0
#11 0x00007f78fac1e94d in clone () from /lib64/libc.so.6

Luckily I still have the debugging stuff on jinzhao's host due to the other bug, so I did a quick test.  I think it's because some references of vfio-pci device is leaked somehow due to existance of VT-d, since this is the backtrace I got when the error happens (or say, when the event is missed):

(gdb) bt
#0  object_unref (obj=0x5559618ef0c0) at /root/git/qemu/qom/object.c:904
#1  0x000055595d710988 in memory_region_unref (mr=0x5559618ef330) at /root/git/qemu/memory.c:1747
#2  0x000055595d713d16 in do_address_space_destroy (as=0x5559618ef2d8) at /root/git/qemu/memory.c:2787
#3  0x000055595dbe41c8 in call_rcu_thread (opaque=0x0) at /root/git/qemu/util/rcu.c:278
#4  0x00007f85ee1b5dd5 in start_thread () from /lib64/libpthread.so.0
#5  0x00007f85ededf94d in clone () from /lib64/libc.so.6
(gdb) p obj->ref
$2 = 3

Basically here obj->ref should be 1 otherwise device_finalize() for vfio-pci device won't be called, and event won't be sent.

I'll dig more tomorrow.  For now, I'll take over the bug since it's for sure vt-d related.

Thanks,
Peter

Comment 10 Peter Xu 2018-01-18 08:58:57 UTC
This is reproducable upstream too, and even without XL710 - I tested
with X540, which also hit this.

A funny thing is that upstream QEMU doesn't have this problem until
commit f7b879e072 ("qdev: defer DEVICE_DEL event until
instance_finalize()", 2017-10-18).  However I don't think that's the
problematic commit.  That commit just revealed another issue after it
fixed one.

After I built a very customized QEMU (and with some script for parsing) I think
these are the two refcounts that are missing (via MemoryRegion
referencing, since memory_region_ref() will add ref onto its owner):

Remaining refcount holders:            
{                                      
    "0x5639af8f8000": {                
        "count": 1,                    
        "devid": 1,                    
        "name": "0000:05:10.2 BAR 0 mmaps[0]",                                 
        "mr": "0x5639af8f8000"         
    },                                 
    "0x5639af8f86d0": {                
        "count": 1,                    
        "devid": 1,                    
        "name": "0000:05:10.2 BAR 3 mmaps[0]",                                 
        "mr": "0x5639af8f86d0"         
    }                                  
}                                      

It means that we referenced these two regions one more time for the
2nd VF (and the 1st VF have paired unrefs, since we have no problem with it):

  "0000:05:10.2 BAR 0 mmaps[0]"
  "0000:05:10.2 BAR 3 mmaps[0]"
  
They are the two BARs for the X540 VF device.  Though, I am still
investigating on how the VT-d device can affect them to leak that refcount.

Peter

Comment 11 Peter Xu 2018-01-19 08:49:55 UTC
Well... this funny refcount leakage seems to be a more general issue on memory API rather than VT-d itself.  I posted my thoughts and solution upstream as RFC (since it touches memory core and I haven't tested too much):

  http://lists.gnu.org/archive/html/qemu-devel/2018-01/msg04703.html

I'll wait for some feedback on the patches before I decide what to do next.

Peter

Comment 18 Oneata Mircea Teodor 2018-03-07 13:20:38 UTC
Please follow 0day process, check the link
https://mojo.redhat.com/docs/DOC-1022552

Comment 27 errata-xmlrpc 2018-11-01 11:04:08 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2018:3443


Note You need to log in before you can comment on or make changes to this bug.