Bug 1887895
| Summary: | vf network card cannot be hot-unplugged from vm | ||||||
|---|---|---|---|---|---|---|---|
| Product: | Red Hat Enterprise Linux 8 | Reporter: | mhou <mhou> | ||||
| Component: | qemu-kvm | Assignee: | Marcelo Tosatti <mtosatti> | ||||
| qemu-kvm sub component: | PCI | QA Contact: | mhou <mhou> | ||||
| Status: | CLOSED CURRENTRELEASE | Docs Contact: | |||||
| Severity: | high | ||||||
| Priority: | high | CC: | ailan, alex.williamson, chayang, jinzhao, juzhang, kzhang, lcapitulino, mhou, mprivozn, mst, mtosatti, ngu, pezhang, virt-maint, yanghliu, yuma | ||||
| Version: | 8.6 | Keywords: | Reopened, Triaged | ||||
| Target Milestone: | rc | Flags: | pm-rhel:
mirror+
|
||||
| Target Release: | 8.6 | ||||||
| Hardware: | x86_64 | ||||||
| OS: | Linux | ||||||
| Whiteboard: | |||||||
| Fixed In Version: | Doc Type: | If docs needed, set a value | |||||
| Doc Text: | Story Points: | --- | |||||
| Clone Of: | |||||||
| : | 2020995 2051788 2069614 (view as bug list) | Environment: | |||||
| Last Closed: | 2021-11-22 07:29:52 UTC | Type: | Bug | ||||
| Regression: | --- | Mount Type: | --- | ||||
| Documentation: | --- | CRM: | |||||
| Verified Versions: | Category: | --- | |||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||
| Embargoed: | |||||||
| Bug Depends On: | |||||||
| Bug Blocks: | 2020995, 2051788, 2069614 | ||||||
| Attachments: |
|
||||||
|
Description
mhou
2020-10-13 14:40:15 UTC
Hi mhou, I have two questions that need your confirmation. (1) > 8.detach the vf to g2 and check the vf in g2 mv. > #virsh detach-interface g2 hostdev --mac 52:54:00:c6:7e:4c According to the hardware info and domain XML you provided, I cannot find any info related to this MAC address “52:54:00:c6:7e:4c” Can you explain where this MAC address comes from ? (2) > <interface type='hostdev' managed='yes'> > <source> > <address type='pci' domain='0x0000' bus='0x5e' slot='0x01' function='0x2'/> > </source> > <mac address='00:de:ad:21:22:02'/> > </interface> > Additional info: > 1. I also use "virsh detach-interface g2 hostdev --mac 00:de:ad:21:22:02". But this does not seem to be detached successfully Can you double confirm whether you can use the "virsh detach-interface g2 hostdev --mac 00:de:ad:21:22:02" alone to hot-unplug the VF from the vm ? (The mac address we specified in the "virsh detach-interface" command needs to be the same as the mac address in the vm of the VF we want to hot-unplugged) Thanks a lot in advance. Hi yanghliu
Sorry for inconvenient. In the step 8, I actually use "virsh detach-device" command. I have modified the description of step 8 in the first comment. please check it. If you needed, I can also provider test log in rt kernel on rhel 8.3.
and I aslo use "virsh detach-interface" to reproduce this phenomenon. Please check the detail as below.
When I try to use network card driven by mlx5, using virsh detach-interface can't be detached.
Version-Release number of selected component (if applicable):
kernel version:4.18.0-193.19.1.rt13.70.el8_2.x86_64
virsh version:6.0.0
qemu-kvm-4.2.0-29.module+el8.2.1+7297+a825794d.x86_64
qemu-img-4.2.0-29.module+el8.2.1+7297+a825794d.x86_64
qemu-kvm-block-ssh-4.2.0-29.module+el8.2.1+7297+a825794d.x86_64
qemu-kvm-block-iscsi-4.2.0-29.module+el8.2.1+7297+a825794d.x86_64
libvirt-daemon-driver-qemu-6.0.0-25.module+el8.2.1+7154+47ffd890.x86_64
ipxe-roms-qemu-20181214-5.git133f4c47.el8.noarch
qemu-kvm-core-4.2.0-29.module+el8.2.1+7297+a825794d.x86_64
qemu-kvm-common-4.2.0-29.module+el8.2.1+7297+a825794d.x86_64
qemu-kvm-block-curl-4.2.0-29.module+el8.2.1+7297+a825794d.x86_64
qemu-kvm-block-gluster-4.2.0-29.module+el8.2.1+7297+a825794d.x86_64
qemu-kvm-block-rbd-4.2.0-29.module+el8.2.1+7297+a825794d.x86_64
libvirt-daemon-driver-storage-gluster-6.0.0-25.module+el8.2.1+7154+47ffd890.x86_64
libvirt-daemon-driver-storage-mpath-6.0.0-25.module+el8.2.1+7154+47ffd890.x86_64
libvirt-daemon-driver-interface-6.0.0-25.module+el8.2.1+7154+47ffd890.x86_64
libvirt-daemon-driver-secret-6.0.0-25.module+el8.2.1+7154+47ffd890.x86_64
libvirt-libs-6.0.0-25.module+el8.2.1+7154+47ffd890.x86_64
libvirt-daemon-6.0.0-25.module+el8.2.1+7154+47ffd890.x86_64
libvirt-daemon-driver-storage-disk-6.0.0-25.module+el8.2.1+7154+47ffd890.x86_64
libvirt-daemon-driver-storage-iscsi-6.0.0-25.module+el8.2.1+7154+47ffd890.x86_64
libvirt-daemon-driver-storage-logical-6.0.0-25.module+el8.2.1+7154+47ffd890.x86_64
libvirt-daemon-driver-storage-rbd-6.0.0-25.module+el8.2.1+7154+47ffd890.x86_64
libvirt-daemon-driver-storage-6.0.0-25.module+el8.2.1+7154+47ffd890.x86_64
libvirt-daemon-driver-nodedev-6.0.0-25.module+el8.2.1+7154+47ffd890.x86_64
libvirt-daemon-driver-qemu-6.0.0-25.module+el8.2.1+7154+47ffd890.x86_64
libvirt-daemon-driver-network-6.0.0-25.module+el8.2.1+7154+47ffd890.x86_64
libvirt-daemon-kvm-6.0.0-25.module+el8.2.1+7154+47ffd890.x86_64
libvirt-client-6.0.0-25.module+el8.2.1+7154+47ffd890.x86_64
libvirt-daemon-driver-storage-core-6.0.0-25.module+el8.2.1+7154+47ffd890.x86_64
libvirt-daemon-driver-storage-iscsi-direct-6.0.0-25.module+el8.2.1+7154+47ffd890.x86_64
libvirt-daemon-driver-storage-scsi-6.0.0-25.module+el8.2.1+7154+47ffd890.x86_64
libvirt-daemon-driver-nwfilter-6.0.0-25.module+el8.2.1+7154+47ffd890.x86_64
libvirt-bash-completion-6.0.0-25.module+el8.2.1+7154+47ffd890.x86_64
libvirt-daemon-config-network-6.0.0-25.module+el8.2.1+7154+47ffd890.x86_64
How reproducible: 100%
Steps to Reproduce:
1.Install the latest rt-kernel on version 8.2
2.Install the virt:8.2/common module
3.set kernel tuning as:
#grubby --args="default_hugepagesz=1G" --update-kernel=`grubby --default-kernel`
#echo isolated_cores=2-79 >> /etc/tuned/realtime-virtual-host-variables.conf
#echo isolate_managed_irq=Y >> /etc/tuned/realtime-virtual-host-variables.conf
#tuned-adm profile realltime-virtual-host
#echo "vm.nr_hugepages = 20" >> /etc/sysctl.conf
#reboot
4.define a vm xml and start the vm.
#virsh define ./g2.xml
#virsh start g2
5.Configure 2 vf on mxl2 network card
# ethtool -i ens3f1
driver: mlx5_core
version: 5.0-0
firmware-version: 12.28.1002 (MT_2150110033)
expansion-rom-version:
bus-info: 0000:5e:00.1
supports-statistics: yes
supports-test: yes
supports-eeprom-access: no
supports-register-dump: no
supports-priv-flags: yes
# echo 2 > /sys/class/net/ens3f1/device/sriov_numvfs
6.Confirm pci address of vf1
# ethtool -i ens3f1v1
driver: mlx5_core
version: 5.0-0
firmware-version: 12.28.1002 (MT_2150110033)
expansion-rom-version:
bus-info: 0000:5e:01.3
supports-statistics: yes
supports-test: yes
supports-eeprom-access: no
supports-register-dump: no
supports-priv-flags: yes
7.attach the vf to g2 and check the vf mac address in g2 vm.
#virsh attach-interface g2 hostdev 0000:5e:01.3 --managed
the dmesg in vm g2 as below:
[Oct 9 04:52] pcieport 0000:00:02.0: Slot(0): Attention button pressed
[ +0.003402] pcieport 0000:00:02.0: Slot(0) Powering on due to button press
[ +0.002131] pcieport 0000:00:02.0: Slot(0): Card present
[ +0.001294] pcieport 0000:00:02.0: Slot(0): Link Up
[17527.831364] pci 0000:01:00.0: [15b3:1014] type 00 class 0x020000
[17527.833533] pci 0000:01:00.0: reg 0x10: [mem 0x00000000-0x000fffff 64bit pref]
[17527.836184] pci 0000:01:00.0: enabling Extended Tags
[17527.838697] pci 0000:01:00.0: 2.000 Gb/s available PCIe bandwidth, limited by 2.5 GT/s x1 link at 0000:00:02.0 (capable of 126.016 Gb/s with 8 GT/s x16 link)
[17527.842920] pci 0000:01:00.0: BAR 0: assigned [mem 0xfea00000-0xfeafffff 64bit pref]
[17527.844081] pcieport 0000:00:02.0: PCI bridge to [bus 01]
[17527.844766] pcieport 0000:00:02.0: bridge window [io 0x1000-0x1fff]
[17527.847026] pcieport 0000:00:02.0: bridge window [mem 0xfde00000-0xfdffffff]
[17527.848781] pcieport 0000:00:02.0: bridge window [mem 0xfea00000-0xfebfffff 64bit pref]
[ +0.132920] pci 0000:01:00.0: [15b3:1014] type 00 class 0x020000
[ +0.002169] pci 0000:01:00.0: reg 0x10: [mem 0x00000000-0x000fffff 64bit pref]
[ +0.002651] pci 0000:01:00.0: enabling Extended Tags
[ +0.002513] pci 0000:01:00.0: 2.000 Gb/s available PCIe bandwidth, limited by 2.5 GT/s x1 link at 0000:00:02.0 (capable of 126.016 Gb/s with 8 GT/s x16 link)
[ +0.004223] pci[17527.858055] mlx5_core 0000:01:00.0: enabling device (0000 -> 0002)
0000:01:00.0: BAR 0: assigned [[17527.877544] mlx5_core 0000:01:00.0: firmware version: 12.28.1002
mem 0xfea00000-0xfeafffff 64bit pref]
[ +0.001161] pcieport 0000:00:02.0: PCI bridge to [bus 01]
[ +0.000685] pcieport 0000:00:02.0: bridge window [io 0x1000-0x1fff]
[ +0.002260] pcieport 0000:00:02.0: bridge window [mem 0xfde00000-0xfdffffff]
[ +0.001755] pcieport 0000:00:02.0: bridge window [mem 0xfea00000-0xfebfffff 64bit pref]
[ +0.009274] mlx5_core 0000:01:00.0: enabling device (0000 -> 0002)
[ +0.019489] mlx5_core 0000:01:00.0: firmware version: 12.28.1002
[17528.352775] mlx5_core 0000:01:00.0: MLX5E: StrdRq(0) RqSz(1024) StrdSz(256) RxCqeCmprss(0)
[ +0.475231] mlx5_core 0000:01:00.0: MLX5E: StrdRq(0) RqSz(1024) StrdSz(256) RxCqeCmprss(0)
[17528.560804] mlx5_core 0000:01:00.0 enp1s0: renamed from eth0
[ +0.208029] mlx5_core 0000:01:00.0 enp1s0: renamed from eth0
[17528.594274] IPv6: ADDRCONF(NETDEV_UP): enp1s0: link is not ready
[ +0.033470] IPv6: ADDRCONF(NETDEV_UP): enp1s0: link is not ready
[17548.743041] mlx5_core 0000:01:00.0 enp1s0: Failed to get min RX wqes on Channel[4] RQN[0xc00b45] wq cur_sz(0) min_rx_wqes(128)
[Oct 9 04:53] mlx5_core 0000:01:00.0 enp1s0: Failed to get min RX wqes on Channel[4] RQN[0xc00b45] wq cur_sz(0) min_rx_wqes(128)
[17548.753343] mlx5_core 0000:01:00.0 enp1s0: Link up
check the message log in host:
Oct 9 04:52:58 dell-per740-04 NetworkManager[1994]: <info> [1602233578.2710] device (ens3f1v1): state change: activated -> unmanaged (reason 'removed', sys-iface-state: 'removed')
Oct 9 04:52:58 dell-per740-04 NetworkManager[1994]: <info> [1602233578.2857] dhcp4 (ens3f1v1): canceled DHCP transaction
Oct 9 04:52:58 dell-per740-04 NetworkManager[1994]: <info> [1602233578.2857] dhcp4 (ens3f1v1): state changed bound -> done
Oct 9 04:52:58 dell-per740-04 dbus-daemon[1945]: [system] Activating via systemd: service name='org.freedesktop.nm_dispatcher' unit='dbus-org.freedesktop.nm-dispatcher.service' requested by ':1.7' (uid=0 pid=1994 comm="/usr/sbin/NetworkManager --no-daemon " label="system_u:system_r:NetworkManager_t:s0")
Oct 9 04:52:58 dell-per740-04 systemd[1]: Starting Network Manager Script Dispatcher Service...
Oct 9 04:52:58 dell-per740-04 dbus-daemon[1945]: [system] Successfully activated service 'org.freedesktop.nm_dispatcher'
Oct 9 04:52:58 dell-per740-04 systemd[1]: Started Network Manager Script Dispatcher Service.
Oct 9 04:52:59 dell-per740-04 kernel: vfio-pci 0000:5e:01.3: enabling device (0000 -> 0002)
#virsh console g2
#ip a
4: enp1s0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP group default qlen 1000
link/ether 52:54:00:c6:7e:4c brd ff:ff:ff:ff:ff:ff
inet 192.168.1.153/24 brd 192.168.1.255 scope global dynamic noprefixroute enp1s0
valid_lft 3572sec preferred_lft 3572sec
inet6 2001::5054:ff:fec6:7e4c/64 scope global dynamic noprefixroute
valid_lft 86400sec preferred_lft 14400sec
inet6 fe80::5054:ff:fec6:7e4c/64 scope link noprefixroute
valid_lft forever preferred_lft forever
check the xml file
<interface type='hostdev' managed='yes'>
<mac address='52:54:00:c6:7e:4c'/>
<driver name='vfio'/>
<source>
<address type='pci' domain='0x0000' bus='0x5e' slot='0x01' function='0x3'/>
</source>
<alias name='hostdev0'/>
<address type='pci' domain='0x0000' bus='0x01' slot='0x00' function='0x0'/>
</interface>
8.detach the vf to g2 and check the vf in g2 mv.
#virsh detach-interface g2 hostdev --mac 52:54:00:c6:7e:4c
check the interface still exist in vm and dmesg show as below, no any information from host message log or dmesg:
[17925.075086] pcieport 0000:00:02.0: Slot(0): Attention button pressed
[17925.076852] pcieport 0000:00:02.0: Slot(0): Powering off due to button press
[Oct 9 04:59] pcieport 0000:00:02.0: Slot(0): Attention button pressed
[ +0.001766] pcieport 0000:00:02.0: Slot(0): Powering off due to button press
login the vm and check network device
#ip a
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 scope host lo
valid_lft forever preferred_lft forever
inet6 ::1/128 scope host
valid_lft forever preferred_lft forever
2: enp3s0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UP group default qlen 1000
link/ether 00:de:ad:00:00:23 brd ff:ff:ff:ff:ff:ff
inet 192.168.122.152/24 brd 192.168.122.255 scope global dynamic noprefixroute enp3s0
valid_lft 2123sec preferred_lft 2123sec
inet6 fe80::8727:b053:b59c:9797/64 scope link noprefixroute
valid_lft forever preferred_lft forever
4: enp1s0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP group default qlen 1000
link/ether 52:54:00:c6:7e:4c brd ff:ff:ff:ff:ff:ff
inet 192.168.1.153/24 brd 192.168.1.255 scope global dynamic noprefixroute enp1s0
valid_lft 3234sec preferred_lft 3234sec
inet6 2001::5054:ff:fec6:7e4c/64 scope global dynamic noprefixroute
valid_lft 86369sec preferred_lft 14369sec
inet6 fe80::5054:ff:fec6:7e4c/64 scope link noprefixroute
valid_lft forever preferred_lft forever
Thanks mhou for the explanation.
> I try to test in different rt kernel version and different virt module.
> It also can't be detach successfully
> the other kernel version:4.18.0-240.rt7.54.el8.x86_64
> virt module: virt:8.3/common
I have done a quick test and reproduced this bug in the following test env:
host:
qemu-kvm-5.1.0-12.module+el8.3.0+8338+cbcb1a4b.x86_64
4.18.0-240.rt7.54.el8.x86_64
vm:
4.18.0-240.rt7.54.el8.x86_64
When I hot-unplug the VF from the vm, the corresponding qmp I got is as follows:
> {"execute":"device_del","arguments":{"id":"hostdev0"},"id":"libvirt-375"}
< {"return": {}, "id": "libvirt-375"}
And I *cannot* observe recv_event as shown below:
"{"timestamp": {"seconds": 1602737325, "microseconds": 262191}, "event": "DEVICE_DELETED", "data": {"device": "hostdev0", "path": "/machine/peripheral/hostdev0"}}"
Since you haven't received the DEVICE_DELETED event something in the guest is blocking the detach. This is definitely not an libvirt issue. I'm reassigning to qemu for further inspection. Meanwhile, it would be nice if you can provide libvirt debug XMLs where the communication with QEMU will be visible. https://wiki.libvirt.org/page/DebugLogs Is this unique to Mellanox or can it be reproduced with an Intel VF? The guest dmesg suggests the hotplug event was received and the slot powered off (or initiated an attempt to power-off), yet the netdev remains in the guest. Hot-remove is cooperative, the guest driver must release the device, what happens if the Mellanox VF driver is blacklisted in the guest, does hot-remove then work? (In reply to Alex Williamson from comment #6) > Is this unique to Mellanox or can it be reproduced with an Intel VF? Hi Alex, I can use the XXV710 VF to reproduce this bug in the following test env: host: qemu-kvm-5.1.0-12.module+el8.3.0+8338+cbcb1a4b.x86_64 4.18.0-240.rt7.54.el8.x86_64 vm: 4.18.0-240.rt7.54.el8.x86_64 The test step is as follows: (1) boot a vm with a XXV710 VF Domain xml: <interface type='hostdev' managed='yes'> <mac address='00:de:ad:21:22:02'/> <driver name='vfio'/> <source> <address type='pci' domain='0x0000' bus='0x82' slot='0x02' function='0x0'/> </source> <alias name='hostdev0'/> <address type='pci' domain='0x0000' bus='0x01' slot='0x00' function='0x0'/> </interface> Qemu cmd line: -device vfio-pci,host=0000:82:02.0,id=hostdev0,bus=pci.1,addr=0x0 (2) check the VF status in the vm: # lspci ... 01:00.0 Ethernet controller: Intel Corporation Ethernet Virtual Function 700 Series (rev 02) # ifconfig ... enp1s0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500 inet 192.168.200.59 netmask 255.255.255.0 broadcast 192.168.200.255 inet6 fe80::5612:1715:5da5:d4b7 prefixlen 64 scopeid 0x20<link> inet6 2001::385d:4d2a:ac6f:c331 prefixlen 64 scopeid 0x0<global> ether 00:de:ad:21:22:02 txqueuelen 1000 (Ethernet) RX packets 0 bytes 0 (0.0 B) RX errors 0 dropped 0 overruns 0 frame 0 TX packets 0 bytes 0 (0.0 B) TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0 (3)try to hot-unplug the VF from the vm: # virsh detach-device-alias $domain hostdev0 Device detach request sent successfully (4) Check whether the VF is successfully hot-unplugged from the vm The VF is not hot-unplugged from vm Additional info 1.The vm dmesg when hot-unplugging the XXV710 VF from the vm: [ 78.980965] pcieport 0000:00:02.0: Slot(0): Attention button pressed [ 78.984473] pcieport 0000:00:02.0: Slot(0): Powering off due to button press 2.The qmp about hot-unplug XXV710 VF: > {"execute":"device_del","arguments":{"id":"hostdev0"},"id":"libvirt-373"} < {"return": {}, "id": "libvirt-373"} Resetting ITR = '---' instead of 8.4.0 since at this point the bug is still under investigation (In reply to yanghliu from comment #7) > (In reply to Alex Williamson from comment #6) > > Is this unique to Mellanox or can it be reproduced with an Intel VF? > Hi Alex, > > > I can use the XXV710 VF to reproduce this bug in the following test env: Thanks for the quick test. Can we also test whether this works with non-hostdev devices, ie. emulated NICs? Does this work with non-rt kernel guest? It still seems that the guest is not releasing the device, which points to a guest kernel/driver issue or perhaps a QEMU PCI issue. This issue still exits with latest RHEL8.5-RT testing. Versions: 4.18.0-319.rt7.100.el8.x86_64 qemu-kvm-6.0.0-21.module+el8.5.0+11555+e0ab0d09.x86_64 tuned-2.15.0-3.el8.noarch libvirt-7.4.0-1.module+el8.5.0+11218+83343022.x86_64 openvswitch2.15-2.15.0-24.el8fdp.x86_64 dpdk-20.11-3.el8.x86_64 Additional info: PF can not be hot unplugged. vhost-user NIC also can not be hot unplugged. After unplug, the vhost-user NIC still exists in VM. # /bin/virsh detach-device rhel8.5 /tmp/rhel8.5_nic1.xml # /bin/virsh detach-device rhel8.5 /tmp/rhel8.5_nic2.xml # lspci | grep Eth Ethernet controller: Red Hat, Inc. Virtio network device (rev 01) Ethernet controller: Red Hat, Inc. Virtio network device (rev 01) Ethernet controller: Red Hat, Inc. Virtio network device (rev 01) Can you try the fix suggested in https://bugzilla.redhat.com/show_bug.cgi?id=1346715 that is set kernel.sched_rt_runtime_us to -1 in order to start VM on a non-RT kernel? started: https://brewweb.engineering.redhat.com/brew/taskinfo?taskID=38652691 hope this succeeds then you can install and test. Bulk update: Move RHEL8 bugs to RHEL9. If necessary to resolve in RHEL8, then clone to the current RHEL8 release. Minxi Hou, Can't reproduce the problem. After disabling restraind system daemon (so the test does not start automatically when the host is rebooted): 1) modprobe vfio-pci 2) echo 2 > /sys/devices/pci0000:85/0000:85:00.0/0000:86:00.0/sriov_numvfs 3) virsh attach-device g2 ./iface.xml Device attached successfully [root@dell-per740-09 ~]# cat iface.xml <interface type='hostdev' managed='yes'> <driver name='vfio'/> <source> <address type='pci' domain='0x0000' bus='0x86' slot='0x00' function='0x3'/> </source> <mac address='00:de:ad:2a:01:02'/> </interface> 4) On guest: [root@localhost ~]# lspci | grep Mella 01:00.0 Ethernet controller: Mellanox Technologies MT27800 Family [ConnectX-5 Virtual Function] 5) virsh detach-device g2 ./iface.xml Device detached successfully 6) On guest: [root@localhost ~]# lspci | grep Mella [root@localhost ~]# (interface is gone). --- One difference on the steps above are that they are using the /sys/devices/pci0000:85/0000:85:00.0/0000:86:00.0/sriov_numvfs path rather than /sys/class/net/ens3f1/device/sriov_numvfs Can you please try to come up with a script that causes the problem ? (manual as in the steps above and your initial comment, as this is easier to debug). Maybe it has been fixed: 4.18.0-193.19.1.rt13.70.el8_2.x86_64 (kernel of comment #1) 4.18.0-348.6.rt7.133.el8.x86_64 (current kernel on the machine) Hello Marcelo
I write a shell script for the bug reproducer. Please check test.sh in attachment. This script tries to create the max of VF which PF can support and attach VFs to VM.
I use mlx5_core card for testing on the test machine which I told you through the email thread. 15 VFs were created and attached to test.sh, but only less than 15 VFs can be found in the guest.
On the guest
# ip link show
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1000
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
2: enp3s0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UP mode DEFAULT group default qlen 1000
link/ether 00:de:ad:2a:00:02 brd ff:ff:ff:ff:ff:ff
3: enp1s0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP mode DEFAULT group default qlen 1000
link/ether 00:de:ad:00:01:01 brd ff:ff:ff:ff:ff:ff
4: enp5s0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP mode DEFAULT group default qlen 1000
link/ether 00:de:ad:00:01:02 brd ff:ff:ff:ff:ff:ff
5: enp6s0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP mode DEFAULT group default qlen 1000
link/ether 00:de:ad:00:01:03 brd ff:ff:ff:ff:ff:ff
6: enp7s0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP mode DEFAULT group default qlen 1000
link/ether 00:de:ad:00:01:04 brd ff:ff:ff:ff:ff:ff
7: enp8s0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP mode DEFAULT group default qlen 1000
link/ether 00:de:ad:00:01:05 brd ff:ff:ff:ff:ff:ff
# virsh dumpxml g2
setlocale: No such file or directory
<domain type='kvm' id='3'>
<name>g2</name>
<uuid>d0a3fd60-46da-11ec-9721-78ac440bdf84</uuid>
<memory unit='KiB'>4194304</memory>
<currentMemory unit='KiB'>4194304</currentMemory>
<memoryBacking>
<hugepages>
<page size='1048576' unit='KiB'/>
</hugepages>
<locked/>
</memoryBacking>
<vcpu placement='static'>5</vcpu>
<cputune>
<vcpupin vcpu='0' cpuset='12'/>
<vcpupin vcpu='1' cpuset='14'/>
<vcpupin vcpu='2' cpuset='16'/>
<vcpupin vcpu='3' cpuset='18'/>
<vcpupin vcpu='4' cpuset='20'/>
<emulatorpin cpuset='0'/>
<emulatorsched scheduler='fifo' priority='1'/>
<vcpusched vcpus='2' scheduler='fifo' priority='1'/>
<vcpusched vcpus='3' scheduler='fifo' priority='1'/>
<vcpusched vcpus='4' scheduler='fifo' priority='1'/>
</cputune>
<numatune>
<memory mode='strict' nodeset='0'/>
<memnode cellid='0' mode='strict' nodeset='0'/>
</numatune>
<resource>
<partition>/machine</partition>
</resource>
<os>
<type arch='x86_64' machine='pc-q35-rhel8.5.0'>hvm</type>
<boot dev='hd'/>
</os>
<features>
<acpi/>
<pmu state='off'/>
<vmport state='off'/>
<ioapic driver='qemu'/>
</features>
<cpu mode='host-passthrough' check='none' migratable='on'>
<feature policy='require' name='tsc-deadline'/>
<numa>
<cell id='0' cpus='0-4' memory='4194304' unit='KiB' memAccess='shared'/>
</numa>
</cpu>
<clock offset='utc'>
<timer name='rtc' tickpolicy='catchup'/>
<timer name='pit' tickpolicy='delay'/>
<timer name='hpet' present='no'/>
</clock>
<on_poweroff>destroy</on_poweroff>
<on_reboot>restart</on_reboot>
<on_crash>restart</on_crash>
<devices>
<emulator>/usr/libexec/qemu-kvm</emulator>
<disk type='file' device='disk'>
<driver name='qemu' type='qcow2' cache='none' io='threads' iommu='on' ats='on'/>
<source file='/var/lib/libvirt/images/g2.qcow2' index='1'/>
<backingStore/>
<target dev='vda' bus='virtio'/>
<alias name='virtio-disk0'/>
<address type='pci' domain='0x0000' bus='0x02' slot='0x00' function='0x0'/>
</disk>
<controller type='usb' index='0' model='none'>
<alias name='usb'/>
</controller>
<controller type='pci' index='0' model='pcie-root'>
<alias name='pcie.0'/>
</controller>
<controller type='pci' index='1' model='pcie-root-port'>
<model name='pcie-root-port'/>
<target chassis='1' port='0x10'/>
<alias name='pci.1'/>
<address type='pci' domain='0x0000' bus='0x00' slot='0x02' function='0x0' multifunction='on'/>
</controller>
<controller type='pci' index='2' model='pcie-root-port'>
<model name='pcie-root-port'/>
<target chassis='2' port='0x11'/>
<alias name='pci.2'/>
<address type='pci' domain='0x0000' bus='0x00' slot='0x02' function='0x1'/>
</controller>
<controller type='pci' index='3' model='pcie-root-port'>
<model name='pcie-root-port'/>
<target chassis='3' port='0x12'/>
<alias name='pci.3'/>
<address type='pci' domain='0x0000' bus='0x00' slot='0x02' function='0x2'/>
</controller>
<controller type='pci' index='4' model='pcie-root-port'>
<model name='pcie-root-port'/>
<target chassis='4' port='0x13'/>
<alias name='pci.4'/>
<address type='pci' domain='0x0000' bus='0x00' slot='0x02' function='0x3'/>
</controller>
<controller type='pci' index='5' model='pcie-root-port'>
<model name='pcie-root-port'/>
<target chassis='5' port='0x14'/>
<alias name='pci.5'/>
<address type='pci' domain='0x0000' bus='0x00' slot='0x02' function='0x4'/>
</controller>
<controller type='pci' index='6' model='pcie-root-port'>
<model name='pcie-root-port'/>
<target chassis='6' port='0x15'/>
<alias name='pci.6'/>
<address type='pci' domain='0x0000' bus='0x00' slot='0x02' function='0x5'/>
</controller>
<controller type='pci' index='7' model='pcie-root-port'>
<model name='pcie-root-port'/>
<target chassis='7' port='0x16'/>
<alias name='pci.7'/>
<address type='pci' domain='0x0000' bus='0x00' slot='0x02' function='0x6'/>
</controller>
<controller type='pci' index='8' model='pcie-root-port'>
<model name='pcie-root-port'/>
<target chassis='8' port='0x17'/>
<alias name='pci.8'/>
<address type='pci' domain='0x0000' bus='0x00' slot='0x02' function='0x7'/>
</controller>
<controller type='pci' index='9' model='pcie-root-port'>
<model name='pcie-root-port'/>
<target chassis='9' port='0x18'/>
<alias name='pci.9'/>
<address type='pci' domain='0x0000' bus='0x00' slot='0x03' function='0x1'/>
</controller>
<controller type='pci' index='10' model='pcie-root-port'>
<model name='pcie-root-port'/>
<target chassis='10' port='0x19'/>
<alias name='pci.10'/>
<address type='pci' domain='0x0000' bus='0x00' slot='0x03' function='0x2'/>
</controller>
<controller type='pci' index='11' model='pcie-root-port'>
<model name='pcie-root-port'/>
<target chassis='11' port='0x20'/>
<alias name='pci.11'/>
<address type='pci' domain='0x0000' bus='0x00' slot='0x03' function='0x4'/>
</controller>
<controller type='pci' index='12' model='pcie-root-port'>
<model name='pcie-root-port'/>
<target chassis='12' port='0x21'/>
<alias name='pci.12'/>
<address type='pci' domain='0x0000' bus='0x00' slot='0x03' function='0x5'/>
</controller>
<controller type='pci' index='13' model='pcie-root-port'>
<model name='pcie-root-port'/>
<target chassis='13' port='0x22'/>
<alias name='pci.13'/>
<address type='pci' domain='0x0000' bus='0x00' slot='0x03' function='0x6'/>
</controller>
<controller type='pci' index='14' model='pcie-root-port'>
<model name='pcie-root-port'/>
<target chassis='14' port='0x23'/>
<alias name='pci.14'/>
<address type='pci' domain='0x0000' bus='0x00' slot='0x03' function='0x7'/>
</controller>
<controller type='pci' index='15' model='pcie-root-port'>
<model name='pcie-root-port'/>
<target chassis='15' port='0x24'/>
<alias name='pci.15'/>
<address type='pci' domain='0x0000' bus='0x00' slot='0x04' function='0x1'/>
</controller>
<controller type='pci' index='16' model='pcie-root-port'>
<model name='pcie-root-port'/>
<target chassis='16' port='0x25'/>
<alias name='pci.16'/>
<address type='pci' domain='0x0000' bus='0x00' slot='0x04' function='0x2'/>
</controller>
<controller type='pci' index='17' model='pcie-root-port'>
<model name='pcie-root-port'/>
<target chassis='17' port='0x26'/>
<alias name='pci.17'/>
<address type='pci' domain='0x0000' bus='0x00' slot='0x04' function='0x3'/>
</controller>
<controller type='pci' index='18' model='pcie-root-port'>
<model name='pcie-root-port'/>
<target chassis='18' port='0x27'/>
<alias name='pci.18'/>
<address type='pci' domain='0x0000' bus='0x00' slot='0x04' function='0x4'/>
</controller>
<controller type='pci' index='19' model='pcie-root-port'>
<model name='pcie-root-port'/>
<target chassis='19' port='0x28'/>
<alias name='pci.19'/>
<address type='pci' domain='0x0000' bus='0x00' slot='0x04' function='0x5'/>
</controller>
<controller type='pci' index='20' model='pcie-root-port'>
<model name='pcie-root-port'/>
<target chassis='20' port='0x29'/>
<alias name='pci.20'/>
<address type='pci' domain='0x0000' bus='0x00' slot='0x04' function='0x6'/>
</controller>
<controller type='pci' index='21' model='pcie-root-port'>
<model name='pcie-root-port'/>
<target chassis='21' port='0x30'/>
<alias name='pci.21'/>
<address type='pci' domain='0x0000' bus='0x00' slot='0x04' function='0x7'/>
</controller>
<controller type='pci' index='22' model='pcie-root-port'>
<model name='pcie-root-port'/>
<target chassis='22' port='0x31'/>
<alias name='pci.22'/>
<address type='pci' domain='0x0000' bus='0x00' slot='0x05' function='0x1'/>
</controller>
<controller type='pci' index='23' model='pcie-root-port'>
<model name='pcie-root-port'/>
<target chassis='23' port='0x32'/>
<alias name='pci.23'/>
<address type='pci' domain='0x0000' bus='0x00' slot='0x05' function='0x2'/>
</controller>
<controller type='pci' index='24' model='pcie-root-port'>
<model name='pcie-root-port'/>
<target chassis='24' port='0x33'/>
<alias name='pci.24'/>
<address type='pci' domain='0x0000' bus='0x00' slot='0x05' function='0x3'/>
</controller>
<controller type='pci' index='25' model='pcie-root-port'>
<model name='pcie-root-port'/>
<target chassis='25' port='0x34'/>
<alias name='pci.25'/>
<address type='pci' domain='0x0000' bus='0x00' slot='0x05' function='0x4'/>
</controller>
<controller type='pci' index='26' model='pcie-root-port'>
<model name='pcie-root-port'/>
<target chassis='26' port='0x35'/>
<alias name='pci.26'/>
<address type='pci' domain='0x0000' bus='0x00' slot='0x05' function='0x5'/>
</controller>
<controller type='pci' index='27' model='pcie-root-port'>
<model name='pcie-root-port'/>
<target chassis='27' port='0x36'/>
<alias name='pci.27'/>
<address type='pci' domain='0x0000' bus='0x00' slot='0x05' function='0x6'/>
</controller>
<controller type='pci' index='28' model='pcie-root-port'>
<model name='pcie-root-port'/>
<target chassis='28' port='0x37'/>
<alias name='pci.28'/>
<address type='pci' domain='0x0000' bus='0x00' slot='0x05' function='0x7'/>
</controller>
<controller type='sata' index='0'>
<alias name='ide'/>
<address type='pci' domain='0x0000' bus='0x00' slot='0x1f' function='0x2'/>
</controller>
<interface type='bridge'>
<mac address='00:de:ad:2a:00:02'/>
<source bridge='virbr0'/>
<target dev='vnet3'/>
<model type='virtio'/>
<alias name='net0'/>
<address type='pci' domain='0x0000' bus='0x03' slot='0x00' function='0x0'/>
</interface>
<interface type='hostdev' managed='yes'>
<mac address='00:de:ad:00:01:01'/>
<driver name='vfio'/>
<source>
<address type='pci' domain='0x0000' bus='0x86' slot='0x00' function='0x2'/>
</source>
<alias name='hostdev0'/>
<address type='pci' domain='0x0000' bus='0x01' slot='0x00' function='0x0'/>
</interface>
<interface type='hostdev' managed='yes'>
<mac address='00:de:ad:00:01:02'/>
<driver name='vfio'/>
<source>
<address type='pci' domain='0x0000' bus='0x86' slot='0x00' function='0x3'/>
</source>
<alias name='hostdev1'/>
<address type='pci' domain='0x0000' bus='0x05' slot='0x00' function='0x0'/>
</interface>
<interface type='hostdev' managed='yes'>
<mac address='00:de:ad:00:01:03'/>
<driver name='vfio'/>
<source>
<address type='pci' domain='0x0000' bus='0x86' slot='0x01' function='0x4'/>
</source>
<alias name='hostdev2'/>
<address type='pci' domain='0x0000' bus='0x06' slot='0x00' function='0x0'/>
</interface>
<interface type='hostdev' managed='yes'>
<mac address='00:de:ad:00:01:04'/>
<driver name='vfio'/>
<source>
<address type='pci' domain='0x0000' bus='0x86' slot='0x01' function='0x5'/>
</source>
<alias name='hostdev3'/>
<address type='pci' domain='0x0000' bus='0x07' slot='0x00' function='0x0'/>
</interface>
<interface type='hostdev' managed='yes'>
<mac address='00:de:ad:00:01:05'/>
<driver name='vfio'/>
<source>
<address type='pci' domain='0x0000' bus='0x86' slot='0x01' function='0x6'/>
</source>
<alias name='hostdev4'/>
<address type='pci' domain='0x0000' bus='0x08' slot='0x00' function='0x0'/>
</interface>
<interface type='hostdev' managed='yes'>
<mac address='00:de:ad:00:01:06'/>
<driver name='vfio'/>
<source>
<address type='pci' domain='0x0000' bus='0x86' slot='0x01' function='0x7'/>
</source>
<alias name='hostdev5'/>
<address type='pci' domain='0x0000' bus='0x09' slot='0x00' function='0x0'/>
</interface>
<interface type='hostdev' managed='yes'>
<mac address='00:de:ad:00:01:07'/>
<driver name='vfio'/>
<source>
<address type='pci' domain='0x0000' bus='0x86' slot='0x02' function='0x0'/>
</source>
<alias name='hostdev6'/>
<address type='pci' domain='0x0000' bus='0x0a' slot='0x00' function='0x0'/>
</interface>
<interface type='hostdev' managed='yes'>
<mac address='00:de:ad:00:01:08'/>
<driver name='vfio'/>
<source>
<address type='pci' domain='0x0000' bus='0x86' slot='0x02' function='0x1'/>
</source>
<alias name='hostdev7'/>
<address type='pci' domain='0x0000' bus='0x0b' slot='0x00' function='0x0'/>
</interface>
<interface type='hostdev' managed='yes'>
<mac address='00:de:ad:00:01:09'/>
<driver name='vfio'/>
<source>
<address type='pci' domain='0x0000' bus='0x86' slot='0x00' function='0x4'/>
</source>
<alias name='hostdev8'/>
<address type='pci' domain='0x0000' bus='0x0c' slot='0x00' function='0x0'/>
</interface>
<interface type='hostdev' managed='yes'>
<mac address='00:de:ad:00:01:0a'/>
<driver name='vfio'/>
<source>
<address type='pci' domain='0x0000' bus='0x86' slot='0x00' function='0x5'/>
</source>
<alias name='hostdev9'/>
<address type='pci' domain='0x0000' bus='0x0d' slot='0x00' function='0x0'/>
</interface>
<interface type='hostdev' managed='yes'>
<mac address='00:de:ad:00:01:0b'/>
<driver name='vfio'/>
<source>
<address type='pci' domain='0x0000' bus='0x86' slot='0x00' function='0x6'/>
</source>
<alias name='hostdev10'/>
<address type='pci' domain='0x0000' bus='0x0e' slot='0x00' function='0x0'/>
</interface>
<interface type='hostdev' managed='yes'>
<mac address='00:de:ad:00:01:0c'/>
<driver name='vfio'/>
<source>
<address type='pci' domain='0x0000' bus='0x86' slot='0x00' function='0x7'/>
</source>
<alias name='hostdev11'/>
<address type='pci' domain='0x0000' bus='0x0f' slot='0x00' function='0x0'/>
</interface>
<interface type='hostdev' managed='yes'>
<mac address='00:de:ad:00:01:0d'/>
<driver name='vfio'/>
<source>
<address type='pci' domain='0x0000' bus='0x86' slot='0x01' function='0x0'/>
</source>
<alias name='hostdev12'/>
<address type='pci' domain='0x0000' bus='0x10' slot='0x00' function='0x0'/>
</interface>
<interface type='hostdev' managed='yes'>
<mac address='00:de:ad:00:01:0e'/>
<driver name='vfio'/>
<source>
<address type='pci' domain='0x0000' bus='0x86' slot='0x01' function='0x1'/>
</source>
<alias name='hostdev13'/>
<address type='pci' domain='0x0000' bus='0x11' slot='0x00' function='0x0'/>
</interface>
<interface type='hostdev' managed='yes'>
<mac address='00:de:ad:00:01:0f'/>
<driver name='vfio'/>
<source>
<address type='pci' domain='0x0000' bus='0x86' slot='0x01' function='0x2'/>
</source>
<alias name='hostdev14'/>
<address type='pci' domain='0x0000' bus='0x12' slot='0x00' function='0x0'/>
</interface>
<interface type='hostdev' managed='yes'>
<mac address='00:de:ad:00:01:10'/>
<driver name='vfio'/>
<source>
<address type='pci' domain='0x0000' bus='0x86' slot='0x01' function='0x3'/>
</source>
<alias name='hostdev15'/>
<address type='pci' domain='0x0000' bus='0x13' slot='0x00' function='0x0'/>
</interface>
<serial type='pty'>
<source path='/dev/pts/0'/>
<target type='isa-serial' port='0'>
<model name='isa-serial'/>
</target>
<alias name='serial0'/>
</serial>
<console type='pty' tty='/dev/pts/0'>
<source path='/dev/pts/0'/>
<target type='serial' port='0'/>
<alias name='serial0'/>
</console>
<input type='mouse' bus='ps2'>
<alias name='input0'/>
</input>
<input type='keyboard' bus='ps2'>
<alias name='input1'/>
</input>
<audio id='1' type='none'/>
<memballoon model='virtio'>
<alias name='balloon0'/>
<address type='pci' domain='0x0000' bus='0x04' slot='0x00' function='0x0'/>
</memballoon>
<iommu model='intel'>
<driver intremap='on' caching_mode='on' iotlb='on'/>
</iommu>
</devices>
<seclabel type='dynamic' model='dac' relabel='yes'>
<label>+107:+107</label>
<imagelabel>+107:+107</imagelabel>
</seclabel>
</domain>
There are two errors on the test script: 1) "no lib for mellanox" 2) "test.sh: line 72: link_up_ifs_with_same_bus: command not found" perhaps they are related? Either way, the same happens with non-rt kernel with tuned's realtime-virtual-host profile, and with non-rt kernel and tuned's balanced profile. See attached files: test-4.18.0-348.4.el8-balanced-tuned-profile.log test-4.18.0-348.4.el8.log test-4.18.0-348.6.rt7.133.el8.log So i don't see this (failure to hotplug more than 4 VFIO's) as specific to -RT kernels or the tuned profile. Please open a separate BZ for the other problem if you think its appropriate. Again, regarding the hot-unplug bug, probably the problem was fixed since your initial report. |