Hide Forgot
Description of problem: For intel XL710 nic (i40e driver), generate several vfs and bind to vfio-pci. Boot guest with several vfs, and with multifunction=on. when hot unplug one function, guest dmesg show: i40evf 0000:00:04.1: Unable to send opcode 2 to PF, err I40E_ERR_QUEUE_EMPTY,aq_err OK Version-Release number of selected component (if applicable): qemu: qemu-kvm-rhev-2.6.0-22.el7.x86_64 kernel: kernel-3.10.0-495.el7.x86_64 How reproducible: 100% Steps to Reproduce: 1.Generate several vfs, and bind them to vfio-pci #echo 4 > /sys/bus/pci/devices/0000:04:00.0/sriov_numvf (get 4 vfs: 0000:04:02.0, 0000:04:02.1, 0000:04:02.2, 0000:04:02.3) #modprobe vfio-pci #lspci -n -s 0000:04:02.0 04:02.0 0200: 8086:154c (rev 02) #echo 8086 154c > /sys/bus/pci/drivers/vfio-pci/new_id #echo 8086 154c > /sys/bus/pci/drivers/vfio-pci/remove_id 2. boot guest with these vfs and multifunction=on: ... -device vfio-pci,host=04:02.0,id=vf-02.0,multifunction=on,addr=0xa.0 \ -device vfio-pci,host=04:02.1,id=vf-02.1,addr=0xa.1 \ -device vfio-pci,host=04:02.2,id=vf-02.2,addr=0xa.2 \ -device vfio-pci,host=04:02.3,id=vf-02.3,addr=0xa.3 \ ... 3. hot unplug one function, for example: vf-02.1 {"execute":"device_del","arguments":{"id":"vf-02.1"}} 4. in the guest dmesg: i40evf 0000:00:0a.1: Unable to send opcode 2 to PF, err I40E_ERR_QUEUE_EMPTY,aq_err OK i40evf 0000:00:0a.2: Unable to send opcode 2 to PF, err I40E_ERR_QUEUE_EMPTY,aq_err OK i40evf 0000:00:0a.3: Unable to send opcode 2 to PF, err I40E_ERR_QUEUE_EMPTY,aq_err OK Actual results: Sometimes, can not hit this error in dmesg. and sometimes, random functions of the four functions will hit this error(ex: 0000:00:0a.1, 0000:00:0a.2, 0000:00:0a.3, or all of them) Expected results: no error output. Additional info: Hostname: dell-per730-29.lab.eng.pek2.redhat.com I have test with intel 82599ES nic(ixgbe driver),can not hit this error. The interface for PF is p6p1,and: # ethtool -i p6p1 driver: i40e version: 1.5.10-k firmware-version: 5.02 0x80002400 17.5.9 expansion-rom-version: bus-info: 0000:04:00.0 supports-statistics: yes supports-test: yes supports-eeprom-access: yes supports-register-dump: yes supports-priv-flags: yes
Except this error message in dmesg, no other abnormalities.VFs can be unpluged successfully. CLI: /usr/libexec/qemu-kvm \ -name 'rhel7.3' \ -sandbox off \ -machine pc \ -nodefaults \ -vga qxl \ -chardev socket,id=qmp_monitor,path=/var/tmp/qmpmonitor,server,nowait \ -mon chardev=qmp_monitor,mode=control \ -device pvpanic,ioport=0x505,id=idkP1Yip \ -device nec-usb-xhci,id=usb1,bus=pci.0,addr=05 \ -device virtio-scsi-pci,id=scsi,bus=pci.0 \ -drive id=drive_image1,if=none,snapshot=off,aio=native,cache=none,format=qcow2,file=/home/rhel73-virtio-scsi.qcow2 \ -device scsi-hd,id=image1,drive=drive_image1,bus=scsi.0 \ -m 4096 \ -smp 4,maxcpus=8,cores=4,threads=1,sockets=2 \ -cpu 'Haswell-noTSX',+kvm_pv_unhalt,hv_spinlocks=0x1fff,hv_vapic,hv_time \ -device usb-tablet,id=usb-tablet1,bus=usb1.0,port=1 \ -vnc :0 \ -boot order=cdn,once=c,menu=on,strict=off \ -enable-kvm \ -qmp tcp:0:4445,server,nowait \ -monitor stdio \ -monitor unix:/home/socket,server,nowait \ -device vfio-pci,host=04:02.0,id=vf-02.0,multifunction=on,addr=0xa.0 \ -device vfio-pci,host=04:02.1,id=vf-02.1,addr=0xa.1\ -device vfio-pci,host=04:02.2,id=vf-02.2,addr=0xa.2\ -device vfio-pci,host=04:02.3,id=vf-02.3,addr=0xa.3\
When any one of the assigned VFs is unplugged, all the functions in that slot are removed (ie, a.{0-3}), correct? Does this ever occur when multifunction is not used for the VFs, assigning each as function 0 in separate slots to the VM? This looks like a driver issue in the guest and other than this message seems to have no ill effects. i40e testing is new for RHEL7.3, so we have no reason to suspect this as a regression.
(In reply to Alex Williamson from comment #3) > When any one of the assigned VFs is unplugged, all the functions in that > slot are removed (ie, a.{0-3}), correct? Does this ever occur when > multifunction is not used for the VFs, assigning each as function 0 in > separate slots to the VM? Yes, when any one of the assigned vfs is unpluged, all functions in same slot will be removed normally, this is ok. And, without multifunction, no error output. > This looks like a driver issue in the guest and other than this message > seems to have no ill effects. i40e testing is new for RHEL7.3, so we have > no reason to suspect this as a regression.
When I try this with libvirt, I get: error: operation failed: cannot hot unplug multifunction PCI device: 0000:03:02.1 Therefore libvirt does not support hot-unplug of multifunction PCI devices and the only way to induce this problem is via raw QEMU interfaces, which are generally not supported. Deferring to 7.4.
Comment 5 still applies to 7.4, libvirt does not support hot-unplug of multifunction devices, therefore there's no supported way to get to this issue and no apparent harm in seeing it other than undesirable kernel logs.
This is only reproducible by directly interacting with QEMU, it's not reproducible with libvirt, therefore closing due to lack of bandwidth to investigate further. Error suggests an issue in the i40e driver, not the virtualization stack.