Bug 1370028 - Guest dmesg prompt "err I40E_ERR_QUEUE_EMPTY" when hot unplug one function from multifunction vfs ( i40evf )
Summary: Guest dmesg prompt "err I40E_ERR_QUEUE_EMPTY" when hot unplug one function fr...
Keywords:
Status: CLOSED DEFERRED
Alias: None
Product: Red Hat Enterprise Linux 7
Classification: Red Hat
Component: qemu-kvm-rhev
Version: 7.3
Hardware: x86_64
OS: Linux
low
low
Target Milestone: rc
: ---
Assignee: Alex Williamson
QA Contact: Pei Zhang
URL:
Whiteboard: vfio
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2016-08-25 06:08 UTC by Yanan Fu
Modified: 2018-12-07 21:00 UTC (History)
7 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2018-12-07 21:00:47 UTC
Target Upstream Version:


Attachments (Terms of Use)

Description Yanan Fu 2016-08-25 06:08:16 UTC
Description of problem:
For intel XL710 nic (i40e driver), generate several vfs and bind to vfio-pci.
Boot guest with several vfs, and with multifunction=on.
when hot unplug one function, guest dmesg show: 
i40evf 0000:00:04.1: Unable to send opcode 2 to PF, err I40E_ERR_QUEUE_EMPTY,aq_err OK

Version-Release number of selected component (if applicable):
qemu: qemu-kvm-rhev-2.6.0-22.el7.x86_64
kernel: kernel-3.10.0-495.el7.x86_64

How reproducible:
100%

Steps to Reproduce:
1.Generate several vfs, and bind them to vfio-pci
  #echo 4 > /sys/bus/pci/devices/0000:04:00.0/sriov_numvf
  (get 4 vfs: 0000:04:02.0, 0000:04:02.1, 0000:04:02.2, 0000:04:02.3)
  #modprobe vfio-pci
  #lspci -n -s 0000:04:02.0
    04:02.0 0200: 8086:154c (rev 02)
  #echo 8086 154c > /sys/bus/pci/drivers/vfio-pci/new_id
  #echo 8086 154c > /sys/bus/pci/drivers/vfio-pci/remove_id
2. boot guest with these vfs and multifunction=on:
    ...
    -device vfio-pci,host=04:02.0,id=vf-02.0,multifunction=on,addr=0xa.0 \
    -device vfio-pci,host=04:02.1,id=vf-02.1,addr=0xa.1 \
    -device vfio-pci,host=04:02.2,id=vf-02.2,addr=0xa.2 \
    -device vfio-pci,host=04:02.3,id=vf-02.3,addr=0xa.3 \
    ...
3. hot unplug one function, for example: vf-02.1
   {"execute":"device_del","arguments":{"id":"vf-02.1"}}
4. in the guest dmesg:
   i40evf 0000:00:0a.1: Unable to send opcode 2 to PF, err I40E_ERR_QUEUE_EMPTY,aq_err OK
   i40evf 0000:00:0a.2: Unable to send opcode 2 to PF, err I40E_ERR_QUEUE_EMPTY,aq_err OK
   i40evf 0000:00:0a.3: Unable to send opcode 2 to PF, err I40E_ERR_QUEUE_EMPTY,aq_err OK

Actual results:
Sometimes, can not hit this error in dmesg. and sometimes, random functions of the four functions will hit this error(ex: 0000:00:0a.1, 0000:00:0a.2, 0000:00:0a.3, or all of them)

Expected results:
no error output.

Additional info:
Hostname: dell-per730-29.lab.eng.pek2.redhat.com
I have test with intel 82599ES nic(ixgbe driver),can not hit this error.

The interface for PF is p6p1,and:
# ethtool -i p6p1
driver: i40e
version: 1.5.10-k
firmware-version: 5.02 0x80002400 17.5.9
expansion-rom-version: 
bus-info: 0000:04:00.0
supports-statistics: yes
supports-test: yes
supports-eeprom-access: yes
supports-register-dump: yes
supports-priv-flags: yes

Comment 1 Yanan Fu 2016-08-25 06:28:42 UTC
Except this error message in dmesg, no other abnormalities.VFs can be unpluged successfully.

CLI:
/usr/libexec/qemu-kvm \
    -name 'rhel7.3'  \
    -sandbox off  \
    -machine pc  \
    -nodefaults  \
    -vga qxl \
    -chardev socket,id=qmp_monitor,path=/var/tmp/qmpmonitor,server,nowait \
    -mon chardev=qmp_monitor,mode=control  \
    -device pvpanic,ioport=0x505,id=idkP1Yip  \
    -device nec-usb-xhci,id=usb1,bus=pci.0,addr=05 \
    -device virtio-scsi-pci,id=scsi,bus=pci.0 \
    -drive id=drive_image1,if=none,snapshot=off,aio=native,cache=none,format=qcow2,file=/home/rhel73-virtio-scsi.qcow2 \
    -device scsi-hd,id=image1,drive=drive_image1,bus=scsi.0 \
    -m 4096  \
    -smp 4,maxcpus=8,cores=4,threads=1,sockets=2  \
    -cpu 'Haswell-noTSX',+kvm_pv_unhalt,hv_spinlocks=0x1fff,hv_vapic,hv_time \
    -device usb-tablet,id=usb-tablet1,bus=usb1.0,port=1  \
    -vnc :0 \
    -boot order=cdn,once=c,menu=on,strict=off \
    -enable-kvm \
    -qmp tcp:0:4445,server,nowait  \
    -monitor stdio \
    -monitor unix:/home/socket,server,nowait \
    -device vfio-pci,host=04:02.0,id=vf-02.0,multifunction=on,addr=0xa.0 \
    -device vfio-pci,host=04:02.1,id=vf-02.1,addr=0xa.1\
    -device vfio-pci,host=04:02.2,id=vf-02.2,addr=0xa.2\
    -device vfio-pci,host=04:02.3,id=vf-02.3,addr=0xa.3\

Comment 3 Alex Williamson 2016-08-30 16:31:26 UTC
When any one of the assigned VFs is unplugged, all the functions in that slot are removed (ie, a.{0-3}), correct?  Does this ever occur when multifunction is not used for the VFs, assigning each as function 0 in separate slots to the VM?

This looks like a driver issue in the guest and other than this message seems to have no ill effects.  i40e testing is new for RHEL7.3, so we have no reason to suspect this as a regression.

Comment 4 Yanan Fu 2016-08-31 07:27:44 UTC
(In reply to Alex Williamson from comment #3)
> When any one of the assigned VFs is unplugged, all the functions in that
> slot are removed (ie, a.{0-3}), correct?  Does this ever occur when
> multifunction is not used for the VFs, assigning each as function 0 in
> separate slots to the VM?

Yes, when any one of the assigned vfs is unpluged, all functions in same slot will be removed normally, this is ok.

And, without multifunction, no error output.
 
> This looks like a driver issue in the guest and other than this message
> seems to have no ill effects.  i40e testing is new for RHEL7.3, so we have
> no reason to suspect this as a regression.

Comment 5 Alex Williamson 2016-09-22 19:06:29 UTC
When I try this with libvirt, I get:

error: operation failed: cannot hot unplug multifunction PCI device: 0000:03:02.1

Therefore libvirt does not support hot-unplug of multifunction PCI devices and the only way to induce this problem is via raw QEMU interfaces, which are generally not supported.  Deferring to 7.4.

Comment 6 Alex Williamson 2017-10-12 19:37:18 UTC
Comment 5 still applies to 7.4, libvirt does not support hot-unplug of multifunction devices, therefore there's no supported way to get to this issue and no apparent harm in seeing it other than undesirable kernel logs.

Comment 7 Alex Williamson 2018-12-07 21:00:47 UTC
This is only reproducible by directly interacting with QEMU, it's not reproducible with libvirt, therefore closing due to lack of bandwidth to investigate further.  Error suggests an issue in the i40e driver, not the virtualization stack.


Note You need to log in before you can comment on or make changes to this bug.