Bug 1409957

Summary: hot-plugged VF can‘t be used again after reset SRIOV-NIC's kernel mod
Product: Red Hat Enterprise Linux 7 Reporter: Han Han <hhan>
Component: libvirtAssignee: Laine Stump <laine>
Status: CLOSED NOTABUG QA Contact: Jingjing Shao <jishao>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 7.3CC: chhu, cww, dyuan, hhan, jdenemar, jishao, jiyan, lizhengui, lizhu, rbalakri, rcernin, xuzhang, yalzhang
Target Milestone: rc   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2018-02-15 21:09:58 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1420851    
Attachments:
Description Flags
libvirtd log none

Description Han Han 2017-01-04 02:13:24 UTC
Created attachment 1237047 [details]
libvirtd log

Description of problem:
As subject

Version-Release number of selected component (if applicable):
kernel-3.10.0-514.6.1.el7.x86_64
libvirt-2.0.0-10.el7_3.3.x86_64
qemu-kvm-rhev-2.6.0-28.el7_3.2.x86_64

How reproducible:
100%

Steps to Reproduce:
1. Reload SRIOV kernel mod and create VF
# modprobe -r ixgbe
# modprobe ixgbe max_vfs=2

2. Prepare an VM and hot-plug the VF
# DOM=V
# virsh start $DOM && sleep 20
Domain V started

# virsh attach-device V hostdev.xml
Device attached successfully

3. Reset kernel mod and try to attach/detach the VF
# modprobe -r ixgbe
# modprobe ixgbe max_vfs=2
# virsh detach-device $DOM hostdev.xml
error: Failed to detach device from hostdev.xml
error: operation failed: no device matching mac address 02:24:6b:89:bc:e0 found

# virsh attach-device $DOM hostdev.xml
error: Failed to attach device from hostdev.xml
error: internal error: Not detaching active device 0000:86:10.1

Actual results:
As step3

Expected results:
VF coulde be attached after resetting the kernel mod.


Additional info:
After restart libvirtd, VF could be attached in step3.

Comment 1 Han Han 2017-01-04 02:15:24 UTC
The hostdev.xml file:
<interface type='hostdev' managed='yes'>
<mac address='02:24:6b:89:bc:e0'/>
<source>
<address type='pci' domain='0x0000' bus='0x86' slot='0x10' function='0x1'/>
</source>
</interface>

Comment 2 Jingjing Shao 2017-01-04 07:04:56 UTC
Additional info:

After the steps3 in the Description,

4, check the dumpxml of Domain V
#virsh dumpxml V | grep hostdev -A6

5, check the nodedev-dumpxml of VF
# virsh nodedev-dumpxml pci_0000_86_10_1
<device>
  <name>pci_0000_86_10_1</name>
  <path>/sys/devices/pci0000:80/0000:80:01.0/0000:86:10.1</path>
  <parent>pci_0000_80_01_0</parent>
  <driver>
    <name>ixgbevf</name>
  </driver>
  <capability type='pci'>
    <domain>0</domain>
    <bus>134</bus>
    <slot>16</slot>
    <function>1</function>
    <product id='0x10ed'>82599 Ethernet Controller Virtual Function</product>
    <vendor id='0x8086'>Intel Corporation</vendor>
    <capability type='phys_function'>
      <address domain='0x0000' bus='0x86' slot='0x00' function='0x1'/>
    </capability>
    <iommuGroup number='83'>
      <address domain='0x0000' bus='0x86' slot='0x10' function='0x1'/>
    </iommuGroup>
    <numa node='1'/>
    <pci-express>
      <link validity='cap' port='0' width='0'/>
      <link validity='sta' width='0'/>
    </pci-express>
  </capability>
</device>


6, shutddown the domain
# virsh destroy V
Domain V destroyed

7.detach the VF from host, then get error

# virsh nodedev-detach pci_0000_86_10_1 
error: Failed to detach device pci_0000_86_10_1
error: Requested operation is not valid: PCI device 0000:86:10.1 is in use by driver QEMU, domain V

8. restart libvirtd 
service libvirtd restart
Redirecting to /bin/systemctl restart  libvirtd.service



After restart libvirtd, all the things turn to be right

9.# virsh nodedev-detach pci_0000_86_10_1
Device pci_0000_86_10_1 detached

# virsh nodedev-reattach pci_0000_86_10_1
Device pci_0000_86_10_1 re-attached


10.# virsh start V
Domain V started

# virsh attach-device V interface.xml
Device attached successfully

Comment 3 Jiri Denemark 2017-01-04 08:15:43 UTC
Is this a regression introduced by libvirt-2.0.0-10.el7_3.3.x86_64 or can you reproduce even with libvirt-2.0.0-10.el7_3.2.x86_64 or libvirt-2.0.0-10.el7.x86_64?

Comment 4 Han Han 2017-01-04 10:02:36 UTC
Hi Jiri,
The bug can be reproduced on libvirt-2.0.0-10.el7_3.2.x86_64 and libvirt-2.0.0-10.el7.x86_64 as commment0 and comment2.
It is not a regression on RHEL7.3.

Comment 6 Jaroslav Suchanek 2017-01-06 13:40:43 UTC
Dup (or just different result) of bug 1402951?

Comment 7 Han Han 2017-01-10 02:36:41 UTC
The bug can be reproduced on libvirt-2.5.0-1.el7.x86_64 as comment0 and commment2

Comment 10 Laine Stump 2017-12-04 18:11:39 UTC
*** Bug 1402951 has been marked as a duplicate of this bug. ***

Comment 12 lizhengui 2020-01-10 02:39:00 UTC
Has this bug been fixed or is it not considered a bug?

Comment 13 Han Han 2020-01-10 03:28:14 UTC
(In reply to lizhengui from comment #12)
> Has this bug been fixed or is it not considered a bug?

Sorry, I didn't track the bug any more. Could you please share your bug reproducing version?

Comment 14 lizhengui 2020-01-10 03:34:13 UTC
My bug reproducing version libvirt 3.2.0. I think the lastest libvirt also has the bug.

Comment 15 Han Han 2020-01-10 03:53:01 UTC
(In reply to lizhengui from comment #14)
> My bug reproducing version libvirt 3.2.0. I think the lastest libvirt also
> has the bug.

And what's your qemu and kernel version?

Comment 16 Han Han 2020-01-10 03:54:15 UTC
Lili, could you please have a test on latest RHEL7 and RHEL8?

Comment 17 lizhengui 2020-01-10 06:01:02 UTC
(In reply to Han Han from comment #15)
> (In reply to lizhengui from comment #14)
> > My bug reproducing version libvirt 3.2.0. I think the lastest libvirt also
> > has the bug.
> 
> And what's your qemu and kernel version?

QEMU emulator version 2.8.1.1

linux-lAtuOc:~ # uname -r
3.10.0-862.14.1.6_48.x86_64

Comment 18 Han Han 2020-01-10 06:26:49 UTC
Assign to jiyan. Please help to test that on latest RHEL7 and RHEL8

Comment 19 lizhengui 2020-01-11 02:38:01 UTC
(In reply to Han Han from comment #18)
> Assign to jiyan. Please help to test that on latest RHEL7 and RHEL8


I think this is a bug of libvirt. If a link down occurs on the pass-through device, it will trigger the kernel to notify qemu to delete the device. After receiving the delete event of qemu, libvirt will do virHostdevReAttachPCIDevices. In the virHostdevGetPCIHostDeviceList function, because of the device has not been linked up, so the virPCIDeviceNew function fails to exit for the device's config file cannot be found. Therefore, the device will not be removed from mgr->activePCIHostdevs.The next time the virtual machine starts, it fails because the device is still in mgr->activePCIHostdevs, although this time the device has been re-linked up.

Comment 20 lizhengui 2020-01-11 02:48:26 UTC
(In reply to lizhengui from comment #19)
> (In reply to Han Han from comment #18)
> > Assign to jiyan. Please help to test that on latest RHEL7 and RHEL8
> 
> 
> I think this is a bug of libvirt. If a link down occurs on the pass-through
> device, it will trigger the kernel to notify qemu to delete the device.
> After receiving the delete event of qemu, libvirt will do
> virHostdevReAttachPCIDevices. In the virHostdevGetPCIHostDeviceList
> function, because of the device has not been linked up, so the
> virPCIDeviceNew function fails to exit for the device's config file cannot
> be found. Therefore, the device will not be removed from
> mgr->activePCIHostdevs.The next time the virtual machine starts, it fails
> because the device is still in mgr->activePCIHostdevs, although this time
> the device has been re-linked up.

(In reply to Han Han from comment #18)
> Assign to jiyan. Please help to test that on latest RHEL7 and RHEL8

add libvirt info:
2020-01-08T11:48:08.798015+08:00|info|libvirtd[54674]|[59786]|qemuProcessEventHandler[5025]|: vm=0x7eff28191d90, event=2
2020-01-08T11:48:08.798106+08:00|err|libvirtd[54674]|[59786]|virPCIDeviceNew[2001]|: Device 0000:b8:00.0 not found: could not access /sys/bus/pci/devices/0000:b8:00.0/config: No such file or directory
2020-01-08T11:48:08.798193+08:00|err|libvirtd[54674]|[59786]|virHostdevReAttachPCIDevices[981]|: Failed to allocate PCI device list: Device 0000:b8:00.0 not found: could not access /sys/bus/pci/devices/0000:b8:00.0/config: No such file or directory