Bug 1409957
Summary: | hot-plugged VF can‘t be used again after reset SRIOV-NIC's kernel mod | ||||||
---|---|---|---|---|---|---|---|
Product: | Red Hat Enterprise Linux 7 | Reporter: | Han Han <hhan> | ||||
Component: | libvirt | Assignee: | Laine Stump <laine> | ||||
Status: | CLOSED NOTABUG | QA Contact: | Jingjing Shao <jishao> | ||||
Severity: | unspecified | Docs Contact: | |||||
Priority: | unspecified | ||||||
Version: | 7.3 | CC: | chhu, cww, dyuan, hhan, jdenemar, jishao, jiyan, lizhengui, lizhu, rbalakri, rcernin, xuzhang, yalzhang | ||||
Target Milestone: | rc | ||||||
Target Release: | --- | ||||||
Hardware: | Unspecified | ||||||
OS: | Unspecified | ||||||
Whiteboard: | |||||||
Fixed In Version: | Doc Type: | If docs needed, set a value | |||||
Doc Text: | Story Points: | --- | |||||
Clone Of: | Environment: | ||||||
Last Closed: | 2018-02-15 21:09:58 UTC | Type: | Bug | ||||
Regression: | --- | Mount Type: | --- | ||||
Documentation: | --- | CRM: | |||||
Verified Versions: | Category: | --- | |||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
Cloudforms Team: | --- | Target Upstream Version: | |||||
Embargoed: | |||||||
Bug Depends On: | |||||||
Bug Blocks: | 1420851 | ||||||
Attachments: |
|
The hostdev.xml file: <interface type='hostdev' managed='yes'> <mac address='02:24:6b:89:bc:e0'/> <source> <address type='pci' domain='0x0000' bus='0x86' slot='0x10' function='0x1'/> </source> </interface> Additional info: After the steps3 in the Description, 4, check the dumpxml of Domain V #virsh dumpxml V | grep hostdev -A6 5, check the nodedev-dumpxml of VF # virsh nodedev-dumpxml pci_0000_86_10_1 <device> <name>pci_0000_86_10_1</name> <path>/sys/devices/pci0000:80/0000:80:01.0/0000:86:10.1</path> <parent>pci_0000_80_01_0</parent> <driver> <name>ixgbevf</name> </driver> <capability type='pci'> <domain>0</domain> <bus>134</bus> <slot>16</slot> <function>1</function> <product id='0x10ed'>82599 Ethernet Controller Virtual Function</product> <vendor id='0x8086'>Intel Corporation</vendor> <capability type='phys_function'> <address domain='0x0000' bus='0x86' slot='0x00' function='0x1'/> </capability> <iommuGroup number='83'> <address domain='0x0000' bus='0x86' slot='0x10' function='0x1'/> </iommuGroup> <numa node='1'/> <pci-express> <link validity='cap' port='0' width='0'/> <link validity='sta' width='0'/> </pci-express> </capability> </device> 6, shutddown the domain # virsh destroy V Domain V destroyed 7.detach the VF from host, then get error # virsh nodedev-detach pci_0000_86_10_1 error: Failed to detach device pci_0000_86_10_1 error: Requested operation is not valid: PCI device 0000:86:10.1 is in use by driver QEMU, domain V 8. restart libvirtd service libvirtd restart Redirecting to /bin/systemctl restart libvirtd.service After restart libvirtd, all the things turn to be right 9.# virsh nodedev-detach pci_0000_86_10_1 Device pci_0000_86_10_1 detached # virsh nodedev-reattach pci_0000_86_10_1 Device pci_0000_86_10_1 re-attached 10.# virsh start V Domain V started # virsh attach-device V interface.xml Device attached successfully Is this a regression introduced by libvirt-2.0.0-10.el7_3.3.x86_64 or can you reproduce even with libvirt-2.0.0-10.el7_3.2.x86_64 or libvirt-2.0.0-10.el7.x86_64? Hi Jiri, The bug can be reproduced on libvirt-2.0.0-10.el7_3.2.x86_64 and libvirt-2.0.0-10.el7.x86_64 as commment0 and comment2. It is not a regression on RHEL7.3. Dup (or just different result) of bug 1402951? The bug can be reproduced on libvirt-2.5.0-1.el7.x86_64 as comment0 and commment2 *** Bug 1402951 has been marked as a duplicate of this bug. *** Has this bug been fixed or is it not considered a bug? (In reply to lizhengui from comment #12) > Has this bug been fixed or is it not considered a bug? Sorry, I didn't track the bug any more. Could you please share your bug reproducing version? My bug reproducing version libvirt 3.2.0. I think the lastest libvirt also has the bug. (In reply to lizhengui from comment #14) > My bug reproducing version libvirt 3.2.0. I think the lastest libvirt also > has the bug. And what's your qemu and kernel version? Lili, could you please have a test on latest RHEL7 and RHEL8? (In reply to Han Han from comment #15) > (In reply to lizhengui from comment #14) > > My bug reproducing version libvirt 3.2.0. I think the lastest libvirt also > > has the bug. > > And what's your qemu and kernel version? QEMU emulator version 2.8.1.1 linux-lAtuOc:~ # uname -r 3.10.0-862.14.1.6_48.x86_64 Assign to jiyan. Please help to test that on latest RHEL7 and RHEL8 (In reply to Han Han from comment #18) > Assign to jiyan. Please help to test that on latest RHEL7 and RHEL8 I think this is a bug of libvirt. If a link down occurs on the pass-through device, it will trigger the kernel to notify qemu to delete the device. After receiving the delete event of qemu, libvirt will do virHostdevReAttachPCIDevices. In the virHostdevGetPCIHostDeviceList function, because of the device has not been linked up, so the virPCIDeviceNew function fails to exit for the device's config file cannot be found. Therefore, the device will not be removed from mgr->activePCIHostdevs.The next time the virtual machine starts, it fails because the device is still in mgr->activePCIHostdevs, although this time the device has been re-linked up. (In reply to lizhengui from comment #19) > (In reply to Han Han from comment #18) > > Assign to jiyan. Please help to test that on latest RHEL7 and RHEL8 > > > I think this is a bug of libvirt. If a link down occurs on the pass-through > device, it will trigger the kernel to notify qemu to delete the device. > After receiving the delete event of qemu, libvirt will do > virHostdevReAttachPCIDevices. In the virHostdevGetPCIHostDeviceList > function, because of the device has not been linked up, so the > virPCIDeviceNew function fails to exit for the device's config file cannot > be found. Therefore, the device will not be removed from > mgr->activePCIHostdevs.The next time the virtual machine starts, it fails > because the device is still in mgr->activePCIHostdevs, although this time > the device has been re-linked up. (In reply to Han Han from comment #18) > Assign to jiyan. Please help to test that on latest RHEL7 and RHEL8 add libvirt info: 2020-01-08T11:48:08.798015+08:00|info|libvirtd[54674]|[59786]|qemuProcessEventHandler[5025]|: vm=0x7eff28191d90, event=2 2020-01-08T11:48:08.798106+08:00|err|libvirtd[54674]|[59786]|virPCIDeviceNew[2001]|: Device 0000:b8:00.0 not found: could not access /sys/bus/pci/devices/0000:b8:00.0/config: No such file or directory 2020-01-08T11:48:08.798193+08:00|err|libvirtd[54674]|[59786]|virHostdevReAttachPCIDevices[981]|: Failed to allocate PCI device list: Device 0000:b8:00.0 not found: could not access /sys/bus/pci/devices/0000:b8:00.0/config: No such file or directory |
Created attachment 1237047 [details] libvirtd log Description of problem: As subject Version-Release number of selected component (if applicable): kernel-3.10.0-514.6.1.el7.x86_64 libvirt-2.0.0-10.el7_3.3.x86_64 qemu-kvm-rhev-2.6.0-28.el7_3.2.x86_64 How reproducible: 100% Steps to Reproduce: 1. Reload SRIOV kernel mod and create VF # modprobe -r ixgbe # modprobe ixgbe max_vfs=2 2. Prepare an VM and hot-plug the VF # DOM=V # virsh start $DOM && sleep 20 Domain V started # virsh attach-device V hostdev.xml Device attached successfully 3. Reset kernel mod and try to attach/detach the VF # modprobe -r ixgbe # modprobe ixgbe max_vfs=2 # virsh detach-device $DOM hostdev.xml error: Failed to detach device from hostdev.xml error: operation failed: no device matching mac address 02:24:6b:89:bc:e0 found # virsh attach-device $DOM hostdev.xml error: Failed to attach device from hostdev.xml error: internal error: Not detaching active device 0000:86:10.1 Actual results: As step3 Expected results: VF coulde be attached after resetting the kernel mod. Additional info: After restart libvirtd, VF could be attached in step3.