Hide Forgot
Description While the fore VF in the VFpool is not available for assignment, libvirt should choose the next available vf for assignment automatically. Version: libvirt-1.1.1-5.el7.x86_64 qemu-kvm-1.5.3-3.el7.x86_64 kernel-3.10.0-23.el7.x86_64 How reproducible: 100% Steps to Reproduce: 1. prepare one hostdev network like the following one: # virsh net-dumpxml hostnet <network> <name>hostnet</name> <uuid>c1fb4ead-21b8-4d69-8ad9-669c55b3dfc7</uuid> <forward mode='hostdev' managed='yes'> <address type='pci' domain='0x0000' bus='0x03' slot='0x10' function='0x1'/> <address type='pci' domain='0x0000' bus='0x03' slot='0x10' function='0x0'/> </forward> </network> 2. prepare one VF xml, the VF is the 1st VF in the vfpool. # cat vf-hostdev-1.xml <hostdev mode='subsystem' type='pci' managed='yes'> <source> <address bus='0x03' slot='0x10' function='0x1'/> </source> </hostdev> 3. hot-plug the vf to the running guest. # virsh attach-device a vf-hostdev-1.xml Device attached successfully 4. prepare one xml like following. # cat vf-vfpool.xml <interface type='network'> <source network='hostnet'/> </interface> 5. hot-plug the vf in vfpool to guest. # virsh attach-device a vf-vfpool.xml error: Failed to attach device from vf-vfpool.xml error: Requested operation is not valid: PCI device 0000:03:10.1 is in use by domain a # virsh attach-device a vf-vfpool.xml error: Failed to attach device from vf-vfpool.xml error: Requested operation is not valid: PCI device 0000:03:10.1 is in use by domain a Actual results: in step5, libvirt can't hot-plug other available vf to the guest. Expected results: in step5, while the fore vf of the vfpool is not available for hot-plug, libvirt should choose other available vf from the vfpool automatically. Additional info:
There's an easy workaround... just don't manually attach devices you assigned to a hostdev network.
The difficulty here is that allocation of a VF to a domain is done in a *very* separate place from the code that actually attempts to use that VF - we allocate it from the network pool in networkAllocateActualDevice() (in the network driver, which currently has no visibility into the hostdevmanager, which tracks which PCI devices are in use by domains), then at some later time (in the qemu commandline generator, which *does* know about hostdevmanager) we actually attempt to detach the device from the host driver and set the mac address. And even beyond that, we pass the information to qemu, and it is qemu that attempts to actually assign the device. In order to be able to bypass a device if it is already in use by some domain that referenced it directly rather than via the network device pool, we will need to *at least* make the network driver aware of the hostdevmanager, and possibly move all setup of the device into the network driver (which I actually think is a *good* idea). The same thing should be done for macvtap device initialization, including creating the macvtap device; eventually this will allow us to put networkAllocateActualDevice() behind a public API so that it can (hopefully) be called from an unprivileged libvirtd. The problem with pulling the network device setup into the network driver is that it will make the network driver compulsory (whereas it is now optional), so that may be a reason to *not* do it this way.
Add one more scenario, SRIOV VFs pool can not skip interface which is already in use, unless the interface is assigned from the same network. 1. Start 2 networks passthrough1 and passthrough2 with same address list as below: ... <forward mode='hostdev' managed='yes'> <address type='pci' domain='0x0000' bus='0x86' slot='0x10' function='0x0'/> <address type='pci' domain='0x0000' bus='0x86' slot='0x10' function='0x1'/> <address type='pci' domain='0x0000' bus='0x86' slot='0x10' function='0x2'/> </forward> ... 2. start a guest, then attach interface from the pool # virsh attach-interface rhel7.2 network passthrough1 Interface attached successfully # virsh attach-interface rhel7.2 network passthrough2 error: Failed to attach interface error: internal error: Not reattaching active device 0000:86:10.0 # virsh attach-interface rhel7.2 network passthrough1 Interface attached successfully
The issue also appears on the macvtap network 1.# virsh list Id Name State ---------------------------------------------------- 7 r7.1 running # virsh dumpxml r7.1 | grep interface -A9 # # # cat mac-vf.xml <interface type='direct'> <mac address='7a:3d:d6:18:19:76'/> <source dev='enp3s16f5' mode='passthrough'/> </interface> # virsh attach-device r7.1 mac-vf.xml Device attached successfully # virsh dumpxml r7.1 | grep interface -A9 <interface type='direct'> <mac address='7a:3d:d6:18:19:76'/> <source dev='enp3s16f5' mode='passthrough'/> <target dev='macvtap0'/> <model type='rtl8139'/> <alias name='net0'/> <address type='pci' domain='0x0000' bus='0x00' slot='0x03' function='0x0'/> </interface> 2. # virsh net-list Name State Autostart Persistent ---------------------------------------------------------- default active no yes hostdev active no yes macvtap-jing active no yes # virsh net-dumpxml macvtap-jing <network> <name>macvtap-jing</name> <uuid>4604073a-e084-4a37-a504-28b0ebe666cf</uuid> <forward dev='enp3s16f5' mode='passthrough'> <interface dev='enp3s16f5'/> <interface dev='enp3s16f3'/> </forward> </network> # cat macvtap-vf.xml <interface type='network'> <source network='macvtap-jing'/> </interface> # virsh attach-device r7.1 macvtap-vf.xml error: Failed to attach device from macvtap-vf.xml error: error creating macvtap interface macvtap1@enp3s16f5 (52:54:00:68:25:05): Invalid argument
The result changed. On libvirt-3.2.0-4.el7.x86_64 1.Prepare a running guest # virsh list Id Name State ---------------------------------------------------- 2 vm1 running 2. Prepare a xml as below # cat mac_vf.xml <interface type='direct'> <mac address='7a:3d6:18:19:76'/> <source dev='p1p1_0' mode='passthrough'/> </interface> 3.# virsh attach-device vm1 mac_vf.xml Device attached successfully # virsh dumpxml vm1 | grep interface -A9 <interface type='direct'> <mac address='7a:3d6:18:19:76'/> <source dev='p1p1_0' mode='passthrough'/> <target dev='macvtap0'/> <model type='rtl8139'/> <alias name='net0'/> <address type='pci' domain='0x0000' bus='0x00' slot='0x07' function='0x0'/> </interface> <serial type='pt Do the step again # virsh attach-device vm1 mac_vf.xml error: Failed to attach device from mac_vf.xml error: error creating macvtap interface macvtap1@p1p1_0 (7a:3d6:18:19:76): Invalid argument 4. Prepare a network pool as below # virsh net-list --all Name State Autostart Persistent ---------------------------------------------------------- passthrough-vf active no yes # virsh net-dumpxml passthrough-vf <network> <name>passthrough-vf</name> <uuid>4e53b598-2305-439d-84f3-2b3564443b9a</uuid> <forward dev='p1p1_0' mode='passthrough'> <interface dev='p1p1_0'/> <interface dev='p1p1_1'/> <interface dev='p1p2_0'/> <interface dev='p1p2_1'/> </forward> </network> 5. Prepare the xml as below # cat vf_pool.xml <interface type='network'> <source network='passthrough-vf'/> </interface> 6.#virsh attach-device vm1 vf_pool.xml Device attached successfully # virsh dumpxml vm1 | grep interface -A8 <interface type='direct'> <mac address='7a:3d6:18:19:76'/> <source dev='p1p1_0' mode='passthrough'/> <target dev='macvtap0'/> <model type='rtl8139'/> <alias name='net0'/> <address type='pci' domain='0x0000' bus='0x00' slot='0x07' function='0x0'/> </interface> <interface type='direct'> <mac address='52:54:00:ce:a9:93'/> <source network='passthrough-vf' dev='p1p1_0' mode='passthrough'/> <target dev='macvtap1'/> <model type='rtl8139'/> <alias name='net1'/> <address type='pci' domain='0x0000' bus='0x00' slot='0x09' function='0x0'/> </interface> # virsh net-dumpxml passthrough-vf <network connections='1'> <name>passthrough-vf</name> <uuid>4e53b598-2305-439d-84f3-2b3564443b9a</uuid> <forward dev='p1p1_0' mode='passthrough'> <interface dev='p1p1_0' connections='1'/> <interface dev='p1p1_1'/> <interface dev='p1p2_0'/> <interface dev='p1p2_1'/> </forward> </network>
In order for this to work properly: 1) the nodedev driver needs to keep track of which devices are currently in use by whom (including those used for macvtap). 2) the network driver needs to allocate a device from the nodedev driver prior to passing it back to the qemu driver. So, the sequence of events would be this: 1) qemu driver calls networkAllocateActualDevice() 2) network scans through the list of devices in the network pool, calling nodedevAllocateDevice (or whatever the function ends up being called) on each device until it gets a success. [optional: nodedev driver binds device to vfio-pci and chowns the iommu group node in /dev/vfio/nn *if* network requests this (alternately we would have to send some sort of cookie back so that qemu could later verify it actually owns the device and can request the bind to vfio-pci, which seems a bit cumbersome) 3) network returns this device to qemu [optional: if device wasn't bound to vfio-pci above, then qemu needs to request it directly here. This could be a security issue if we don't provide a way for qemu to prove to nodedev that it actually does own the device] 4) qemu uses the device 5) qemu calls networkReleaseActualDevice 6) network calls nodedevReleaseDevice 7) network marks device available in pool. Or something like that. This all points out that in order to implement this functionality, the nodedev driver needs to provide an externally visible API to the hostdevmgr.
This bug was closed deferred as a result of bug triage. Please reopen if you disagree and provide justification why this bug should get enough priority. Most important would be information about impact on customer or layered product. Please indicate requested target release.