Bug 1012820

Summary: libvirt should choose the available vf for assignment automatically if the fore vfs are unavaliable
Product: Red Hat Enterprise Linux 8 Reporter: Xuesong Zhang <xuzhang>
Component: libvirtAssignee: Virtualization Maintenance <virt-maint>
Status: CLOSED DEFERRED QA Contact: jiyan <jiyan>
Severity: medium Docs Contact:
Priority: medium    
Version: ---CC: dyuan, jdenemar, jtomko, knoel, laine, mzhan, yafu, yalzhang
Target Milestone: pre-dev-freezeKeywords: Triaged
Target Release: 8.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of:
: 1012829 (view as bug list) Environment:
Last Closed: 2020-02-11 12:11:00 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Bug Depends On:    
Bug Blocks: 1012829    

Description Xuesong Zhang 2013-09-27 08:26:54 UTC
Description
While the fore VF in the VFpool is not available for assignment, libvirt should choose the next available vf for assignment automatically.

Version:
libvirt-1.1.1-5.el7.x86_64
qemu-kvm-1.5.3-3.el7.x86_64
kernel-3.10.0-23.el7.x86_64

How reproducible:
100%

Steps to Reproduce:
1. prepare one hostdev network like the following one:
# virsh net-dumpxml hostnet
<network>
  <name>hostnet</name>
  <uuid>c1fb4ead-21b8-4d69-8ad9-669c55b3dfc7</uuid>
  <forward mode='hostdev' managed='yes'>
    <address type='pci' domain='0x0000' bus='0x03' slot='0x10' function='0x1'/>
    <address type='pci' domain='0x0000' bus='0x03' slot='0x10' function='0x0'/>
  </forward>
</network>

2. prepare one VF xml, the VF is the 1st VF in the vfpool.
# cat vf-hostdev-1.xml
<hostdev mode='subsystem' type='pci' managed='yes'>
  <source>
    <address bus='0x03' slot='0x10' function='0x1'/>
  </source>
</hostdev>

3. hot-plug the vf to the running guest.
# virsh attach-device a vf-hostdev-1.xml 
Device attached successfully

4. prepare one xml like following. 
# cat vf-vfpool.xml 
<interface type='network'>
 <source network='hostnet'/>
</interface>

5. hot-plug the vf in vfpool to guest.
# virsh attach-device a vf-vfpool.xml 
error: Failed to attach device from vf-vfpool.xml
error: Requested operation is not valid: PCI device 0000:03:10.1 is in use by domain a

# virsh attach-device a vf-vfpool.xml 
error: Failed to attach device from vf-vfpool.xml
error: Requested operation is not valid: PCI device 0000:03:10.1 is in use by domain a


Actual results:
in step5, libvirt can't hot-plug other available vf to the guest.


Expected results:
in step5, while the fore vf of the vfpool is not available for hot-plug, libvirt should choose other available vf from the vfpool automatically.


Additional info:

Comment 2 Jiri Denemark 2013-09-27 14:38:26 UTC
There's an easy workaround... just don't manually attach devices you assigned to a hostdev network.

Comment 7 Laine Stump 2015-03-16 14:33:19 UTC
The difficulty here is that allocation of a VF to a domain is done in a *very* separate place from the code that actually attempts to use that VF - we allocate it from the network pool in networkAllocateActualDevice() (in the network driver, which currently has no visibility into the hostdevmanager, which tracks which PCI devices are in use by domains), then at some later time (in the qemu commandline generator, which *does* know about hostdevmanager) we actually attempt to detach the device from the host driver and set the mac address. And even beyond that, we pass the information to qemu, and it is qemu that attempts to actually assign the device.

In order to be able to bypass a device if it is already in use by some domain that referenced it directly rather than via the network device pool, we will need to *at least* make the network driver aware of the hostdevmanager, and possibly move all setup of the device into the network driver (which I actually think is a *good* idea). The same thing should be done for macvtap device initialization, including creating the macvtap device; eventually this will allow us to put networkAllocateActualDevice() behind a public API so that it can (hopefully) be called from an unprivileged libvirtd. The problem with pulling the network device setup into the network driver is that it will make the network driver compulsory (whereas it is now optional), so that may be a reason to *not* do it this way.

Comment 10 yalzhang@redhat.com 2016-03-04 10:15:16 UTC
Add one more scenario, SRIOV VFs pool can not skip interface which is already in use, unless the interface is assigned from the same network.

1. Start 2 networks passthrough1 and passthrough2 with same address list as below:
...  
<forward mode='hostdev' managed='yes'>
    <address type='pci' domain='0x0000' bus='0x86' slot='0x10' function='0x0'/>
    <address type='pci' domain='0x0000' bus='0x86' slot='0x10' function='0x1'/>
    <address type='pci' domain='0x0000' bus='0x86' slot='0x10' function='0x2'/>
  </forward>
...
2. start a guest, then attach interface from the pool
# virsh attach-interface rhel7.2 network passthrough1
Interface attached successfully

# virsh attach-interface rhel7.2 network passthrough2
error: Failed to attach interface
error: internal error: Not reattaching active device 0000:86:10.0

# virsh attach-interface rhel7.2 network passthrough1
Interface attached successfully

Comment 12 Jingjing Shao 2016-06-30 07:21:39 UTC
The issue also appears on the macvtap network 
1.# virsh list
 Id    Name                           State
----------------------------------------------------
 7     r7.1                           running


# virsh dumpxml r7.1 | grep interface -A9
# 
# 

# cat mac-vf.xml 
 <interface type='direct'>
<mac address='7a:3d:d6:18:19:76'/>
<source dev='enp3s16f5' mode='passthrough'/>
</interface>

# virsh attach-device  r7.1  mac-vf.xml
Device attached successfully

# virsh dumpxml r7.1 | grep interface -A9
    <interface type='direct'>
      <mac address='7a:3d:d6:18:19:76'/>
      <source dev='enp3s16f5' mode='passthrough'/>
      <target dev='macvtap0'/>
      <model type='rtl8139'/>
      <alias name='net0'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x03' function='0x0'/>
    </interface>


2. # virsh net-list
 Name                 State      Autostart     Persistent
----------------------------------------------------------
 default              active     no            yes
 hostdev              active     no            yes
 macvtap-jing         active     no            yes

# virsh net-dumpxml macvtap-jing
<network>
  <name>macvtap-jing</name>
  <uuid>4604073a-e084-4a37-a504-28b0ebe666cf</uuid>
  <forward dev='enp3s16f5' mode='passthrough'>
    <interface dev='enp3s16f5'/>
    <interface dev='enp3s16f3'/>
  </forward>
</network>


# cat  macvtap-vf.xml
<interface type='network'>
<source network='macvtap-jing'/>
</interface>


# virsh attach-device  r7.1  macvtap-vf.xml
error: Failed to attach device from macvtap-vf.xml
error: error creating macvtap interface macvtap1@enp3s16f5 (52:54:00:68:25:05): Invalid argument

Comment 14 Jingjing Shao 2017-05-09 11:59:00 UTC
The result changed.

On libvirt-3.2.0-4.el7.x86_64

1.Prepare a running guest
#  virsh list
 Id    Name                           State
----------------------------------------------------
 2     vm1                            running


2. Prepare a xml as below

# cat mac_vf.xml
 <interface type='direct'>
<mac address='7a:3d6:18:19:76'/>
<source dev='p1p1_0' mode='passthrough'/>
</interface>


3.# virsh attach-device vm1 mac_vf.xml    
Device attached successfully


# virsh dumpxml vm1 | grep interface -A9
    <interface type='direct'>
      <mac address='7a:3d6:18:19:76'/>
      <source dev='p1p1_0' mode='passthrough'/>
      <target dev='macvtap0'/>
      <model type='rtl8139'/>
      <alias name='net0'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x07' function='0x0'/>
    </interface>
    <serial type='pt


 Do the step again
# virsh attach-device vm1 mac_vf.xml
error: Failed to attach device from mac_vf.xml
error: error creating macvtap interface macvtap1@p1p1_0 (7a:3d6:18:19:76): Invalid argument


4. Prepare a network pool as below

# virsh net-list --all
 Name                 State      Autostart     Persistent
----------------------------------------------------------
 passthrough-vf       active     no            yes


# virsh net-dumpxml passthrough-vf
<network>
  <name>passthrough-vf</name>
  <uuid>4e53b598-2305-439d-84f3-2b3564443b9a</uuid>
  <forward dev='p1p1_0' mode='passthrough'>
    <interface dev='p1p1_0'/>
    <interface dev='p1p1_1'/>
    <interface dev='p1p2_0'/>
    <interface dev='p1p2_1'/>
  </forward>
</network>

5. Prepare the xml as below
# cat vf_pool.xml
<interface type='network'>
<source network='passthrough-vf'/>
</interface>


6.#virsh attach-device vm1 vf_pool.xml  
Device attached successfully

# virsh dumpxml vm1 | grep interface -A8
    <interface type='direct'>
      <mac address='7a:3d6:18:19:76'/>
      <source dev='p1p1_0' mode='passthrough'/>
      <target dev='macvtap0'/>
      <model type='rtl8139'/>
      <alias name='net0'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x07' function='0x0'/>
    </interface>
    <interface type='direct'>
      <mac address='52:54:00:ce:a9:93'/>
      <source network='passthrough-vf' dev='p1p1_0' mode='passthrough'/>
      <target dev='macvtap1'/>
      <model type='rtl8139'/>
      <alias name='net1'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x09' function='0x0'/>
    </interface>

# virsh net-dumpxml passthrough-vf  
<network connections='1'>
  <name>passthrough-vf</name>
  <uuid>4e53b598-2305-439d-84f3-2b3564443b9a</uuid>
  <forward dev='p1p1_0' mode='passthrough'>
    <interface dev='p1p1_0' connections='1'/>
    <interface dev='p1p1_1'/>
    <interface dev='p1p2_0'/>
    <interface dev='p1p2_1'/>
  </forward>
</network>

Comment 15 Laine Stump 2018-04-23 14:11:15 UTC
In order for this to work properly:

1) the nodedev driver needs to keep track of which devices are currently in use by whom (including those used for macvtap).

2) the network driver needs to allocate a device from the nodedev driver prior to passing it back to the qemu driver.

So, the sequence of events would be this:

1) qemu driver calls networkAllocateActualDevice()

2) network scans through the list of devices in the network pool, calling nodedevAllocateDevice (or whatever the function ends up being called) on each device until it gets a success.

[optional: nodedev driver binds device to vfio-pci and chowns the iommu group node in /dev/vfio/nn *if* network requests this (alternately we would have to send some sort of cookie back so that qemu could later verify it actually owns the device and can request the bind to vfio-pci, which seems a bit cumbersome)

3) network returns this device to qemu

[optional: if device wasn't bound to vfio-pci above, then qemu needs to request it directly here. This could be a security issue if we don't provide a way for qemu to prove to nodedev that it actually does own the device]

4) qemu uses the device

5) qemu calls networkReleaseActualDevice

6) network calls nodedevReleaseDevice

7) network marks device available in pool.

Or something like that.

This all points out that in order to implement this functionality, the nodedev driver needs to provide an externally visible API to the hostdevmgr.

Comment 17 Jaroslav Suchanek 2020-02-11 12:11:00 UTC
This bug was closed deferred as a result of bug triage.

Please reopen if you disagree and provide justification why this bug should
get enough priority. Most important would be information about impact on
customer or layered product. Please indicate requested target release.