Bug 1273058 - [HostDev] - Failing to run VM with host device - pci 82576 VF (0x10ca) attached to VM because the host device considered as unavailable
Summary: [HostDev] - Failing to run VM with host device - pci 82576 VF (0x10ca) attach...
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: ovirt-engine
Classification: oVirt
Component: BLL.Network
Version: 3.6.0
Hardware: x86_64
OS: Linux
medium
medium
Target Milestone: ovirt-3.6.2
: 3.6.2.5
Assignee: Dan Kenigsberg
QA Contact: Michael Burman
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2015-10-19 13:39 UTC by Michael Burman
Modified: 2016-02-18 11:00 UTC (History)
6 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2016-02-18 11:00:34 UTC
oVirt Team: Network
Embargoed:
rule-engine: ovirt-3.6.z+
ylavi: planning_ack+
rule-engine: devel_ack+
rule-engine: testing_ack+


Attachments (Terms of Use)
hostdevListByCaps_output_5_FREE_VFs (39.03 KB, text/plain)
2015-12-01 13:09 UTC, Michael Burman
no flags Details


Links
System ID Private Priority Status Summary Last Updated
oVirt gerrit 50175 0 master MERGED engine: Failing to run VM with network host device Never
oVirt gerrit 50483 0 ovirt-engine-3.6 MERGED engine: Failing to run VM with network host device 2015-12-22 11:24:44 UTC
oVirt gerrit 50950 0 refs/tags/ovirt-engine-3.6.2 ABANDONED engine: Failing to run VM with network host device 2015-12-23 07:31:12 UTC
oVirt gerrit 50951 0 ovirt-engine-3.6.2 MERGED engine: Failing to run VM with network host device 2015-12-23 09:12:39 UTC

Description Michael Burman 2015-10-19 13:39:14 UTC
Description of problem:
[HostDev] - Failing to run VM with host device - pci 82576 VF (0x10ca) attached to VM because the host device considered as unavailable.

If trying to run VM with a host device attached to it via the [Host Devices] sub tab, in my case it is a VF( pci 82576 (0x10ca) ), i'm failing with error:

Operation Canceled:

Cannot run VM. There is no host that satisfies current scheduling constraints. See below for details:
The host puma22.scl.lab.tlv.redhat.com did not satisfy internal filter HostDevice because some of the required host devices are unavailable..

But the VF is free and available to use. ^^

engine.log 

CanDoAction of action 'RunVm' failed for user admin@internal. Reasons: VAR__ACTION__RUN,VAR__TYPE__VM,SCHEDULING_ALL_HOSTS_FILTERED_OUT,VAR__FILTERTYPE__INTERNAL,$hostName puma22.scl.lab.tlv.redhat.com,$filterName HostDevice,VAR__DETAIL__HOST_DEVICE_UNAVAILABLE,SCHEDULING_HOST_FILTERED_REASON_WITH_DETAIL
2015-10-19 16:13:06,468 INFO  [org.ovirt.engine.core.bll.RunVmCommand] (ajp-/127.0.0.1:8702-5) [] Lock freed to object 'EngineLock:{exclusiveLocks='[27e2db73-edc0-4985-9718-539f9308be88=<VM, ACTION_TYPE_FAILED_OBJECT_LOCKED>]', sharedLocks='null'}'

Version-Release number of selected component (if applicable):
3.6.0.1-0.1.el6.noarch

How reproducible:
100

Steps to Reproduce:
1. Host with sr-iov capable NICs and at least 1 VF enabled on host.
2. Add device - pci 82576 VF to a VM and try to run it


Actual results:
Cannot run VM. There is no host that satisfies current scheduling constraints. See below for details:
The host puma22.scl.lab.tlv.redhat.com did not satisfy internal filter HostDevice because some of the required host devices are unavailable..

Expected results:
VM should run with host device- pci VF if it is free and available.

Comment 1 Yaniv Kaul 2015-10-20 10:32:08 UTC
Michael - what's the real use case here? Why would you attach the device and not use it via SR-IOV?

Comment 2 Michael Burman 2015-10-21 05:28:56 UTC
I'm not sure about the real use case here, but i was asked to test this as part of the sr-iov feature, although it's part of the HostDev feature.
For me it's important to test that in such scenario 1 VF will be considered as non-free VF and can't be used.

Let's forward this need info to Alona or maybe to someone from Virt DEV.

Comment 3 Martin Betak 2015-10-21 16:20:25 UTC
@Michael: when using direct passthrough (not SR-IOV) device is considered unavailable when it has a 'net' device attached on top of it (this is done to prevent you from accidentally cutting connection to your host by direct passthrough)

When I try to spawn new VF's on my setup via the "Setup Host Networks" UI indeed I see network connections created on top of it. So either you have to manually remove them or use SR-IOV for VF passthrough.

From my initial look this looks like valid behaviour, but Alona can probably provide more details.

Comment 4 Alona Kaplan 2015-11-16 12:14:31 UTC
(In reply to Martin Betak from comment #3)
> @Michael: when using direct passthrough (not SR-IOV) device is considered
> unavailable when it has a 'net' device attached on top of it (this is done
> to prevent you from accidentally cutting connection to your host by direct
> passthrough)
> 
> When I try to spawn new VF's on my setup via the "Setup Host Networks" UI
> indeed I see network connections created on top of it. So either you have to
> manually remove them or use SR-IOV for VF passthrough.
> 
> From my initial look this looks like valid behaviour, but Alona can probably
> provide more details.

As I understood from Michael the VF that was directly attached was 'free'- it hadn't any networks attached to it.

Michael please confirm the VF is free (doesn't have network/label/vlan device attached to it).

Comment 5 Michael Burman 2015-11-16 12:17:37 UTC
Hi Alona,

Yes, i can confirm that the VF was free.

Comment 6 Martin Betak 2015-12-01 10:23:47 UTC
I believe it is very important to define "free" correctly. Usually we can see 'net' interfaces on top of each VF, which are 'DOWN', have no vlan, etc. attached and are currently detected by NetworkDeviceHelper logic as network used.

@Michael could you please post the output of
> vdsClient -s 0 hostdevListByCaps
from the host?

Comment 7 Michael Burman 2015-12-01 13:09:45 UTC
Created attachment 1100920 [details]
hostdevListByCaps_output_5_FREE_VFs

Comment 8 Michael Burman 2015-12-01 13:13:08 UTC
Hi Martin, yes, sure i can.
Attaching the output in a file^^ 
Please note that i have 5 free VFs on host)
you will see that they are free in this output.

Comment 9 Martin Betak 2015-12-01 15:12:17 UTC
@Michael: yes it is true that the VFs themselves are free, but notice the net_enp5s 'net' devices that are attached on each individual VF. This is what is causing the engine (NetworkDeviceHelper) to report the devices as "network used". If we would like to make a finer distinction whether the particular 'net' interface is actually UP and thus really 'in use', that would be a backend network RFE.

Comment 10 Dan Kenigsberg 2015-12-09 09:19:26 UTC
IIRC we agreed that a network PCI device (be it PF or VF) should be considered 'free' unless it has an ovirt network defined on top of it, or it is attached via a tap device to a VM.

Comment 11 Alona Kaplan 2015-12-09 14:01:00 UTC
(In reply to Martin Betak from comment #9)
> @Michael: yes it is true that the VFs themselves are free, but notice the
> net_enp5s 'net' devices that are attached on each individual VF. This is
> what is causing the engine (NetworkDeviceHelper) to report the devices as
> "network used". If we would like to make a finer distinction whether the
> particular 'net' interface is actually UP and thus really 'in use', that
> would be a backend network RFE.

Hi Martin,

I think there is a confusion here. Each free vf has a net device. The NetworkDeviceHelper shouldn't mark the vf as non-free in this case.

I think the confusion is with the case the nic that this net device represents has network attached to it (it can be checked via the setup networks dialog).
In this case, the vf should be marked as non-free.

The bug is that 'NetworkDeviceHelper.isNetworkDeviceFree(..)' is called with the networkDevice while it should be called with the pci device.

I will send a patch to fix it.

Comment 12 Sandro Bonazzola 2015-12-23 15:07:42 UTC
This bug has target milestone 3.6.2 and is on modified without a target release.
This may be perfectly correct, but please check if the patch fixing this bug is included in ovirt-engine-3.6.2. If it's included, please set target-release to 3.6.2 and move to ON_QA. Thanks.

Comment 13 Michael Burman 2016-01-17 09:27:33 UTC
Verified on - 3.6.2.5-0.1.el6

Comment 14 Michael Burman 2016-01-17 09:28:03 UTC
Verified on - 3.6.2.5-0.1.el6


Note You need to log in before you can comment on or make changes to this bug.