Bug 1274316 - [SR-IOV] - 'pci-passthrough' vNIC reported as unplugged in UI once running the VM, although the vNICs state is UP and plugged
[SR-IOV] - 'pci-passthrough' vNIC reported as unplugged in UI once running th...
Status: CLOSED CURRENTRELEASE
Product: ovirt-engine
Classification: oVirt
Component: BLL.Network (Show other bugs)
3.6.0
x86_64 Linux
medium Severity urgent (vote)
: ovirt-3.6.1
: 3.6.1
Assigned To: Alona Kaplan
Meni Yakove
network
:
Depends On: 1261352
Blocks:
  Show dependency treegraph
 
Reported: 2015-10-22 09:21 EDT by Ido Barkan
Modified: 2016-02-10 14:15 EST (History)
15 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: 1261352
Environment:
Last Closed: 2015-12-16 07:20:52 EST
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: Network
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---
rule-engine: ovirt‑3.6.z+
ylavi: Triaged+
ylavi: planning_ack+
danken: devel_ack+
rule-engine: testing_ack+


Attachments (Terms of Use)


External Trackers
Tracker ID Priority Status Summary Last Updated
oVirt gerrit 47618 master MERGED net: SRIOV API change. Never
oVirt gerrit 47619 master ABANDONED net: moved unit tests from HostdevCreationTests Never
oVirt gerrit 47633 master MERGED engine: Running passthrough vnic- api change Never
oVirt gerrit 47650 ovirt-3.6 MERGED net: SRIOV API change. Never
oVirt gerrit 47704 ovirt-engine-3.6 MERGED engine: Running passthrough vnic- api change Never

  None (edit)
Description Ido Barkan 2015-10-22 09:21:35 EDT
+++ This bug was initially created as a clone of Bug #1261352 +++

Description of problem:
[SR-IOV] - 'pci-passthrough' vNIC reported as unplugged in UI once running the VM, although the vNICs state is UP and plugged.

Once running a VM with 'pci-passthrough' vNIC(vNIC is plugged), the vNIC changes his state to unplugged in UI, but the actual state of the vNIC reported as plugged in the client's OS. 


vdsm log -->

Traceback (most recent call last):
  File "/usr/share/vdsm/virt/vm.py", line 1384, in _getRunningVmStats
    vm_sample.interval)
  File "/usr/share/vdsm/virt/vmstats.py", line 42, in produce
    networks(vm, stats, first_sample, last_sample, interval)
  File "/usr/share/vdsm/virt/vmstats.py", line 213, in networks
    first_indexes = _find_bulk_stats_reverse_map(first_sample, 'net')
  File "/usr/share/vdsm/virt/vmstats.py", line 340, in _find_bulk_stats_reverse_map
    name_to_idx[stats['%s.%d.name' % (group, idx)]] = idx
KeyError: 'net.0.name'
Thread-35436::INFO::2015-09-09 08:53:55,272::xmlrpc::92::vds.XMLRPCServer::(_process_requests) Request handler for 127.0.0.1:47532 stopped
Thread-35437::ERROR::2015-09-09 08:54:01,080::vm::1387::virt.vm::(_getRunningVmStats) vmId=`c7758e8d-610e-4f43-8504-ed6acf5e2513`::Error fetching vm stats
Traceback (most recent call last):
  File "/usr/share/vdsm/virt/vm.py", line 1384, in _getRunningVmStats
    vm_sample.interval)
  File "/usr/share/vdsm/virt/vmstats.py", line 42, in produce
    networks(vm, stats, first_sample, last_sample, interval)
  File "/usr/share/vdsm/virt/vmstats.py", line 213, in networks
    first_indexes = _find_bulk_stats_reverse_map(first_sample, 'net')
  File "/usr/share/vdsm/virt/vmstats.py", line 340, in _find_bulk_stats_reverse_map
    name_to_idx[stats['%s.%d.name' % (group, idx)]] = idx
KeyError: 'net.0.name'


Version-Release number of selected component (if applicable):
3.6.0-0.13.master.el6

How reproducible:
50%-90% sometimes

Steps to Reproduce:
1. SR-IOV setup , enable 1 VF on host
2. Create network with 'passthrough' profile and add vNIC with this profile to VM(pci-passthrough type)
3. Run VM

Actual results:
vNIC changes his state to unplugged once starting the VM in the UI.
vNIC reported as plugged in the client's OS and vNIC got ip from dhcp.

Expected results:
vNIC shouldn't change his state to unplugged in the UI. should stay plugged and UI should report that vNIC plugged.

--- Additional comment from Michael Burman on 2015-09-10 11:08:39 EDT ---



--- Additional comment from Michael Burman on 2015-09-10 11:09:33 EDT ---



--- Additional comment from Michael Burman on 2015-09-10 11:11:12 EDT ---



--- Additional comment from Michael Burman on 2015-09-10 11:11:15 EDT ---



--- Additional comment from Michael Burman on 2015-09-10 11:11:45 EDT ---



--- Additional comment from Michael Burman on 2015-10-14 01:59:57 EDT ---

This bug causing problems testing sr-iov feature and maybe blocking the feature.

- I can't get ip for vlan tagged passthrough profiles. 
- Sometimes VM coming up and vNIC not reported at all in the client's OS
- This bug should be fixed as soon as possible.

--- Additional comment from Yaniv Dary on 2015-10-14 09:29:01 EDT ---

Can we assign someone to look?

--- Additional comment from Alona Kaplan on 2015-10-19 08:48:27 EDT ---

I think the bug is cause because- the output of vdsClient -s 0  list contains the same device tw        
devices = [{'device': 'pci_0000_05_10_1', 'specParams': {'macAddr': '00:00:00:00:00:22'}, 'type': 'hostdev', 'deviceId': '6940d5e7-9814-4ae0-94ef-f78e68229e76'},
{'nicModel': 'passthrough', 'macAddr': '00:00:00:00:00:22', 'linkActive': True, 'name': 'hostdev0', 'alias': 'hostdev0', 'address': {'slot': '0x10', 'bus': '0x05', 'domain': '0x0000', 'type': 'pci', 'function': '0x1'}, 'device': 'hostdev', 'type': 'interface'}

Both of the entries represent the same device. I'm not sure (didn't debug it), but I think the engine tries to get the info of the device from the first entry, while the actual info about the device reside in the second entry.
Those two entries should be merged to one entry that contains the deviceId as reported by the engine and the device's actual data.

Ido/Martin P- This bug was discovered a long time ago, and should have been already fixed by one of you.
Do you know why it still exists?

--- Additional comment from Martin Polednik on 2015-10-20 03:36:58 EDT ---

The code wasn't really touched afaik. The fix requires some expertise in networking as I'm not aware of the implications of not using the interface device.

note:
This happens right when we parse the XML libvirt has constructed (getUnderlying*). We use hostdev for the device creation, but then expect the "nic device" to be populated. "Merging" is the answer, but I'm not sure which element will eventually be a better choice.
Comment 1 Yaniv Lavi (Dary) 2015-10-29 08:51:05 EDT
In oVirt testing is done on single release by default. Therefore I'm removing the 4.0 flag. If you think this bug must be tested in 4.0 as well, please re-add the flag. Please note we might not have testing resources to handle the 4.0 clone.
Comment 2 Yaniv Lavi (Dary) 2015-11-01 04:27:39 EST
Should this be on MODIFIED?
Comment 3 Michael Burman 2015-11-29 04:37:09 EST
Verified on - 3.6.1-0.2.el6 and vdsm-4.17.11-0.el7ev.noarch
Comment 4 Sandro Bonazzola 2015-12-16 07:20:52 EST
According to verification status and target milestone this issue should be fixed in oVirt 3.6.1. Closing current release.

Note You need to log in before you can comment on or make changes to this bug.