Bug 1417217 - SR-IOV vNIC unplugged after migration completed
Summary: SR-IOV vNIC unplugged after migration completed
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: ovirt-engine
Classification: oVirt
Component: BLL.Network
Version: 4.1.0.2
Hardware: Unspecified
OS: Unspecified
medium
high vote
Target Milestone: ovirt-4.1.1-1
: 4.1.1.6
Assignee: Martin Mucha
QA Contact: Meni Yakove
URL:
Whiteboard:
Depends On: 1406283
Blocks:
TreeView+ depends on / blocked
 
Reported: 2017-01-27 14:59 UTC by Meni Yakove
Modified: 2017-04-21 09:53 UTC (History)
4 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Cause: missing locking Consequence: vNIC was not plugged back after migration Fix: added locking Result: vNIC is plugged after migration
Clone Of:
Environment:
Last Closed: 2017-04-21 09:53:02 UTC
oVirt Team: Network
rule-engine: ovirt-4.1+
ylavi: exception+


Attachments (Terms of Use)
engine, source and destination hosts logs (3.14 MB, application/zip)
2017-01-27 14:59 UTC, Meni Yakove
no flags Details


Links
System ID Priority Status Summary Last Updated
Red Hat Bugzilla 1406283 None CLOSED [SR-IOV] - concurrent hotplug fails (due to concurrent getCaps failure) 2019-09-26 10:56:50 UTC
oVirt gerrit 73009 master MERGED core: Migration and host refreshes must not mingle 2017-03-16 09:29:26 UTC
oVirt gerrit 73590 master MERGED core: removed indescriptive and incorrect message 2017-03-16 09:29:35 UTC
oVirt gerrit 73612 master MERGED core: added states when nic can be activated or deactivated 2017-03-16 09:29:38 UTC
oVirt gerrit 74203 ovirt-engine-4.1 MERGED core: removed indescriptive and incorrect message 2017-03-19 10:41:02 UTC
oVirt gerrit 74204 ovirt-engine-4.1 MERGED core: added states when nic can be activated or deactivated 2017-03-19 10:41:09 UTC
oVirt gerrit 74205 ovirt-engine-4.1 MERGED core: Migration and host refreshes must not mingle 2017-03-19 10:40:08 UTC
oVirt gerrit 74298 ovirt-engine-4.1.1.z MERGED core: removed indescriptive and incorrect message 2017-03-20 11:15:47 UTC
oVirt gerrit 74299 ovirt-engine-4.1.1.z MERGED core: added states when nic can be activated or deactivated 2017-03-20 11:15:34 UTC
oVirt gerrit 74300 ovirt-engine-4.1.1.z MERGED core: Migration and host refreshes must not mingle 2017-03-20 11:15:40 UTC

Internal Links: 1406283

Description Meni Yakove 2017-01-27 14:59:50 UTC
Created attachment 1245157 [details]
engine, source and destination hosts logs

Description of problem:
When successfully migrate VM with SR-IOV vNIC sometimes the vNIC is unplugged.

Version-Release number of selected component (if applicable):
ovirt-engine-4.1.0.2-0.2.el7.noarch
vdsm-4.19.2-2.el7ev.x86_64


Steps to Reproduce:
1. Create a VM with SR-IOV vNIC
2. Start the VM
3. Migrate the VM

Actual results:
vNIC is unplugged

Expected results:
vNIC is plugged

Additional info:
2017-01-27 16:35:34,855+02 ERROR [org.ovirt.engine.core.vdsbroker.monitoring.VmDevicesMonitoring] (DefaultQuartzScheduler3) [7b40e0be] VM '0c1494af-a9dd-4e91-9133-fa91d1936b2a' managed non pluggable device was removed unexpectedly from libvirt: 'VmDevice:{id='VmDeviceId:{deviceId='3ae6fc1b-7d90-48fd-959c-db7efd745d4b', vmId='0c1494af-a9dd-4e91-9133-fa91d1936b2a'}', device='hostdev', type='INTERFACE', bootOrder='0', specParams='[]', address='{slot=0x08, bus=0x00, domain=0x0000, type=pci, function=0x0}', managed='true', plugged='false', readOnly='false', deviceAlias='hostdev0', customProperties='[]', snapshotId='null', logicalName='null', hostDevice='pci_0000_05_10_2'}'

Comment 1 Dan Kenigsberg 2017-01-28 13:02:30 UTC
Could it be a dup of Bug 1406283 ?

Comment 2 Martin Mucha 2017-02-06 14:55:21 UTC
looking into logs:
1) part of problems are probably fixed by 'duplicate' 1406283. I'd try to reproduce this bug after 1406283 it's merged.

2) I found this in logs:

WARN  [org.ovirt.engine.core.bll.network.vm.ActivateDeactivateVmNicCommand] (ForkJoinPool-1-worker-6) [66e2d004] Validation of action 'ActivateDeactivateVmNic' fai
led for user admin@internal-authz. Reasons: VAR__ACTION__ACTIVATE,VAR__TYPE__INTERFACE,ACTIVATE_DEACTIVATE_NIC_VM_STATUS_ILLEGAL
2017-01-27 16:35:32,089+02 WARN  [org.ovirt.engine.core.bll.lock.InMemoryLockManager] (ForkJoinPool-1-worker-6) [66e2d004] Trying to release exclusive lock which does not exist, lock key: '0
c1494af-a9dd-4e91-9133-fa91d1936b2aVM'


2017-01-27 16:35:32,117+02 ERROR [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (ForkJoinPool-1-worker-6) [66e2d004] EVENT_ID: VM_MIGRATION_NOT_ALL_VM_NICS_WERE_PLUGG
ED_BACK(12,003), Correlation ID: vms_syncAction_220016f0-13a5-49fe, Job ID: 2d91ecdb-3c4d-431b-9af6-e593e9b59152, Call Stack: null, Custom Event ID: -1, Message: After migration of SR-IOV-mi
gration-vm, following vm nics failed to be plugged back: C1_migration_sriov_vnic1.


Meaning, that plugging nic failed, and therefore you get message, that not all were plugged back. Looking at reason, it's because invalid VM status; following condition must have been violated:
vmStatus == VMStatus.Up || vmStatus == VMStatus.Down || vmStatus == VMStatus.ImageLocked;


I don't have idea in which state VM was when this happened to you, so I don't know if it's ok or not.

3) there's also:
ERROR [org.ovirt.engine.api.restapi.resource.AbstractBackendResource] (default task-9) [] Operation Failed: [Cannot migrate VM. There is no host that satisfies cur
rent scheduling constraints. See below for details:, The host host_mixed_2 did not satisfy internal filter Network because there are no free virtual functions which are suitable for virtual 
nic(s) C1_migration_sriov_vnic1. A virtual function is considered as suitable if the VF's configuration of its physical function contains the virtual nic's network/network label.]

which might mean:
a) you tried to migrate to VM without free VFs
b) you tried that before we fixed VF leakage
c) we did not fix all VF leakage.

--> I think we have to finish 1406283 first, and then try to reproduce this bug.

Comment 3 Yaniv Kaul 2017-03-19 08:48:01 UTC
Missed 4.1.1, moving to 4.1.2.

Comment 4 Dan Kenigsberg 2017-03-20 11:09:26 UTC
much like its (hopefully) twin bug 1406283 this belongs to 4.1.1-1

Comment 5 Michael Burman 2017-03-26 07:20:42 UTC
Verified on - rhevm-4.1.1.6-0.1.el7.noarch and vdsm-4.19.10-1.el7ev.x86_64


Note You need to log in before you can comment on or make changes to this bug.