Bug 1417217

Summary: SR-IOV vNIC unplugged after migration completed
Product: [oVirt] ovirt-engine Reporter: Meni Yakove <myakove>
Component: BLL.NetworkAssignee: Martin Mucha <mmucha>
Status: CLOSED CURRENTRELEASE QA Contact: Meni Yakove <myakove>
Severity: high Docs Contact:
Priority: medium    
Version: 4.1.0.2CC: bugs, danken, mburman, ylavi
Target Milestone: ovirt-4.1.1-1Keywords: Automation, TestOnly
Target Release: 4.1.1.6Flags: rule-engine: ovirt-4.1+
ylavi: exception+
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Cause: missing locking Consequence: vNIC was not plugged back after migration Fix: added locking Result: vNIC is plugged after migration
Story Points: ---
Clone Of: Environment:
Last Closed: 2017-04-21 09:53:02 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: Network RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1406283    
Bug Blocks:    
Attachments:
Description Flags
engine, source and destination hosts logs none

Description Meni Yakove 2017-01-27 14:59:50 UTC
Created attachment 1245157 [details]
engine, source and destination hosts logs

Description of problem:
When successfully migrate VM with SR-IOV vNIC sometimes the vNIC is unplugged.

Version-Release number of selected component (if applicable):
ovirt-engine-4.1.0.2-0.2.el7.noarch
vdsm-4.19.2-2.el7ev.x86_64


Steps to Reproduce:
1. Create a VM with SR-IOV vNIC
2. Start the VM
3. Migrate the VM

Actual results:
vNIC is unplugged

Expected results:
vNIC is plugged

Additional info:
2017-01-27 16:35:34,855+02 ERROR [org.ovirt.engine.core.vdsbroker.monitoring.VmDevicesMonitoring] (DefaultQuartzScheduler3) [7b40e0be] VM '0c1494af-a9dd-4e91-9133-fa91d1936b2a' managed non pluggable device was removed unexpectedly from libvirt: 'VmDevice:{id='VmDeviceId:{deviceId='3ae6fc1b-7d90-48fd-959c-db7efd745d4b', vmId='0c1494af-a9dd-4e91-9133-fa91d1936b2a'}', device='hostdev', type='INTERFACE', bootOrder='0', specParams='[]', address='{slot=0x08, bus=0x00, domain=0x0000, type=pci, function=0x0}', managed='true', plugged='false', readOnly='false', deviceAlias='hostdev0', customProperties='[]', snapshotId='null', logicalName='null', hostDevice='pci_0000_05_10_2'}'

Comment 1 Dan Kenigsberg 2017-01-28 13:02:30 UTC
Could it be a dup of Bug 1406283 ?

Comment 2 Martin Mucha 2017-02-06 14:55:21 UTC
looking into logs:
1) part of problems are probably fixed by 'duplicate' 1406283. I'd try to reproduce this bug after 1406283 it's merged.

2) I found this in logs:

WARN  [org.ovirt.engine.core.bll.network.vm.ActivateDeactivateVmNicCommand] (ForkJoinPool-1-worker-6) [66e2d004] Validation of action 'ActivateDeactivateVmNic' fai
led for user admin@internal-authz. Reasons: VAR__ACTION__ACTIVATE,VAR__TYPE__INTERFACE,ACTIVATE_DEACTIVATE_NIC_VM_STATUS_ILLEGAL
2017-01-27 16:35:32,089+02 WARN  [org.ovirt.engine.core.bll.lock.InMemoryLockManager] (ForkJoinPool-1-worker-6) [66e2d004] Trying to release exclusive lock which does not exist, lock key: '0
c1494af-a9dd-4e91-9133-fa91d1936b2aVM'


2017-01-27 16:35:32,117+02 ERROR [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (ForkJoinPool-1-worker-6) [66e2d004] EVENT_ID: VM_MIGRATION_NOT_ALL_VM_NICS_WERE_PLUGG
ED_BACK(12,003), Correlation ID: vms_syncAction_220016f0-13a5-49fe, Job ID: 2d91ecdb-3c4d-431b-9af6-e593e9b59152, Call Stack: null, Custom Event ID: -1, Message: After migration of SR-IOV-mi
gration-vm, following vm nics failed to be plugged back: C1_migration_sriov_vnic1.


Meaning, that plugging nic failed, and therefore you get message, that not all were plugged back. Looking at reason, it's because invalid VM status; following condition must have been violated:
vmStatus == VMStatus.Up || vmStatus == VMStatus.Down || vmStatus == VMStatus.ImageLocked;


I don't have idea in which state VM was when this happened to you, so I don't know if it's ok or not.

3) there's also:
ERROR [org.ovirt.engine.api.restapi.resource.AbstractBackendResource] (default task-9) [] Operation Failed: [Cannot migrate VM. There is no host that satisfies cur
rent scheduling constraints. See below for details:, The host host_mixed_2 did not satisfy internal filter Network because there are no free virtual functions which are suitable for virtual 
nic(s) C1_migration_sriov_vnic1. A virtual function is considered as suitable if the VF's configuration of its physical function contains the virtual nic's network/network label.]

which might mean:
a) you tried to migrate to VM without free VFs
b) you tried that before we fixed VF leakage
c) we did not fix all VF leakage.

--> I think we have to finish 1406283 first, and then try to reproduce this bug.

Comment 3 Yaniv Kaul 2017-03-19 08:48:01 UTC
Missed 4.1.1, moving to 4.1.2.

Comment 4 Dan Kenigsberg 2017-03-20 11:09:26 UTC
much like its (hopefully) twin bug 1406283 this belongs to 4.1.1-1

Comment 5 Michael Burman 2017-03-26 07:20:42 UTC
Verified on - rhevm-4.1.1.6-0.1.el7.noarch and vdsm-4.19.10-1.el7ev.x86_64