Created attachment 1312594 [details] Logs Description of problem: Regression with hotunplug vNIC. There is a new regression with hotunplug vNIC and it failing on latest master and cause to link down and up regression as well.. Failed to UpdateVmInterfaceVDS, error = Device instance for device identified by alias net0 and type interface not found, code = 56 EngineException: org.ovirt.engine.core.vdsbroker.vdsbroker.VDSErrorException: VDSGenericException: VDSErrorException: Failed to HotUnplugNicVDS, error = NIC not found, code = 50 (Failed with error DEACTIVATE_NIC_FAILED and code 50) Version-Release number of selected component (if applicable): 4.2.0-0.0.master.20170811144920.gita423008.el7.centos vdsm-4.20.2-60.git06231e5.el7.centos.x86_64 How reproducible: Around 80-95% Steps to Reproduce: 1. Start VM with vNIC 2. Hotunplug and link down the vNIC at same time/action(The vNIC stay unplugged) 3. Try to link up the vNIC 4. Try to unplug the vNIC Actual results: 3 - Failed to UpdateVmInterfaceVDS, error = Device instance for device identified by alias net0 and type interface not found, code = 56 4 -EngineException: org.ovirt.engine.core.vdsbroker.vdsbroker.VDSErrorException: VDSGenericException: VDSErrorException: Failed to HotUnplugNicVDS, error = NIC not found, code = 50 (Failed with error DEACTIVATE_NIC_FAILED and code 50) Expected results: Should work as expected
Reproduction rate is 100% with steps mentioned above^^ , after step 2, vNIC always returns to plugged state after few seconds, although we set it as unplugged. From here it's not possible to link the vNIC up or to unplug it.
This bug report has Keywords: Regression or TestBlocker. Since no regressions or test blockers are allowed between releases, it is also being identified as a blocker for this release. Please resolve ASAP.
Seems the issue is 'dumpXml' on the host returns empty list. The problematic host is - 'camel-vdsa.qa.lab.tlv.redhat.com'. @ahadas, please take a look.
The functionality of 'dumpxmls' with empty list of VMs is different than that of full-list, it simply returns an empty response. The engine never calls this verb with an empty list of VMs so it should be fine. I didn't manage to reproduce it, but the result of 'dumpxmls' doesn't seem related.
The bug reproduced easily with only hotunplug. Start VM with vNIC, hot unplug it, after few seconds the vNIC becomes plugged again on it's own.
So a deeper look into this reveals that VDSM can get out-of-sync with libvirt's domain xml. In this case, it causes VDSM to report a domain xml that contains the 'unplugged' interface. The NIC looks set as unplugged manually by the engine and then its settings are overridden with ones that indicate it is plugged. But the problem is not specific to this flow. Need to figure out why VDSM doesn't report the up-to-date domain xml.
Verified on - 4.2.0-0.0.master.20170820180837.git59243e9.el7.centos and vdsm-4.20.2-90.git6511af5.el7.centos.x86_64
This bugzilla is included in oVirt 4.2.0 release, published on Dec 20th 2017. Since the problem described in this bug report should be resolved in oVirt 4.2.0 release, published on Dec 20th 2017, it has been closed with a resolution of CURRENT RELEASE. If the solution does not work for you, please open a new bug report.