Bug 1480949 - Regression with hotunplug vNIC
Summary: Regression with hotunplug vNIC
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: vdsm
Classification: oVirt
Component: Core
Version: 4.20.0
Hardware: x86_64
OS: Linux
medium
high
Target Milestone: ovirt-4.2.0
: ---
Assignee: Milan Zamazal
QA Contact: Meni Yakove
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2017-08-13 06:09 UTC by Michael Burman
Modified: 2017-12-20 10:44 UTC (History)
2 users (show)

Fixed In Version:
Doc Type: No Doc Update
Doc Text:
Clone Of:
Environment:
Last Closed: 2017-12-20 10:44:46 UTC
oVirt Team: Virt
Embargoed:
rule-engine: ovirt-4.2+
rule-engine: blocker+


Attachments (Terms of Use)
Logs (840.31 KB, application/x-gzip)
2017-08-13 06:09 UTC, Michael Burman
no flags Details


Links
System ID Private Priority Status Summary Last Updated
oVirt gerrit 80609 0 master MERGED virt: Make sure all hotunplug calls update domain descriptor 2020-08-06 17:28:05 UTC

Description Michael Burman 2017-08-13 06:09:08 UTC
Created attachment 1312594 [details]
Logs

Description of problem:
Regression with hotunplug vNIC. 
There is a new regression with hotunplug vNIC and it failing on latest master and cause to link down and up regression as well..

Failed to UpdateVmInterfaceVDS, error = Device instance for device identified by alias net0 and type interface not found, code = 56

EngineException: org.ovirt.engine.core.vdsbroker.vdsbroker.VDSErrorException: VDSGenericException: VDSErrorException: Failed to HotUnplugNicVDS, error = NIC not found, code = 50 (Failed with error DEACTIVATE_NIC_FAILED and code 50)

Version-Release number of selected component (if applicable):
4.2.0-0.0.master.20170811144920.gita423008.el7.centos
vdsm-4.20.2-60.git06231e5.el7.centos.x86_64

How reproducible:
Around 80-95%

Steps to Reproduce:
1. Start VM with vNIC
2. Hotunplug and link down the vNIC at same time/action(The vNIC stay unplugged)
3. Try to link up the vNIC
4. Try to unplug the vNIC

Actual results:
3 - Failed to UpdateVmInterfaceVDS, error = Device instance for device identified by alias net0 and type interface not found, code = 56

4 -EngineException: org.ovirt.engine.core.vdsbroker.vdsbroker.VDSErrorException: VDSGenericException: VDSErrorException: Failed to HotUnplugNicVDS, error = NIC not found, code = 50 (Failed with error DEACTIVATE_NIC_FAILED and code 50)

Expected results:
Should work as expected

Comment 1 Michael Burman 2017-08-13 06:14:38 UTC
Reproduction rate is 100% with steps mentioned above^^ , after step 2, vNIC always returns to plugged state after few seconds, although we set it as unplugged. From here it's not possible to link the vNIC up or to unplug it.

Comment 2 Red Hat Bugzilla Rules Engine 2017-08-13 07:56:32 UTC
This bug report has Keywords: Regression or TestBlocker.
Since no regressions or test blockers are allowed between releases, it is also being identified as a blocker for this release. Please resolve ASAP.

Comment 3 Alona Kaplan 2017-08-14 09:24:42 UTC
Seems the issue is 'dumpXml' on the host returns empty list.
The problematic host is - 'camel-vdsa.qa.lab.tlv.redhat.com'. @ahadas, please take a look.

Comment 4 Arik 2017-08-14 09:52:35 UTC
The functionality of 'dumpxmls' with empty list of VMs is different than that of full-list, it simply returns an empty response. The engine never calls this verb with an empty list of VMs so it should be fine.
I didn't manage to reproduce it, but the result of 'dumpxmls' doesn't seem related.

Comment 5 Michael Burman 2017-08-14 12:33:33 UTC
The bug reproduced easily with only hotunplug.
Start VM with vNIC, hot unplug it, after few seconds the vNIC becomes plugged again on it's own.

Comment 6 Arik 2017-08-14 12:37:31 UTC
So a deeper look into this reveals that VDSM can get out-of-sync with libvirt's domain xml. In this case, it causes VDSM to report a domain xml that contains the 'unplugged' interface.
The NIC looks set as unplugged manually by the engine and then its settings are overridden with ones that indicate it is plugged. But the problem is not specific to this flow. Need to figure out why VDSM doesn't report the up-to-date domain xml.

Comment 7 Michael Burman 2017-08-21 05:45:43 UTC
Verified on - 4.2.0-0.0.master.20170820180837.git59243e9.el7.centos and vdsm-4.20.2-90.git6511af5.el7.centos.x86_64

Comment 8 Sandro Bonazzola 2017-12-20 10:44:46 UTC
This bugzilla is included in oVirt 4.2.0 release, published on Dec 20th 2017.

Since the problem described in this bug report should be
resolved in oVirt 4.2.0 release, published on Dec 20th 2017, it has been closed with a resolution of CURRENT RELEASE.

If the solution does not work for you, please open a new bug report.


Note You need to log in before you can comment on or make changes to this bug.