Bug 1480949 - Regression with hotunplug vNIC
Regression with hotunplug vNIC
Status: CLOSED CURRENTRELEASE
Product: vdsm
Classification: oVirt
Component: Core (Show other bugs)
4.20.0
x86_64 Linux
medium Severity high (vote)
: ovirt-4.2.0
: ---
Assigned To: Milan Zamazal
Meni Yakove
: Regression
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2017-08-13 02:09 EDT by Michael Burman
Modified: 2017-12-20 05:44 EST (History)
2 users (show)

See Also:
Fixed In Version:
Doc Type: No Doc Update
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2017-12-20 05:44:46 EST
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: Virt
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---
rule-engine: ovirt‑4.2+
rule-engine: blocker+


Attachments (Terms of Use)
Logs (840.31 KB, application/x-gzip)
2017-08-13 02:09 EDT, Michael Burman
no flags Details


External Trackers
Tracker ID Priority Status Summary Last Updated
oVirt gerrit 80609 master MERGED virt: Make sure all hotunplug calls update domain descriptor 2017-08-15 02:19 EDT

  None (edit)
Description Michael Burman 2017-08-13 02:09:08 EDT
Created attachment 1312594 [details]
Logs

Description of problem:
Regression with hotunplug vNIC. 
There is a new regression with hotunplug vNIC and it failing on latest master and cause to link down and up regression as well..

Failed to UpdateVmInterfaceVDS, error = Device instance for device identified by alias net0 and type interface not found, code = 56

EngineException: org.ovirt.engine.core.vdsbroker.vdsbroker.VDSErrorException: VDSGenericException: VDSErrorException: Failed to HotUnplugNicVDS, error = NIC not found, code = 50 (Failed with error DEACTIVATE_NIC_FAILED and code 50)

Version-Release number of selected component (if applicable):
4.2.0-0.0.master.20170811144920.gita423008.el7.centos
vdsm-4.20.2-60.git06231e5.el7.centos.x86_64

How reproducible:
Around 80-95%

Steps to Reproduce:
1. Start VM with vNIC
2. Hotunplug and link down the vNIC at same time/action(The vNIC stay unplugged)
3. Try to link up the vNIC
4. Try to unplug the vNIC

Actual results:
3 - Failed to UpdateVmInterfaceVDS, error = Device instance for device identified by alias net0 and type interface not found, code = 56

4 -EngineException: org.ovirt.engine.core.vdsbroker.vdsbroker.VDSErrorException: VDSGenericException: VDSErrorException: Failed to HotUnplugNicVDS, error = NIC not found, code = 50 (Failed with error DEACTIVATE_NIC_FAILED and code 50)

Expected results:
Should work as expected
Comment 1 Michael Burman 2017-08-13 02:14:38 EDT
Reproduction rate is 100% with steps mentioned above^^ , after step 2, vNIC always returns to plugged state after few seconds, although we set it as unplugged. From here it's not possible to link the vNIC up or to unplug it.
Comment 2 Red Hat Bugzilla Rules Engine 2017-08-13 03:56:32 EDT
This bug report has Keywords: Regression or TestBlocker.
Since no regressions or test blockers are allowed between releases, it is also being identified as a blocker for this release. Please resolve ASAP.
Comment 3 Alona Kaplan 2017-08-14 05:24:42 EDT
Seems the issue is 'dumpXml' on the host returns empty list.
The problematic host is - 'camel-vdsa.qa.lab.tlv.redhat.com'. @ahadas, please take a look.
Comment 4 Arik 2017-08-14 05:52:35 EDT
The functionality of 'dumpxmls' with empty list of VMs is different than that of full-list, it simply returns an empty response. The engine never calls this verb with an empty list of VMs so it should be fine.
I didn't manage to reproduce it, but the result of 'dumpxmls' doesn't seem related.
Comment 5 Michael Burman 2017-08-14 08:33:33 EDT
The bug reproduced easily with only hotunplug.
Start VM with vNIC, hot unplug it, after few seconds the vNIC becomes plugged again on it's own.
Comment 6 Arik 2017-08-14 08:37:31 EDT
So a deeper look into this reveals that VDSM can get out-of-sync with libvirt's domain xml. In this case, it causes VDSM to report a domain xml that contains the 'unplugged' interface.
The NIC looks set as unplugged manually by the engine and then its settings are overridden with ones that indicate it is plugged. But the problem is not specific to this flow. Need to figure out why VDSM doesn't report the up-to-date domain xml.
Comment 7 Michael Burman 2017-08-21 01:45:43 EDT
Verified on - 4.2.0-0.0.master.20170820180837.git59243e9.el7.centos and vdsm-4.20.2-90.git6511af5.el7.centos.x86_64
Comment 8 Sandro Bonazzola 2017-12-20 05:44:46 EST
This bugzilla is included in oVirt 4.2.0 release, published on Dec 20th 2017.

Since the problem described in this bug report should be
resolved in oVirt 4.2.0 release, published on Dec 20th 2017, it has been closed with a resolution of CURRENT RELEASE.

If the solution does not work for you, please open a new bug report.

Note You need to log in before you can comment on or make changes to this bug.