Bug 1134256

Summary: Vdsm returns successfully even though hotunplugNic does not complete in libvirt
Product: Red Hat Enterprise Virtualization Manager Reporter: akotov
Component: vdsmAssignee: Marcin Mirecki <mmirecki>
Status: CLOSED ERRATA QA Contact: Michael Burman <mburman>
Severity: high Docs Contact:
Priority: high    
Version: 3.4.1-1CC: alkaplan, bazulay, danken, iheim, jbuchta, knoel, lpeer, lsurette, mburman, myakove, rbalakri, Rhev-m-bugs, sbonazzo, ycui, yeylon, ykaul, ylavi
Target Milestone: ovirt-3.6.2   
Target Release: 3.6.2   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2016-03-09 19:24:48 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: Network RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description akotov 2014-08-27 07:58:47 UTC
Description of problem:
RHEV fails to hot-plug NIC and provides misleading error message 

Version-Release number of selected component (if applicable):

rhevm-backend-3.4.1-0.31.el6ev.noarch
libvirt-0.10.2-29.el6_5.9.x86_64
qemu-kvm-rhev-0.12.1.2-2.415.el6_5.10.x86_64

How reproducible:

Always


Steps to Reproduce:
1. Create a VM, 1 disk, 1 NIC, do not install any OS
2. Start VM and let it hang in "no bootable device" or any other state
3. Unplug NIC
4. Hot-Plug NIC

Actual results:

NIC plug-in fails
--
GUI
: Failed to activate VM Network Interface.
--
--
engine.log
2014-08-27 09:54:52,393 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.HotPlugNicVDSCommand] (ajp-/127.0.0.1:8702-4) [2a407950] Failed in HotPlugNicVDS method
2014-08-27 09:54:52,395 INFO  [org.ovirt.engine.core.vdsbroker.vdsbroker.HotPlugNicVDSCommand] (ajp-/127.0.0.1:8702-4) [2a407950] Command org.ovirt.engine.core.vdsbroker.vdsbroker.HotPlugNicVDSCommand return value 
 StatusOnlyReturnForXmlRpc [mStatus=StatusForXmlRpc [mCode=49, mMessage=An error occurred, but the cause is unknown]]
--

---
libvirtd log
2014-08-27 07:55:47.256+0000: 4171: error : qemuMonitorJSONCheckError:357 : internal error unable to execute QEMU command 'device_add': Duplicate ID 'net0' for device
2014-08-27 07:55:47.289+0000: 4171: error : virNWFilterDHCPSnoopEnd:2131 : internal error ifname "vnet0" not in key map
2014-08-27 07:55:47.315+0000: 4171: error : virNetDevGetIndex:653 : Unable to get index for interface vnet0: No such device
---
---
vdsm
Thread-5605::DEBUG::2014-08-27 07:55:47,378::libvirtconnection::124::root::(wrapper) Unknown libvirterror: ecode: 1 edom: 0 level: 2 message: An error occurred, but the cause is unknown
Thread-5605::ERROR::2014-08-27 07:55:47,379::vm::3301::vm.Vm::(hotplugNic) vmId=`812557ec-e3c4-4502-8118-c762d9efca26`::Hotplug failed
Traceback (most recent call last):
  File "/usr/share/vdsm/vm.py", line 3299, in hotplugNic
  File "/usr/share/vdsm/vm.py", line 891, in f
  File "/usr/lib64/python2.6/site-packages/vdsm/libvirtconnection.py", line 92, in wrapper
  File "/usr/lib64/python2.6/site-packages/libvirt.py", line 399, in attachDevice
libvirtError: An error occurred, but the cause is unknown
---

Expected results:

NIC plugged-in successfully, or, if for whatever reason it is not possible without booted OS, error handling is done and meaningful and helpful error is shown.

Additional info:

I think it is technically possible without OS booted and we should do it. 
Example: 

- power off same VM with NIC unplugged.
- power on VM, still no OS booted
- plug-in NIC. It works.

Comment 1 Dan Kenigsberg 2014-08-30 23:42:07 UTC
(In reply to akotov from comment #0)
> 
> I think it is technically possible without OS booted and we should do it. 
> Example: 
> 
> - power off same VM with NIC unplugged.
> - power on VM, still no OS booted
> - plug-in NIC. It works.

Hot(un)plug requires guest support and participation. How can you tell the plug-in worked? Most probably, qemu is still waiting (in vain) to guest cooperation. A following attempt to unplug/plug the device would show the error.

I do think that Engine should suggest a possible reason for plug/unplug failure; something along the lines:

  An error occurred, but the cause is unknown. (do you have a guest operating system that supports hot-plugging?)

Another (a bit more complex, and harming backward compatibility) idea is to limit hot(un)plug to VMs with running guest OS.

Comment 2 Lior Vernia 2015-02-12 13:55:30 UTC
I don't mind extending the error message with a warning that the guest must have a compatible OS installed.

Comment 3 Barak 2015-05-04 10:06:51 UTC
Meni,

can you please try and reproduce this issue (with guest OS , and without ...)

Comment 4 Meni Yakove 2015-05-04 10:56:22 UTC
mburman please try to reproduce this issue.

Comment 5 Michael Burman 2015-05-04 11:20:55 UTC
When VM running without OS installed, it is possible to add and hot plug nic's to VM without problem, it is possible to hotunplug the nic's as well, but..

If trying to hotplug back nic that just been unplugged, operation will fail with error:
Error while executing action Edit VM Interface properties: Failed to activate VM Network Interface.

- Because of the attempt to use double PCI slot.


engine.log 

2015-05-04 14:06:35,908 ERROR [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (ajp--127.0.0.1-8702-11) [5a07daa1] Correlation ID: 5a07daa1, Call Stack: null, Custom Event ID: -1, Message: Failed to plug Network Interface nic1 (VirtIO) to VM hotplug_test_no_OS. (User: admin@internal)
2015-05-04 14:06:35,984 INFO  [org.ovirt.engine.core.bll.network.vm.UpdateVmInterfaceCommand] (ajp--127.0.0.1-8702-11) [5a07daa1] Command [id=4819fa4e-5db9-4406-8011-8c2f6e287acf]: Compensating DELETED_OR_UPDATED_ENTITY of org.ovirt.engine.core.common.businessentities.network.VmNic; snapshot: id=4125c295-3b26-4e29-a9f6-8e8b15ca7151.
2015-05-04 14:06:36,035 ERROR [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (ajp--127.0.0.1-8702-11) [5a07daa1] Correlation ID: 14705b01, Call Stack: null, Custom Event ID: -1, Message: Failed to update Interface nic1 (VirtIO) for VM hotplug_test_no_OS. (User: admin@internal)


vdsm.log 

Thread-153315::INFO::2015-05-04 14:06:39,021::vm::2297::vm.Vm::(hotplugNic) vmId=`1c5cf02a-b32c-4679-a4fe-7fba119086a7`::Hotplug NIC xml: <interface type="bridge">
        <address bus="0x00" domain="0x0000" function="0x0" slot="0x03" type="pci"/>
        <mac address="00:00:00:01:00:02"/>
        <model type="virtio"/>
        <source bridge="ovirtmgmt"/>
        <link state="up"/>
        <bandwidth/>
</interface>

JsonRpc (StompReactor)::DEBUG::2015-05-04 14:06:39,090::stompReactor::94::Broker.StompAdapter::(handle_frame) Handling message <StompFrame command='SEND'>
JsonRpcServer::DEBUG::2015-05-04 14:06:39,093::__init__::482::jsonrpc.JsonRpcServer::(serve_requests) Waiting for request
Thread-153316::DEBUG::2015-05-04 14:06:39,094::stompReactor::158::yajsonrpc.StompServer::(send) Sending response
Thread-153315::ERROR::2015-05-04 14:06:39,120::vm::2302::vm.Vm::(hotplugNic) vmId=`1c5cf02a-b32c-4679-a4fe-7fba119086a7`::Hotplug failed
Traceback (most recent call last):
  File "/usr/share/vdsm/virt/vm.py", line 2300, in hotplugNic
    self._dom.attachDevice(nicXml)
  File "/usr/share/vdsm/virt/vm.py", line 617, in f
    ret = attr(*args, **kwargs)
  File "/usr/lib/python2.7/site-packages/vdsm/libvirtconnection.py", line 126, in wrapper
    ret = f(*args, **kwargs)
  File "/usr/lib64/python2.7/site-packages/libvirt.py", line 500, in attachDevice
    if ret == -1: raise libvirtError ('virDomainAttachDevice() failed', dom=self)
libvirtError: internal error: Attempted double use of PCI slot 0000:00:03.0 (may need "multifunction='on'" for device on function 0)


See also - https://bugzilla.redhat.com/show_bug.cgi?id=1159729 
RFE i opened regarding this issue.

and this- https://bugzilla.redhat.com/show_bug.cgi?id=1168288

Comment 8 Dan Kenigsberg 2015-11-02 10:56:11 UTC
When hot-plugging fails, we should warn the end user that the failure may have happened due to a former failure to unplug. Maybe something like "... hot (un)plugged may have failed due to non-cooperating guest operating system"

Comment 10 Michael Burman 2015-12-29 13:21:04 UTC
Tested on 3.6.2-0.1.el6 and failedQA. 

- Failing to hotunplug vNIC while the VM has no OS with error:
 'Error while executing action Edit VM Interface properties: Failed to deactivate VM Network Interface.' 
This scenario is covered, but, bad things start to happen and to break when failing with such operation and then migrating this VM to second host. 

- Next scenario is 100% reproducible and should be covered.

a. Run VM with 1 vNIC, without OS system on it and Network is part of boot order.

2015-12-28 08:44:43,813 ERROR [org.ovirt.engine.core.vdsbroker.VmsMonitoring] (DefaultQuartzScheduler_Worker-79) [77a1e10f] VM '5a4f6180-cc65-48fe-94d4-f4e7e86a2406' managed non pluggable device was removed unexpectedly from libvirt: 'VmDevice:{id='VmDeviceId:{deviceId='6dc1bd95-c938-4beb-99f1-fb1883c84156', vmId='5a4f6180-cc65-48fe-94d4-f4e7e86a2406'}', device='spice', type='GRAPHICS', bootOrder='0', specParams='[]', address='', managed='true', plugged='false', readOnly='false', deviceAlias='', customProperties='[]', snapshotId='null', logicalName='null', usingScsiReservation='false'}'


b. Fail to hotunplug the vNIC with error:
'Error while executing action Edit VM Interface properties: Failed to deactivate VM Network Interface.' 
c. Migrate VM to second host with success
d. Once migration finished vNIC reported as unplugged 
e. Hotplug back this vNIC with success and migrate VM back to first host
f. Try to migrate the VM again to second host
Result --> Migration failed and VM got suddenly shut down 

 2015-12-28 08:50:11,986 INFO  [org.ovirt.engine.core.vdsbroker.VmAnalyzer] (DefaultQuartzScheduler_Worker-26) [b09b28e] VM '5a4f6180-cc65-48fe-94d4-f4e7e86a2406(no_os_vm) is running in db and not running in VDS 'silver-vdsb.qa.lab.tlv.redhat.com'
2015-12-28 08:50:11,986 INFO  [org.ovirt.engine.core.vdsbroker.VmAnalyzer] (DefaultQuartzScheduler_Worker-26) [b09b28e] add VM 'no_os_vm' to rerun treatment
2015-12-28 08:50:12,243 ERROR [org.ovirt.engine.core.vdsbroker.VmsMonitoring] (DefaultQuartzScheduler_Worker-26) [b09b28e] Rerun VM '5a4f6180-cc65-48fe-94d4-f4e7e86a2406'. Called from VDS 'silver-vdsb.qa.lab.tlv.redhat.com'
2015-12-28 08:50:12,292 ERROR [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (org.ovirt.thread.pool-7-thread-22) [b09b28e] Correlation ID: 2b48a5e1, Job ID: 67b12f07-4bdc-4ea4-8516-54d1e1f79bab, Call Stack: null, Custom Event ID: -1, Message: Migration failed  (VM: no_os_vm, Source: silver-vdsa.qa.lab.tlv.redhat.com).
2015-12-28 08:50:12,303 INFO  [org.ovirt.engine.core.bll.ProcessDownVmCommand] (org.ovirt.thread.pool-7-thread-17) [1639ca7c] Running command: ProcessDownVmCommand internal: true.
2015-12-28 08:50:12,334 INFO  [org.ovirt.engine.core.bll.MigrateVmCommand] (org.ovirt.thread.pool-7-thread-22) [b09b28e] Lock freed to object 'EngineLock:{exclusiveLocks='[5a4f6180-cc65-48fe-94d4-f4e7e86a2406=<VM, ACTION_TYPE_FAILED_VM_IS_BEING_MIGRATED$VmName no_os_vm>]', sharedLocks='null'}'

-  First host :
Thread-33482::ERROR::2015-12-28 08:49:56,427::vm::758::virt.vm:_startUnderlyingVm) vmId=`5a4f6180-cc65-48fe-94d4-f4e7e86a2406`::The vm start process failed
Traceback (most recent call last):
  File "/usr/share/vdsm/virt/vm.py", line 721, in _startUnderlyingVm
    self._completeIncomingMigration()
  File "/usr/share/vdsm/virt/vm.py", line 2817, in _completeIncomingMigration
    self._domDependentInit()
  File "/usr/share/vdsm/virt/vm.py", line 1787, in _domDependentInit
    self._getUnderlyingVmDevicesInfo()
  File "/usr/share/vdsm/virt/vm.py", line 1717, in _getUnderlyingVmDevicesInfo
    self._getUnderlyingNetworkInterfaceInfo()
  File "/usr/share/vdsm/virt/vm.py", line 4459, in _getUnderlyingNetworkInterfaceInfo
    if nic.macAddr.lower() == mac.lower():
AttributeError: macAddr

- Second host:
periodic/5::ERROR::2015-12-28 08:50:06,937::executor::188::Executor:_execute_task) Unhandled exception in <virt.periodic.NumaInfoMonitor object at 0x2b7cd10>
Traceback (most recent call last):
  File "/usr/lib/python2.7/site-packages/vdsm/executor.py", line 186, in _execute_task
    callable()
  File "/usr/share/vdsm/virt/periodic.py", line 271, in __call__
    self._execute()
  File "/usr/share/vdsm/virt/periodic.py", line 311, in _execute
    self._vm.updateNumaInfo()
  File "/usr/share/vdsm/virt/vm.py", line 5056, in updateNumaInfo
    self._numaInfo = numaUtils.getVmNumaNodeRuntimeInfo(self)
  File "/usr/share/vdsm/numaUtils.py", line 106, in getVmNumaNodeRuntimeInfo
    _get_vcpu_positioning(vm))
  File "/usr/share/vdsm/numaUtils.py", line 129, in _get_vcpu_positioning
    return vm._dom.vcpus()[0]
  File "/usr/share/vdsm/virt/virdomain.py", line 68, in f
    ret = attr(*args, **kwargs)
  File "/usr/lib/python2.7/site-packages/vdsm/libvirtconnection.py", line 124, in wrapper
    ret = f(*args, **kwargs)
  File "/usr/lib64/python2.7/site-packages/libvirt.py", line 2751, in vcpus
    if ret == -1: raise libvirtError ('virDomainGetVcpus() failed', dom=self)
libvirtError: Domain not found: no domain with matching uuid '5a4f6180-cc65-48fe-94d4-f4e7e86a2406' (no_os_vm)

Comment 11 Michael Burman 2015-12-29 13:25:13 UTC
(In reply to Michael Burman from comment #10)
> Tested on 3.6.2-0.1.el6 and failedQA. 
> 
> - Failing to hotunplug vNIC while the VM has no OS with error:
>  'Error while executing action Edit VM Interface properties: Failed to
> deactivate VM Network Interface.' 
> This scenario is covered, but, bad things start to happen and to break when
> failing with such operation and then migrating this VM to second host. 
> 
> - Next scenario is 100% reproducible and should be covered.
> 
> a. Run VM with 1 vNIC, without OS system on it and Network is part of boot
> order.
> 
This error should be below step d and not a
> 2015-12-28 08:44:43,813 ERROR
> [org.ovirt.engine.core.vdsbroker.VmsMonitoring]
> (DefaultQuartzScheduler_Worker-79) [77a1e10f] VM
> '5a4f6180-cc65-48fe-94d4-f4e7e86a2406' managed non pluggable device was
> removed unexpectedly from libvirt:
> 'VmDevice:{id='VmDeviceId:{deviceId='6dc1bd95-c938-4beb-99f1-fb1883c84156',
> vmId='5a4f6180-cc65-48fe-94d4-f4e7e86a2406'}', device='spice',
> type='GRAPHICS', bootOrder='0', specParams='[]', address='', managed='true',
> plugged='false', readOnly='false', deviceAlias='', customProperties='[]',
> snapshotId='null', logicalName='null', usingScsiReservation='false'}'
> 
> 
> b. Fail to hotunplug the vNIC with error:
> 'Error while executing action Edit VM Interface properties: Failed to
> deactivate VM Network Interface.' 
> c. Migrate VM to second host with success
> d. Once migration finished vNIC reported as unplugged 
> e. Hotplug back this vNIC with success and migrate VM back to first host
> f. Try to migrate the VM again to second host
> Result --> Migration failed and VM got suddenly shut down 
> 
>  2015-12-28 08:50:11,986 INFO  [org.ovirt.engine.core.vdsbroker.VmAnalyzer]
> (DefaultQuartzScheduler_Worker-26) [b09b28e] VM
> '5a4f6180-cc65-48fe-94d4-f4e7e86a2406(no_os_vm) is running in db and not
> running in VDS 'silver-vdsb.qa.lab.tlv.redhat.com'
> 2015-12-28 08:50:11,986 INFO  [org.ovirt.engine.core.vdsbroker.VmAnalyzer]
> (DefaultQuartzScheduler_Worker-26) [b09b28e] add VM 'no_os_vm' to rerun
> treatment
> 2015-12-28 08:50:12,243 ERROR
> [org.ovirt.engine.core.vdsbroker.VmsMonitoring]
> (DefaultQuartzScheduler_Worker-26) [b09b28e] Rerun VM
> '5a4f6180-cc65-48fe-94d4-f4e7e86a2406'. Called from VDS
> 'silver-vdsb.qa.lab.tlv.redhat.com'
> 2015-12-28 08:50:12,292 ERROR
> [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector]
> (org.ovirt.thread.pool-7-thread-22) [b09b28e] Correlation ID: 2b48a5e1, Job
> ID: 67b12f07-4bdc-4ea4-8516-54d1e1f79bab, Call Stack: null, Custom Event ID:
> -1, Message: Migration failed  (VM: no_os_vm, Source:
> silver-vdsa.qa.lab.tlv.redhat.com).
> 2015-12-28 08:50:12,303 INFO 
> [org.ovirt.engine.core.bll.ProcessDownVmCommand]
> (org.ovirt.thread.pool-7-thread-17) [1639ca7c] Running command:
> ProcessDownVmCommand internal: true.
> 2015-12-28 08:50:12,334 INFO  [org.ovirt.engine.core.bll.MigrateVmCommand]
> (org.ovirt.thread.pool-7-thread-22) [b09b28e] Lock freed to object
> 'EngineLock:{exclusiveLocks='[5a4f6180-cc65-48fe-94d4-f4e7e86a2406=<VM,
> ACTION_TYPE_FAILED_VM_IS_BEING_MIGRATED$VmName no_os_vm>]',
> sharedLocks='null'}'
> 
> -  First host :
> Thread-33482::ERROR::2015-12-28
> 08:49:56,427::vm::758::virt.vm:_startUnderlyingVm)
> vmId=`5a4f6180-cc65-48fe-94d4-f4e7e86a2406`::The vm start process failed
> Traceback (most recent call last):
>   File "/usr/share/vdsm/virt/vm.py", line 721, in _startUnderlyingVm
>     self._completeIncomingMigration()
>   File "/usr/share/vdsm/virt/vm.py", line 2817, in _completeIncomingMigration
>     self._domDependentInit()
>   File "/usr/share/vdsm/virt/vm.py", line 1787, in _domDependentInit
>     self._getUnderlyingVmDevicesInfo()
>   File "/usr/share/vdsm/virt/vm.py", line 1717, in
> _getUnderlyingVmDevicesInfo
>     self._getUnderlyingNetworkInterfaceInfo()
>   File "/usr/share/vdsm/virt/vm.py", line 4459, in
> _getUnderlyingNetworkInterfaceInfo
>     if nic.macAddr.lower() == mac.lower():
> AttributeError: macAddr
> 
> - Second host:
> periodic/5::ERROR::2015-12-28
> 08:50:06,937::executor::188::Executor:_execute_task) Unhandled exception in
> <virt.periodic.NumaInfoMonitor object at 0x2b7cd10>
> Traceback (most recent call last):
>   File "/usr/lib/python2.7/site-packages/vdsm/executor.py", line 186, in
> _execute_task
>     callable()
>   File "/usr/share/vdsm/virt/periodic.py", line 271, in __call__
>     self._execute()
>   File "/usr/share/vdsm/virt/periodic.py", line 311, in _execute
>     self._vm.updateNumaInfo()
>   File "/usr/share/vdsm/virt/vm.py", line 5056, in updateNumaInfo
>     self._numaInfo = numaUtils.getVmNumaNodeRuntimeInfo(self)
>   File "/usr/share/vdsm/numaUtils.py", line 106, in getVmNumaNodeRuntimeInfo
>     _get_vcpu_positioning(vm))
>   File "/usr/share/vdsm/numaUtils.py", line 129, in _get_vcpu_positioning
>     return vm._dom.vcpus()[0]
>   File "/usr/share/vdsm/virt/virdomain.py", line 68, in f
>     ret = attr(*args, **kwargs)
>   File "/usr/lib/python2.7/site-packages/vdsm/libvirtconnection.py", line
> 124, in wrapper
>     ret = f(*args, **kwargs)
>   File "/usr/lib64/python2.7/site-packages/libvirt.py", line 2751, in vcpus
>     if ret == -1: raise libvirtError ('virDomainGetVcpus() failed', dom=self)
> libvirtError: Domain not found: no domain with matching uuid
> '5a4f6180-cc65-48fe-94d4-f4e7e86a2406' (no_os_vm)

Comment 12 Michael Burman 2016-01-18 08:03:51 UTC
Verified on - 3.6.2.5-0.1.el6

Comment 14 errata-xmlrpc 2016-03-09 19:24:48 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHBA-2016-0362.html