DescriptionJulio Entrena Perez
2013-05-20 15:23:58 UTC
Description of problem:
"Internal Engine Error" is reported by destination host after an ongoing live migration times out ('migration_timeout' reached).
Webadmin portal presents user with unhelpful error message which doesn't provide any hint of what did go wrong.
Version-Release number of selected component (if applicable):
vdsm-4.10.2-1.13.el6ev
rhevm-3.1.0-53.el6ev
How reproducible:
Unknown, it happened once so far.
Steps to Reproduce:
1. Start a live migration which won't complete within the default 'migration_timeout' of 300 seconds.
2. Wait until those 5 minutes have elapsed.
Actual results:
Migration failed due to Error: Internal Engine Error (VM: vm_hostname, Source Host: host_hostname).
Expected results:
A more descriptive error message, e.g. "Migration failed to complete within default timeout of $migration_timeout seconds".
Additional info:
- Source host:
[...]
Thread-61770::INFO::2013-05-17 12:33:12,750::libvirtvm::413::vm.Vm::(run) vmId=`d6446340-b00a-4068-8778-2227f89776fd`::Migration Progress: 292 seconds elapsed, 50% of data processed, 50% of mem processed
Thread-61770::INFO::2013-05-17 12:33:26,731::libvirtvm::413::vm.Vm::(run) vmId=`d6446340-b00a-4068-8778-2227f89776fd`::Migration Progress: 306 seconds elapsed, 51% of data processed, 51% of mem processed
Thread-61769::DEBUG::2013-05-17 12:33:26,302::libvirtvm::351::vm.Vm::(run) vmId=`d6446340-b00a-4068-8778-2227f89776fd`::setting migration downtime to 250
Thread-61770::WARNING::2013-05-17 12:33:38,616::libvirtvm::399::vm.Vm::(run) vmId=`d6446340-b00a-4068-8778-2227f89776fd`::Migration is stuck: Hasn't progressed in 307.269540071 seconds. Aborting.
Thread-61770::DEBUG::2013-05-17 12:33:42,844::libvirtvm::416::vm.Vm::(stop) vmId=`d6446340-b00a-4068-8778-2227f89776fd`::stopping migration monitor thread
libvirtEventLoop::DEBUG::2013-05-17 12:33:51,554::libvirtvm::2808::vm.Vm::(_onLibvirtLifecycleEvent) vmId=`d6446340-b00a-4068-8778-2227f89776fd`::event Resumed detail 1 opaque None
Thread-61766::DEBUG::2013-05-17 12:33:51,577::libvirtvm::357::vm.Vm::(cancel) vmId=`d6446340-b00a-4068-8778-2227f89776fd`::canceling migration downtime thread
Thread-61766::DEBUG::2013-05-17 12:33:51,579::libvirtvm::416::vm.Vm::(stop) vmId=`d6446340-b00a-4068-8778-2227f89776fd`::stopping migration monitor thread
Thread-61769::DEBUG::2013-05-17 12:33:51,579::libvirtvm::354::vm.Vm::(run) vmId=`d6446340-b00a-4068-8778-2227f89776fd`::migration downtime thread exiting
Thread-61766::ERROR::2013-05-17 12:33:51,581::vm::195::vm.Vm::(_recover) vmId=`d6446340-b00a-4068-8778-2227f89776fd`::operation aborted: migration job: canceled by client
Thread-61766::ERROR::2013-05-17 12:33:58,758::vm::283::vm.Vm::(run) vmId=`d6446340-b00a-4068-8778-2227f89776fd`::Failed to migrate
[...]
Thread-61836::DEBUG::2013-05-17 12:34:00,605::BindingXMLRPC::903::vds::(wrapper) client [2.3.4.5]::call vmGetMigrationStatus with ('d6446340-b00a-4068-8778-2227f89776fd',) {}
Thread-61836::DEBUG::2013-05-17 12:34:00,605::BindingXMLRPC::910::vds::(wrapper) return vmGetMigrationStatus with {'status': {'message': 'Migration canceled', 'code': {'status': {'message': 'Migration not in progress', 'code': 47}}}}
- Destination host:
Thread-50523::ERROR::2013-05-17 12:33:51,488::utils::416::vm.Vm::(collect) vmId=`d6446340-b00a-4068-8778-2227f89776fd`::Stats function failed: <AdvancedStatsFunction _highWrite at 0x21d5d50>
Traceback (most recent call last):
File "/usr/lib64/python2.6/site-packages/vdsm/utils.py", line 412, in collect
statsFunction()
File "/usr/lib64/python2.6/site-packages/vdsm/utils.py", line 287, in __call__
retValue = self._function(*args, **kwargs)
File "/usr/share/vdsm/libvirtvm.py", line 109, in _highWrite
self._vm._dom.blockInfo(vmDrive.path, 0)
File "/usr/share/vdsm/libvirtvm.py", line 525, in f
raise toe
TimeoutError: Timed out during operation: cannot acquire state change lock
[...]
libvirtEventLoop::DEBUG::2013-05-17 12:33:51,509::libvirtvm::2808::vm.Vm::(_onLibvirtLifecycleEvent) vmId=`d6446340-b00a-4068-8778-2227f89776fd`::event Stopped detail 5 opaque None
Thread-50523::DEBUG::2013-05-17 12:33:51,509::task::957::TaskManager.Task::(_decref) Task=`579740f0-17e8-4ac9-8605-a078bc168489`::ref 0 aborting False
libvirtEventLoop::INFO::2013-05-17 12:33:51,512::vm::713::vm.Vm::(_onQemuDeath) vmId=`d6446340-b00a-4068-8778-2227f89776fd`::underlying process disconnected
libvirtEventLoop::INFO::2013-05-17 12:33:51,516::libvirtvm::2374::vm.Vm::(releaseVm) vmId=`d6446340-b00a-4068-8778-2227f89776fd`::Release VM resources
[...]
Thread-50546::DEBUG::2013-05-17 12:33:51,733::BindingXMLRPC::903::vds::(wrapper) client [1.2.3.4]::call vmDestroy with ('d6446340-b00a-4068-8778-2227f89776fd',) {}
[...]
Thread-50523::ERROR::2013-05-17 12:33:51,683::libvirtconnection::106::vds::(wrapper) Unknown libvirterror: ecode: 1 edom: 10
Traceback (most recent call last):
File "/usr/lib64/python2.6/site-packages/vdsm/libvirtconnection.py", line 83, in wrapper
ret = f(*args, **kwargs)
File "/usr/lib64/python2.6/site-packages/libvirt.py", line 1849, in getCPUStats
if ret is None: raise libvirtError ('virDomainGetCPUStats() failed', dom=self)
libvirtError: internal error No such domain <D6>Dc@<B0>
@h<87>x"'<F8><97>v<FD>
[...]
Thread-50546::INFO::2013-05-17 12:33:51,775::API::308::vds::(destroy) vmContainerLock acquired by vm d6446340-b00a-4068-8778-2227f89776fd
[...]
Thread-50546::DEBUG::2013-05-17 12:33:52,066::libvirtvm::2425::vm.Vm::(destroy) vmId=`d6446340-b00a-4068-8778-2227f89776fd`::destroy Called
Thread-50523::ERROR::2013-05-17 12:33:51,792::utils::416::vm.Vm::(collect) vmId=`d6446340-b00a-4068-8778-2227f89776fd`::Stats function failed: <AdvancedStatsFunction _sampleCpu at 0x21d5e30>
Traceback (most recent call last):
File "/usr/lib64/python2.6/site-packages/vdsm/utils.py", line 412, in collect
statsFunction()
File "/usr/lib64/python2.6/site-packages/vdsm/utils.py", line 287, in __call__
retValue = self._function(*args, **kwargs)
File "/usr/share/vdsm/libvirtvm.py", line 137, in _sampleCpu
cpuStats = self._vm._dom.getCPUStats(True, 0)
File "/usr/share/vdsm/libvirtvm.py", line 517, in f
ret = attr(*args, **kwargs)
File "/usr/lib64/python2.6/site-packages/vdsm/libvirtconnection.py", line 83, in wrapper
ret = f(*args, **kwargs)
File "/usr/lib64/python2.6/site-packages/libvirt.py", line 1849, in getCPUStats
if ret is None: raise libvirtError ('virDomainGetCPUStats() failed', dom=self)
libvirtError: internal error No such domain <D6>Dc@<B0>
@h<87>x"'<F8><97>v<FD>
[...]
Thread-50523::ERROR::2013-05-17 12:33:52,135::libvirtconnection::106::vds::(wrapper) Unknown libvirterror: ecode: 42 edom: 10
Traceback (most recent call last):
File "/usr/lib64/python2.6/site-packages/vdsm/libvirtconnection.py", line 83, in wrapper
ret = f(*args, **kwargs)
File "/usr/lib64/python2.6/site-packages/libvirt.py", line 1820, in blockStats
if ret is None: raise libvirtError ('virDomainBlockStats() failed', dom=self)
libvirtError: Domain not found: no domain with matching uuid 'd6446340-b00a-4068-8778-2227f89776fd'
[...]
Thread-50523::ERROR::2013-05-17 12:33:52,239::libvirtconnection::106::vds::(wrapper) Unknown libvirterror: ecode: 42 edom: 10
Traceback (most recent call last):
File "/usr/lib64/python2.6/site-packages/vdsm/libvirtconnection.py", line 83, in wrapper
ret = f(*args, **kwargs)
File "/usr/lib64/python2.6/site-packages/libvirt.py", line 1826, in blockStatsFlags
if ret is None: raise libvirtError ('virDomainBlockStatsFlags() failed', dom=self)
libvirtError: Domain not found: no domain with matching uuid 'd6446340-b00a-4068-8778-2227f89776fd'
Thread-50523::ERROR::2013-05-17 12:33:52,286::libvirtconnection::106::vds::(wrapper) Unknown libvirterror: ecode: 42 edom: 10
Traceback (most recent call last):
File "/usr/lib64/python2.6/site-packages/vdsm/libvirtconnection.py", line 83, in wrapper
ret = f(*args, **kwargs)
File "/usr/lib64/python2.6/site-packages/libvirt.py", line 1867, in interfaceStats
if ret is None: raise libvirtError ('virDomainInterfaceStats() failed', dom=self)
libvirtError: Domain not found: no domain with matching uuid 'd6446340-b00a-4068-8778-2227f89776fd'
libvirtEventLoop::ERROR::2013-05-17 12:33:52,364::libvirtconnection::106::vds::(wrapper) Unknown libvirterror: ecode: 42 edom: 10
Traceback (most recent call last):
File "/usr/lib64/python2.6/site-packages/vdsm/libvirtconnection.py", line 83, in wrapper
ret = f(*args, **kwargs)
File "/usr/lib64/python2.6/site-packages/libvirt.py", line 381, in XMLDesc
if ret is None: raise libvirtError ('virDomainGetXMLDesc() failed', dom=self)
libvirtError: Domain not found: no domain with matching uuid 'd6446340-b00a-4068-8778-2227f89776fd'
libvirtEventLoop::DEBUG::2013-05-17 12:33:52,380::utils::353::vm.Vm::(stop) vmId=`d6446340-b00a-4068-8778-2227f89776fd`::Stop statistics collection
libvirtEventLoop::DEBUG::2013-05-17 12:33:52,394::vmChannels::208::vds::(unregister) Delete fileno 89 from listener.
Thread-50523::DEBUG::2013-05-17 12:33:52,394::utils::384::vm.Vm::(run) vmId=`d6446340-b00a-4068-8778-2227f89776fd`::Stats thread finished
[...]
libvirtEventLoop::ERROR::2013-05-17 12:33:52,420::libvirtconnection::106::vds::(wrapper) Unknown libvirterror: ecode: 42 edom: 10
Traceback (most recent call last):
File "/usr/lib64/python2.6/site-packages/vdsm/libvirtconnection.py", line 83, in wrapper
ret = f(*args, **kwargs)
File "/usr/lib64/python2.6/site-packages/libvirt.py", line 755, in destroyFlags
if ret == -1: raise libvirtError ('virDomainDestroyFlags() failed', dom=self)
libvirtError: Domain not found: no domain with matching uuid 'd6446340-b00a-4068-8778-2227f89776fd'
[...]
Thread-50546::DEBUG::2013-05-17 12:33:52,625::libvirtvm::2419::vm.Vm::(deleteVm) vmId=`d6446340-b00a-4068-8778-2227f89776fd`::Total desktops after destroy of d6446340-b00a-4068-8778-2227f89776fd is 75
Thread-50546::DEBUG::2013-05-17 12:33:52,632::BindingXMLRPC::910::vds::(wrapper) return vmDestroy with {'status': {'message': 'Machine destroyed', 'code': 0}}
libvirtEventLoop::ERROR::2013-05-17 12:33:52,636::libvirtconnection::106::vds::(wrapper) Unknown libvirterror: ecode: 42 edom: 10
Traceback (most recent call last):
File "/usr/lib64/python2.6/site-packages/vdsm/libvirtconnection.py", line 83, in wrapper
ret = f(*args, **kwargs)
File "/usr/lib64/python2.6/site-packages/libvirt.py", line 381, in XMLDesc
if ret is None: raise libvirtError ('virDomainGetXMLDesc() failed', dom=self)
libvirtError: Domain not found: no domain with matching uuid 'd6446340-b00a-4068-8778-2227f89776fd'
libvirtEventLoop::DEBUG::2013-05-17 12:33:52,657::vm::1027::vm.Vm::(setDownStatus) vmId=`d6446340-b00a-4068-8778-2227f89776fd`::Changed state to Down: Admin shut down
libvirtEventLoop::DEBUG::2013-05-17 12:33:52,664::utils::353::vm.Vm::(stop) vmId=`d6446340-b00a-4068-8778-2227f89776fd`::Stop statistics collection
libvirtEventLoop::ERROR::2013-05-17 12:33:52,681::libvirtconnection::106::vds::(wrapper) Unknown libvirterror: ecode: 42 edom: 10
Traceback (most recent call last):
File "/usr/lib64/python2.6/site-packages/vdsm/libvirtconnection.py", line 83, in wrapper
ret = f(*args, **kwargs)
File "/usr/lib64/python2.6/site-packages/libvirt.py", line 381, in XMLDesc
if ret is None: raise libvirtError ('virDomainGetXMLDesc() failed', dom=self)
libvirtError: Domain not found: no domain with matching uuid 'd6446340-b00a-4068-8778-2227f89776fd'
- Manager:
2013-05-17 12:34:00,521 INFO [org.ovirt.engine.core.vdsbroker.VdsUpdateRunTimeInfo] (QuartzScheduler_Worker-18) VM vm_hostname d6446340-b00a-4068-8778-2227f89776fd moved from MigratingFrom --> Up
2013-05-17 12:34:00,521 INFO [org.ovirt.engine.core.vdsbroker.VdsUpdateRunTimeInfo] (QuartzScheduler_Worker-18) adding VM d6446340-b00a-4068-8778-2227f89776fd to re-run list
2013-05-17 12:34:00,522 ERROR [org.ovirt.engine.core.vdsbroker.VdsUpdateRunTimeInfo] (QuartzScheduler_Worker-18) Rerun vm d6446340-b00a-4068-8778-2227f89776fd. Called from vds src_hostname
2013-05-17 12:34:00,569 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.MigrateStatusVDSCommand] (pool-3-thread-49) START, MigrateStatusVDSCommand(HostName = src_hostname
, HostId = 1a62f776-695e-11e2-a97a-fb8bf5530f36, vmId=d6446340-b00a-4068-8778-2227f89776fd), log id: 3b3e8edd
2013-05-17 12:34:00,607 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.VdsBrokerCommand] (pool-3-thread-49) Failed in MigrateStatusVDS method, for vds: src_hostname
; host: 1.2.3.4
2013-05-17 12:34:00,607 ERROR [org.ovirt.engine.core.vdsbroker.VDSCommandBase] (pool-3-thread-49) Command MigrateStatusVDS execution failed. Exception: ClassCastException: java.util.HashMap cannot be cast to java.lang.Integer
2013-05-17 12:34:00,607 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.MigrateStatusVDSCommand] (pool-3-thread-49) FINISH, MigrateStatusVDSCommand, log id: 3b3e8edd
[...]
2013-05-17 12:34:00,781 WARN [org.ovirt.engine.core.bll.MigrateVmCommand] (pool-3-thread-49) CanDoAction of action MigrateVm failed. Reasons:ACTION_TYPE_FAILED_VDS_VM_CLUSTER,VAR__ACTION__MIGRATE,VAR__TYPE__VM
Uncompressed sosreports of both hosts, the manager and the guest are also available at http://jentrena-xw8600.usersys.redhat.com/00846482/ .
errors on destination host points to some internal libvirt problem. please take a look...
Comment 8Martin Kletzander
2013-08-14 14:39:59 UTC
It is very hard to find the cause in these logs. What is the name of the machine that didn't migrate? Grepping all the errors and looking at the logs shown in this bz I can't find anything unusual on the bug. Can you point me in a right direction?
Comment 9Martin Kletzander
2013-08-15 14:05:18 UTC
As mentioned in comment #6, this is already tracked in bug 970645. After discussion with Michal and going through the logs I've found no other problem beyond the mentioned one, so I'm closing it as DUPLICATE.
*** This bug has been marked as a duplicate of bug 970645 ***