Bug 1264679 - Failed_to_Live_migrate_vm_between_AMD_hosts
Failed_to_Live_migrate_vm_between_AMD_hosts
Status: CLOSED WORKSFORME
Product: Red Hat Enterprise Virtualization Manager
Classification: Red Hat
Component: ovirt-engine (Show other bugs)
3.6.0
Unspecified Unspecified
unspecified Severity high
: ---
: ---
Assigned To: nobody nobody
Ilanit Stein
virt
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2015-09-20 11:07 EDT by Kevin Alon Goldblatt
Modified: 2015-12-04 10:51 EST (History)
12 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2015-10-26 09:47:07 EDT
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
engine, vdsm, server, journal logs (3.30 MB, application/x-gzip)
2015-09-20 11:12 EDT, Kevin Alon Goldblatt
no flags Details
engine, vdsm, server, journal logs (3.99 MB, application/x-gzip)
2015-09-20 11:50 EDT, Kevin Alon Goldblatt
no flags Details
added requested qemu log (1.88 KB, application/x-gzip)
2015-10-05 17:11 EDT, Kevin Alon Goldblatt
no flags Details

  None (edit)
Description Kevin Alon Goldblatt 2015-09-20 11:07:31 EDT
Description of problem:
Live migration of VM between AMD hosts failed. Same between Intel hosts works fine

Version-Release number of selected component (if applicable):
rhevm-3.6.0-0.12.master.el6.noarch
vdsm-4.17.5-1.el7ev.noarch

How reproducible:
100%

Steps to Reproduce:
1. Create a host with 4 disks block and nfs preallocated and thin
2. Start the VM and run migrate to another host >>>> Fails with "error = Virtual machine does not exist"


Actual results:
Live VM migration fails between AMD hosts. Same with Intel hosts works fine

Expected results:
Live migrate should succeed

Additional info:

Engine log.
------------------------------------------------------------------
2015-09-20 16:30:23,649 INFO  [org.ovirt.engine.core.vdsbroker.MigrateVDSCommand] (org.ovirt.thread.pool-7-thread-22) [66d9bcc3] START, MigrateVDSCommand( MigrateVDSCommandParameters:{runAsync='true', hostId='fbc6b327-125b-4bf6-a7df-5e43
3bce48c4', vmId='b4e00751-0255-4738-b030-d8bf9e71013a', srcHost='10.35.110.15', dstVdsId='87d8ccac-da11-4f69-8e0b-eab3984776d0', dstHost='10.35.64.12:54321', migrationMethod='ONLINE', tunnelMigration='false', migrationDowntime='0', autoC
onverge='false', migrateCompressed='false', consoleAddress='null'}), log id: 1b387d7
2015-09-20 16:30:23,650 INFO  [org.ovirt.engine.core.vdsbroker.vdsbroker.MigrateBrokerVDSCommand] (org.ovirt.thread.pool-7-thread-22) [66d9bcc3] START, MigrateBrokerVDSCommand(HostName = blond_vdsf, MigrateVDSCommandParameters:{runAsync=
'true', hostId='fbc6b327-125b-4bf6-a7df-5e433bce48c4', vmId='b4e00751-0255-4738-b030-d8bf9e71013a', srcHost='10.35.110.15', dstVdsId='87d8ccac-da11-4f69-8e0b-eab3984776d0', dstHost='10.35.64.12:54321', migrationMethod='ONLINE', tunnelMig
ration='false', migrationDowntime='0', autoConverge='false', migrateCompressed='false', consoleAddress='null'}), log id: 50c655c9
2015-09-20 16:30:23,674 INFO  [org.ovirt.engine.core.vdsbroker.vdsbroker.MigrateBrokerVDSCommand] (org.ovirt.thread.pool-7-thread-22) [66d9bcc3] FINISH, MigrateBrokerVDSCommand, log id: 50c655c9
2015-09-20 16:30:23,838 INFO  [org.ovirt.engine.core.vdsbroker.MigrateVDSCommand] (org.ovirt.thread.pool-7-thread-22) [66d9bcc3] FINISH, MigrateVDSCommand, return: MigratingFrom, log id: 1b387d7
2015-09-20 16:30:24,087 INFO  [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (org.ovirt.thread.pool-7-thread-22) [66d9bcc3] Correlation ID: 66d9bcc3, Job ID: 651c99ea-b2d7-4779-b041-ad1c2f3a809e, Call Stack: null,
 Custom Event ID: -1, Message: Migration started (VM: vm32, Source: blond_vdsf, Destination: blond-vdsh, User: admin@internal).
2015-09-20 16:30:32,184 WARN  [org.ovirt.engine.core.vdsbroker.irsbroker.IrsProxyData] (org.ovirt.thread.pool-7-thread-31) [] domain '95e96e83-0966-4708-8565-eaa32afaf155:block4' in problem. vds: 'blond-vdsh'
2015-09-20 16:30:39,667 INFO  [org.ovirt.engine.core.vdsbroker.VmAnalyzer] (DefaultQuartzScheduler_Worker-79) [] RefreshVmList VM id 'b4e00751-0255-4738-b030-d8bf9e71013a' status = 'Paused' on VDS 'blond-vdsh' ignoring it in the refresh 
until migration is done
2015-09-20 16:30:47,932 INFO  [org.ovirt.engine.core.vdsbroker.irsbroker.IrsProxyData] (org.ovirt.thread.pool-7-thread-32) [] Domain '95e96e83-0966-4708-8565-eaa32afaf155:block4' recovered from problem. vds: 'blond-vdsh'
2015-09-20 16:30:54,698 INFO  [org.ovirt.engine.core.vdsbroker.VmAnalyzer] (DefaultQuartzScheduler_Worker-62) [] RefreshVmList VM id 'b4e00751-0255-4738-b030-d8bf9e71013a' status = 'Paused' on VDS 'blond-vdsh' ignoring it in the refresh 
until migration is done
2015-09-20 16:30:55,523 INFO  [org.ovirt.engine.core.vdsbroker.VmAnalyzer] (DefaultQuartzScheduler_Worker-99) [] VM 'b4e00751-0255-4738-b030-d8bf9e71013a'(vm32) moved from 'MigratingFrom' --> 'Up'
2015-09-20 16:30:55,524 INFO  [org.ovirt.engine.core.vdsbroker.VmAnalyzer] (DefaultQuartzScheduler_Worker-99) [] Adding VM 'b4e00751-0255-4738-b030-d8bf9e71013a' to re-run list
2015-09-20 16:30:56,570 ERROR [org.ovirt.engine.core.vdsbroker.VmsMonitoring] (DefaultQuartzScheduler_Worker-99) [] Rerun VM 'b4e00751-0255-4738-b030-d8bf9e71013a'. Called from VDS 'blond_vdsf'
2015-09-20 16:30:56,591 INFO  [org.ovirt.engine.core.vdsbroker.vdsbroker.MigrateStatusVDSCommand] (org.ovirt.thread.pool-7-thread-26) [] START, MigrateStatusVDSCommand(HostName = blond_vdsf, MigrateStatusVDSCommandParameters:{runAsync='true', hostId='fbc6b327-125b-4bf6-a7df-5e433bce48c4', vmId='b4e00751-0255-4738-b030-d8bf9e71013a'}), log id: 4c227241
2015-09-20 16:30:56,600 INFO  [org.ovirt.engine.core.vdsbroker.vdsbroker.MigrateStatusVDSCommand] (org.ovirt.thread.pool-7-thread-26) [] FINISH, MigrateStatusVDSCommand, log id: 4c227241
2015-09-20 16:30:56,742 ERROR [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (org.ovirt.thread.pool-7-thread-26) [] Correlation ID: 66d9bcc3, Job ID: 651c99ea-b2d7-4779-b041-ad1c2f3a809e, Call Stack: null, Custom Event ID: -1, Message: Migration failed  (VM: vm32, Source: blond_vdsf, Destination: blond-vdsh).
2015-09-20 16:30:56,791 INFO  [org.ovirt.engine.core.bll.MigrateVmToServerCommand] (org.ovirt.thread.pool-7-thread-26) [] Lock freed to object 'EngineLock:{exclusiveLocks='[b4e00751-0255-4738-b030-d8bf9e71013a=<VM, ACTION_TYPE_FAILED_VM_IS_BEING_MIGRATED$VmName vm32>]', sharedLocks='null'}'
2015-09-20 16:31:01,861 INFO  [org.ovirt.engine.core.vdsbroker.irsbroker.IrsProxyData] (org.ovirt.thread.pool-7-thread-35) [] Domain '95e96e83-0966-4708-8565-eaa32afaf155:block4' recovered from problem. vds: 'blond_vdsf'
2015-09-20 16:31:01,861 INFO  [org.ovirt.engine.core.vdsbroker.irsbroker.IrsProxyData] (org.ovirt.thread.pool-7-thread-35) [] Domain '95e96e83-0966-4708-8565-eaa32afaf155:block4' has recovered from problem. No active host in the DC is reporting it as problematic, so clearing the domain recovery timer.
2015-09-20 16:31:09,725 INFO  [org.ovirt.engine.core.vdsbroker.VmAnalyzer] (DefaultQuartzScheduler_Worker-39) [] RefreshVmList VM id 'b4e00751-0255-4738-b030-d8bf9e71013a' status = 'Paused' on VDS 'blond-vdsh' ignoring it in the refresh until migration is done
2015-09-20 16:31:21,056 INFO  [org.ovirt.engine.core.vdsbroker.vdsbroker.DestroyVDSCommand] (ForkJoinPool-1-worker-4) [] START, DestroyVDSCommand(HostName = blond-vdsh, DestroyVmVDSCommandParameters:{runAsync='true', hostId='87d8ccac-da11-4f69-8e0b-eab3984776d0', vmId='b4e00751-0255-4738-b030-d8bf9e71013a', force='false', secondsToWait='0', gracefully='false', reason=''}), log id: 61f05f48
2015-09-20 16:31:21,101 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.DestroyVDSCommand] (ForkJoinPool-1-worker-4) [] Failed in 'DestroyVDS' method
2015-09-20 16:31:21,274 ERROR [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (ForkJoinPool-1-worker-4) [] Correlation ID: null, Call Stack: null, Custom Event ID: -1, Message: VDSM blond-vdsh command failed: Virtual machine does not exist
2015-09-20 16:31:21,275 INFO  [org.ovirt.engine.core.vdsbroker.vdsbroker.DestroyVDSCommand] (ForkJoinPool-1-worker-4) [] Command 'org.ovirt.engine.core.vdsbroker.vdsbroker.DestroyVDSCommand' return value 'StatusOnlyReturnForXmlRpc [status=StatusForXmlRpc [code=1, message=Virtual machine does not exist]]'
2015-09-20 16:31:21,275 INFO  [org.ovirt.engine.core.vdsbroker.vdsbroker.DestroyVDSCommand] (ForkJoinPool-1-worker-4) [] HostName = blond-vdsh
2015-09-20 16:31:21,276 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.DestroyVDSCommand] (ForkJoinPool-1-worker-4) [] Command 'DestroyVDSCommand(HostName = blond-vdsh, DestroyVmVDSCommandParameters:{runAsync='true', hostId='87d8ccac-da11-4f69-8e0b-eab3984776d0', vmId='b4e00751-0255-4738-b030-d8bf9e71013a', force='false', secondsToWait='0', gracefully='false', reason=''})' execution failed: VDSGenericException: VDSErrorException: Failed to DestroyVDS, error = Virtual machine does not exist, code = 1
2015-09-20 16:31:21,276 INFO  [org.ovirt.engine.core.vdsbroker.vdsbroker.DestroyVDSCommand] (ForkJoinPool-1-worker-4) [] FINISH, DestroyVDSCommand, log id: 61f05f48
2015-09-20 16:31:21,279 INFO  [org.ovirt.engine.core.vdsbroker.VmAnalyzer] (ForkJoinPool-1-worker-4) [] RefreshVmList VM id 'b4e00751-0255-4738-b030-d8bf9e71013a' status = 'Down' on VDS 'blond-vdsh' ignoring it in the refresh until migration is done


VDSM source host log
-----------------------------------------
Thread-66917::ERROR::2015-09-20 16:30:43,159::migration::208::virt.vm::(_recover) vmId=`b4e00751-0255-4738-b030-d8bf9e71013a`::[-32605] No response for JSON-RPC VM.migrationCreate request.

VDSM destination host log
------------------------------------------
Thread-251318::ERROR::2015-09-20 16:31:31,616::vm::769::virt.vm::(_startUnderlyingVm) vmId=`b4e00751-0255-4738-b030-d8bf9e71013a`::Failed to start a migration destination vm
Traceback (most recent call last):
  File "/usr/share/vdsm/virt/vm.py", line 739, in _startUnderlyingVm
    self._completeIncomingMigration()
  File "/usr/share/vdsm/virt/vm.py", line 2729, in _completeIncomingMigration
    self._incomingMigrationFinished.isSet(), usedTimeout)
  File "/usr/share/vdsm/virt/vm.py", line 2782, in _attachLibvirtDomainAfterMigration
    raise MigrationError(e.get_error_message())
MigrationError: Domain not found: no domain with matching uuid 'b4e00751-0255-4738-b030-d8bf9e71013a'
Thread-251327::INFO::2015-09-20 16:31:31,617::logUtils::51::dispatcher::(wrapper) Run and protect: inappropriateDevices, Return response: None
Thread-251318::INFO::2015-09-20 16:31:31,750::vm::1302::virt.vm::(setDownStatus) vmId=`b4e00751-0255-4738-b030-d8bf9e71013a`::Changed state to Down: VM failed to migrate (code=8)
Thread-251327::DEBUG::2015-09-20 16:31:31,751::task::1191::Storage.TaskManager.Task::(prepare) Task=`f6a83f46-5254-4d8c-9812-5c906ac37183`::finished: None
Thread-251318::DEBUG::2015-09-20 16:31:31,752::__init__::206::jsonrpc.Notification::(emit) Sending event {"params": {"b4e00751-0255-4738-b030-d8bf9e71013a": {"status": "Down", "timeOffset": "0", "exitReason": 8, "exitMessage": "VM failed to migrate", "exitCode": 1}, "notify_time": 6452638470}, "jsonrpc": "2.0", "method": "|virt|VM_status|b4e00751-0255-4738-b030-d8bf9e71013a"}
Thread-251327::DEBUG::2015-09-20 16:31:31,752::task::595::Storage.TaskManager.Task::(_updateState) Task=`f6a83f46-5254-4d8c-9812-5c906ac37183`::moving from state preparing -> state finished
Thread-251318::DEBUG::2015-09-20 16:31:31,754::stompreactor::304::yajsonrpc.StompServer::(send) Sending response
Thread-251327::DEBUG::2015-09-20 16:31:31,754::resourceManager::940::Storage.ResourceManager.Owner::(releaseAll) Owner.releaseAll requests {} resources {}
Thread-251327::DEBUG::2015-09-20 16:31:31,756::resourceManager::977::Storage.ResourceManager.Owner::(cancelAll) Owner.cancelAll requests {}
Thread-251327::DEBUG::2015-09-20 16:31:31,758::task::993::Storage.TaskManager.Task::(_decref) Task=`f6a83f46-5254-4d8c-9812-5c906ac37183`::ref 0 aborting False
Thread-251327::DEBUG::2015-09-20 16:31:31,760::vm::3750::virt.vm::(deleteVm) vmId=`b4e00751-0255-4738-b030-d8bf9e71013a`::Total desktops after destroy of b4e00751-0255-4738-b030-d8bf9e71013a is 0
periodic/5::DEBUG::2015-09-20 16:31:31,762::executor::178::Executor::(_run) Worker was discarded
periodic/6::DEBUG::2015-09-20 16:31:31,761::executor::178::Executor::(_run) Worker was discarded
periodic/16::DEBUG::2015-09-20 16:31:31,766::executor::178::Executor::(_run) Worker was discarded
periodic/12::DEBUG::2015-09-20 16:31:31,763::executor::178::Executor::(_run) Worker was discarded
periodic/9::DEBUG::2015-09-20 16:31:31,763::executor::178::Executor::(_run) Worker was discarded
periodic/17::DEBUG::2015-09-20 16:31:31,766::executor::178::Executor::(_run) Worker was discarded
periodic/25::DEBUG::2015-09-20 16:31:31,768::executor::178::Executor::(_run) Worker was discarded
periodic/22::DEBUG::2015-09-20 16:31:31,770::executor::178::Executor::(_run) Worker was discarded
periodic/19::DEBUG::2015-09-20 16:31:31,768::executor::178::Executor::(_run) Worker was discarded
periodic/10::DEBUG::2015-09-20 16:31:31,763::executor::178::Executor::(_run) Worker was discarded
periodic/11::DEBUG::2015-09-20 16:31:31,763::executor::178::Executor::(_run) Worker was discarded
periodic/24::DEBUG::2015-09-20 16:31:31,769::executor::178::Executor::(_run) Worker was discarded
periodic/26::DEBUG::2015-09-20 16:31:31,769::executor::178::Executor::(_run) Worker was discarded
periodic/21::DEBUG::2015-09-20 16:31:31,770::executor::178::Executor::(_run) Worker was discarded
periodic/13::DEBUG::2015-09-20 16:31:31,764::executor::178::Executor::(_run) Worker was discarded
periodic/20::DEBUG::2015-09-20 16:31:31,770::executor::178::Executor::(_run) Worker was discarded
periodic/18::DEBUG::2015-09-20 16:31:31,766::executor::178::Executor::(_run) Worker was discarded
periodic/7::DEBUG::2015-09-20 16:31:31,762::executor::178::Executor::(_run) Worker was discarded
periodic/29::DEBUG::2015-09-20 16:31:31,770::executor::178::Executor::(_run) Worker was discarded
periodic/8::DEBUG::2015-09-20 16:31:31,763::executor::178::Executor::(_run) Worker was discarded
periodic/4::DEBUG::2015-09-20 16:31:31,761::executor::178::Executor::(_run) Worker was discarded
periodic/23::DEBUG::2015-09-20 16:31:31,768::executor::178::Executor::(_run) Worker was discarded
periodic/27::DEBUG::2015-09-20 16:31:31,769::executor::178::Executor::(_run) Worker was discarded
periodic/14::DEBUG::2015-09-20 16:31:31,765::executor::178::Executor::(_run) Worker was discarded
periodic/15::DEBUG::2015-09-20 16:31:31,766::executor::178::Executor::(_run) Worker was discarded
periodic/28::DEBUG::2015-09-20 16:31:31,769::executor::178::Executor::(_run) Worker was discarded
Thread-251327::DEBUG::2015-09-20 16:31:31,761::__init__::534::jsonrpc.JsonRpcServer::(_serveRequest) Return 'VM.destroy' in bridge with True
periodic/30::DEBUG::2015-09-20 16:31:31,776::executor::178::Executor::(_run) Worker was discarded
Thread-251347::DEBUG::2015-09-20 16:31:31,789::__init__::503::jsonrpc.JsonRpcServer::(_serveRequest) Calling 'VM.destroy' in bridge with {u'vmID': u'b4e00751-0255-4738-b030-d8bf9e71013a'}
Thread-251327::DEBUG::2015-09-20 16:31:31,805::stompreactor::304::yajsonrpc.StompServer::(send) Sending response
Comment 1 Kevin Alon Goldblatt 2015-09-20 11:12 EDT
Created attachment 1075314 [details]
engine, vdsm, server, journal logs
Comment 2 Kevin Alon Goldblatt 2015-09-20 11:50 EDT
Created attachment 1075318 [details]
engine, vdsm, server, journal logs

Added new logs Please ignore previous logs
Comment 3 Allon Mureinik 2015-09-21 06:49:22 EDT
I can't see anything storage-related here.

Moving to whiteboard=virt (who own the migration flow) to investigate.
Comment 4 Vinzenz Feenstra [evilissimo] 2015-09-21 08:45:02 EDT
Please attach destination and source logs for qemu and libvirt. Thanks
Comment 5 Kevin Alon Goldblatt 2015-10-05 17:11 EDT
Created attachment 1080079 [details]
added requested qemu log

added requested qemu log.  Qemu log only exists for source host
Comment 6 Kevin Alon Goldblatt 2015-10-05 17:12:03 EDT
(In reply to Vinzenz Feenstra [evilissimo] from comment #4)
> Please attach destination and source logs for qemu and libvirt. Thanks

added requested qemu log.  Qemu log only exists for source host
Comment 7 Vinzenz Feenstra [evilissimo] 2015-10-06 02:46:22 EDT
hi there are there also some libvirt logs available from the time frame in question? 
Since there are no destination logs for qemu, that would mean it might have failed somewhere in libvirt already.
Comment 8 Kevin Alon Goldblatt 2015-10-06 05:10:06 EDT
(In reply to Vinzenz Feenstra [evilissimo] from comment #7)
> hi there are there also some libvirt logs available from the time frame in
> question? 
> Since there are no destination logs for qemu, that would mean it might have
> failed somewhere in libvirt already

Hi,

No libvirt logs are available on the hosts
Comment 9 Vinzenz Feenstra [evilissimo] 2015-10-08 06:05:03 EDT
Ok in that case, please reproduce this and ensure that you have the libvirt logs and qemu logs from that time frame.

There's no information in the provided logs which would explain the failure.

From what you have described the issue may be found in the destination host libvirt logs, or the source host libvirt logs.

VDSM logs don't seem to show a reason.


Thanks a lot in advance already, for trying again to provide the logs.
Comment 10 Michal Skrivanek 2015-10-23 09:57:08 EDT
de-targeting due to lack of logs
Comment 11 Kevin Alon Goldblatt 2015-10-25 07:05:21 EDT
(In reply to Vinzenz Feenstra [evilissimo] from comment #9)
> Ok in that case, please reproduce this and ensure that you have the libvirt
> logs and qemu logs from that time frame.
> 
> There's no information in the provided logs which would explain the failure.
> 
> From what you have described the issue may be found in the destination host
> libvirt logs, or the source host libvirt logs.
> 
> VDSM logs don't seem to show a reason.
> 
> 
> Thanks a lot in advance already, for trying again to provide the logs.

I ran the same scenario several times again with new code and the problem no longer comes up. I assume the code has changed.

I used the following upgraded code:
---------------------------------------------
rhevm-3.6.0-0.18.el6.noarch
vdsm-4.17.8-1.el7ev.noarch


I ran the following scenario:
---------------------------------------------
Steps to Reproduce:
1. Create a host with 4 disks block and nfs preallocated and thin
2. Start the VM and run migrate to another host >>>> All disks are successfully migrated.
Comment 12 Michal Skrivanek 2015-10-26 09:47:07 EDT
the code did't really change, please keep observing it and if it happens again please make sure you capture relevant logs and reopen the bug

Note You need to log in before you can comment on or make changes to this bug.