Bug 1204466

Summary: migration failed "Error creating the requested VM"
Product: Red Hat Enterprise Virtualization Manager Reporter: Eldad Marciano <emarcian>
Component: vdsmAssignee: Martin Polednik <mpoledni>
Status: CLOSED NOTABUG QA Contact: Ilanit Stein <istein>
Severity: high Docs Contact:
Priority: high    
Version: 3.5.1CC: bazulay, ecohen, gklein, lpeer, lsurette, michal.skrivanek, ofrenkel, rmcswain, yeylon
Target Milestone: ---Keywords: Triaged
Target Release: 3.5.5   
Hardware: x86_64   
OS: Linux   
Whiteboard: virt
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2015-07-21 13:22:38 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Eldad Marciano 2015-03-22 12:14:00 UTC
Description of problem:

migration failed due to the following error 
'RuntimeError: migration destination error: Error creating the requested VM'

this bug was found when trying understand what is the migrations response time in time we have active and dirty allocated memory.
this bug happens sporadically. 

Thread-134879::ERROR::2015-03-22 09:36:00,305::__init__::493::jsonrpc.JsonRpcServer::(_serveRequest) Internal server error
Traceback (most recent call last):
  File "/usr/lib/python2.6/site-packages/yajsonrpc/__init__.py", line 488, in _serveRequest
    res = method(**params)
  File "/usr/share/vdsm/rpc/Bridge.py", line 264, in _dynamicMethod
    result = fn(*methodArgs)
  File "/usr/share/vdsm/API.py", line 365, in getStats
    stats = v.getStats().copy()
  File "/usr/share/vdsm/virt/vm.py", line 2824, in getStats
    stats.update(self._getRunningVmStats())
  File "/usr/share/vdsm/virt/vm.py", line 2870, in _getRunningVmStats
    stats['migrationProgress'] = self.migrateStatus()['progress']
KeyError: 'progress'
Thread-134879::DEBUG::2015-03-22 09:36:00,306::stompReactor::163::yajsonrpc.StompServer::(send) Sending response
Dummy-67::DEBUG::2015-03-22 09:36:00,846::storage_mailbox::731::Storage.Misc.excCmd::(_checkForMail) dd if=/rhev/data-center/6b588963-6490-4093-a778-699094c0b07b/mastersd/dom_md/inbox iflag=direct,fullblock count=1 bs=2457600 (cwd None)
Dummy-67::DEBUG::2015-03-22 09:36:01,061::storage_mailbox::731::Storage.Misc.excCmd::(_checkForMail) SUCCESS: <err> = '1+0 records in\n1+0 records out\n2457600 bytes (2.5 MB) copied, 0.0198509 s, 124 MB/s\n'; <rc> = 0
Thread-21::DEBUG::2015-03-22 09:36:01,757::fileSD::261::Storage.Misc.excCmd::(getReadDelay) /bin/dd if=/rhev/data-center/mnt/cmplnt-nas01.scale.openstack.engineering.redhat.com:_rhev__real_data_storage__mount/a31f4e48-5d9a-4554-b7cb-2456e2bf2918/dom_md/metadata iflag=direct of=/dev/null bs=4096 count=1 (cwd None)
Thread-21::DEBUG::2015-03-22 09:36:01,773::fileSD::261::Storage.Misc.excCmd::(getReadDelay) SUCCESS: <err> = '0+1 records in\n0+1 records out\n408 bytes (408 B) copied, 0.000315672 s, 1.3 MB/s\n'; <rc> = 0
Thread-20::DEBUG::2015-03-22 09:36:01,790::fileSD::261::Storage.Misc.excCmd::(getReadDelay) /bin/dd if=/rhev/data-center/mnt/ntap-oslab-01-phx1.scale.openstack.engineering.redhat.com:_rhev__real/2aa91150-274a-4bb5-a264-d50a713fb10d/dom_md/metadata iflag=direct of=/dev/null bs=4096 count=1 (cwd None)
Thread-20::DEBUG::2015-03-22 09:36:01,804::fileSD::261::Storage.Misc.excCmd::(getReadDelay) SUCCESS: <err> = '0+1 records in\n0+1 records out\n551 bytes (551 B) copied, 0.000303708 s, 1.8 MB/s\n'; <rc> = 0
Thread-134813::ERROR::2015-03-22 09:36:02,243::migration::260::vm.Vm::(run) vmId=`859b7ddc-9f7e-47ae-9a5d-cd8aa75ffbeb`::Failed to migrate
Traceback (most recent call last):
  File "/usr/share/vdsm/virt/migration.py", line 246, in run
    self._startUnderlyingMigration(time.time())
  File "/usr/share/vdsm/virt/migration.py", line 285, in _startUnderlyingMigration
    response['status']['message'])
RuntimeError: migration destination error: Error creating the requested VM

Version-Release number of selected component (if applicable):
RHEL 6.6
vdsm-4.16.12-2.el6ev.x86_64
libvirt-0.10.2-46.el6_6.3.x86_64
qemu-kvm-rhev-0.12.1.2-2.448.el6.x86_64

How reproducible:
90%

Steps to Reproduce:
make sure running on same hardware.

1. create large vm (50GB RAM, 4 CPU's).
2. run a stress tool in order to simulate memory activity 15% (mem usage).
3. migrate the vm.


Actual results:
migration failed sporadically.

Expected results:
stable migration results, with pass results.

Additional info:

Comment 1 Eldad Marciano 2015-03-22 12:20:55 UTC
looks like it's related to the following domain issue,
at the destination host vdsm complain about this one:

Thread-143359::ERROR::2015-03-22 12:17:43,116::vm::2325::vm.Vm::(_startUnderlyingVm) vmId=`859b7ddc-9f7e-47ae-9a5d-cd8aa75ffbeb`::Failed to start a migration destination vm
Traceback (most recent call last):
  File "/usr/share/vdsm/virt/vm.py", line 2298, in _startUnderlyingVm
    self._completeIncomingMigration()
  File "/usr/share/vdsm/virt/vm.py", line 4107, in _completeIncomingMigration
    self._incomingMigrationFinished.isSet(), usedTimeout)
  File "/usr/share/vdsm/virt/vm.py", line 4160, in _attachLibvirtDomainAfterMigration
    raise MigrationError(e.get_error_message())
MigrationError: Domain not found: no domain with matching uuid '859b7ddc-9f7e-47ae-9a5d-cd8aa75ffbeb'

Comment 5 Robert McSwain 2015-04-28 13:43:23 UTC
Customer has found that with "RHEV Hypervisor - 6.6 - 20150128.0.el6ev" and vdsm-4.16.8.1-6.el6ev that these migration failures continue. 

He has also installed a host with RHEL 6.6 containing vdsm-4.16.8.1-8.el6ev, and this seems to allow them to migrate without issues. If we can get a new RHEV-H release, this would be ideal from what we're seeing here.

Comment 7 Michal Skrivanek 2015-07-21 10:21:44 UTC
note: keyerror from original description fixed by f98010f311f6c223881320240217838b45c84f48, not related to the problem

Comment 8 Michal Skrivanek 2015-07-21 13:21:18 UTC
without logs and based on the provided information it looks like the destination simply timed out to create VM. The timeout typically happens due to slow path prepare...which is not suprising in loaded environment. Destination logs would be needed to be 100% sure. All other reported errors in vdsm log are harmless