Description of problem: migration failed due to the following error 'RuntimeError: migration destination error: Error creating the requested VM' this bug was found when trying understand what is the migrations response time in time we have active and dirty allocated memory. this bug happens sporadically. Thread-134879::ERROR::2015-03-22 09:36:00,305::__init__::493::jsonrpc.JsonRpcServer::(_serveRequest) Internal server error Traceback (most recent call last): File "/usr/lib/python2.6/site-packages/yajsonrpc/__init__.py", line 488, in _serveRequest res = method(**params) File "/usr/share/vdsm/rpc/Bridge.py", line 264, in _dynamicMethod result = fn(*methodArgs) File "/usr/share/vdsm/API.py", line 365, in getStats stats = v.getStats().copy() File "/usr/share/vdsm/virt/vm.py", line 2824, in getStats stats.update(self._getRunningVmStats()) File "/usr/share/vdsm/virt/vm.py", line 2870, in _getRunningVmStats stats['migrationProgress'] = self.migrateStatus()['progress'] KeyError: 'progress' Thread-134879::DEBUG::2015-03-22 09:36:00,306::stompReactor::163::yajsonrpc.StompServer::(send) Sending response Dummy-67::DEBUG::2015-03-22 09:36:00,846::storage_mailbox::731::Storage.Misc.excCmd::(_checkForMail) dd if=/rhev/data-center/6b588963-6490-4093-a778-699094c0b07b/mastersd/dom_md/inbox iflag=direct,fullblock count=1 bs=2457600 (cwd None) Dummy-67::DEBUG::2015-03-22 09:36:01,061::storage_mailbox::731::Storage.Misc.excCmd::(_checkForMail) SUCCESS: <err> = '1+0 records in\n1+0 records out\n2457600 bytes (2.5 MB) copied, 0.0198509 s, 124 MB/s\n'; <rc> = 0 Thread-21::DEBUG::2015-03-22 09:36:01,757::fileSD::261::Storage.Misc.excCmd::(getReadDelay) /bin/dd if=/rhev/data-center/mnt/cmplnt-nas01.scale.openstack.engineering.redhat.com:_rhev__real_data_storage__mount/a31f4e48-5d9a-4554-b7cb-2456e2bf2918/dom_md/metadata iflag=direct of=/dev/null bs=4096 count=1 (cwd None) Thread-21::DEBUG::2015-03-22 09:36:01,773::fileSD::261::Storage.Misc.excCmd::(getReadDelay) SUCCESS: <err> = '0+1 records in\n0+1 records out\n408 bytes (408 B) copied, 0.000315672 s, 1.3 MB/s\n'; <rc> = 0 Thread-20::DEBUG::2015-03-22 09:36:01,790::fileSD::261::Storage.Misc.excCmd::(getReadDelay) /bin/dd if=/rhev/data-center/mnt/ntap-oslab-01-phx1.scale.openstack.engineering.redhat.com:_rhev__real/2aa91150-274a-4bb5-a264-d50a713fb10d/dom_md/metadata iflag=direct of=/dev/null bs=4096 count=1 (cwd None) Thread-20::DEBUG::2015-03-22 09:36:01,804::fileSD::261::Storage.Misc.excCmd::(getReadDelay) SUCCESS: <err> = '0+1 records in\n0+1 records out\n551 bytes (551 B) copied, 0.000303708 s, 1.8 MB/s\n'; <rc> = 0 Thread-134813::ERROR::2015-03-22 09:36:02,243::migration::260::vm.Vm::(run) vmId=`859b7ddc-9f7e-47ae-9a5d-cd8aa75ffbeb`::Failed to migrate Traceback (most recent call last): File "/usr/share/vdsm/virt/migration.py", line 246, in run self._startUnderlyingMigration(time.time()) File "/usr/share/vdsm/virt/migration.py", line 285, in _startUnderlyingMigration response['status']['message']) RuntimeError: migration destination error: Error creating the requested VM Version-Release number of selected component (if applicable): RHEL 6.6 vdsm-4.16.12-2.el6ev.x86_64 libvirt-0.10.2-46.el6_6.3.x86_64 qemu-kvm-rhev-0.12.1.2-2.448.el6.x86_64 How reproducible: 90% Steps to Reproduce: make sure running on same hardware. 1. create large vm (50GB RAM, 4 CPU's). 2. run a stress tool in order to simulate memory activity 15% (mem usage). 3. migrate the vm. Actual results: migration failed sporadically. Expected results: stable migration results, with pass results. Additional info:
looks like it's related to the following domain issue, at the destination host vdsm complain about this one: Thread-143359::ERROR::2015-03-22 12:17:43,116::vm::2325::vm.Vm::(_startUnderlyingVm) vmId=`859b7ddc-9f7e-47ae-9a5d-cd8aa75ffbeb`::Failed to start a migration destination vm Traceback (most recent call last): File "/usr/share/vdsm/virt/vm.py", line 2298, in _startUnderlyingVm self._completeIncomingMigration() File "/usr/share/vdsm/virt/vm.py", line 4107, in _completeIncomingMigration self._incomingMigrationFinished.isSet(), usedTimeout) File "/usr/share/vdsm/virt/vm.py", line 4160, in _attachLibvirtDomainAfterMigration raise MigrationError(e.get_error_message()) MigrationError: Domain not found: no domain with matching uuid '859b7ddc-9f7e-47ae-9a5d-cd8aa75ffbeb'
Customer has found that with "RHEV Hypervisor - 6.6 - 20150128.0.el6ev" and vdsm-4.16.8.1-6.el6ev that these migration failures continue. He has also installed a host with RHEL 6.6 containing vdsm-4.16.8.1-8.el6ev, and this seems to allow them to migrate without issues. If we can get a new RHEV-H release, this would be ideal from what we're seeing here.
note: keyerror from original description fixed by f98010f311f6c223881320240217838b45c84f48, not related to the problem
without logs and based on the provided information it looks like the destination simply timed out to create VM. The timeout typically happens due to slow path prepare...which is not suprising in loaded environment. Destination logs would be needed to be 100% sure. All other reported errors in vdsm log are harmless