+++ This bug is a downstream clone. The original bug is: +++ +++ bug 1409246 +++ ====================================================================== Description of problem: VM migration fails with; - src host; Thread-12509491::INFO::2016-12-22 09:49:56,306::migration::407::virt.vm::(_startUnderlyingMigration) vmId=`36710f55-4f25-4c80-912e-d7c9dfc87b99`::Creation of destination VM took: 0 seconds Thread-12509491::ERROR::2016-12-22 09:49:56,307::migration::252::virt.vm::(_recover) vmId=`36710f55-4f25-4c80-912e-d7c9dfc87b99`::migration destination error: Fatal error during migration - dest host; jsonrpc.Executor/0::DEBUG::2016-12-22 09:49:56,304::API::601::vds::(migrationCreate) Migration create - Failed jsonrpc.Executor/0::DEBUG::2016-12-22 09:49:56,305::API::607::vds::(migrationCreate) Returning backwards compatible migration error code There are no clues in the VM's qemu log, the 'messages' file or the journald logs. VDSM should report the reason for the failure to create the VM. Version-Release number of selected component (if applicable): - RHEV 4.0.2 - RHEL 7.2 hosts; - vdsm-4.18.11-1.el7 - libvirt-1.2.17-13.el7_2.5 - qemu-kvm-rhev-2.3.0-31.el7_2.21 How reproducible: Not reproducible. Steps to Reproduce: 1. 2. 3. Actual results: Expected results: A meaningful error. Additional info: A side-effect of this is that when putting a host into maintenance mode, if some migrations fail in this manner and they are all to the same host, that host can get placed into an 'ERROR' state. (Originally by Gordon Watson)
the problem is on the engine side where the incoming/outgoing limits are not sent (Originally by michal.skrivanek)
Verify with: Engine: 4.0.7-0.1.el7ev Step: migrate vm and check the incoming/outgoing limit is set. vdsm [u'outgoingLimit = 2 , incomingLimit = 2] jsonrpc.Executor/4::DEBUG::2017-02-07 15:06:39,323::__init__::530::jsonrpc.JsonRpcServer::(_handle_request) Calling 'VM.migrate' in bridge with {u'params': {u'incomingLimit': 2, u'src': u'alma05.qa.lab.tlv.redhat.com', u'dstqemu': u'10.35.70.4', u'autoConverge': u'false', u'tunneled': u'false', u'enableGuestEvents': False, u'dst': u'cyan-vdsg.qa.lab.tlv.redhat.com:54321', u'vmId': u'7ad815d0-4aa7-4797-8587-21d72f9c6094', u'abortOnError': u'true', u'outgoingLimit': 2, u'compressed': u'false', u'maxBandwidth': 500, u'method': u'online'}, u'vmID': u'7ad815d0-4aa7-4797-8587-21d72f9c6094'} Engine [maxIncomingMigrations=2 maxOutgoingMigrations=2] 2017-02-07 15:06:40,319 INFO [org.ovirt.engine.core.vdsbroker.MigrateVDSCommand] (org.ovirt.thread.pool-6-thread-26) [72ec5728] START, MigrateVDSCommand( MigrateVDSCommandParameters:{runAsync='true', hostId='60da1a9e-b3c6-4588-b282-9cf2bcd27399', vmId='7ad815d0-4aa7-4797-8587-21d72f9c6094', srcHost='alma05.qa.lab.tlv.redhat.com', dstVdsId='ff0f2c59-93f0-4ae4-b8af-8f7b84790704', dstHost='cyan-vdsg.qa.lab.tlv.redhat.com:54321', migrationMethod='ONLINE', tunnelMigration='false', migrationDowntime='0', autoConverge='false', migrateCompressed='false', consoleAddress='null', maxBandwidth='500', enableGuestEvents='false', maxIncomingMigrations='2', maxOutgoingMigrations='2', convergenceSchedule='null'}), log id: 72121b2e
Hi Martin. Please confirm whether this bug requires doc text, and if yes, please set the requires doc text flag accordingly. Thanks.
yes it does, as a bug fix note
Martin, I need some clarification in order to edit the doc text. Can you please explain what you mean by "migration limits" and what is the connection with the retry attempts. Are the limits the number of times to retry the migration?
@Emma, in this particular context "migration limits" is used to describe the "Maximum number of incoming concurrent migrations" and the "Maximum number of outgoing concurrent migrations" per host. If on one of the ends the capacity is reached (e.g. *source* has reached max outgoing or the *destination* max incoming) the VDSM will attempt a retry when the capacity becomes available. This is a new behavior in 4.0 and is triggered only by supplying the parameters from the engine (they are sent as a part of each "migrate" operation to VDSM) - which was missing, so older behavior (fail on full capacity) was in effect. HTH Martin
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://rhn.redhat.com/errata/RHBA-2017-0544.html
I don't think that's the right errata linked in comment #26. That one points to vdsm for 4.0.7, but the gerrit patches attached to this bug are to the engine. Regards, GFW.
(In reply to Gordon Watson from comment #27) > I don't think that's the right errata linked in comment #26. That one points > to vdsm for 4.0.7, but the gerrit patches attached to this bug are to the > engine. > > Regards, GFW. right, it should be the engine one - https://rhn.redhat.com/errata/RHBA-2017-0542.html