Bug 1413939 - [z-stream clone - 4.0.7] VM migration failing with "Returning backwards compatible migration error code"
Summary: [z-stream clone - 4.0.7] VM migration failing with "Returning backwards compa...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Virtualization Manager
Classification: Red Hat
Component: vdsm
Version: 4.0.2
Hardware: Unspecified
OS: Linux
unspecified
high
Target Milestone: ovirt-4.0.7
: ---
Assignee: Martin Betak
QA Contact: Israel Pinto
URL:
Whiteboard:
Depends On: 1409246
Blocks:
TreeView+ depends on / blocked
 
Reported: 2017-01-17 11:47 UTC by rhev-integ
Modified: 2020-06-11 13:29 UTC (History)
14 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Previously, the Manager did not send migration limitations to the VDSM during virtual machine migration operations, to define the maximum number of concurrent incoming and outgoing operations. As a result, if one of these limits was reached by the VDSM, it did not attempt to retry the operation that failed. Now, the Manager sends the migration limitations to the VDSM as part of each migration operation, and consequently, if one of the limits is reached, the VDSM will retry when the required capacity becomes available.
Clone Of: 1409246
Environment:
Last Closed: 2017-03-16 15:36:03 UTC
oVirt Team: Virt
Target Upstream Version:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2017:0544 0 normal SHIPPED_LIVE vdsm 4.0.7 bug fix and enhancement update 2017-03-16 19:25:18 UTC
oVirt gerrit 69621 0 None None None 2017-01-17 11:49:26 UTC
oVirt gerrit 69629 0 None None None 2017-01-17 11:49:26 UTC

Description rhev-integ 2017-01-17 11:47:41 UTC
+++ This bug is a downstream clone. The original bug is: +++
+++   bug 1409246 +++
======================================================================

Description of problem:

VM migration fails with;

  - src host;

Thread-12509491::INFO::2016-12-22 09:49:56,306::migration::407::virt.vm::(_startUnderlyingMigration) vmId=`36710f55-4f25-4c80-912e-d7c9dfc87b99`::Creation of destination VM took: 0 seconds
Thread-12509491::ERROR::2016-12-22 09:49:56,307::migration::252::virt.vm::(_recover) vmId=`36710f55-4f25-4c80-912e-d7c9dfc87b99`::migration destination error: Fatal error during migration


  - dest host;

jsonrpc.Executor/0::DEBUG::2016-12-22 09:49:56,304::API::601::vds::(migrationCreate) Migration create - Failed
jsonrpc.Executor/0::DEBUG::2016-12-22 09:49:56,305::API::607::vds::(migrationCreate) Returning backwards compatible migration error code


There are no clues in the VM's qemu log, the 'messages' file or the journald logs. VDSM should report the reason for the failure to create the VM.



Version-Release number of selected component (if applicable):

- RHEV 4.0.2
- RHEL 7.2 hosts;
  - vdsm-4.18.11-1.el7
  - libvirt-1.2.17-13.el7_2.5
  - qemu-kvm-rhev-2.3.0-31.el7_2.21


How reproducible:

Not reproducible.


Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:

A meaningful error.


Additional info:

A side-effect of this is that when putting a host into maintenance mode, if some migrations fail in this manner and they are all to the same host, that host can get placed into an 'ERROR' state.

(Originally by Gordon Watson)

Comment 16 rhev-integ 2017-01-17 11:49:05 UTC
the problem is on the engine side where the incoming/outgoing limits are not sent

(Originally by michal.skrivanek)

Comment 20 Israel Pinto 2017-02-07 13:15:37 UTC
Verify with:
Engine:  4.0.7-0.1.el7ev

Step:
migrate vm and check the incoming/outgoing limit is set.

vdsm [u'outgoingLimit = 2 , incomingLimit = 2]

jsonrpc.Executor/4::DEBUG::2017-02-07 15:06:39,323::__init__::530::jsonrpc.JsonRpcServer::(_handle_request) Calling 'VM.migrate' in bridge with {u'params': {u'incomingLimit': 2, u'src': u'alma05.qa.lab.tlv.redhat.com', u'dstqemu': u'10.35.70.4', u'autoConverge': u'false', u'tunneled': u'false', u'enableGuestEvents': False, u'dst': u'cyan-vdsg.qa.lab.tlv.redhat.com:54321', u'vmId': u'7ad815d0-4aa7-4797-8587-21d72f9c6094', u'abortOnError': u'true', u'outgoingLimit': 2, u'compressed': u'false', u'maxBandwidth': 500, u'method': u'online'}, u'vmID': u'7ad815d0-4aa7-4797-8587-21d72f9c6094'}

Engine [maxIncomingMigrations=2 maxOutgoingMigrations=2]
2017-02-07 15:06:40,319 INFO  [org.ovirt.engine.core.vdsbroker.MigrateVDSCommand] (org.ovirt.thread.pool-6-thread-26) [72ec5728] START, MigrateVDSCommand( MigrateVDSCommandParameters:{runAsync='true', hostId='60da1a9e-b3c6-4588-b282-9cf2bcd27399', vmId='7ad815d0-4aa7-4797-8587-21d72f9c6094', srcHost='alma05.qa.lab.tlv.redhat.com', dstVdsId='ff0f2c59-93f0-4ae4-b8af-8f7b84790704', dstHost='cyan-vdsg.qa.lab.tlv.redhat.com:54321', migrationMethod='ONLINE', tunnelMigration='false', migrationDowntime='0', autoConverge='false', migrateCompressed='false', consoleAddress='null', maxBandwidth='500', enableGuestEvents='false', maxIncomingMigrations='2', maxOutgoingMigrations='2', convergenceSchedule='null'}), log id: 72121b2e

Comment 21 Emma Heftman 2017-02-27 08:43:48 UTC
Hi Martin. Please confirm whether this bug requires doc text, and if yes, please set the requires doc text flag accordingly. Thanks.

Comment 22 Michal Skrivanek 2017-02-27 09:10:43 UTC
yes it does, as a bug fix note

Comment 23 Emma Heftman 2017-02-27 09:47:30 UTC
Martin, I need some clarification in order to edit the doc text. Can you please explain what you mean by "migration limits" and what is the connection with the retry attempts. Are the limits the number of times to retry the migration?

Comment 24 Martin Betak 2017-02-27 09:56:36 UTC
@Emma, in this particular context "migration limits" is used to describe the "Maximum number of incoming concurrent migrations" and the "Maximum number of outgoing concurrent migrations" per host. If on one of the ends the capacity is reached (e.g. *source* has reached max outgoing or the *destination* max incoming) the VDSM will attempt a retry when the capacity becomes available.

This is a new behavior in 4.0 and is triggered only by supplying the parameters from the engine (they are sent as a part of each "migrate" operation to VDSM) - which was missing, so older behavior (fail on full capacity) was in effect.

HTH

Martin

Comment 26 errata-xmlrpc 2017-03-16 15:36:03 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHBA-2017-0544.html

Comment 27 Gordon Watson 2017-03-16 18:08:09 UTC
I don't think that's the right errata linked in comment #26. That one points to vdsm for 4.0.7, but the gerrit patches attached to this bug are to the engine.

Regards, GFW.

Comment 28 Michal Skrivanek 2017-03-16 18:20:57 UTC
(In reply to Gordon Watson from comment #27)
> I don't think that's the right errata linked in comment #26. That one points
> to vdsm for 4.0.7, but the gerrit patches attached to this bug are to the
> engine.
> 
> Regards, GFW.

right, it should be the engine one - https://rhn.redhat.com/errata/RHBA-2017-0542.html


Note You need to log in before you can comment on or make changes to this bug.