Bug 1409246 - VM migration failing with "Returning backwards compatible migration error code"
Summary: VM migration failing with "Returning backwards compatible migration error code"
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Virtualization Manager
Classification: Red Hat
Component: vdsm
Version: 4.0.2
Hardware: Unspecified
OS: Linux
unspecified
high
Target Milestone: ovirt-4.1.0-rc
: ---
Assignee: Martin Betak
QA Contact: Israel Pinto
URL:
Whiteboard:
Depends On:
Blocks: 1413939
TreeView+ depends on / blocked
 
Reported: 2016-12-30 17:01 UTC by Gordon Watson
Modified: 2020-06-11 13:10 UTC (History)
15 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Previously, the Manager didn't send migration limits to VDSM during VM migration operations. The destination VDSM understood this lack of migration limits as migration from an old source that didn't support the retry logic, failing the migration altogether instead of having the source attempt a proper retry. This could ultimately result in the host entering an error state; especially problematic, for example, when attempting to put a host to maintenance. Now, the Manager sends migrations limits as part of the VM migration operation, so the migration retry logic between the source and destination VDSM is properly triggered when necessary.
Clone Of:
: 1413939 (view as bug list)
Environment:
Last Closed: 2017-04-25 00:41:28 UTC
oVirt Team: Virt
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHEA-2017:0998 0 normal SHIPPED_LIVE VDSM bug fix and enhancement update 4.1 GA 2017-04-18 20:11:39 UTC
oVirt gerrit 69621 0 None MERGED core: Pass migration limits in MigrateBrokerVDSCommand 2021-01-18 21:29:22 UTC
oVirt gerrit 69629 0 None MERGED core: Pass migration limits in MigrateBrokerVDSCommand 2021-01-18 21:30:02 UTC

Description Gordon Watson 2016-12-30 17:01:37 UTC
Description of problem:

VM migration fails with;

  - src host;

Thread-12509491::INFO::2016-12-22 09:49:56,306::migration::407::virt.vm::(_startUnderlyingMigration) vmId=`36710f55-4f25-4c80-912e-d7c9dfc87b99`::Creation of destination VM took: 0 seconds
Thread-12509491::ERROR::2016-12-22 09:49:56,307::migration::252::virt.vm::(_recover) vmId=`36710f55-4f25-4c80-912e-d7c9dfc87b99`::migration destination error: Fatal error during migration


  - dest host;

jsonrpc.Executor/0::DEBUG::2016-12-22 09:49:56,304::API::601::vds::(migrationCreate) Migration create - Failed
jsonrpc.Executor/0::DEBUG::2016-12-22 09:49:56,305::API::607::vds::(migrationCreate) Returning backwards compatible migration error code


There are no clues in the VM's qemu log, the 'messages' file or the journald logs. VDSM should report the reason for the failure to create the VM.



Version-Release number of selected component (if applicable):

- RHEV 4.0.2
- RHEL 7.2 hosts;
  - vdsm-4.18.11-1.el7
  - libvirt-1.2.17-13.el7_2.5
  - qemu-kvm-rhev-2.3.0-31.el7_2.21


How reproducible:

Not reproducible.


Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:

A meaningful error.


Additional info:

A side-effect of this is that when putting a host into maintenance mode, if some migrations fail in this manner and they are all to the same host, that host can get placed into an 'ERROR' state.

Comment 15 Michal Skrivanek 2017-01-03 10:09:46 UTC
the problem is on the engine side where the incoming/outgoing limits are not sent

Comment 19 Israel Pinto 2017-02-07 13:11:38 UTC
Verify with:
Engine: Version 4.1.0.3-0.1.el7

Step:
migrate vm and check the incoming/outgoing limit is set.

Results:
Engine log: [maxOutgoingMigrations = 2, maxIncomingMigrations = 2]
2017-02-07 14:41:47,859+02 INFO  [org.ovirt.engine.core.vdsbroker.vdsbroker.MigrateBrokerVDSCommand] (org.ovirt.thread.pool-6-thread-50) [3add7741-1207-4618-b2f7-d8e51cad6c6d] START, MigrateBrokerVDSCommand(HostName = host_mixed_1, MigrateVDSCommandParameters:{runAsync='true', hostId='b879d50c-67e0-479a-9300-a7b7b2f1339c', vmId='1b1217db-ff0a-4e98-b408-a4f71e8af776', srcHost='seal08.qa.lab.tlv.redhat.com', dstVdsId='30ef250f-3a7d-4ed5-a8ca-75834da14cdb', dstHost='seal09.qa.lab.tlv.redhat.com:54321', migrationMethod='ONLINE', tunnelMigration='false', migrationDowntime='0', autoConverge='true', migrateCompressed='false', consoleAddress='null', maxBandwidth='5000', enableGuestEvents='true', maxIncomingMigrations='2', maxOutgoingMigrations='2', convergenceSchedule='[init=[{name=setDowntime, params=[100]}], stalling=[{limit=1, action={name=setDowntime, params=[150]}}, {limit=2, action={name=setDowntime, params=[200]}}, {limit=3, action={name=setDowntime, params=[300]}}, {limit=4, action={name=setDowntime, params=[400]}}, {limit=6, action={name=setDowntime, params=[500]}}, {limit=-1, action={name=abort, params=[]}}]]'}), log id: 451aeef4

vdms log: [incomingLimit = 2 , outgoingLimit = 2]
2017-02-07 14:41:42,272 INFO  (periodic/2) [dispatcher] Run and protect: repoStats, Return response: {'06a1f8f4-81aa-49b0-b831-88284b222737': {'code': 0, 'actual': True, 'version': 0, 'acquired': True, 'delay': '0.000222759', 'lastCheck': '5.5', 'valid': True}, '62430d5b-8c57-4147-92dc-1ea907963ff9': {'code': 0, 'actual': True, 'version': 0, 'acquired': True, 'delay': '0.000242301', 'lastCheck': '0.9', 'valid': True}, 'fa65b176-cb84-41fd-8bf6-dae05dcc2595': {'code': 0, 'actual': True, 'version': 4, 'acquired': True, 'delay': '0.000257589', 'lastCheck': '0.9', 'valid': True}, 'bbb7e15c-7884-4d52-8bc4-d2de58e21a78': {'code': 0, 'actual': True, 'version': 4, 'acquired': True, 'delay': '0.000227469', 'lastCheck': '0.9', 'valid': True}} (logUtils:52)
2017-02-07 14:41:46,857 INFO  (jsonrpc/7) [vdsm.api] START migrate args=(<virt.vm.Vm object at 0x371fad0>, {'incomingLimit': 2, 'src': 'seal08.qa.lab.tlv.redhat.com', 'dstqemu': '10.35.117.90', 'autoConverge': 'true', 'tunneled': 'false', 'enableGuestEvents': True, 'dst': 'seal09.qa.lab.tlv.redhat.com:54321', 'convergenceSchedule': {'init': [{'params': ['100'], 'name': 'setDowntime'}], 'stalling': [{'action': {'params': ['150'], 'name': 'setDowntime'}, 'limit': 1}, {'action': {'params': ['200'], 'name': 'setDowntime'}, 'limit': 2}, {'action': {'params': ['300'], 'name': 'setDowntime'}, 'limit': 3}, {'action': {'params': ['400'], 'name': 'setDowntime'}, 'limit': 4}, {'action': {'params': ['500'], 'name': 'setDowntime'}, 'limit': 6}, {'action': {'params': [], 'name': 'abort'}, 'limit': -1}]}, 'vmId': '1b1217db-ff0a-4e98-b408-a4f71e8af776', 'abortOnError': 'true', 'outgoingLimit': 2, 'compressed': 'false', 'maxBandwidth': 5000, 'method': 'online', 'mode': 'remote'}) kwargs={} (api:37)


Note You need to log in before you can comment on or make changes to this bug.