Bug 1047629 - VMs migration fail though migration is possible.
Summary: VMs migration fail though migration is possible.
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Virtualization Manager
Classification: Red Hat
Component: ovirt-engine
Version: 3.3.0
Hardware: Unspecified
OS: Unspecified
medium
medium
Target Milestone: ---
: 3.4.1
Assignee: Gilad Chaplik
QA Contact: Lukas Svaty
URL:
Whiteboard: sla
Depends On:
Blocks: rhev3.4snap1 rhev3.5beta 1156165
TreeView+ depends on / blocked
 
Reported: 2014-01-01 08:38 UTC by Ilanit Stein
Modified: 2018-12-05 16:52 UTC (History)
18 users (show)

Fixed In Version: av4
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2014-06-09 15:08:01 UTC
oVirt Team: SLA
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
engine log (concurrent migration performed ~15:31) (243.86 KB, application/x-gzip)
2014-01-01 09:40 UTC, Ilanit Stein
no flags Details
source host vdsm log (Please pay attention date on source host is 2 hours behind rhevm time.) (705.54 KB, application/x-xz)
2014-01-01 09:52 UTC, Ilanit Stein
no flags Details
source host libvirt log (658.88 KB, application/x-xz)
2014-01-01 10:01 UTC, Ilanit Stein
no flags Details
engine log of concurrent 30 VM migration on 3 hosts (239.78 KB, text/x-log)
2014-04-30 14:50 UTC, Lukas Svaty
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHSA-2014:0506 0 normal SHIPPED_LIVE Moderate: Red Hat Enterprise Virtualization Manager 3.4.0 update 2014-06-09 18:55:38 UTC
oVirt gerrit 24651 0 None None None Never
oVirt gerrit 25461 0 None None None Never

Description Ilanit Stein 2014-01-01 08:38:21 UTC
Description of problem:

A setup with 5 hosts (2 rhels, 3 rhevhs), run 20 VMs, with rhel OS.
Concurrent migration for all VMs, ends up with 18 VMs migration successful and 2 failed.   
Manual migration for these 2 VM succeeded

 
Version-Release number of selected component (if applicable):
is29

Comment 1 Ilanit Stein 2014-01-01 09:40:56 UTC
Created attachment 844065 [details]
engine log (concurrent migration performed ~15:31)

Comment 2 Ilanit Stein 2014-01-01 09:52:38 UTC
Created attachment 844067 [details]
source host vdsm log (Please pay attention date on source host is 2 hours behind rhevm time.)

Comment 3 Ilanit Stein 2014-01-01 10:01:48 UTC
Created attachment 844078 [details]
source host libvirt log

Please pay attention date on source host is 2 hours behind rhevm time.

Comment 4 Michal Skrivanek 2014-01-07 11:42:10 UTC
I only see 12 of them in the logs, all finished

Comment 5 Ilanit Stein 2014-01-07 13:36:19 UTC
There are 20 VMs:
mig1...7
mig2-1...10
mig1-1...3

18 VMs succeeded (I found the 18 messages of migration completed in engine.log).
2 VMs failed migration, mig2-1, and mig 2-7.
They were running on host lilach-vdsb, which is the "source host", that it's logs attached.
Here are their failures, in engine.log:

VM mig2-1 migration failure:
===========================
2013-12-31 15:31:12,496 INFO  [org.ovirt.engine.core.bll.scheduling.SchedulingManager] (pool-4-thread-42) [baa30a4] Candidate host cyan-vdse.qa.lab.tlv.redhat.com (3461a144-63da-44f5-836a-645f49006909) was filtered out by VAR__FILTERTYPE__INTERNAL filter Memory (correlation id: baa30a4)
2013-12-31 15:31:12,496 INFO  [org.ovirt.engine.core.bll.scheduling.SchedulingManager] (pool-4-thread-42) [baa30a4] Candidate host silver-vdsc.qa.lab.tlv.redhat.com (5f9f30ff-9bfd-449d-97eb-6e8d6e0d7a02) was filtered out by VAR__FILTERTYPE__INTERNAL filter Memory (correlation id: baa30a4)
2013-12-31 15:31:12,496 INFO  [org.ovirt.engine.core.bll.scheduling.SchedulingManager] (pool-4-thread-42) [baa30a4] Candidate host lilach-vdsa.tlv.redhat.com (904ff1e4-bc57-4075-a5a7-659b4c79da61) was filtered out by VAR__FILTERTYPE__INTERNAL filter Memory (correlation id: baa30a4)
2013-12-31 15:31:12,496 INFO  [org.ovirt.engine.core.bll.scheduling.SchedulingManager] (pool-4-thread-42) [baa30a4] Candidate host lilach-vdsc.tlv.redhat.com (6cda5eea-62ac-4f7d-94ae-e29369ae2e8c) was filtered out by VAR__FILTERTYPE__INTERNAL filter Memory (correlation id: baa30a4)
2013-12-31 15:31:12,498 ERROR [org.ovirt.engine.core.bll.MigrateVmCommand] (pool-4-thread-42) [baa30a4] Command org.ovirt.engine.core.bll.MigrateVmCommand throw Vdc Bll exception. With error message VdcBLLException: RESOURCE_MANAGER_VDS_NOT_FOUND (Failed with error RESOURCE_MANAGER_VDS_NOT_FOUND and code 5004)
2013-12-31 15:31:12,513 ERROR [org.ovirt.engine.core.bll.MigrateVmCommand] (pool-4-thread-42) [baa30a4] Transaction rolled-back for command: org.ovirt.engine.core.bll.MigrateVmCommand.
2013-12-31 15:31:12,549 INFO  [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (pool-4-thread-42) [baa30a4] Correlation ID: baa30a4, Job ID: 51f97157-a0d2-4725-9aca-247b4a24a7e8, Call Stack: null, Custom Event ID: -1, Message: Migration failed  (VM: mig2-1, Source: lilach-vdsb.tlv.redhat.com, Destination: <UNKNOWN>).

VM mig2-1 migration failure:
===========================

2013-12-31 15:32:02,530 INFO  [org.ovirt.engine.core.bll.scheduling.SchedulingManager] (pool-4-thread-42) [18115901] Candidate host cyan-vdse.qa.lab.tlv.redhat.com (3461a144-63da-44f5-836a-645f49006909) was filtered out by VAR__FILTERTYPE__INTERNAL filter Memory (correlation id: 18115901)
2013-12-31 15:32:02,530 INFO  [org.ovirt.engine.core.bll.scheduling.SchedulingManager] (pool-4-thread-42) [18115901] Candidate host silver-vdsc.qa.lab.tlv.redhat.com (5f9f30ff-9bfd-449d-97eb-6e8d6e0d7a02) was filtered out by VAR__FILTERTYPE__INTERNAL filter Memory (correlation id: 18115901)
2013-12-31 15:32:02,530 INFO  [org.ovirt.engine.core.bll.scheduling.SchedulingManager] (pool-4-thread-42) [18115901] Candidate host lilach-vdsa.tlv.redhat.com (904ff1e4-bc57-4075-a5a7-659b4c79da61) was filtered out by VAR__FILTERTYPE__INTERNAL filter Memory (correlation id: 18115901)
2013-12-31 15:32:02,530 INFO  [org.ovirt.engine.core.bll.scheduling.SchedulingManager] (pool-4-thread-42) [18115901] Candidate host lilach-vdsc.tlv.redhat.com (6cda5eea-62ac-4f7d-94ae-e29369ae2e8c) was filtered out by VAR__FILTERTYPE__INTERNAL filter Memory (correlation id: 18115901)
2013-12-31 15:32:02,530 ERROR [org.ovirt.engine.core.bll.MigrateVmCommand] (pool-4-thread-42) [18115901] Command org.ovirt.engine.core.bll.MigrateVmCommand throw Vdc Bll exception. With error message VdcBLLException: RESOURCE_MANAGER_VDS_NOT_FOUND (Failed with error RESOURCE_MANAGER_VDS_NOT_FOUND and code 5004)
2013-12-31 15:32:02,533 ERROR [org.ovirt.engine.core.bll.MigrateVmCommand] (pool-4-thread-42) [18115901] Transaction rolled-back for command: org.ovirt.engine.core.bll.MigrateVmCommand.
2013-12-31 15:32:02,537 INFO  [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (pool-4-thread-42) [18115901] Correlation ID: 18115901, Job ID: 400c7767-5710-4bc5-9469-04ace3ddb6b4, Call Stack: null, Custom Event ID: -1, Message: Migration failed  (VM: mig2-7, Source: lilach-vdsb.tlv.redhat.com, Destination: <UNKNOWN>).

Manual migration to VM mig2-7 (engine.log):
==========================================
2013-12-31 15:40:56,355 INFO  [org.ovirt.engine.core.bll.MigrateVmToServerCommand] (pool-4-thread-42) [26b6b128] Running command: MigrateVmToServerCommand internal: false. Entities affected :  ID: 1b16a838-c43f-4896-8aef-ca319cd041b8 Type: VM
2013-12-31 15:40:56,400 INFO  [org.ovirt.engine.core.vdsbroker.MigrateVDSCommand] (pool-4-thread-42) [26b6b128] START, MigrateVDSCommand(HostName = lilach-vdsb.tlv.redhat.com, HostId = 5ada85a2-ed80-4fb0-abaf-5b329ca5f3be, vmId=1b16a838-c43f-4896-8aef-ca319cd041b8, srcHost=10.35.5.48, dstVdsId=6cda5eea-62ac-4f7d-94ae-e29369ae2e8c, dstHost=10.35.4.120:54321, migrationMethod=ONLINE, tunnelMigration=false), log id: 13b3b776
2013-12-31 15:40:56,400 INFO  [org.ovirt.engine.core.vdsbroker.vdsbroker.MigrateBrokerVDSCommand] (pool-4-thread-42) [26b6b128] VdsBroker::migrate::Entered (vm_guid=1b16a838-c43f-4896-8aef-ca319cd041b8, srcHost=10.35.5.48, dstHost=10.35.4.120:54321,  method=online
2013-12-31 15:40:56,412 INFO  [org.ovirt.engine.core.vdsbroker.vdsbroker.MigrateBrokerVDSCommand] (pool-4-thread-42) [26b6b128] START, MigrateBrokerVDSCommand(HostName = lilach-vdsb.tlv.redhat.com, HostId = 5ada85a2-ed80-4fb0-abaf-5b329ca5f3be, vmId=1b16a838-c43f-4896-8aef-ca319cd041b8, srcHost=10.35.5.48, dstVdsId=6cda5eea-62ac-4f7d-94ae-e29369ae2e8c, dstHost=10.35.4.120:54321, migrationMethod=ONLINE, tunnelMigration=false), log id: 7d1bdb99
2013-12-31 15:40:56,458 INFO  [org.ovirt.engine.core.vdsbroker.vdsbroker.MigrateBrokerVDSCommand] (pool-4-thread-42) [26b6b128] FINISH, MigrateBrokerVDSCommand, log id: 7d1bdb99
2013-12-31 15:40:56,550 INFO  [org.ovirt.engine.core.vdsbroker.MigrateVDSCommand] (pool-4-thread-42) [26b6b128] FINISH, MigrateVDSCommand, return: MigratingFrom, log id: 13b3b776
2013-12-31 15:40:56,559 INFO  [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (pool-4-thread-42) [26b6b128] Correlation ID: 26b6b128, Job ID: 2924ec16-e888-4b90-a268-7fb988da713b, Call Stack: null, Custom Event ID: -1, Message: Migration started (VM: mig2-7, Source: lilach-vdsb.tlv.redhat.com, Destination: lilach-vdsc.tlv.redhat.com, User: admin@internal).
2013-12-31 15:40:59,234 INFO  [org.ovirt.engine.core.vdsbroker.VdsUpdateRunTimeInfo] (DefaultQuartzScheduler_Worker-24) RefreshVmList vm id 1b16a838-c43f-4896-8aef-ca319cd041b8 is migrating to vds lilach-vdsc.tlv.redhat.com ignoring it in the refresh until migration is done
2013-12-31 15:41:02,258 INFO  [org.ovirt.engine.core.vdsbroker.VdsUpdateRunTimeInfo] (DefaultQuartzScheduler_Worker-99) RefreshVmList vm id 1b16a838-c43f-4896-8aef-ca319cd041b8 is migrating to vds lilach-vdsc.tlv.redhat.com ignoring it in the refresh until migration is done
2013-12-31 15:41:05,319 INFO  [org.ovirt.engine.core.vdsbroker.vdsbroker.FullListVdsCommand] (DefaultQuartzScheduler_Worker-56) START, FullListVdsCommand(HostName = lilach-vdsc.tlv.redhat.com, HostId = 6cda5eea-62ac-4f7d-94ae-e29369ae2e8c, vds=Host[lilach-vdsc.tlv.redhat.com], vmIds=[1b16a838-c43f-4896-8aef-ca319cd041b8]), log id: 3ab1d8a4
2013-12-31 15:41:05,329 INFO  [org.ovirt.engine.core.vdsbroker.vdsbroker.FullListVdsCommand] (DefaultQuartzScheduler_Worker-56) FINISH, FullListVdsCommand, return: [Ljava.util.HashMap;@72061516, log id: 3ab1d8a4
2013-12-31 15:41:05,385 INFO  [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (DefaultQuartzScheduler_Worker-56) Correlation ID: 26b6b128, Job ID: 2924ec16-e888-4b90-a268-7fb988da713b, Call Stack: null, Custom Event ID: -1, Message: Migration completed (VM: mig2-7, Source: lilach-vdsb.tlv.redhat.com, Destination: lilach-vdsc.tlv.redhat.com, Duration: 8 sec).

Comment 6 Michal Skrivanek 2014-01-07 13:43:52 UTC
ah, ok, I was talking about the vdsm log, but from the above this seems to fail in engine scheduling.
That is expected if you try mass migration. Once things settles down you can then trigger it again and it may succeed. 

Maybe consider several tries (with a delay) for this action similarly to when we move to maintenance?

Comment 7 Doron Fediuck 2014-01-07 16:55:43 UTC
We'll look into it to see what can be done in cases of multiple actions / loaded system.

Comment 8 Gilad Chaplik 2014-02-19 12:00:47 UTC
Good cache Ilanit :-). hopefully the fix will cover all other sort of strange scenarios.

Comment 9 Gilad Chaplik 2014-02-19 12:02:03 UTC
(In reply to Gilad Chaplik from comment #8)
> Good cache Ilanit :-). hopefully the fix will cover all other sort of
> strange scenarios.

meant catch of course :) dealing with too much cache lately ;)

Comment 11 Ilanit Stein 2014-04-03 05:57:06 UTC
Gilad, 

How should this bug be verified? 
Exactly same as in description, using 5 hosts?

Thanks,
Ilanit.

Comment 12 Gilad Chaplik 2014-04-03 14:44:59 UTC
(In reply to Ilanit Stein from comment #11)
> Gilad, 
> 
> How should this bug be verified? 
> Exactly same as in description, using 5 hosts?

hi Ilanit, 

yes, please create a migration storm while all resources are consumed.

> 
> Thanks,
> Ilanit.

Comment 14 Lukas Svaty 2014-04-30 14:49:11 UTC
on setup suggested by Ilanit, everything seems to go as expected

I created another stress scenario with setup which failed:
setup build av8
3 hosts 8GB memory running RHEL 6.5 - intel
30Vms 512MB memory running (no operating system)

concurrent migration give RESOURCE_MANAGER_VDS_NOT_FOUND exception

Gilad: could this be because of no GA installed on VMs?
If yes I think we should count with 512MB as worst scenario if no GA installed.

attaching engine.log of migration

Comment 15 Lukas Svaty 2014-04-30 14:50:54 UTC
Created attachment 891205 [details]
engine log of concurrent 30 VM migration on 3 hosts

Comment 16 Gilad Chaplik 2014-04-30 17:46:16 UTC
Arik?

Comment 19 errata-xmlrpc 2014-06-09 15:08:01 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHSA-2014-0506.html


Note You need to log in before you can comment on or make changes to this bug.