Description of problem: A setup with 5 hosts (2 rhels, 3 rhevhs), run 20 VMs, with rhel OS. Concurrent migration for all VMs, ends up with 18 VMs migration successful and 2 failed. Manual migration for these 2 VM succeeded Version-Release number of selected component (if applicable): is29
Created attachment 844065 [details] engine log (concurrent migration performed ~15:31)
Created attachment 844067 [details] source host vdsm log (Please pay attention date on source host is 2 hours behind rhevm time.)
Created attachment 844078 [details] source host libvirt log Please pay attention date on source host is 2 hours behind rhevm time.
I only see 12 of them in the logs, all finished
There are 20 VMs: mig1...7 mig2-1...10 mig1-1...3 18 VMs succeeded (I found the 18 messages of migration completed in engine.log). 2 VMs failed migration, mig2-1, and mig 2-7. They were running on host lilach-vdsb, which is the "source host", that it's logs attached. Here are their failures, in engine.log: VM mig2-1 migration failure: =========================== 2013-12-31 15:31:12,496 INFO [org.ovirt.engine.core.bll.scheduling.SchedulingManager] (pool-4-thread-42) [baa30a4] Candidate host cyan-vdse.qa.lab.tlv.redhat.com (3461a144-63da-44f5-836a-645f49006909) was filtered out by VAR__FILTERTYPE__INTERNAL filter Memory (correlation id: baa30a4) 2013-12-31 15:31:12,496 INFO [org.ovirt.engine.core.bll.scheduling.SchedulingManager] (pool-4-thread-42) [baa30a4] Candidate host silver-vdsc.qa.lab.tlv.redhat.com (5f9f30ff-9bfd-449d-97eb-6e8d6e0d7a02) was filtered out by VAR__FILTERTYPE__INTERNAL filter Memory (correlation id: baa30a4) 2013-12-31 15:31:12,496 INFO [org.ovirt.engine.core.bll.scheduling.SchedulingManager] (pool-4-thread-42) [baa30a4] Candidate host lilach-vdsa.tlv.redhat.com (904ff1e4-bc57-4075-a5a7-659b4c79da61) was filtered out by VAR__FILTERTYPE__INTERNAL filter Memory (correlation id: baa30a4) 2013-12-31 15:31:12,496 INFO [org.ovirt.engine.core.bll.scheduling.SchedulingManager] (pool-4-thread-42) [baa30a4] Candidate host lilach-vdsc.tlv.redhat.com (6cda5eea-62ac-4f7d-94ae-e29369ae2e8c) was filtered out by VAR__FILTERTYPE__INTERNAL filter Memory (correlation id: baa30a4) 2013-12-31 15:31:12,498 ERROR [org.ovirt.engine.core.bll.MigrateVmCommand] (pool-4-thread-42) [baa30a4] Command org.ovirt.engine.core.bll.MigrateVmCommand throw Vdc Bll exception. With error message VdcBLLException: RESOURCE_MANAGER_VDS_NOT_FOUND (Failed with error RESOURCE_MANAGER_VDS_NOT_FOUND and code 5004) 2013-12-31 15:31:12,513 ERROR [org.ovirt.engine.core.bll.MigrateVmCommand] (pool-4-thread-42) [baa30a4] Transaction rolled-back for command: org.ovirt.engine.core.bll.MigrateVmCommand. 2013-12-31 15:31:12,549 INFO [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (pool-4-thread-42) [baa30a4] Correlation ID: baa30a4, Job ID: 51f97157-a0d2-4725-9aca-247b4a24a7e8, Call Stack: null, Custom Event ID: -1, Message: Migration failed (VM: mig2-1, Source: lilach-vdsb.tlv.redhat.com, Destination: <UNKNOWN>). VM mig2-1 migration failure: =========================== 2013-12-31 15:32:02,530 INFO [org.ovirt.engine.core.bll.scheduling.SchedulingManager] (pool-4-thread-42) [18115901] Candidate host cyan-vdse.qa.lab.tlv.redhat.com (3461a144-63da-44f5-836a-645f49006909) was filtered out by VAR__FILTERTYPE__INTERNAL filter Memory (correlation id: 18115901) 2013-12-31 15:32:02,530 INFO [org.ovirt.engine.core.bll.scheduling.SchedulingManager] (pool-4-thread-42) [18115901] Candidate host silver-vdsc.qa.lab.tlv.redhat.com (5f9f30ff-9bfd-449d-97eb-6e8d6e0d7a02) was filtered out by VAR__FILTERTYPE__INTERNAL filter Memory (correlation id: 18115901) 2013-12-31 15:32:02,530 INFO [org.ovirt.engine.core.bll.scheduling.SchedulingManager] (pool-4-thread-42) [18115901] Candidate host lilach-vdsa.tlv.redhat.com (904ff1e4-bc57-4075-a5a7-659b4c79da61) was filtered out by VAR__FILTERTYPE__INTERNAL filter Memory (correlation id: 18115901) 2013-12-31 15:32:02,530 INFO [org.ovirt.engine.core.bll.scheduling.SchedulingManager] (pool-4-thread-42) [18115901] Candidate host lilach-vdsc.tlv.redhat.com (6cda5eea-62ac-4f7d-94ae-e29369ae2e8c) was filtered out by VAR__FILTERTYPE__INTERNAL filter Memory (correlation id: 18115901) 2013-12-31 15:32:02,530 ERROR [org.ovirt.engine.core.bll.MigrateVmCommand] (pool-4-thread-42) [18115901] Command org.ovirt.engine.core.bll.MigrateVmCommand throw Vdc Bll exception. With error message VdcBLLException: RESOURCE_MANAGER_VDS_NOT_FOUND (Failed with error RESOURCE_MANAGER_VDS_NOT_FOUND and code 5004) 2013-12-31 15:32:02,533 ERROR [org.ovirt.engine.core.bll.MigrateVmCommand] (pool-4-thread-42) [18115901] Transaction rolled-back for command: org.ovirt.engine.core.bll.MigrateVmCommand. 2013-12-31 15:32:02,537 INFO [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (pool-4-thread-42) [18115901] Correlation ID: 18115901, Job ID: 400c7767-5710-4bc5-9469-04ace3ddb6b4, Call Stack: null, Custom Event ID: -1, Message: Migration failed (VM: mig2-7, Source: lilach-vdsb.tlv.redhat.com, Destination: <UNKNOWN>). Manual migration to VM mig2-7 (engine.log): ========================================== 2013-12-31 15:40:56,355 INFO [org.ovirt.engine.core.bll.MigrateVmToServerCommand] (pool-4-thread-42) [26b6b128] Running command: MigrateVmToServerCommand internal: false. Entities affected : ID: 1b16a838-c43f-4896-8aef-ca319cd041b8 Type: VM 2013-12-31 15:40:56,400 INFO [org.ovirt.engine.core.vdsbroker.MigrateVDSCommand] (pool-4-thread-42) [26b6b128] START, MigrateVDSCommand(HostName = lilach-vdsb.tlv.redhat.com, HostId = 5ada85a2-ed80-4fb0-abaf-5b329ca5f3be, vmId=1b16a838-c43f-4896-8aef-ca319cd041b8, srcHost=10.35.5.48, dstVdsId=6cda5eea-62ac-4f7d-94ae-e29369ae2e8c, dstHost=10.35.4.120:54321, migrationMethod=ONLINE, tunnelMigration=false), log id: 13b3b776 2013-12-31 15:40:56,400 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.MigrateBrokerVDSCommand] (pool-4-thread-42) [26b6b128] VdsBroker::migrate::Entered (vm_guid=1b16a838-c43f-4896-8aef-ca319cd041b8, srcHost=10.35.5.48, dstHost=10.35.4.120:54321, method=online 2013-12-31 15:40:56,412 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.MigrateBrokerVDSCommand] (pool-4-thread-42) [26b6b128] START, MigrateBrokerVDSCommand(HostName = lilach-vdsb.tlv.redhat.com, HostId = 5ada85a2-ed80-4fb0-abaf-5b329ca5f3be, vmId=1b16a838-c43f-4896-8aef-ca319cd041b8, srcHost=10.35.5.48, dstVdsId=6cda5eea-62ac-4f7d-94ae-e29369ae2e8c, dstHost=10.35.4.120:54321, migrationMethod=ONLINE, tunnelMigration=false), log id: 7d1bdb99 2013-12-31 15:40:56,458 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.MigrateBrokerVDSCommand] (pool-4-thread-42) [26b6b128] FINISH, MigrateBrokerVDSCommand, log id: 7d1bdb99 2013-12-31 15:40:56,550 INFO [org.ovirt.engine.core.vdsbroker.MigrateVDSCommand] (pool-4-thread-42) [26b6b128] FINISH, MigrateVDSCommand, return: MigratingFrom, log id: 13b3b776 2013-12-31 15:40:56,559 INFO [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (pool-4-thread-42) [26b6b128] Correlation ID: 26b6b128, Job ID: 2924ec16-e888-4b90-a268-7fb988da713b, Call Stack: null, Custom Event ID: -1, Message: Migration started (VM: mig2-7, Source: lilach-vdsb.tlv.redhat.com, Destination: lilach-vdsc.tlv.redhat.com, User: admin@internal). 2013-12-31 15:40:59,234 INFO [org.ovirt.engine.core.vdsbroker.VdsUpdateRunTimeInfo] (DefaultQuartzScheduler_Worker-24) RefreshVmList vm id 1b16a838-c43f-4896-8aef-ca319cd041b8 is migrating to vds lilach-vdsc.tlv.redhat.com ignoring it in the refresh until migration is done 2013-12-31 15:41:02,258 INFO [org.ovirt.engine.core.vdsbroker.VdsUpdateRunTimeInfo] (DefaultQuartzScheduler_Worker-99) RefreshVmList vm id 1b16a838-c43f-4896-8aef-ca319cd041b8 is migrating to vds lilach-vdsc.tlv.redhat.com ignoring it in the refresh until migration is done 2013-12-31 15:41:05,319 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.FullListVdsCommand] (DefaultQuartzScheduler_Worker-56) START, FullListVdsCommand(HostName = lilach-vdsc.tlv.redhat.com, HostId = 6cda5eea-62ac-4f7d-94ae-e29369ae2e8c, vds=Host[lilach-vdsc.tlv.redhat.com], vmIds=[1b16a838-c43f-4896-8aef-ca319cd041b8]), log id: 3ab1d8a4 2013-12-31 15:41:05,329 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.FullListVdsCommand] (DefaultQuartzScheduler_Worker-56) FINISH, FullListVdsCommand, return: [Ljava.util.HashMap;@72061516, log id: 3ab1d8a4 2013-12-31 15:41:05,385 INFO [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (DefaultQuartzScheduler_Worker-56) Correlation ID: 26b6b128, Job ID: 2924ec16-e888-4b90-a268-7fb988da713b, Call Stack: null, Custom Event ID: -1, Message: Migration completed (VM: mig2-7, Source: lilach-vdsb.tlv.redhat.com, Destination: lilach-vdsc.tlv.redhat.com, Duration: 8 sec).
ah, ok, I was talking about the vdsm log, but from the above this seems to fail in engine scheduling. That is expected if you try mass migration. Once things settles down you can then trigger it again and it may succeed. Maybe consider several tries (with a delay) for this action similarly to when we move to maintenance?
We'll look into it to see what can be done in cases of multiple actions / loaded system.
Good cache Ilanit :-). hopefully the fix will cover all other sort of strange scenarios.
(In reply to Gilad Chaplik from comment #8) > Good cache Ilanit :-). hopefully the fix will cover all other sort of > strange scenarios. meant catch of course :) dealing with too much cache lately ;)
Gilad, How should this bug be verified? Exactly same as in description, using 5 hosts? Thanks, Ilanit.
(In reply to Ilanit Stein from comment #11) > Gilad, > > How should this bug be verified? > Exactly same as in description, using 5 hosts? hi Ilanit, yes, please create a migration storm while all resources are consumed. > > Thanks, > Ilanit.
on setup suggested by Ilanit, everything seems to go as expected I created another stress scenario with setup which failed: setup build av8 3 hosts 8GB memory running RHEL 6.5 - intel 30Vms 512MB memory running (no operating system) concurrent migration give RESOURCE_MANAGER_VDS_NOT_FOUND exception Gilad: could this be because of no GA installed on VMs? If yes I think we should count with 512MB as worst scenario if no GA installed. attaching engine.log of migration
Created attachment 891205 [details] engine log of concurrent 30 VM migration on 3 hosts
Arik?
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. http://rhn.redhat.com/errata/RHSA-2014-0506.html