Bug 1098763

Summary: Mass migration of 1500 VM's caused VDSErrorException: Failed to GetAllVmStatsVDS
Product: Red Hat Enterprise Virtualization Manager Reporter: Yuri Obshansky <yobshans>
Component: ovirt-engineAssignee: Francesco Romani <fromani>
Status: CLOSED NEXTRELEASE QA Contact:
Severity: medium Docs Contact:
Priority: unspecified    
Version: 3.4.0CC: acathrow, fromani, gklein, iheim, lpeer, ofrenkel, Rhev-m-bugs, yeylon, yobshans
Target Milestone: ---   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard: virt
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2014-05-21 12:12:49 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
Log files
none
libvirtd.log.gz
none
vdsm.log none

Description Yuri Obshansky 2014-05-18 08:00:25 UTC
Description of problem:
Many VDSErrorException occurs during mass VMs migration which are running on fake hosts. 

Version-Release number of selected component (if applicable):
RHEVM 3.4.0-0.16.rc.el6ev
OS Version: RHEL - 6Server - 6.5.0.1.el6
Kernel Version: 2.6.32 - 431.5.1.el6.x86_64
KVM Version: 0.12.1.2 - 2.415.el6_5.6
LIBVIRT Version: libvirt-0.10.2-29.el6_5.5
VDSM Version: vdsm-4.14.7-0.2.rc.el6ev

How reproducible:
Environment: 2 Data Center (Real and Fake),
2 NFS Storage Domains (Real and Fake),
4 real hosts (24 CPU, 64 G),
200 real VMs,
200 fake hosts,
1500 fake VMs.
Perform JMeter script with 100 threads for migrate fake VMS
VDSErrorException occurs during mass VMs migration.

Steps to Reproduce:
1.
2.
3.

Actual results:
1.
2014-05-14 07:48:26,966 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.MigrateStatusVDSCommand] (org.ovirt.thread.pool-4-thread-11) [390924d3] Failed in MigrateStatusVDS method
2014-05-14 07:48:26,967 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.MigrateStatusVDSCommand] (org.ovirt.thread.pool-4-thread-11) [390924d3] Command org.ovirt.engine.core.vdsbroker.vdsbro
ker.MigrateStatusVDSCommand return value
 StatusOnlyReturnForXmlRpc [mStatus=StatusForXmlRpc [mCode=12, mMessage=Fatal error during migration]]
2014-05-14 07:48:26,967 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.MigrateStatusVDSCommand] (org.ovirt.thread.pool-4-thread-11) [390924d3] HostName = fake_host_69
2014-05-14 07:48:26,967 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.MigrateStatusVDSCommand] (org.ovirt.thread.pool-4-thread-11) [390924d3] Command MigrateStatusVDSCommand(HostName = fake_host_69, HostId = c2d01b2e-e1bd-4233-af26-379a186a7e71, vmId=098bdb18-cc21-4e7a-9838-df512a5609a6) execution failed. Exception: VDSErrorException: VDSGenericException: VDSErrorException: Failed to MigrateStatusVDS, error = Fatal error during migration, code = 12
2014-05-14 07:48:26,967 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.MigrateStatusVDSCommand] (org.ovirt.thread.pool-4-thread-11) [390924d3] FINISH, MigrateStatusVDSCommand, log id: 7811c0df
2014-05-14 07:48:26,972 INFO [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (org.ovirt.thread.pool-4-thread-11) [390924d3] Correlation ID: 7bbf5060, Job ID: 874a688b-9f0e-409f-bb1c-c0794745596e, Call Stack: null, Custom Event ID: -1, Message: Migration failed due to Error: Fatal error during migration. Trying to migrate to another Host (VM: fake_490, Source: fake_host_69, Destination: fake_host_105).

2.
2014-05-14 08:02:05,865 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.DestroyVDSCommand] (DefaultQuartzScheduler_Worker-28) [34065ecb] START, DestroyVDSCommand(HostName = fake_host_32, HostId = 78b0e5a2-08df-43f5-8004-67285ce4a2c2, vmId=209701d0-6e97-4cf6-ab92-ba4dfe410686, force=false, secondsToWait=0, gracefully=false), log id: 316fcb34
2014-05-14 08:02:05,866 WARN [org.ovirt.engine.core.utils.threadpool.ThreadPoolUtil] (org.ovirt.thread.pool-4-thread-13) Executing a command: java.util.concurrent.FutureTask , but note that there are 16 tasks in the queue.
2014-05-14 08:02:05,873 WARN [org.ovirt.engine.core.vdsbroker.VdsManager] (DefaultQuartzScheduler_Worker-94) [1d49a2f0] Failed to refresh VDS , vds = 03d530a7-fb23-4131-a85d-54346028ce4b : fake_host_3, error = org.ovirt.engine.core.vdsbroker.vdsbroker.VDSErrorException: VDSGenericException: VDSErrorException: Failed to GetAllVmStatsVDS, error = Unexpected exception, code = 16, continuing.: org.ovirt.engine.core.vdsbroker.vdsbroker.VDSErrorException: VDSGenericException: VDSErrorException: Failed to GetAllVmStatsVDS, error = Unexpected exception, code = 16
        at org.ovirt.engine.core.vdsbroker.vdsbroker.VdsBrokerCommand.createDefaultConcreteException(VdsBrokerCommand.java:61) [vdsbroker.jar:]
        at org.ovirt.engine.core.vdsbroker.vdsbroker.BrokerCommandBase.createException(BrokerCommandBase.java:199) [vdsbroker.jar:]
        at org.ovirt.engine.core.vdsbroker.vdsbroker.BrokerCommandBase.proceedProxyReturnValue(BrokerCommandBase.java:186) [vdsbroker.jar:]
        at org.ovirt.engine.core.vdsbroker.vdsbroker.GetAllVmStatsVDSCommand.executeVdsBrokerCommand(GetAllVmStatsVDSCommand.java:21) [vdsbroker.jar:]
        at org.ovirt.engine.core.vdsbroker.vdsbroker.VdsBrokerCommand.executeVDSCommand(VdsBrokerCommand.java:96) [vdsbroker.jar:]
        at org.ovirt.engine.core.vdsbroker.VDSCommandBase.executeCommand(VDSCommandBase.java:56) [vdsbroker.jar:]
        at org.ovirt.engine.core.dal.VdcCommandBase.execute(VdcCommandBase.java:31) [dal.jar:]
        at org.ovirt.engine.core.vdsbroker.ResourceManager.runVdsCommand(ResourceManager.java:413) [vdsbroker.jar:]
        at org.ovirt.engine.core.vdsbroker.VdsUpdateRunTimeInfo.fetchRunningVms(VdsUpdateRunTimeInfo.java:954) [vdsbroker.jar:]
        at org.ovirt.engine.core.vdsbroker.VdsUpdateRunTimeInfo.refreshVmStats(VdsUpdateRunTimeInfo.java:910) [vdsbroker.jar:]
        at org.ovirt.engine.core.vdsbroker.VdsUpdateRunTimeInfo.refreshVdsRunTimeInfo(VdsUpdateRunTimeInfo.java:507) [vdsbroker.jar:]
        at org.ovirt.engine.core.vdsbroker.VdsUpdateRunTimeInfo.refresh(VdsUpdateRunTimeInfo.java:344) [vdsbroker.jar:]
        at org.ovirt.engine.core.vdsbroker.VdsManager.onTimer(VdsManager.java:231) [vdsbroker.jar:]
        at sun.reflect.GeneratedMethodAccessor10.invoke(Unknown Source) [:1.7.0_55]
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) [rt.jar:1.7.0_55]
        at java.lang.reflect.Method.invoke(Method.java:606) [rt.jar:1.7.0_55]
        at org.ovirt.engine.core.utils.timer.JobWrapper.execute(JobWrapper.java:60) [scheduler.jar:]
        at org.quartz.core.JobRunShell.run(JobRunShell.java:213) [quartz.jar:]
        at org.quartz.simpl.SimpleThreadPool$WorkerThread.run(SimpleThreadPool.java:557) [quartz.jar:] 

Expected results:
no exception

Additional info:

Comment 1 Omer Frenkel 2014-05-19 07:37:43 UTC
please attach full engine.log
please attach vdsm.log and libvirtd.log for source and destination hosts for one of the vms that failed migration
please also attach vdsm.log of host fake_host_3 where GetAllVmStats failed

Comment 2 Yuri Obshansky 2014-05-20 14:04:32 UTC
Created attachment 897615 [details]
Log files

I reproduced that issue again and got:
2014-05-20 12:53:40,178 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.GetAllVmStatsVDSCommand] (DefaultQuartzScheduler_Worker-51) [2ce2411c] Failed in GetAllVmStatsVDS method
2014-05-20 12:53:40,179 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.GetAllVmStatsVDSCommand] (DefaultQuartzScheduler_Worker-51) [2ce2411c] Command GetAllVmStatsVDSCommand(HostName = fake_host_192, HostId = 8e59d93d-f3fe-4cf5-8231-156d98af0e1a, vds=Host[fake_host_192]) execution failed. Exception: VDSErrorException: VDSGenericException: VDSErrorException: Failed to GetAllVmStatsVDS, error = Unexpected exception, code = 16
2014-05-20 12:53:40,179 ERROR [org.ovirt.engine.core.vdsbroker.VdsUpdateRunTimeInfo] (DefaultQuartzScheduler_Worker-51) [2ce2411c] Failed vds listing,  vds = 8e59d93d-f3fe-4cf5-8231-156d98af0e1a : fake_host_192, error = org.ovirt.engine.core.vdsbroker.vdsbroker.VDSErrorException: VDSGenericException: VDSErrorException: Failed to GetAllVmStatsVDS, error = Unexpected exception, code = 16
2014-05-20 12:53:40,199 WARN  [org.ovirt.engine.core.vdsbroker.VdsManager] (DefaultQuartzScheduler_Worker-51) [2ce2411c] Failed to refresh VDS , vds = 8e59d93d-f3fe-4cf5-8231-156d98af0e1a : fake_host_192, error = org.ovirt.engine.core.vdsbroker.vdsbroker.VDSErrorException: VDSGenericException: VDSErrorException: Failed to GetAllVmStatsVDS, error = Unexpected exception, code = 16, continuing.: org.ovirt.engine.core.vdsbroker.vdsbroker.VDSErrorException: VDSGenericException: VDSErrorException: Failed to GetAllVmStatsVDS, error = Unexpected exception, code = 16
        at org.ovirt.engine.core.vdsbroker.vdsbroker.VdsBrokerCommand.createDefaultConcreteException(VdsBrokerCommand.java:61) [vdsbroker.jar:]
        at org.ovirt.engine.core.vdsbroker.vdsbroker.BrokerCommandBase.createException(BrokerCommandBase.java:199) [vdsbroker.jar:]
        at org.ovirt.engine.core.vdsbroker.vdsbroker.BrokerCommandBase.proceedProxyReturnValue(BrokerCommandBase.java:186) [vdsbroker.jar:]
        at org.ovirt.engine.core.vdsbroker.vdsbroker.GetAllVmStatsVDSCommand.executeVdsBrokerCommand(GetAllVmStatsVDSCommand.java:21) [vdsbroker.jar:]
        at org.ovirt.engine.core.vdsbroker.vdsbroker.VdsBrokerCommand.executeVDSCommand(VdsBrokerCommand.java:96) [vdsbroker.jar:]
        at org.ovirt.engine.core.vdsbroker.VDSCommandBase.executeCommand(VDSCommandBase.java:56) [vdsbroker.jar:]
        at org.ovirt.engine.core.dal.VdcCommandBase.execute(VdcCommandBase.java:31) [dal.jar:]
        at org.ovirt.engine.core.vdsbroker.ResourceManager.runVdsCommand(ResourceManager.java:413) [vdsbroker.jar:]
        at org.ovirt.engine.core.vdsbroker.VdsUpdateRunTimeInfo.fetchRunningVms(VdsUpdateRunTimeInfo.java:954) [vdsbroker.jar:]
        at org.ovirt.engine.core.vdsbroker.VdsUpdateRunTimeInfo.refreshVmStats(VdsUpdateRunTimeInfo.java:910) [vdsbroker.jar:]
        at org.ovirt.engine.core.vdsbroker.VdsUpdateRunTimeInfo.refreshVdsRunTimeInfo(VdsUpdateRunTimeInfo.java:507) [vdsbroker.jar:]
        at org.ovirt.engine.core.vdsbroker.VdsUpdateRunTimeInfo.refresh(VdsUpdateRunTimeInfo.java:344) [vdsbroker.jar:]
        at org.ovirt.engine.core.vdsbroker.VdsManager.onTimer(VdsManager.java:231) [vdsbroker.jar:]
        at sun.reflect.GeneratedMethodAccessor6.invoke(Unknown Source) [:1.7.0_55]
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) [rt.jar:1.7.0_55]
        at java.lang.reflect.Method.invoke(Method.java:606) [rt.jar:1.7.0_55]
        at org.ovirt.engine.core.utils.timer.JobWrapper.execute(JobWrapper.java:60) [scheduler.jar:]
        at org.quartz.core.JobRunShell.run(JobRunShell.java:213) [quartz.jar:]
        at org.quartz.simpl.SimpleThreadPool$WorkerThread.run(SimpleThreadPool.java:557) [quartz.jar:]

Please, find attached engine.log and vdsm.log and libvirtd.log (from host fake_host_192 which appears in exception)

Comment 3 Yuri Obshansky 2014-05-20 14:05:50 UTC
Created attachment 897617 [details]
libvirtd.log.gz

Comment 4 Yuri Obshansky 2014-05-20 14:06:24 UTC
Created attachment 897618 [details]
vdsm.log

Comment 5 Francesco Romani 2014-05-21 09:57:20 UTC
Looks like you've been hit by
https://bugzilla.redhat.com/show_bug.cgi?id=1098148

which will be included in version 3.4.1

Still looking into the logs to see if there is something more

Comment 6 Francesco Romani 2014-05-21 12:12:49 UTC
Di not find anything else alarming. Feel free to reopen and to point me to anything I may have missed.