844344 – [engine-core] null pointer exception due to unhandled migration failure (Failed to create ExecutionContext for MigrateVmCommand: java.lang.NullPointerException)

Bug 844344 - [engine-core] null pointer exception due to unhandled migration failure (Failed to create ExecutionContext for MigrateVmCommand: java.lang.NullPointerException)

Summary: [engine-core] null pointer exception due to unhandled migration failure (Fail...

Keywords:
Status:	CLOSED CURRENTRELEASE
Alias:	None
Product:	Red Hat Enterprise Virtualization Manager
Classification:	Red Hat
Component:	ovirt-engine
Sub Component:
Version:	3.1.0
Hardware:	x86_64
OS:	Linux
Priority:	unspecified
Severity:	high
Target Milestone:	---
Target Release:	---
Assignee:	Eli Mesika
QA Contact:	Barak Dagan
Docs Contact:
URL:
Whiteboard:	virt
Depends On:
Blocks:	818289 870745
TreeView+	depends on / blocked

Reported:	2012-07-30 11:19 UTC by Haim
Modified:	2013-05-09 15:14 UTC (History)
CC List:	12 users (show)
Fixed In Version:	si18
Doc Type:	Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed:	2013-05-09 15:14:58 UTC
oVirt Team:	---
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Description Haim 2012-07-30 11:19:50 UTC

Description of problem:

host is not connected to pool but is running vm, auto-recvoery tries to connect host to pool, engine then send connectStorageServer and connectStoragePool, when  that fails, engine start SetVdsNonOperational threatment, and tries to migrate vm to different host, when that fails due to libvirt, we are failing with NPE.

2012-07-30 17:20:00,676 INFO  [org.ovirt.engine.core.bll.storage.ConnectHostToStoragePoolServersCommand] (QuartzScheduler_Worker-23) [6c01ff] Running command: ConnectHostToStoragePoolServersCommand internal: true. Entities affected :  ID: b8423a1f-8889-46
9a-b2fa-39ab78ac3a57 Type: StoragePool
2012-07-30 17:20:00,678 INFO  [org.ovirt.engine.core.vdsbroker.vdsbroker.ConnectStorageServerVDSCommand] (QuartzScheduler_Worker-23) [6c01ff] START, ConnectStorageServerVDSCommand(vdsId = 78e0bf24-d979-11e1-9e8e-001a4a16970e, storagePoolId = b8423a1f-8889
-469a-b2fa-39ab78ac3a57, storageType = ISCSI, connectionList = [{ id: ce7b6af6-5325-4d1f-9341-4d1b2ca59445, connection: 10.35.160.7 };]), log id: 373c86d9
2012-07-30 17:20:00,802 INFO  [org.ovirt.engine.core.vdsbroker.vdsbroker.ConnectStorageServerVDSCommand] (QuartzScheduler_Worker-23) [6c01ff] FINISH, ConnectStorageServerVDSCommand, return: {ce7b6af6-5325-4d1f-9341-4d1b2ca59445=0}, log id: 373c86d9
2012-07-30 17:20:00,802 INFO  [org.ovirt.engine.core.bll.storage.ConnectHostToStoragePoolServersCommand] (QuartzScheduler_Worker-23) [6c01ff] Host nott-vds1.qa.lab.tlv.redhat.com storage connection was succeeded 
2012-07-30 17:20:00,814 INFO  [org.ovirt.engine.core.vdsbroker.vdsbroker.ConnectStoragePoolVDSCommand] (QuartzScheduler_Worker-23) [6c01ff] START, ConnectStoragePoolVDSCommand(vdsId = 78e0bf24-d979-11e1-9e8e-001a4a16970e, storagePoolId = b8423a1f-8889-469
a-b2fa-39ab78ac3a57, vds_spm_id = 1, masterDomainId = 4f38a996-dbeb-4981-885b-742b46a4714f, masterVersion = 1), log id: 36c9428f
2012-07-30 17:20:00,848 INFO  [org.ovirt.engine.core.vdsbroker.vdsbroker.ConnectStoragePoolVDSCommand] (QuartzScheduler_Worker-23) [6c01ff] FINISH, ConnectStoragePoolVDSCommand, log id: 36c9428f
2012-07-30 17:20:00,851 INFO  [org.ovirt.engine.core.bll.SetNonOperationalVdsCommand] (QuartzScheduler_Worker-23) [5730d662] Running command: SetNonOperationalVdsCommand internal: true. Entities affected :  ID: 78e0bf24-d979-11e1-9e8e-001a4a16970e Type: V
DS
2012-07-30 17:20:00,851 INFO  [org.ovirt.engine.core.vdsbroker.SetVdsStatusVDSCommand] (QuartzScheduler_Worker-23) [5730d662] START, SetVdsStatusVDSCommand(vdsId = 78e0bf24-d979-11e1-9e8e-001a4a16970e, status=NonOperational, nonOperationalReason=STORAGE_D
OMAIN_UNREACHABLE), log id: 282351f2
2012-07-30 17:20:00,857 INFO  [org.ovirt.engine.core.vdsbroker.SetVdsStatusVDSCommand] (QuartzScheduler_Worker-23) [5730d662] FINISH, SetVdsStatusVDSCommand, log id: 282351f2
2012-07-30 17:20:00,860 ERROR [org.ovirt.engine.core.bll.MaintananceVdsCommand] (pool-4-thread-46) [5730d662] Failed to create ExecutionContext for MigrateVmCommand: java.lang.NullPointerException
        at org.ovirt.engine.core.bll.MaintananceVdsCommand.createMigrateVmContext(MaintananceVdsCommand.java:117) [engine-bll.jar:]
        at org.ovirt.engine.core.bll.MaintananceVdsCommand.MigrateAllVms(MaintananceVdsCommand.java:96) [engine-bll.jar:]
        at org.ovirt.engine.core.bll.MaintananceVdsCommand.MigrateAllVms(MaintananceVdsCommand.java:82) [engine-bll.jar:]
        at org.ovirt.engine.core.bll.SetNonOperationalVdsCommand$1.run(SetNonOperationalVdsCommand.java:44) [engine-bll.jar:]
        at org.ovirt.engine.core.utils.threadpool.ThreadPoolUtil$InternalWrapperRunnable.run(ThreadPoolUtil.java:64) [engine-utils.jar:]
        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) [rt.jar:1.7.0_05-icedtea]
        at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334) [rt.jar:1.7.0_05-icedtea]
        at java.util.concurrent.FutureTask.run(FutureTask.java:166) [rt.jar:1.7.0_05-icedtea]
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110) [rt.jar:1.7.0_05-icedtea]
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603) [rt.jar:1.7.0_05-icedtea]
        at java.lang.Thread.run(Thread.java:722) [rt.jar:1.7.0_05-icedtea]

2012-07-30 17:20:00,950 INFO  [org.ovirt.engine.core.bll.InternalMigrateVmCommand] (pool-4-thread-46) [32bdb5a2] Running command: InternalMigrateVmCommand internal: true. Entities affected :  ID: 0fdbfebb-8583-43df-8b11-78b7cd3e2372 Type: VM
2012-07-30 17:20:00,991 INFO  [org.ovirt.engine.core.vdsbroker.MigrateVDSCommand] (pool-4-thread-46) [32bdb5a2] START, MigrateVDSCommand(vdsId = 78e0bf24-d979-11e1-9e8e-001a4a16970e, vmId=0fdbfebb-8583-43df-8b11-78b7cd3e2372, srcHost=nott-vds1.qa.lab.tlv.redhat.com, dstVdsId=16cbcf58-d97a-11e1-bba6-001a4a16970e, dstHost=nott-vds2.qa.lab.tlv.redhat.com:54321, migrationMethod=ONLINE), log id: 6e55d10a
2012-07-30 17:20:00,994 INFO  [org.ovirt.engine.core.bll.MultipleActionsRunner] (pool-4-thread-49) [5730d662] MultipleActionsRunner of type MigrateVmToServer invoked with no actions
2012-07-30 17:20:01,008 INFO  [org.ovirt.engine.core.bll.HandleVdsCpuFlagsOrClusterChangedCommand] (QuartzScheduler_Worker-23) [4da4173f] Running command: HandleVdsCpuFlagsOrClusterChangedCommand internal: true. Entities affected :  ID: 78e0bf24-d979-11e1-9e8e-001a4a16970e Type: VDS
2012-07-30 17:20:01,028 INFO  [org.ovirt.engine.core.bll.HandleVdsVersionCommand] (QuartzScheduler_Worker-23) [60a60469] Running command: HandleVdsVersionCommand internal: true. Entities affected :  ID: 78e0bf24-d979-11e1-9e8e-001a4a16970e Type: VDS
2012-07-30 17:20:01,035 INFO  [org.ovirt.engine.core.vdsbroker.VdsUpdateRunTimeInfo] (QuartzScheduler_Worker-23) [60a60469] Host 78e0bf24-d979-11e1-9e8e-001a4a16970e : nott-vds1.qa.lab.tlv.redhat.com is already in NonOperational status. SetNonOperationalVds command is skipped.
2012-07-30 17:20:01,037 INFO  [org.ovirt.engine.core.vdsbroker.vdsbroker.MigrateBrokerVDSCommand] (pool-4-thread-46) [32bdb5a2] VdsBroker::migrate::Entered (vm_guid=0fdbfebb-8583-43df-8b11-78b7cd3e2372, srcHost=nott-vds1.qa.lab.tlv.redhat.com, dstHost=nott-vds2.qa.lab.tlv.redhat.com:54321,  method=online
2012-07-30 17:20:01,037 INFO  [org.ovirt.engine.core.vdsbroker.vdsbroker.MigrateBrokerVDSCommand] (pool-4-thread-46) [32bdb5a2] START, MigrateBrokerVDSCommand(vdsId = 78e0bf24-d979-11e1-9e8e-001a4a16970e, vmId=0fdbfebb-8583-43df-8b11-78b7cd3e2372, srcHost=nott-vds1.qa.lab.tlv.redhat.com, dstVdsId=16cbcf58-d97a-11e1-bba6-001a4a16970e, dstHost=nott-vds2.qa.lab.tlv.redhat.com:54321, migrationMethod=ONLINE), log id: 4e6f3345
2012-07-30 17:20:01,100 INFO  [org.ovirt.engine.core.vdsbroker.vdsbroker.MigrateBrokerVDSCommand] (pool-4-thread-46) [32bdb5a2] FINISH, MigrateBrokerVDSCommand, log id: 4e6f3345
2012-07-30 17:20:01,109 INFO  [org.ovirt.engine.core.vdsbroker.MigrateVDSCommand] (pool-4-thread-46) [32bdb5a2] FINISH, MigrateVDSCommand, return: MigratingFrom, log id: 6e55d10a
2012-07-30 17:20:05,138 INFO  [org.ovirt.engine.core.vdsbroker.VdsUpdateRunTimeInfo] (QuartzScheduler_Worker-25) VM vm 0fdbfebb-8583-43df-8b11-78b7cd3e2372 moved from MigratingFrom --> Up
2012-07-30 17:20:05,138 INFO  [org.ovirt.engine.core.vdsbroker.VdsUpdateRunTimeInfo] (QuartzScheduler_Worker-25) adding VM 0fdbfebb-8583-43df-8b11-78b7cd3e2372 to re-run list
2012-07-30 17:20:05,177 ERROR [org.ovirt.engine.core.vdsbroker.VdsUpdateRunTimeInfo] (QuartzScheduler_Worker-25) Rerun vm 0fdbfebb-8583-43df-8b11-78b7cd3e2372. Called from vds nott-vds1.qa.lab.tlv.redhat.com
2012-07-30 17:20:05,182 INFO  [org.ovirt.engine.core.vdsbroker.vdsbroker.MigrateStatusVDSCommand] (pool-4-thread-49) START, MigrateStatusVDSCommand(vdsId = 78e0bf24-d979-11e1-9e8e-001a4a16970e, vmId=0fdbfebb-8583-43df-8b11-78b7cd3e2372), log id: 7e8a3ca7
2012-07-30 17:20:05,209 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.BrokerCommandBase] (pool-4-thread-49) Failed in MigrateStatusVDS method
2012-07-30 17:20:05,209 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.BrokerCommandBase] (pool-4-thread-49) Error code migrateErr and error message VDSGenericException: VDSErrorException: Failed to MigrateStatusVDS, error = Fatal error during migration
2012-07-30 17:20:05,209 INFO  [org.ovirt.engine.core.vdsbroker.vdsbroker.BrokerCommandBase] (pool-4-thread-49) Command org.ovirt.engine.core.vdsbroker.vdsbroker.MigrateStatusVDSCommand return value 
 Class Name: org.ovirt.engine.core.vdsbroker.vdsbroker.StatusOnlyReturnForXmlRpc
mStatus                       Class Name: org.ovirt.engine.core.vdsbroker.vdsbroker.StatusForXmlRpc
mCode                         12
mMessage                      Fatal error during migration

Comment 1 Eli Mesika 2012-08-01 21:17:32 UTC

Please add exact steps of how to reproduce and additional info (how many hosts in cluster, is that host also the SPM etc.)

Comment 3 Michal Skrivanek 2012-08-09 14:35:52 UTC

After talking to Eli...

seems to me quite unlikely to happen.
Can you confirm this is a consequence of blocking/failing the vdsm connectStoragePool?
If so this won't likely happen at customer site unless things are broken/misconfigured and hence the priority is low and I would suggest rhevm_future.

Comment 8 Eli Mesika 2012-08-27 13:49:07 UTC

reolved as a result of fixing BZ 818289

http://gerrit.ovirt.org/#/c/7494/

Comment 9 Ayal Baron 2012-08-28 12:46:16 UTC

(In reply to comment #3)
> After talking to Eli...
> 
> seems to me quite unlikely to happen.
> Can you confirm this is a consequence of blocking/failing the vdsm
> connectStoragePool?
> If so this won't likely happen at customer site unless things are
> broken/misconfigured and hence the priority is low and I would suggest
> rhevm_future.

Actually it's quite easy to reproduce this scenario:
https://bugzilla.redhat.com/show_bug.cgi?id=732479

Failure to connect to pool means that host should not be up.
When sending a connectStoragePool command when already connected to *this* pool the command will succeed, not fail.
It will only fail if:
1. there was a problem when trying to connect
2. vdsm is already connected to a different pool.

The patch above does not deal with the underlying issue (host should be moved to up only if it successfully connected to pool and getVdsStats reported that everything is ok).

Note You need to log in before you can comment on or make changes to this bug.