Description of problem: host is not connected to pool but is running vm, auto-recvoery tries to connect host to pool, engine then send connectStorageServer and connectStoragePool, when that fails, engine start SetVdsNonOperational threatment, and tries to migrate vm to different host, when that fails due to libvirt, we are failing with NPE. 2012-07-30 17:20:00,676 INFO [org.ovirt.engine.core.bll.storage.ConnectHostToStoragePoolServersCommand] (QuartzScheduler_Worker-23) [6c01ff] Running command: ConnectHostToStoragePoolServersCommand internal: true. Entities affected : ID: b8423a1f-8889-46 9a-b2fa-39ab78ac3a57 Type: StoragePool 2012-07-30 17:20:00,678 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.ConnectStorageServerVDSCommand] (QuartzScheduler_Worker-23) [6c01ff] START, ConnectStorageServerVDSCommand(vdsId = 78e0bf24-d979-11e1-9e8e-001a4a16970e, storagePoolId = b8423a1f-8889 -469a-b2fa-39ab78ac3a57, storageType = ISCSI, connectionList = [{ id: ce7b6af6-5325-4d1f-9341-4d1b2ca59445, connection: 10.35.160.7 };]), log id: 373c86d9 2012-07-30 17:20:00,802 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.ConnectStorageServerVDSCommand] (QuartzScheduler_Worker-23) [6c01ff] FINISH, ConnectStorageServerVDSCommand, return: {ce7b6af6-5325-4d1f-9341-4d1b2ca59445=0}, log id: 373c86d9 2012-07-30 17:20:00,802 INFO [org.ovirt.engine.core.bll.storage.ConnectHostToStoragePoolServersCommand] (QuartzScheduler_Worker-23) [6c01ff] Host nott-vds1.qa.lab.tlv.redhat.com storage connection was succeeded 2012-07-30 17:20:00,814 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.ConnectStoragePoolVDSCommand] (QuartzScheduler_Worker-23) [6c01ff] START, ConnectStoragePoolVDSCommand(vdsId = 78e0bf24-d979-11e1-9e8e-001a4a16970e, storagePoolId = b8423a1f-8889-469 a-b2fa-39ab78ac3a57, vds_spm_id = 1, masterDomainId = 4f38a996-dbeb-4981-885b-742b46a4714f, masterVersion = 1), log id: 36c9428f 2012-07-30 17:20:00,848 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.ConnectStoragePoolVDSCommand] (QuartzScheduler_Worker-23) [6c01ff] FINISH, ConnectStoragePoolVDSCommand, log id: 36c9428f 2012-07-30 17:20:00,851 INFO [org.ovirt.engine.core.bll.SetNonOperationalVdsCommand] (QuartzScheduler_Worker-23) [5730d662] Running command: SetNonOperationalVdsCommand internal: true. Entities affected : ID: 78e0bf24-d979-11e1-9e8e-001a4a16970e Type: V DS 2012-07-30 17:20:00,851 INFO [org.ovirt.engine.core.vdsbroker.SetVdsStatusVDSCommand] (QuartzScheduler_Worker-23) [5730d662] START, SetVdsStatusVDSCommand(vdsId = 78e0bf24-d979-11e1-9e8e-001a4a16970e, status=NonOperational, nonOperationalReason=STORAGE_D OMAIN_UNREACHABLE), log id: 282351f2 2012-07-30 17:20:00,857 INFO [org.ovirt.engine.core.vdsbroker.SetVdsStatusVDSCommand] (QuartzScheduler_Worker-23) [5730d662] FINISH, SetVdsStatusVDSCommand, log id: 282351f2 2012-07-30 17:20:00,860 ERROR [org.ovirt.engine.core.bll.MaintananceVdsCommand] (pool-4-thread-46) [5730d662] Failed to create ExecutionContext for MigrateVmCommand: java.lang.NullPointerException at org.ovirt.engine.core.bll.MaintananceVdsCommand.createMigrateVmContext(MaintananceVdsCommand.java:117) [engine-bll.jar:] at org.ovirt.engine.core.bll.MaintananceVdsCommand.MigrateAllVms(MaintananceVdsCommand.java:96) [engine-bll.jar:] at org.ovirt.engine.core.bll.MaintananceVdsCommand.MigrateAllVms(MaintananceVdsCommand.java:82) [engine-bll.jar:] at org.ovirt.engine.core.bll.SetNonOperationalVdsCommand$1.run(SetNonOperationalVdsCommand.java:44) [engine-bll.jar:] at org.ovirt.engine.core.utils.threadpool.ThreadPoolUtil$InternalWrapperRunnable.run(ThreadPoolUtil.java:64) [engine-utils.jar:] at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) [rt.jar:1.7.0_05-icedtea] at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334) [rt.jar:1.7.0_05-icedtea] at java.util.concurrent.FutureTask.run(FutureTask.java:166) [rt.jar:1.7.0_05-icedtea] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110) [rt.jar:1.7.0_05-icedtea] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603) [rt.jar:1.7.0_05-icedtea] at java.lang.Thread.run(Thread.java:722) [rt.jar:1.7.0_05-icedtea] 2012-07-30 17:20:00,950 INFO [org.ovirt.engine.core.bll.InternalMigrateVmCommand] (pool-4-thread-46) [32bdb5a2] Running command: InternalMigrateVmCommand internal: true. Entities affected : ID: 0fdbfebb-8583-43df-8b11-78b7cd3e2372 Type: VM 2012-07-30 17:20:00,991 INFO [org.ovirt.engine.core.vdsbroker.MigrateVDSCommand] (pool-4-thread-46) [32bdb5a2] START, MigrateVDSCommand(vdsId = 78e0bf24-d979-11e1-9e8e-001a4a16970e, vmId=0fdbfebb-8583-43df-8b11-78b7cd3e2372, srcHost=nott-vds1.qa.lab.tlv.redhat.com, dstVdsId=16cbcf58-d97a-11e1-bba6-001a4a16970e, dstHost=nott-vds2.qa.lab.tlv.redhat.com:54321, migrationMethod=ONLINE), log id: 6e55d10a 2012-07-30 17:20:00,994 INFO [org.ovirt.engine.core.bll.MultipleActionsRunner] (pool-4-thread-49) [5730d662] MultipleActionsRunner of type MigrateVmToServer invoked with no actions 2012-07-30 17:20:01,008 INFO [org.ovirt.engine.core.bll.HandleVdsCpuFlagsOrClusterChangedCommand] (QuartzScheduler_Worker-23) [4da4173f] Running command: HandleVdsCpuFlagsOrClusterChangedCommand internal: true. Entities affected : ID: 78e0bf24-d979-11e1-9e8e-001a4a16970e Type: VDS 2012-07-30 17:20:01,028 INFO [org.ovirt.engine.core.bll.HandleVdsVersionCommand] (QuartzScheduler_Worker-23) [60a60469] Running command: HandleVdsVersionCommand internal: true. Entities affected : ID: 78e0bf24-d979-11e1-9e8e-001a4a16970e Type: VDS 2012-07-30 17:20:01,035 INFO [org.ovirt.engine.core.vdsbroker.VdsUpdateRunTimeInfo] (QuartzScheduler_Worker-23) [60a60469] Host 78e0bf24-d979-11e1-9e8e-001a4a16970e : nott-vds1.qa.lab.tlv.redhat.com is already in NonOperational status. SetNonOperationalVds command is skipped. 2012-07-30 17:20:01,037 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.MigrateBrokerVDSCommand] (pool-4-thread-46) [32bdb5a2] VdsBroker::migrate::Entered (vm_guid=0fdbfebb-8583-43df-8b11-78b7cd3e2372, srcHost=nott-vds1.qa.lab.tlv.redhat.com, dstHost=nott-vds2.qa.lab.tlv.redhat.com:54321, method=online 2012-07-30 17:20:01,037 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.MigrateBrokerVDSCommand] (pool-4-thread-46) [32bdb5a2] START, MigrateBrokerVDSCommand(vdsId = 78e0bf24-d979-11e1-9e8e-001a4a16970e, vmId=0fdbfebb-8583-43df-8b11-78b7cd3e2372, srcHost=nott-vds1.qa.lab.tlv.redhat.com, dstVdsId=16cbcf58-d97a-11e1-bba6-001a4a16970e, dstHost=nott-vds2.qa.lab.tlv.redhat.com:54321, migrationMethod=ONLINE), log id: 4e6f3345 2012-07-30 17:20:01,100 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.MigrateBrokerVDSCommand] (pool-4-thread-46) [32bdb5a2] FINISH, MigrateBrokerVDSCommand, log id: 4e6f3345 2012-07-30 17:20:01,109 INFO [org.ovirt.engine.core.vdsbroker.MigrateVDSCommand] (pool-4-thread-46) [32bdb5a2] FINISH, MigrateVDSCommand, return: MigratingFrom, log id: 6e55d10a 2012-07-30 17:20:05,138 INFO [org.ovirt.engine.core.vdsbroker.VdsUpdateRunTimeInfo] (QuartzScheduler_Worker-25) VM vm 0fdbfebb-8583-43df-8b11-78b7cd3e2372 moved from MigratingFrom --> Up 2012-07-30 17:20:05,138 INFO [org.ovirt.engine.core.vdsbroker.VdsUpdateRunTimeInfo] (QuartzScheduler_Worker-25) adding VM 0fdbfebb-8583-43df-8b11-78b7cd3e2372 to re-run list 2012-07-30 17:20:05,177 ERROR [org.ovirt.engine.core.vdsbroker.VdsUpdateRunTimeInfo] (QuartzScheduler_Worker-25) Rerun vm 0fdbfebb-8583-43df-8b11-78b7cd3e2372. Called from vds nott-vds1.qa.lab.tlv.redhat.com 2012-07-30 17:20:05,182 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.MigrateStatusVDSCommand] (pool-4-thread-49) START, MigrateStatusVDSCommand(vdsId = 78e0bf24-d979-11e1-9e8e-001a4a16970e, vmId=0fdbfebb-8583-43df-8b11-78b7cd3e2372), log id: 7e8a3ca7 2012-07-30 17:20:05,209 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.BrokerCommandBase] (pool-4-thread-49) Failed in MigrateStatusVDS method 2012-07-30 17:20:05,209 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.BrokerCommandBase] (pool-4-thread-49) Error code migrateErr and error message VDSGenericException: VDSErrorException: Failed to MigrateStatusVDS, error = Fatal error during migration 2012-07-30 17:20:05,209 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.BrokerCommandBase] (pool-4-thread-49) Command org.ovirt.engine.core.vdsbroker.vdsbroker.MigrateStatusVDSCommand return value Class Name: org.ovirt.engine.core.vdsbroker.vdsbroker.StatusOnlyReturnForXmlRpc mStatus Class Name: org.ovirt.engine.core.vdsbroker.vdsbroker.StatusForXmlRpc mCode 12 mMessage Fatal error during migration
Please add exact steps of how to reproduce and additional info (how many hosts in cluster, is that host also the SPM etc.)
After talking to Eli... seems to me quite unlikely to happen. Can you confirm this is a consequence of blocking/failing the vdsm connectStoragePool? If so this won't likely happen at customer site unless things are broken/misconfigured and hence the priority is low and I would suggest rhevm_future.
reolved as a result of fixing BZ 818289 http://gerrit.ovirt.org/#/c/7494/
(In reply to comment #3) > After talking to Eli... > > seems to me quite unlikely to happen. > Can you confirm this is a consequence of blocking/failing the vdsm > connectStoragePool? > If so this won't likely happen at customer site unless things are > broken/misconfigured and hence the priority is low and I would suggest > rhevm_future. Actually it's quite easy to reproduce this scenario: https://bugzilla.redhat.com/show_bug.cgi?id=732479 Failure to connect to pool means that host should not be up. When sending a connectStoragePool command when already connected to *this* pool the command will succeed, not fail. It will only fail if: 1. there was a problem when trying to connect 2. vdsm is already connected to a different pool. The patch above does not deal with the underlying issue (host should be moved to up only if it successfully connected to pool and getVdsStats reported that everything is ok).