Hide Forgot
Description of problem: high-level explanation: flow: migrate master flow setup: 2 domains, 2 hosts, iscsi action: put master domain into maintenance result: when operation is done, ConnectHostToStoragePoolCommand returns with exception on wrong master domain: - engine send the wrong version on connectStoragePool, 1 instead of 2. low level explanation correlated with engine logs: - deactivate storage domain command is sent (by user), command is sent to vdsm (SPM), but later on aborted by CanDoAction: 2012-02-12 03:08:11,363 INFO [org.ovirt.engine.core.bll.storage.DeactivateStorageDomainCommand] (pool-5-thread-3) Running command: DeactivateStorageDomainCommand internal: false. Entities affected : ID: 05a78fd0-1f41-43cc-b6cd-490e18a c6715 Type: Storage 2012-02-12 03:08:11,415 INFO [org.ovirt.engine.core.vdsbroker.irsbroker.DeactivateStorageDomainVDSCommand] (pool-5-thread-3) START, DeactivateStorageDomainVDSCommand(storagePoolId = 2319b7ff-09f6-488c-83c4-aa78139db3b4, ignoreFailoverLimit = false, compatabilityVersion = null, storageDomainId = 05a78fd0-1f41-43cc-b6cd-490e18ac6715, masterDomainId = 7aa228a1-f4aa-4e55-aca1-a3d802ee09ad, masterVersion = 2), log id: 56bb488b 2012-02-12 03:08:12,801 WARN [org.ovirt.engine.core.bll.storage.DeactivateStorageDomainCommand] (http--0.0.0.0-8080-1) CanDoAction of action DeactivateStorageDomain failed. Reasons:VAR__TYPE__STORAGE__DOMAIN,VAR__ACTION__DEACTIVATE,ACTION_TYPE_FAILED_STORAGE_DOMAIN_STATUS_ILLEGAL 2012-02-12 03:08:12,824 INFO [org.ovirt.engine.core.bll.storage.DeactivateStorageDomainCommand] (pool-5-thread-10) Running command: DeactivateStorageDomainCommand internal: false. Entities affected : ID: 05a78fd0-1f41-43cc-b6cd-490e18ac6715 Type: Storage - disconnectStoragePool is sent to HSM (nott-vds2) : 2012-02-12 03:08:12,874 INFO [org.ovirt.engine.core.bll.storage.DisconnectStoragePoolAsyncOperation] (pool-5-thread-5) Disconnect storage pool treatment vds: nott-vds2,pool XtreamIO 2012-02-12 03:08:12,877 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.DisconnectStoragePoolVDSCommand] (pool-5-thread-5) START, DisconnectStoragePoolVDSCommand(vdsId = 89a9d20e-508b-11e1-897a-ab0c759be3e9, storagePoolId = 2319b7ff-09f6-488c-83c4-aa78139db3b4, vds_spm_id = 1, masterDomainId = 00000000-0000-0000-0000-000000000000, masterVersion = 0), log id: 7523ab2c 2012-02-12 03:08:16,744 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.DisconnectStoragePoolVDSCommand] (pool-5-thread-5) FINISH, DisconnectStoragePoolVDSCommand, log id: 7523ab2c - followed by deactivate: 2012-02-12 03:08:16,745 INFO [org.ovirt.engine.core.vdsbroker.irsbroker.DeactivateStorageDomainVDSCommand] (pool-5-thread-10) START, DeactivateStorageDomainVDSCommand(storagePoolId = 2319b7ff-09f6-488c-83c4-aa78139db3b4, ignoreFailoverLimit = false, compatabilityVersion = null, storageDomainId = 05a78fd0-1f41-43cc-b6cd-490e18ac6715, masterDomainId = 00000000-0000-0000-0000-000000000000, masterVersion = 1), log id: 1ad5d4c9 2012-02-12 03:08:43,685 INFO [org.ovirt.engine.core.vdsbroker.irsbroker.DeactivateStorageDomainVDSCommand] (pool-5-thread-3) FINISH, DeactivateStorageDomainVDSCommand, log id: 56bb488b 2012-02-12 03:08:43,704 INFO [org.ovirt.engine.core.bll.storage.AfterDeactivateSingleAsyncOperation] (pool-5-thread-5) After deactivate treatment vds: nott-vds2,pool XtreamIO 2012-02-12 03:08:43,704 INFO [org.ovirt.engine.core.bll.storage.AfterDeactivateSingleAsyncOperation] (pool-5-thread-8) After deactivate treatment vds: nott-vds3,pool XtreamIO - then, refreshStoragePool is sent over both hosts, but due to disconnectStoragePool sent to one of the hosts, it fails: 2012-02-12 03:08:43,707 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.RefreshStoragePoolVDSCommand] (pool-5-thread-5) START, RefreshStoragePoolVDSCommand(vdsId = 89a9d20e-508b-11e1-897a-ab0c759be3e9, storagePoolId = 2319b7ff-09f6-488c-83c4-aa78139db3b4, masterStorageDomainId=7aa228a1-f4aa-4e55-aca1-a3d802ee09ad, msterVersion=2), log id: 753425c4 2012-02-12 03:08:43,707 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.RefreshStoragePoolVDSCommand] (pool-5-thread-8) START, RefreshStoragePoolVDSCommand(vdsId = 95ec7922-508b-11e1-aa8e-179b2aa6c4ed, storagePoolId = 2319b7ff-09f6-488c-83c4-aa78139db3b4, masterStorageDomainId=7aa228a1-f4aa-4e55-aca1-a3d802ee09ad, msterVersion=2), log id: 7e8a364 2012-02-12 03:08:44,362 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.BrokerCommandBase] (pool-5-thread-5) Failed in RefreshStoragePoolVDS method 2012-02-12 03:08:44,362 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.BrokerCommandBase] (pool-5-thread-5) Error code StoragePoolUnknown and error message VDSGenericException: VDSErrorException: Failed to RefreshStoragePoolVDS, error = Unknown pool id, pool not connected: ('2319b7ff-09f6-488c-83c4-aa78139db3b4',) 2012-02-12 03:08:44,389 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.BrokerCommandBase] (pool-5-thread-5) Command org.ovirt.engine.core.vdsbroker.vdsbroker.RefreshStoragePoolVDSCommand return value Class Name: org.ovirt.engine.core.vdsbroker.vdsbroker.StatusOnlyReturnForXmlRpc mStatus Class Name: org.ovirt.engine.core.vdsbroker.vdsbroker.StatusForXmlRpc mCode 309 mMessage Unknown pool id, pool not connected: ('2319b7ff-09f6-488c-83c4-aa78139db3b4',) 2012-02-12 03:08:44,389 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.BrokerCommandBase] (pool-5-thread-5) Vds: nott-vds2 2012-02-12 03:08:44,393 ERROR [org.ovirt.engine.core.vdsbroker.VDSCommandBase] (pool-5-thread-5) Command RefreshStoragePoolVDS execution failed. Exception: VDSErrorException: VDSGenericException: VDSErrorException: Failed to RefreshStoragePoolVDS, error = Unknown pool id, pool not connected: ('2319b7ff-09f6-488c-83c4-aa78139db3b4',) 2012-02-12 03:08:44,393 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.RefreshStoragePoolVDSCommand] (pool-5-thread-5) FINISH, RefreshStoragePoolVDSCommand, log id: 753425c4 2012-02-12 03:08:44,394 ERROR [org.ovirt.engine.core.bll.storage.AfterDeactivateSingleAsyncOperation] (pool-5-thread-5) Failed to refresh storagePool. Host nott-vds2 to storage pool XtreamIO. Exception: {3}: org.ovirt.engine.core.common.errors.VdcBLLException: VdcBLLException: org.ovirt.engine.core.vdsbroker.vdsbroker.VDSErrorException: VDSGenericException: VDSErrorException: Failed to RefreshStoragePoolVDS, error = Unknown pool id, pool not connected: ('2319b7ff-09f6-488c-83c4-aa78139db3b4',) at org.ovirt.engine.core.bll.VDSBrokerFrontendImpl.handleVdsResult(VDSBrokerFrontendImpl.java:44) [engine-bll.jar:] at org.ovirt.engine.core.bll.VDSBrokerFrontendImpl.RunVdsCommand(VDSBrokerFrontendImpl.java:29) [engine-bll.jar:] at org.ovirt.engine.core.bll.storage.AfterDeactivateSingleAsyncOperation.Execute(AfterDeactivateSingleAsyncOperation.java:26) [engine-bll.jar:] at org.ovirt.engine.core.utils.SyncronizeNumberOfAsyncOperations$AsyncOpThread.run(SyncronizeNumberOfAsyncOperations.java:30) [engine-utils.jar:] at org.ovirt.engine.core.utils.threadpool.ThreadPoolUtil$InternalWrapperRunnable.run(ThreadPoolUtil.java:57) [engine-utils.jar:] at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) [:1.6.0_22] at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334) [:1.6.0_22] at java.util.concurrent.FutureTask.run(FutureTask.java:166) [:1.6.0_22] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110) [:1.6.0_22] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603) [:1.6.0_22] at java.lang.Thread.run(Thread.java:679) [:1.6.0_22] - after some more disconnect\reconnect to pool attempts, the following command is sent: 2012-02-12 03:10:02,709 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.ConnectStoragePoolVDSCommand] (pool-5-thread-10) START, ConnectStoragePoolVDSCommand(vdsId = 89a9d20e-508b-11e1-897a-ab0c759be3e9, storagePoolId = 2319b7ff-09f6-488c-83c4-aa78139db3b4, vds_spm_id = 1, masterDomainId = 7aa228a1-f4aa-4e55-aca1-a3d802ee09ad, masterVersion = 1), log id: 31abf83c problem: engine use new master domain version, but old master version, which is 1 instead of 2. since I had 2 storage domains in pool, and i deactivated one of them, reconstruct is not called (no reconstruct where only one active domain in pool).
This bug raises the following questions: 1) how come deactivateStorageDomain command was sent to vdsm by engine, but failed after with CanDoAction - should't CanDoAction test come first? 2) how come engine sends disconnectStoragePool to certain host, and right after send refreshStoragePool which requires connection to pool ? we know it would fail. 3) going back to bullet 3, if there is a rough thread, that managed to change domain status (internal one) just before User performed the action from GUI, shouldn't second action be blocked by locking mechanism ?
Created attachment 561224 [details] rhevm log
Should be fixed, because of all flows were rewritten, and were fixed similar flows. Should work from 3.2 ovirt version
3.2 beta built, moving to ON_QA status to allow testing