Bug 789711 - [ovirt] [engine-core] wrong master domain after migrate master flow
Summary: [ovirt] [engine-core] wrong master domain after migrate master flow
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: oVirt
Classification: Retired
Component: ovirt-engine-core
Version: unspecified
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
: 3.2
Assignee: lpeer
QA Contact:
URL:
Whiteboard: storage
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2012-02-12 13:48 UTC by Haim
Modified: 2016-02-10 16:33 UTC (History)
8 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2013-02-15 06:46:18 UTC
oVirt Team: Storage


Attachments (Terms of Use)
rhevm log (43.84 KB, application/x-gzip)
2012-02-12 13:57 UTC, Haim
no flags Details

Description Haim 2012-02-12 13:48:39 UTC
Description of problem:

high-level explanation: 

flow: migrate master flow 
setup: 2 domains, 2 hosts, iscsi 
action: put master domain into maintenance 
result: when operation is done, ConnectHostToStoragePoolCommand returns with exception on wrong master domain: 
- engine send the wrong version on connectStoragePool, 1 instead of 2. 

low level explanation correlated with engine logs: 

- deactivate storage domain command is sent (by user), command is sent to vdsm (SPM), but later on aborted by CanDoAction:

2012-02-12 03:08:11,363 INFO  [org.ovirt.engine.core.bll.storage.DeactivateStorageDomainCommand] (pool-5-thread-3) Running command: DeactivateStorageDomainCommand internal: false. Entities affected :  ID: 05a78fd0-1f41-43cc-b6cd-490e18a
c6715 Type: Storage

2012-02-12 03:08:11,415 INFO  [org.ovirt.engine.core.vdsbroker.irsbroker.DeactivateStorageDomainVDSCommand] (pool-5-thread-3) START, DeactivateStorageDomainVDSCommand(storagePoolId = 2319b7ff-09f6-488c-83c4-aa78139db3b4, ignoreFailoverLimit = false, compatabilityVersion = null, storageDomainId = 05a78fd0-1f41-43cc-b6cd-490e18ac6715, masterDomainId = 7aa228a1-f4aa-4e55-aca1-a3d802ee09ad, masterVersion = 2), log id: 56bb488b

2012-02-12 03:08:12,801 WARN  [org.ovirt.engine.core.bll.storage.DeactivateStorageDomainCommand] (http--0.0.0.0-8080-1) CanDoAction of action DeactivateStorageDomain failed. Reasons:VAR__TYPE__STORAGE__DOMAIN,VAR__ACTION__DEACTIVATE,ACTION_TYPE_FAILED_STORAGE_DOMAIN_STATUS_ILLEGAL

2012-02-12 03:08:12,824 INFO  [org.ovirt.engine.core.bll.storage.DeactivateStorageDomainCommand] (pool-5-thread-10) Running command: DeactivateStorageDomainCommand internal: false. Entities affected :  ID: 05a78fd0-1f41-43cc-b6cd-490e18ac6715 Type: Storage

- disconnectStoragePool is sent to HSM (nott-vds2) :

2012-02-12 03:08:12,874 INFO  [org.ovirt.engine.core.bll.storage.DisconnectStoragePoolAsyncOperation] (pool-5-thread-5) Disconnect storage pool treatment vds: nott-vds2,pool XtreamIO
2012-02-12 03:08:12,877 INFO  [org.ovirt.engine.core.vdsbroker.vdsbroker.DisconnectStoragePoolVDSCommand] (pool-5-thread-5) START, DisconnectStoragePoolVDSCommand(vdsId = 89a9d20e-508b-11e1-897a-ab0c759be3e9, storagePoolId = 2319b7ff-09f6-488c-83c4-aa78139db3b4, vds_spm_id = 1, masterDomainId = 00000000-0000-0000-0000-000000000000, masterVersion = 0), log id: 7523ab2c
2012-02-12 03:08:16,744 INFO  [org.ovirt.engine.core.vdsbroker.vdsbroker.DisconnectStoragePoolVDSCommand] (pool-5-thread-5) FINISH, DisconnectStoragePoolVDSCommand, log id: 7523ab2c

- followed by deactivate: 

2012-02-12 03:08:16,745 INFO  [org.ovirt.engine.core.vdsbroker.irsbroker.DeactivateStorageDomainVDSCommand] (pool-5-thread-10) START, DeactivateStorageDomainVDSCommand(storagePoolId = 2319b7ff-09f6-488c-83c4-aa78139db3b4, ignoreFailoverLimit = false, compatabilityVersion = null, storageDomainId = 05a78fd0-1f41-43cc-b6cd-490e18ac6715, masterDomainId = 00000000-0000-0000-0000-000000000000, masterVersion = 1), log id: 1ad5d4c9
2012-02-12 03:08:43,685 INFO  [org.ovirt.engine.core.vdsbroker.irsbroker.DeactivateStorageDomainVDSCommand] (pool-5-thread-3) FINISH, DeactivateStorageDomainVDSCommand, log id: 56bb488b

2012-02-12 03:08:43,704 INFO  [org.ovirt.engine.core.bll.storage.AfterDeactivateSingleAsyncOperation] (pool-5-thread-5) After deactivate treatment vds: nott-vds2,pool XtreamIO
2012-02-12 03:08:43,704 INFO  [org.ovirt.engine.core.bll.storage.AfterDeactivateSingleAsyncOperation] (pool-5-thread-8) After deactivate treatment vds: nott-vds3,pool XtreamIO

- then, refreshStoragePool is sent over both hosts, but due to disconnectStoragePool sent to one of the hosts, it fails:

2012-02-12 03:08:43,707 INFO  [org.ovirt.engine.core.vdsbroker.vdsbroker.RefreshStoragePoolVDSCommand] (pool-5-thread-5) START, RefreshStoragePoolVDSCommand(vdsId = 89a9d20e-508b-11e1-897a-ab0c759be3e9, storagePoolId = 2319b7ff-09f6-488c-83c4-aa78139db3b4, masterStorageDomainId=7aa228a1-f4aa-4e55-aca1-a3d802ee09ad, msterVersion=2), log id: 753425c4
2012-02-12 03:08:43,707 INFO  [org.ovirt.engine.core.vdsbroker.vdsbroker.RefreshStoragePoolVDSCommand] (pool-5-thread-8) START, RefreshStoragePoolVDSCommand(vdsId = 95ec7922-508b-11e1-aa8e-179b2aa6c4ed, storagePoolId = 2319b7ff-09f6-488c-83c4-aa78139db3b4, masterStorageDomainId=7aa228a1-f4aa-4e55-aca1-a3d802ee09ad, msterVersion=2), log id: 7e8a364
2012-02-12 03:08:44,362 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.BrokerCommandBase] (pool-5-thread-5) Failed in RefreshStoragePoolVDS method
2012-02-12 03:08:44,362 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.BrokerCommandBase] (pool-5-thread-5) Error code StoragePoolUnknown and error message VDSGenericException: VDSErrorException: Failed to RefreshStoragePoolVDS, error = Unknown pool id, pool not connected: ('2319b7ff-09f6-488c-83c4-aa78139db3b4',)
2012-02-12 03:08:44,389 INFO  [org.ovirt.engine.core.vdsbroker.vdsbroker.BrokerCommandBase] (pool-5-thread-5) Command org.ovirt.engine.core.vdsbroker.vdsbroker.RefreshStoragePoolVDSCommand return value 
 Class Name: org.ovirt.engine.core.vdsbroker.vdsbroker.StatusOnlyReturnForXmlRpc
mStatus                       Class Name: org.ovirt.engine.core.vdsbroker.vdsbroker.StatusForXmlRpc
mCode                         309
mMessage                      Unknown pool id, pool not connected: ('2319b7ff-09f6-488c-83c4-aa78139db3b4',)

2012-02-12 03:08:44,389 INFO  [org.ovirt.engine.core.vdsbroker.vdsbroker.BrokerCommandBase] (pool-5-thread-5) Vds: nott-vds2
2012-02-12 03:08:44,393 ERROR [org.ovirt.engine.core.vdsbroker.VDSCommandBase] (pool-5-thread-5) Command RefreshStoragePoolVDS execution failed. Exception: VDSErrorException: VDSGenericException: VDSErrorException: Failed to RefreshStoragePoolVDS, error = Unknown pool id, pool not connected: ('2319b7ff-09f6-488c-83c4-aa78139db3b4',)
2012-02-12 03:08:44,393 INFO  [org.ovirt.engine.core.vdsbroker.vdsbroker.RefreshStoragePoolVDSCommand] (pool-5-thread-5) FINISH, RefreshStoragePoolVDSCommand, log id: 753425c4
2012-02-12 03:08:44,394 ERROR [org.ovirt.engine.core.bll.storage.AfterDeactivateSingleAsyncOperation] (pool-5-thread-5) Failed to refresh storagePool. Host nott-vds2 to storage pool XtreamIO. Exception: {3}: org.ovirt.engine.core.common.errors.VdcBLLException: VdcBLLException: org.ovirt.engine.core.vdsbroker.vdsbroker.VDSErrorException: VDSGenericException: VDSErrorException: Failed to RefreshStoragePoolVDS, error = Unknown pool id, pool not connected: ('2319b7ff-09f6-488c-83c4-aa78139db3b4',)
        at org.ovirt.engine.core.bll.VDSBrokerFrontendImpl.handleVdsResult(VDSBrokerFrontendImpl.java:44) [engine-bll.jar:]
        at org.ovirt.engine.core.bll.VDSBrokerFrontendImpl.RunVdsCommand(VDSBrokerFrontendImpl.java:29) [engine-bll.jar:]
        at org.ovirt.engine.core.bll.storage.AfterDeactivateSingleAsyncOperation.Execute(AfterDeactivateSingleAsyncOperation.java:26) [engine-bll.jar:]
        at org.ovirt.engine.core.utils.SyncronizeNumberOfAsyncOperations$AsyncOpThread.run(SyncronizeNumberOfAsyncOperations.java:30) [engine-utils.jar:]
        at org.ovirt.engine.core.utils.threadpool.ThreadPoolUtil$InternalWrapperRunnable.run(ThreadPoolUtil.java:57) [engine-utils.jar:]
        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) [:1.6.0_22]
        at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334) [:1.6.0_22]
        at java.util.concurrent.FutureTask.run(FutureTask.java:166) [:1.6.0_22]
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110) [:1.6.0_22]
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603) [:1.6.0_22]
        at java.lang.Thread.run(Thread.java:679) [:1.6.0_22]

- after some more disconnect\reconnect to pool attempts, the following command is sent: 


2012-02-12 03:10:02,709 INFO  [org.ovirt.engine.core.vdsbroker.vdsbroker.ConnectStoragePoolVDSCommand] (pool-5-thread-10) START, ConnectStoragePoolVDSCommand(vdsId = 89a9d20e-508b-11e1-897a-ab0c759be3e9, storagePoolId = 2319b7ff-09f6-488c-83c4-aa78139db3b4, vds_spm_id = 1, masterDomainId = 7aa228a1-f4aa-4e55-aca1-a3d802ee09ad, masterVersion = 1), log id: 31abf83c


problem: engine use new master domain version, but old master version, which is 1 instead of 2. 

since I had 2 storage domains in pool, and i deactivated one of them, reconstruct is not called (no reconstruct where only one active domain in pool).

Comment 1 Haim 2012-02-12 13:53:45 UTC
This bug raises the following questions: 

1) how come deactivateStorageDomain command was sent to vdsm by engine, but failed after with CanDoAction - should't CanDoAction test come first? 

2) how come engine sends disconnectStoragePool to certain host, and right after send refreshStoragePool which requires connection to pool ? we know it would fail. 

3) going back to bullet 3, if there is a rough thread, that managed to change domain status (internal one) just before User performed the action from GUI, shouldn't second action be blocked by locking mechanism ?

Comment 2 Haim 2012-02-12 13:57:18 UTC
Created attachment 561224 [details]
rhevm log

Comment 4 mkublin 2012-10-04 13:26:18 UTC
Should be fixed, because of all flows were rewritten, and were fixed similar flows.
Should work from 3.2 ovirt version

Comment 5 Itamar Heim 2013-01-16 16:08:26 UTC
3.2 beta built, moving to ON_QA status to allow testing


Note You need to log in before you can comment on or make changes to this bug.