Created attachment 737220 [details] logs Description of problem: engine is not preventing DeactivateStorageDomain while there are running tasks on the domain. we fail in the deactivate in vdsm with timeout exception acquire of lock. I deactivated the master domain while there were tasks and a second domain. we are starting a reconstruct and causing version mismatch between engine and vdsm on master domain 2013-04-18 11:48:32,604 WARN [org.ovirt.engine.core.vdsbroker.irsbroker.IrsBrokerCommand] (pool-4-thread-34) Master domain version is not in sync between DB and VDSM. Domain orion-01 marked as master, but the version in DB: 3 and in VDSM: 1 Version-Release number of selected component (if applicable): sf13.1 How reproducible: 100% Steps to Reproduce: 1. create a disk 2. remove the disk 3. put the domain in maintenance Actual results: engine is sending deactivateStorage to the vdsm. Expected results: if the domain has running tasks we should stop the maintenance with CanDoAction Additional info: logs
*** Bug 956046 has been marked as a duplicate of this bug. ***
tested on sf15 - moving back to devel I was able to put a non-master domain while it has runningtasks on it into mainetnance and failed in vdsm on timeout: de8aa2e3-0ebc-4fe6-a407-8b48eb1fc350 : verb = mergeSnapshots id = de8aa2e3-0ebc-4fe6-a407-8b48eb1fc350 5e3f14cc-bf0a-45e4-87ff-885304e2e902 : verb = mergeSnapshots id = 5e3f14cc-bf0a-45e4-87ff-885304e2e902 64f92caa-e01c-4469-ba8a-a1db31ecf9e3 : verb = mergeSnapshots id = 64f92caa-e01c-4469-ba8a-a1db31ecf9e3 013-05-05 16:50:56,250 INFO [org.ovirt.engine.core.vdsbroker.irsbroker.MergeSnapshotsVDSCommand] (ajp-/127.0.0.1:8702-7) [669b6173] START, MergeSnapshotsVDSCommand( storagePoolId = 7fd33b43-a9f4-4eb7-a885-e9583a929ceb, ignoreFailoverLimit = false, compatabilityVersion = 3.2, storageDomainId = 81ef11d0-4c0c-47b4-8953-d61a6af442d8, imageGroupId = 3dc6cae9-de4b-47c6-a6f3-35dd7cb522bd, imageId = e8ef5de3-3f26-4aa6-94c6-4c99757ba73c, imageId2 = 11ee296f-2398-403c-aebe-fdaed2462168, vmId = 52a8272d-e945-4cab-b13d-36b86140cd66, postZero = false), log id: 52f3594e 2013-05-05 16:50:57,467 INFO [org.ovirt.engine.core.vdsbroker.irsbroker.DeactivateStorageDomainVDSCommand] (pool-4-thread-48) [360e2ca6] START, DeactivateStorageDomainVDSCommand( storagePoolId = 7fd33b43-a9f4-4eb7-a885-e9583a929ceb, ignoreFailoverLimit = false, compatabilityVersion = null, storageDomainId = 81ef11d0-4c0c-47b4-8953-d61a6af442d8, masterDomainId = 00000000-0000-0000-0000-000000000000, masterVersion = 129), log id: 23f28274 [root@gold-vdsd ~]# vdsClient -s 0 getStorageDomainInfo 81ef11d0-4c0c-47b4-8953-d61a6af442d8 uuid = 81ef11d0-4c0c-47b4-8953-d61a6af442d8 vguuid = Eq3hKP-LB4o-CSHb-U91Z-JKKN-4eQV-Pecf1l lver = -1 state = OK version = 3 role = Regular pool = ['7fd33b43-a9f4-4eb7-a885-e9583a929ceb'] spm_id = -1 type = ISCSI class = Data master_ver = 0 name = Dafna-32-02 2013-05-05 15:03:56,625 ERROR [org.ovirt.engine.core.vdsbroker.VDSCommandBase] (pool-4-thread-46) [4bfa9da8] Command DeactivateStorageDomainVDS execution failed. Exception: IrsOperationFailedNoFailoverException: IRSGenericException: IRS ErrorException: Resource timeout: () 2013-05-05 15:03:56,625 INFO [org.ovirt.engine.core.vdsbroker.irsbroker.DeactivateStorageDomainVDSCommand] (pool-4-thread-46) [4bfa9da8] FINISH, DeactivateStorageDomainVDSCommand, log id: 2a54a312 2013-05-05 15:03:56,625 ERROR [org.ovirt.engine.core.bll.storage.DeactivateStorageDomainCommand] (pool-4-thread-46) [4bfa9da8] Command org.ovirt.engine.core.bll.storage.DeactivateStorageDomainCommand throw Vdc Bll exception. With error message VdcBLLException: org.ovirt.engine.core.vdsbroker.irsbroker.IrsOperationFailedNoFailoverException: IRSGenericException: IRSErrorException: Resource timeout: () 2013-05-05 15:03:56,631 INFO [org.ovirt.engine.core.bll.storage.DeactivateStorageDomainCommand] (pool-4-thread-46) [4bfa9da8] Command [id=055a70a4-fe82-4f0d-bea7-4865ff56ead5]: Compensating CHANGED_STATUS_ONLY of org.ovirt.engine.core. common.businessentities.StoragePoolIsoMap; snapshot: EntityStatusSnapshot [id=storagePoolId = 7fd33b43-a9f4-4eb7-a885-e9583a929ceb, storageId = 38755249-4bb3-4841-bf5b-05f4a521514d, status=Active]. 2013-05-05 15:03:56,634 INFO [org.ovirt.engine.core.bll.storage.DeactivateStorageDomainCommand] (pool-4-thread-46) [4bfa9da8] Lock freed to object EngineLock [exclusiveLocks= key: 38755249-4bb3-4841-bf5b-05f4a521514d value: STORAGE , sharedLocks= key: 7fd33b43-a9f4-4eb7-a885-e9583a929ceb value: POOL ]
Failed to reproduce - Created VM with disk 5 GB (preallocated) Created 5 snapshots. Pressed "delete" on the 3rd snapshot - Tried moving the master storage domain to maintainence. Got the following canDoAction: Cannot deactivate Master Data Domain while there are running tasks on its Data Center. -Please wait until tasks will finish and try again. Please advise
My bad - reproduced on non master domain.
For non master domains - This is not an infra issue anymore Kublin fixed the async_task_entities table filling. After his fix - if you try to deactivate non master data storage domain - there is no handling for it at canDoAction (for non master storage domain - we check if there are tasks related to it only if it is export storage domain) - you do see entries at async_task_entities table. IMHO, we should decide if this is the correct behavior or not. Moving back to storage team.
Why is this a regression?
because you could not do that in the past - we would get a CanDoAction if the domain had any running task related to it. it's been a while since I tested it but I'm pretty sure that in 3.1 you were unable to do that.
Some technical insight on what should be done: 1. There is an infrastructure to report the relevant storage domain ID. All storage-related commands should do something like this: getReturnValue().getInternalTaskIdList().add( createTask(taskCreationInfo, getParameters().getParentCommand(), VdcObjectType.Storage, sourceDomainId, getParameters().getStorageDomainId())); 2. DeactivateStorageDomain only checks against tasks on Export domains - should be done on any data domain as well (in canDoAction()).
(In reply to comment #12) > Some technical insight on what should be done: > > 1. There is an infrastructure to report the relevant storage domain ID. > All storage-related commands should do something like this: > getReturnValue().getInternalTaskIdList().add( > createTask(taskCreationInfo, > getParameters().getParentCommand(), > VdcObjectType.Storage, > sourceDomainId, > getParameters().getStorageDomainId())); > > 2. DeactivateStorageDomain only checks against tasks on Export domains - > should be done on any data domain as well (in canDoAction()). Did it ever check against tasks in other domains? (i.e. did the behaviour here change)
(In reply to comment #13) > (In reply to comment #12) > > Some technical insight on what should be done: > > > > 1. There is an infrastructure to report the relevant storage domain ID. > > All storage-related commands should do something like this: > > getReturnValue().getInternalTaskIdList().add( > > createTask(taskCreationInfo, > > getParameters().getParentCommand(), > > VdcObjectType.Storage, > > sourceDomainId, > > getParameters().getStorageDomainId())); > > > > 2. DeactivateStorageDomain only checks against tasks on Export domains - > > should be done on any data domain as well (in canDoAction()). > > Did it ever check against tasks in other domains? (i.e. did the behaviour > here change) Yes, in 3.1 - This was changed in response to bug 753591, probably incorrectly.
verified using is5 following flow from comment #7 (with non master domain) Error while executing action: Cannot deactivate Data Domain while there are running tasks on this data domain. -Please wait until tasks will finish and try again.
Closing - RHEV 3.3 Released