Bug 953455 - engine: we can deactivate storage domain while there are running tasks related to that storage (including master domain)
Summary: engine: we can deactivate storage domain while there are running tasks relate...
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: Red Hat Enterprise Virtualization Manager
Classification: Red Hat
Component: ovirt-engine
Version: 3.2.0
Hardware: x86_64
OS: Linux
unspecified
urgent
Target Milestone: ---
: 3.3.0
Assignee: Tal Nisan
QA Contact: Aharon Canan
URL:
Whiteboard: storage
: 956046 (view as bug list)
Depends On:
Blocks: 972698
TreeView+ depends on / blocked
 
Reported: 2013-04-18 08:54 UTC by Dafna Ron
Modified: 2016-02-10 19:51 UTC (History)
14 users (show)

Fixed In Version: is2
Doc Type: Bug Fix
Doc Text:
Clone Of:
: 972698 (view as bug list)
Environment:
Last Closed:
oVirt Team: Storage
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
logs (740.92 KB, application/x-gzip)
2013-04-18 08:54 UTC, Dafna Ron
no flags Details


Links
System ID Private Priority Status Summary Last Updated
oVirt gerrit 13614 0 None None None Never
oVirt gerrit 14893 0 None None None Never

Description Dafna Ron 2013-04-18 08:54:40 UTC
Created attachment 737220 [details]
logs

Description of problem:

engine is not preventing DeactivateStorageDomain while there are running tasks on the domain. 
we fail in the deactivate in vdsm with timeout exception acquire of lock.

I deactivated the master domain while there were tasks and a second domain. 
we are starting a reconstruct and causing version mismatch between engine and vdsm on master domain

2013-04-18 11:48:32,604 WARN  [org.ovirt.engine.core.vdsbroker.irsbroker.IrsBrokerCommand] (pool-4-thread-34) Master domain version is not in sync between DB and VDSM. Domain orion-01 marked as master, but the version in DB: 3 and in VDSM: 1
 

Version-Release number of selected component (if applicable):

sf13.1

How reproducible:

100%

Steps to Reproduce:
1. create a disk
2. remove the disk
3. put the domain in maintenance
  
Actual results:

engine is sending deactivateStorage to the vdsm. 

Expected results:

if the domain has running tasks we should stop the maintenance with CanDoAction

Additional info: logs

Comment 4 mkublin 2013-04-25 12:32:36 UTC
*** Bug 956046 has been marked as a duplicate of this bug. ***

Comment 6 Dafna Ron 2013-05-05 13:55:14 UTC
tested on sf15 - moving back to devel


I was able to put a non-master domain while it has runningtasks on it into mainetnance and failed in vdsm on timeout: 

de8aa2e3-0ebc-4fe6-a407-8b48eb1fc350 :
        verb = mergeSnapshots
        id = de8aa2e3-0ebc-4fe6-a407-8b48eb1fc350
5e3f14cc-bf0a-45e4-87ff-885304e2e902 :
        verb = mergeSnapshots
        id = 5e3f14cc-bf0a-45e4-87ff-885304e2e902
64f92caa-e01c-4469-ba8a-a1db31ecf9e3 :
        verb = mergeSnapshots
        id = 64f92caa-e01c-4469-ba8a-a1db31ecf9e3

013-05-05 16:50:56,250 INFO  [org.ovirt.engine.core.vdsbroker.irsbroker.MergeSnapshotsVDSCommand] (ajp-/127.0.0.1:8702-7) [669b6173] START, MergeSnapshotsVDSCommand( storagePoolId = 7fd33b43-a9f4-4eb7-a885-e9583a929ceb, ignoreFailoverLimit = false, compatabilityVersion = 3.2, storageDomainId = 81ef11d0-4c0c-47b4-8953-d61a6af442d8, imageGroupId = 3dc6cae9-de4b-47c6-a6f3-35dd7cb522bd, imageId = e8ef5de3-3f26-4aa6-94c6-4c99757ba73c, imageId2 = 11ee296f-2398-403c-aebe-fdaed2462168, vmId = 52a8272d-e945-4cab-b13d-36b86140cd66, postZero = false), log id: 52f3594e

2013-05-05 16:50:57,467 INFO  [org.ovirt.engine.core.vdsbroker.irsbroker.DeactivateStorageDomainVDSCommand] (pool-4-thread-48) [360e2ca6] START, DeactivateStorageDomainVDSCommand( storagePoolId = 7fd33b43-a9f4-4eb7-a885-e9583a929ceb, ignoreFailoverLimit = false, compatabilityVersion = null, storageDomainId = 81ef11d0-4c0c-47b4-8953-d61a6af442d8, masterDomainId = 00000000-0000-0000-0000-000000000000, masterVersion = 129), log id: 23f28274

[root@gold-vdsd ~]# vdsClient -s 0 getStorageDomainInfo 81ef11d0-4c0c-47b4-8953-d61a6af442d8
	uuid = 81ef11d0-4c0c-47b4-8953-d61a6af442d8
	vguuid = Eq3hKP-LB4o-CSHb-U91Z-JKKN-4eQV-Pecf1l
	lver = -1
	state = OK
	version = 3
	role = Regular
	pool = ['7fd33b43-a9f4-4eb7-a885-e9583a929ceb']
	spm_id = -1
	type = ISCSI
	class = Data
	master_ver = 0
	name = Dafna-32-02

2013-05-05 15:03:56,625 ERROR [org.ovirt.engine.core.vdsbroker.VDSCommandBase] (pool-4-thread-46) [4bfa9da8] Command DeactivateStorageDomainVDS execution failed. Exception: IrsOperationFailedNoFailoverException: IRSGenericException: IRS
ErrorException: Resource timeout: ()
2013-05-05 15:03:56,625 INFO  [org.ovirt.engine.core.vdsbroker.irsbroker.DeactivateStorageDomainVDSCommand] (pool-4-thread-46) [4bfa9da8] FINISH, DeactivateStorageDomainVDSCommand, log id: 2a54a312
2013-05-05 15:03:56,625 ERROR [org.ovirt.engine.core.bll.storage.DeactivateStorageDomainCommand] (pool-4-thread-46) [4bfa9da8] Command org.ovirt.engine.core.bll.storage.DeactivateStorageDomainCommand throw Vdc Bll exception. With error 
message VdcBLLException: org.ovirt.engine.core.vdsbroker.irsbroker.IrsOperationFailedNoFailoverException: IRSGenericException: IRSErrorException: Resource timeout: ()
2013-05-05 15:03:56,631 INFO  [org.ovirt.engine.core.bll.storage.DeactivateStorageDomainCommand] (pool-4-thread-46) [4bfa9da8] Command [id=055a70a4-fe82-4f0d-bea7-4865ff56ead5]: Compensating CHANGED_STATUS_ONLY of org.ovirt.engine.core.
common.businessentities.StoragePoolIsoMap; snapshot: EntityStatusSnapshot [id=storagePoolId = 7fd33b43-a9f4-4eb7-a885-e9583a929ceb, storageId = 38755249-4bb3-4841-bf5b-05f4a521514d, status=Active].
2013-05-05 15:03:56,634 INFO  [org.ovirt.engine.core.bll.storage.DeactivateStorageDomainCommand] (pool-4-thread-46) [4bfa9da8] Lock freed to object EngineLock [exclusiveLocks= key: 38755249-4bb3-4841-bf5b-05f4a521514d value: STORAGE
, sharedLocks= key: 7fd33b43-a9f4-4eb7-a885-e9583a929ceb value: POOL
]

Comment 7 Yair Zaslavsky 2013-05-06 06:40:03 UTC
Failed to reproduce -
Created VM with disk 5 GB (preallocated)
Created 5 snapshots.
Pressed "delete" on the 3rd snapshot -
Tried moving the master storage domain to maintainence.
Got the following canDoAction:

Cannot deactivate Master Data Domain while there are running tasks on its Data Center.
-Please wait until tasks will finish and try again.


Please advise

Comment 8 Yair Zaslavsky 2013-05-06 07:54:57 UTC
My bad - reproduced on non master domain.

Comment 9 Yair Zaslavsky 2013-05-06 10:33:48 UTC
For non master domains - This is not an infra issue anymore

Kublin fixed the async_task_entities table filling.
After his fix - if you try to deactivate non master data storage domain - there is no handling for it at canDoAction (for non master storage domain - we check if there are tasks related to it only if it is export storage domain) - you do see entries at async_task_entities table.


IMHO, we should decide if this is the correct behavior or not.

Moving back to storage team.

Comment 10 Sean Cohen 2013-05-06 11:23:55 UTC
Why is this a regression?

Comment 11 Dafna Ron 2013-05-06 11:52:27 UTC
because you could not do that in the past - we would get a CanDoAction if the domain had any running task related to it. 
it's been a while since I tested it but I'm pretty sure that in 3.1 you were unable to do that.

Comment 12 Allon Mureinik 2013-05-08 14:01:47 UTC
Some technical insight on what should be done:

1. There is an infrastructure to report the relevant storage domain ID. 
All storage-related commands should do something like this:
getReturnValue().getInternalTaskIdList().add(
                    createTask(taskCreationInfo,
                            getParameters().getParentCommand(),
                            VdcObjectType.Storage,
                            sourceDomainId,
                            getParameters().getStorageDomainId()));

2. DeactivateStorageDomain only checks against tasks on Export domains - should be done on any data domain as well (in canDoAction()).

Comment 13 Ayal Baron 2013-05-09 12:41:35 UTC
(In reply to comment #12)
> Some technical insight on what should be done:
> 
> 1. There is an infrastructure to report the relevant storage domain ID. 
> All storage-related commands should do something like this:
> getReturnValue().getInternalTaskIdList().add(
>                     createTask(taskCreationInfo,
>                             getParameters().getParentCommand(),
>                             VdcObjectType.Storage,
>                             sourceDomainId,
>                             getParameters().getStorageDomainId()));
> 
> 2. DeactivateStorageDomain only checks against tasks on Export domains -
> should be done on any data domain as well (in canDoAction()).

Did it ever check against tasks in other domains? (i.e. did the behaviour here change)

Comment 14 Allon Mureinik 2013-05-12 09:43:15 UTC
(In reply to comment #13)
> (In reply to comment #12)
> > Some technical insight on what should be done:
> > 
> > 1. There is an infrastructure to report the relevant storage domain ID. 
> > All storage-related commands should do something like this:
> > getReturnValue().getInternalTaskIdList().add(
> >                     createTask(taskCreationInfo,
> >                             getParameters().getParentCommand(),
> >                             VdcObjectType.Storage,
> >                             sourceDomainId,
> >                             getParameters().getStorageDomainId()));
> > 
> > 2. DeactivateStorageDomain only checks against tasks on Export domains -
> > should be done on any data domain as well (in canDoAction()).
> 
> Did it ever check against tasks in other domains? (i.e. did the behaviour
> here change)

Yes, in 3.1 - This was changed in response to bug 753591, probably incorrectly.

Comment 19 Aharon Canan 2013-07-17 09:15:34 UTC
verified using is5 following flow from comment #7 (with non master domain)

Error while executing action: Cannot deactivate Data Domain while there are running tasks on this data domain.
-Please wait until tasks will finish and try again.

Comment 21 Itamar Heim 2014-01-21 22:30:59 UTC
Closing - RHEV 3.3 Released

Comment 22 Itamar Heim 2014-01-21 22:30:59 UTC
Closing - RHEV 3.3 Released

Comment 23 Itamar Heim 2014-01-21 22:33:55 UTC
Closing - RHEV 3.3 Released


Note You need to log in before you can comment on or make changes to this bug.