Bug 826894

Summary: ovirt-engine-backend [Multiple Domains]: backend is not blocking creation of template while one domain is in maintenance
Product: Red Hat Enterprise Virtualization Manager Reporter: Dafna Ron <dron>
Component: ovirt-engineAssignee: Tal Nisan <tnisan>
Status: CLOSED WORKSFORME QA Contact: Haim <hateya>
Severity: medium Docs Contact:
Priority: high    
Version: 3.1.0CC: abaron, amureini, dyasny, hateya, iheim, lpeer, mkublin, Rhev-m-bugs, yeylon, ykaul
Target Milestone: ---   
Target Release: 3.1.0   
Hardware: x86_64   
OS: Linux   
Whiteboard: storage
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2012-06-13 15:35:24 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: Storage RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
logs none

Description Dafna Ron 2012-05-31 08:25:50 UTC
Description of problem:

we do not block make template at all in backend when one of the domains is in maintenance. 
if the disk on the domain is not the boot disk the webadmin blocks the action but if the disk is boot disk the command is sent to vdsm and will fail in vdsm with low level image copy failed. 

the user will get the following error in the webadmin: 

Error while executing action New VM Template: Internal Error (VM Snapshot mismatch).

Version-Release number of selected component (if applicable):

si4

How reproducible:

100%

Steps to Reproduce:
1. create 3 domains
2. create a vm with 2 disks (both disks located on a different domain which is not the master domain)
3. put domain where the boot disk is located on in mainetancne -> try to create template. 
  
Actual results:

action will not be blocked and will be sent to vdsm. 
the error will be for low level copy and user will see the following error: 

Error while executing action New VM Template: Internal Error (VM Snapshot mismatch).

Expected results:

1) we should block the user from execting the command in backend. 

Additional info: full logs attached

user gets: 

Error while executing action New VM Template: Internal Error (VM Snapshot mismatch).

backend sends the command to the vdsm which returnms with:
e105f5e0-54de-43eb-8e0c-06cb4d63adf1::ERROR::2012-05-28 10:45:41,895::task::853::TaskManager.Task::(_setError) Task=`e105f5e0-54de-43eb-8e0c-06cb4d63adf1`::Unexpected error
Traceback (most recent call last):
  File "/usr/share/vdsm/storage/task.py", line 861, in _run
    return fn(*args, **kargs)
  File "/usr/share/vdsm/storage/task.py", line 320, in run
    return self.cmd(*self.argslist, **self.argsdict)
  File "/usr/share/vdsm/storage/securable.py", line 63, in wrapper
    return f(self, *args, **kwargs)
  File "/usr/share/vdsm/storage/sp.py", line 1624, in copyImage
    volType, volFormat, preallocate, postZero, force)
  File "/usr/share/vdsm/storage/image.py", line 874, in copy
    raise se.CopyImageError("src image=%s, dst image=%s: msg=%s" % (srcImgUUID, dstImgUUID, str(e)))
CopyImageError: low level Image copy failed: ('src image=7911574b-98d1-4c90-aaaf-a4982df259db, dst image=f2e785af-4c6b-425c-a691-f85fa1dd8c28: msg=Action was stopped: ()',)
e105f5e0-54de-43eb-8e0c-06cb4d63adf1::DEBUG::2012-05-28 10:45:41,896::task::872::TaskManager.Task::(_run) Task=`e105f5e0-54de-43eb-8e0c-06cb4d63adf1`::Task._run: e105f5e0-54de-43eb-8e0c-06cb4d63adf1 () {} failed - stopping task
e105f5e0-54de-43eb-8e0c-06cb4d63adf1::DEBUG::2012-05-28 10:45:41,897::task::1199::TaskManager.Task::(stop) Task=`e105f5e0-54de-43eb-8e0c-06cb4d63adf1`::stopping in state running (force False)
e105f5e0-54de-43eb-8e0c-06cb4d63adf1::DEBUG::2012-05-28 10:45:41,897::task::903::TaskManager.Task::(_runJobs) Task=`e105f5e0-54de-43eb-8e0c-06cb4d63adf1`::aborting: Task is aborted: u'e105f5e0-54de-43eb-8e0c-06cb4d63adf1' - code 411


Backend:

2012-05-28 10:48:18,738 ERROR [org.ovirt.engine.core.vdsbroker.irsbroker.GetImageInfoVDSCommand] (http--0.0.0.0-8080-1) [7b6a7d07] IrsBroker::getImageInfo::Failed getting image info imageId = 1ebab7af-73fc-4e4c-a174-6338d076bc56 does not
 exist on domainName = orion , domainId = 31bb8e4b-ed31-4ec7-bb43-6d28a26955c3,  error code: StorageDomainDoesNotExist, message: Storage domain does not exist: ('31bb8e4b-ed31-4ec7-bb43-6d28a26955c3',)
2012-05-28 10:48:18,741 INFO  [org.ovirt.engine.core.vdsbroker.vdsbroker.BrokerCommandBase] (http--0.0.0.0-8080-1) [7b6a7d07] Command org.ovirt.engine.core.vdsbroker.irsbroker.GetImageInfoVDSCommand return value 
 
Class Name: org.ovirt.engine.core.vdsbroker.irsbroker.OneImageInfoReturnForXmlRpc
mInfo                         Null
mStatus                       Class Name: org.ovirt.engine.core.vdsbroker.vdsbroker.StatusForXmlRpc
mCode                         358
mMessage                      Storage domain does not exist: ('31bb8e4b-ed31-4ec7-bb43-6d28a26955c3',)



2012-05-28 10:48:18,744 INFO  [org.ovirt.engine.core.vdsbroker.irsbroker.GetImageInfoVDSCommand] (http--0.0.0.0-8080-1) [7b6a7d07] FINISH, GetImageInfoVDSCommand, log id: 3718ae19
2012-05-28 10:48:18,746 ERROR [org.ovirt.engine.core.bll.CreateImageTemplateCommand] (http--0.0.0.0-8080-1) [7b6a7d07] Command org.ovirt.engine.core.bll.CreateImageTemplateCommand throw Vdc Bll exception. With error message VdcBLLExcepti
on:
2012-05-28 10:48:18,750 ERROR [org.ovirt.engine.core.bll.CreateImageTemplateCommand] (http--0.0.0.0-8080-1) [7b6a7d07] Transaction rolled-back for command: org.ovirt.engine.core.bll.CreateImageTemplateCommand.
2012-05-28 10:48:18,751 ERROR [org.ovirt.engine.core.bll.CreateImageTemplateCommand] (http--0.0.0.0-8080-1) [7b6a7d07] Transaction rolled-back for command: org.ovirt.engine.core.bll.CreateImageTemplateCommand.
2012-05-28 10:48:18,752 INFO  [org.ovirt.engine.core.utils.transaction.TransactionSupport] (http--0.0.0.0-8080-1) [7b6a7d07] transaction rolled back
2012-05-28 10:48:18,753 ERROR [org.ovirt.engine.core.bll.AddVmTemplateCommand] (http--0.0.0.0-8080-1) [7b6a7d07] Command org.ovirt.engine.core.bll.AddVmTemplateCommand throw Vdc Bll exception. With error message VdcBLLException: RESOURCE
_MANAGER_VM_SNAPSHOT_MISSMATCH
2012-05-28 10:48:18,756 INFO  [org.ovirt.engine.core.bll.AddVmTemplateCommand] (http--0.0.0.0-8080-1) [7b6a7d07] Command [id=b62c9f75-603f-4b4e-8fdf-be27fac870ce]: Compensating NEW_ENTITY_ID of org.ovirt.engine.core.common.businessentiti
es.VmTemplate; snapshot: a7059bb1-5408-4972-be68-2eed827725bd.
2012-05-28 10:48:18,761 INFO  [org.ovirt.engine.core.bll.AddVmTemplateCommand] (http--0.0.0.0-8080-1) [7b6a7d07] Command [id=b62c9f75-603f-4b4e-8fdf-be27fac870ce]: Compensating CHANGED_STATUS_ONLY of org.ovirt.engine.core.common.business
entities.VmDynamic; snapshot: EntityStatusSnapshot [id=2616a183-15b0-4fe4-9b64-368b4ba1d260, status=Down].
2012-05-28 10:48:18,781 INFO  [org.ovirt.engine.core.bll.AsyncTaskManager] (Rollback-pool-10-thread-47) [7b6a7d07] AsyncTaskManager::CancelTask: Attempting to cancel task e105f5e0-54de-43eb-8e0c-06cb4d63adf1.
2012-05-28 10:48:18,784 INFO  [org.ovirt.engine.core.bll.SPMAsyncTask] (Rollback-pool-10-thread-47) [7b6a7d07] SPMAsyncTask::StopTask: Attempting to stop task e105f5e0-54de-43eb-8e0c-06cb4d63adf1 (Parent Command AddVmTemplate, Parameters
 Type org.ovirt.engine.core.common.asynctasks.AsyncTaskParameters).
2012-05-28 10:48:18,788 INFO  [org.ovirt.engine.core.vdsbroker.irsbroker.SPMStopTaskVDSCommand] (Rollback-pool-10-thread-47) [7b6a7d07] START, SPMStopTaskVDSCommand(storagePoolId = 38bd59fc-04a2-4f56-abb4-62fd34bb83f3, ignoreFailoverLimi
t = false, compatabilityVersion = null, taskId = e105f5e0-54de-43eb-8e0c-06cb4d63adf1), log id: 4f98a899
2012-05-28 10:48:18,792 INFO  [org.ovirt.engine.core.vdsbroker.vdsbroker.HSMStopTaskVDSCommand] (Rollback-pool-10-thread-47) [7b6a7d07] START, HSMStopTaskVDSCommand(vdsId = 06c61084-a10b-11e1-9854-001a4a169741, taskId=e105f5e0-54de-43eb-
8e0c-06cb4d63adf1), log id: 1fbc00d1
2012-05-28 10:48:18,842 INFO  [org.ovirt.engine.core.vdsbroker.vdsbroker.HSMStopTaskVDSCommand] (Rollback-pool-10-thread-47) [7b6a7d07] FINISH, HSMStopTaskVDSCommand, log id: 1fbc00d1
2012-05-28 10:48:18,843 INFO  [org.ovirt.engine.core.vdsbroker.irsbroker.SPMStopTaskVDSCommand] (Rollback-pool-10-thread-47) [7b6a7d07] FINISH, SPMStopTaskVDSCommand, log id: 4f98a899
2012-05-28 10:48:18,846 INFO  [org.ovirt.engine.core.bll.EntityMultiAsyncTasks] (Rollback-pool-10-thread-47) [7b6a7d07] EntityMultiAsyncTasks::StartPollingTask: Current Action Type for entity a7059bb1-5408-4972-be68-2eed827725bd is AddVm
Template (determined by task e105f5e0-54de-43eb-8e0c-06cb4d63adf1)
2012-05-28 10:48:26,235 INFO  [org.ovirt.engine.core.bll.AsyncTaskManager] (QuartzScheduler_Worker-11) [26400bad] AsyncTaskManager::PollAndUpdateAsyncTasks: 1 tasks, 1 tasks to poll now
2012-05-28 10:48:26,254 INFO  [org.ovirt.engine.core.bll.SPMAsyncTask] (QuartzScheduler_Worker-11) [26400bad] SPMAsyncTask::PollTask: Polling task e105f5e0-54de-43eb-8e0c-06cb4d63adf1 (Parent Command AddVmTemplate, Parameters Type org.ov
irt.engine.core.common.asynctasks.AsyncTaskParameters) returned status running.
2012-05-28 10:48:26,256 INFO  [org.ovirt.engine.core.bll.AsyncTaskManager] (QuartzScheduler_Worker-11) [26400bad] AsyncTaskManager::_timer_Elapsed: Finished polling Tasks, will poll again in 10 seconds.
2012-05-28 10:48:36,280 INFO  [org.ovirt.engine.core.bll.SPMAsyncTask] (QuartzScheduler_Worker-42) [40a7af09] SPMAsyncTask::PollTask: Polling task e105f5e0-54de-43eb-8e0c-06cb4d63adf1 (Parent Command AddVmTemplate, Parameters Type org.ov
irt.engine.core.common.asynctasks.AsyncTaskParameters) returned status finished, result 'cleanSuccess'.
2012-05-28 10:48:36,309 ERROR [org.ovirt.engine.core.bll.SPMAsyncTask] (QuartzScheduler_Worker-42) [40a7af09] BaseAsyncTask::LogEndTaskFailure: Task e105f5e0-54de-43eb-8e0c-06cb4d63adf1 (Parent Command AddVmTemplate, Parameters Type org.
ovirt.engine.core.common.asynctasks.AsyncTaskParameters) ended with failure:
-- Result: cleanSuccess
-- Message: VDSGenericException: VDSErrorException: Failed in vdscommand to HSMGetAllTasksStatusesVDS, error = e105f5e0-54de-43eb-8e0c-06cb4d63adf1,
-- Exception: VDSGenericException: VDSErrorException: Failed in vdscommand to HSMGetAllTasksStatusesVDS, error = e105f5e0-54de-43eb-8e0c-06cb4d63adf1
2012-05-28 10:48:36,313 INFO  [org.ovirt.engine.core.bll.EntityAsyncTask] (QuartzScheduler_Worker-42) [40a7af09] EntityAsyncTask::EndActionIfNecessary: All tasks of entity a7059bb1-5408-4972-be68-2eed827725bd has ended -> executing EndAc
tion
2012-05-28 10:48:36,314 INFO  [org.ovirt.engine.core.bll.EntityAsyncTask] (QuartzScheduler_Worker-42) [40a7af09] EntityAsyncTask::EndAction: Ending action for 1 tasks (entity ID: a7059bb1-5408-4972-be68-2eed827725bd): calling EndAction f
or action type AddVmTemplate.
2012-05-28 10:48:36,316 INFO  [org.ovirt.engine.core.bll.EntityAsyncTask] (Rollback-pool-10-thread-47) [40a7af09] EntityAsyncTask::EndCommandAction [within thread]context: Attempting to EndAction AddVmTemplate
2012-05-28 10:48:36,351 ERROR [org.ovirt.engine.core.bll.AddVmTemplateCommand] (Rollback-pool-10-thread-47) [572688ef] Ending command with failure: org.ovirt.engine.core.bll.AddVmTemplateCommand
2012-05-28 10:48:36,352 ERROR [org.ovirt.engine.core.bll.CreateImageTemplateCommand] (Rollback-pool-10-thread-47) [6f8cd45b] Ending command with failure: org.ovirt.engine.core.bll.CreateImageTemplateCommand
2012-05-28 10:48:36,367 INFO  [org.ovirt.engine.core.bll.CreateImageTemplateCommand] (Rollback-pool-10-thread-47) [7b6a7d07] Ending command successfully: org.ovirt.engine.core.bll.CreateImageTemplateCommand
2012-05-28 10:48:36,374 ERROR [org.ovirt.engine.core.bll.VmHandler] (Rollback-pool-10-thread-47) Trying to unlock vm new in status Down - not moving to down!
:

Comment 1 Dafna Ron 2012-05-31 08:27:34 UTC
Created attachment 587975 [details]
logs

Comment 2 mkublin 2012-06-12 07:23:07 UTC
It is engine and it is blocking I wrote that code, before the bug will be closed as non reproducible, the reason is a race - the domain was still at process of moving to Maintaince and that's why it passed check at engine part, don't think that should be solved

Comment 5 Dafna Ron 2012-06-13 13:28:28 UTC
not sure I understood you Kublin :) 

Did you mean that you added code that fixed this issue? 

 - if so than please add the commit and we will verify it on next build.

Did you mean that we wont fix because its a race? 

 - I do not agree since just because its a race does not mean that its not a bug and that we don't fix bugs that are caused by a race. 

did you mean that its not worth fixing because its a race?

 - If you think that its not worth fixing then please close as WONTFIX not as WORKSFORME since it is reproducible 100% in this build.

Comment 6 mkublin 2012-06-13 13:43:43 UTC
I mean the code which is checking a status of domains are already there , from the beginning. I wrote it.

From the logs : it was a race, I don't understand how it can be 100% reproducible, code was there always and it WORKSFORME at my env.
The only reason that u have not a code that is checking these that a strange way of choosing commits for build.

Comment 9 Tal Nisan 2012-06-13 15:34:55 UTC
Checked it numerous times, the backend disallows it and also the UI disallows it by greying the make template dialog and showing a comment saying "Disks' Storage Domains are not accessible."
Moving to WORKSFORME