Bug 843407

Summary: ovirt-engine-backend [TEXT]: wrong error message when trying to reconstruct master during remove/create volume actions
Product: Red Hat Enterprise Virtualization Manager Reporter: Dafna Ron <dron>
Component: ovirt-engineAssignee: Greg Padgett <gpadgett>
Status: CLOSED CURRENTRELEASE QA Contact: Dafna Ron <dron>
Severity: low Docs Contact:
Priority: medium    
Version: 3.1.0CC: abaron, amureini, dyasny, hateya, iheim, lpeer, Rhev-m-bugs, sgrinber, snmishra, yeylon, ykaul
Target Milestone: ---   
Target Release: 3.1.0   
Hardware: x86_64   
OS: Linux   
Whiteboard: storage
Fixed In Version: SI20 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2012-12-04 20:00:15 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: Storage RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
log none

Description Dafna Ron 2012-07-26 08:58:26 UTC
Created attachment 600460 [details]
log

Description of problem:

we are getting the following error: 

2012-07-26 11:43:54,658 WARN  [org.ovirt.engine.core.bll.storage.DeactivateStorageDomainCommand] (ajp-/0.0.0.0:8009-7) CanDoAction of action DeactivateStorageDomain failed. Reasons:VAR__TYPE__STORAGE__DOMAIN,VAR__ACTION__DEACTIVATE,ACTION_TYPE_FAILED_DETECTED_RUNNING_VMS


Version-Release number of selected component (if applicable):

si11

How reproducible:

70%

Steps to Reproduce:
1. reconstruct master domain while there are running tasks (such as create vm's, remove vm's)
2.
3.
  
Actual results:

if we block the reconstruct *which does not happen each time because of a race bug, than we are getting a wrong error message that there are running vms

Expected results:

we should rephrase error 

Additional info: backend log

Comment 1 Sharad Mishra 2012-07-30 21:47:44 UTC
Trying to understand what the issue is here. 

Currently the code flow is - 

        if (!getParameters().getIsInternal()
                && !getVmDAO()
                        .getAllRunningForStorageDomain(getStorageDomain().getId())
                        .isEmpty()) {
            addCanDoActionMessage(VdcBllMessages.ACTION_TYPE_FAILED_DETECTED_RUNNING_VMS);
            return false;
        }
        if (getStoragePool().getspm_vds_id() != null
                    && getStorageDomain().getstorage_domain_type() != StorageDomainType.ISO
                    && getAsyncTaskDao().getAsyncTaskIdsByEntity(getParameters().getStorageDomainId()).size() > 0) {
                addCanDoActionMessage(VdcBllMessages.ERROR_CANNOT_DEACTIVATE_DOMAIN_WITH_TASKS);
                return false;
        }

If we reverse it and check for tasks before running VMs. Will it solve this issue without creating any new one?

Comment 2 Dafna Ron 2012-09-12 09:29:48 UTC
(In reply to comment #1)
> Trying to understand what the issue is here. 
> 
> Currently the code flow is - 
> 
>         if (!getParameters().getIsInternal()
>                 && !getVmDAO()
>                        
> .getAllRunningForStorageDomain(getStorageDomain().getId())
>                         .isEmpty()) {
>            
> addCanDoActionMessage(VdcBllMessages.
> ACTION_TYPE_FAILED_DETECTED_RUNNING_VMS);
>             return false;
>         }
>         if (getStoragePool().getspm_vds_id() != null
>                     && getStorageDomain().getstorage_domain_type() !=
> StorageDomainType.ISO
>                     &&
> getAsyncTaskDao().getAsyncTaskIdsByEntity(getParameters().
> getStorageDomainId()).size() > 0) {
>                
> addCanDoActionMessage(VdcBllMessages.
> ERROR_CANNOT_DEACTIVATE_DOMAIN_WITH_TASKS);
>                 return false;
>         }
> 
> If we reverse it and check for tasks before running VMs. Will it solve this
> issue without creating any new one?



it was two months ago, but I don't think that there were running vm's. 
so the first CanDoAction should not have been fired.

Comment 3 Greg Padgett 2012-09-13 22:24:42 UTC
http://gerrit.ovirt.org/7997

(In reply to comment #2)
> it was two months ago, but I don't think that there were running vm's. 
> so the first CanDoAction should not have been fired.

The root cause is that some tasks cause the VMs to appear "running" even if they aren't, according to getAllRunningForStorageDomain--the stored procedure that backs it looks for VMs that aren't in the "Down" state--which includes not only those that are running, but also those that are migrating, locked, etc.

(In reply to comment #1)
> If we reverse it and check for tasks before running VMs. Will it solve this
> issue without creating any new one?

Depends... there still may be cases where there are no tasks, yet the VM is in such a state as to appear running.  Instead, I've generalized the message so it is more clear what is going on.

Comment 5 Allon Mureinik 2012-09-29 12:48:07 UTC
Merged I3ce9ce378ffeed980af508e6556f9d844a3e07bd

Comment 7 Dafna Ron 2012-10-15 16:24:49 UTC
verified on si20

Error while executing action: Cannot deactivate Master Data Domain while there are running tasks on its Data Center.
-Please wait until tasks will finish and try again.