Bug 1315960

Summary: Cannot move the master storage domain to maintenance
Product: [oVirt] ovirt-engine Reporter: Nelly Credi <ncredi>
Component: BLL.StorageAssignee: Ala Hino <ahino>
Status: CLOSED CURRENTRELEASE QA Contact: Elad <ebenahar>
Severity: high Docs Contact:
Priority: unspecified    
Version: 3.6.3.3CC: ahino, amureini, bugs, ebenahar, laravot, ncredi, sbonazzo, tnisan
Target Milestone: ovirt-3.6.6Keywords: Automation
Target Release: 3.6.6Flags: tnisan: ovirt-3.6.z?
rule-engine: planning_ack?
rule-engine: devel_ack+
rule-engine: testing_ack+
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2016-05-30 10:55:15 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: Storage RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Bug Depends On: 1312741    
Bug Blocks:    
Attachments:
Description Flags
engine logs
none
hosts logs none

Description Nelly Credi 2016-03-09 06:59:09 UTC
Description of problem:
Sometimes the master domain cannot be moved to maintenance mode, with this error
Failed to deactivate storage domain nfs_0 (Data Center golden_env_mixed)

Version-Release number of selected component (if applicable):


How reproducible:
30%

Steps to Reproduce:
1.add master domain wait for everything to be up
2. try to put it into maintenance 
3.

Actual results:
Failed to deactivate storage domain nfs_0 (Data Center golden_env_mixed)

Expected results:
should be able to put it into maintenance

Additional info:
 looking at the logs I saw that there are tasks causing the failure:

SpmStopVDSCommand::Not stopping SPM on vds 'host_mixed_4', pool id '457833c9-adf9-4939-ae00-9dc198c50039' as there are uncleared tasks
.....
DeactivateStorageDomainCommand] (org.ovirt.thread.pool-6-thread-5) [253137b9] Aborting execution due to failure to stop SPM

 when I checked the tasks table I saw a lot of them (123) with:
 status | action_type 
--------+-------------
      2 |        1010
    
vdsm_task_id = 00000000-0000-0000-0000-000000000000 


I have kept an environment in this state and we see it happening in dev ci as well

Comment 1 Nelly Credi 2016-03-09 06:59:59 UTC
Created attachment 1134397 [details]
engine logs

Comment 2 Nelly Credi 2016-03-09 07:00:38 UTC
Created attachment 1134398 [details]
hosts logs

Comment 3 Nelly Credi 2016-03-09 09:23:34 UTC
*** Bug 1315959 has been marked as a duplicate of this bug. ***

Comment 4 Ala Hino 2016-03-10 09:53:14 UTC
Hi Nelly,

Please note that it is impossible to put domain into maintenance if there are running tasks.
Action type 1010 and status 2 mean that there are 123 *running* Live Migrate Disks tasks.
You (ci env) have to make sure there are no running tasks before trying to put the domain into maintenance.

Comment 5 Nelly Credi 2016-03-27 07:55:44 UTC
The tasks were stuck. there was no indication for any live migration anywhere except for the DB

Comment 6 Allon Mureinik 2016-04-03 12:05:55 UTC
As bug 1312741 is ON_QA and there seems to be no other issue here, setting this one to ON_QA too for QA to verify.

Comment 7 Allon Mureinik 2016-04-03 12:07:25 UTC
(In reply to Allon Mureinik from comment #6)
> As bug 1312741 is ON_QA and there seems to be no other issue here, setting
> this one to ON_QA too for QA to verify.

Correction - moving to MODIFIED, as there is no 3.6.6 build yet. When there will be, this one should be moved to ON_QA and verified against it.

Comment 8 Elad 2016-05-02 12:56:22 UTC
Ala, are the verification steps here similar to the ones of bug 1312741? and if so, as bug 1312741 is CLOSED CURRENTRELEASE and got verified, can we move this one to VERIFIED as well?

Comment 9 Ala Hino 2016-05-02 20:32:09 UTC
Elad, the BZs are different. I'd suggest to verify this one too. 

I'd recommend first to run live migration and then try moving the domain to maintenance and see what happens. Keep in mind that if there are running jobs, domain cannot be moved to maintenance. So, as long as the migration is running, user cannot move domain to maintenance; however, once operation completes, the user should be able to move domain to maintenance.

Comment 10 Elad 2016-05-03 09:03:35 UTC
During live migration, moving the master domain to maintenance is not allowed. Once the live migration tasks are completed, moving the domain to maintenance is allowed and works well.


Verified using:
rhevm-3.6.6-0.1.el6.noarch
vdsm-4.17.27-0.el7ev.noarch