Bug 1315960 - Cannot move the master storage domain to maintenance
Cannot move the master storage domain to maintenance
Status: CLOSED CURRENTRELEASE
Product: ovirt-engine
Classification: oVirt
Component: BLL.Storage (Show other bugs)
3.6.3.3
Unspecified Unspecified
unspecified Severity high (vote)
: ovirt-3.6.6
: 3.6.6
Assigned To: Ala Hino
Elad
: Automation
: 1315959 (view as bug list)
Depends On: 1312741
Blocks:
  Show dependency treegraph
 
Reported: 2016-03-09 01:59 EST by Nelly Credi
Modified: 2016-05-30 06:55 EDT (History)
8 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2016-05-30 06:55:15 EDT
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: Storage
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---
tnisan: ovirt‑3.6.z?
rule-engine: planning_ack?
rule-engine: devel_ack+
rule-engine: testing_ack+


Attachments (Terms of Use)
engine logs (830.11 KB, application/x-bzip)
2016-03-09 01:59 EST, Nelly Credi
no flags Details
hosts logs (2.75 MB, application/x-bzip)
2016-03-09 02:00 EST, Nelly Credi
no flags Details

  None (edit)
Description Nelly Credi 2016-03-09 01:59:09 EST
Description of problem:
Sometimes the master domain cannot be moved to maintenance mode, with this error
Failed to deactivate storage domain nfs_0 (Data Center golden_env_mixed)

Version-Release number of selected component (if applicable):


How reproducible:
30%

Steps to Reproduce:
1.add master domain wait for everything to be up
2. try to put it into maintenance 
3.

Actual results:
Failed to deactivate storage domain nfs_0 (Data Center golden_env_mixed)

Expected results:
should be able to put it into maintenance

Additional info:
 looking at the logs I saw that there are tasks causing the failure:

SpmStopVDSCommand::Not stopping SPM on vds 'host_mixed_4', pool id '457833c9-adf9-4939-ae00-9dc198c50039' as there are uncleared tasks
.....
DeactivateStorageDomainCommand] (org.ovirt.thread.pool-6-thread-5) [253137b9] Aborting execution due to failure to stop SPM

 when I checked the tasks table I saw a lot of them (123) with:
 status | action_type 
--------+-------------
      2 |        1010
    
vdsm_task_id = 00000000-0000-0000-0000-000000000000 


I have kept an environment in this state and we see it happening in dev ci as well
Comment 1 Nelly Credi 2016-03-09 01:59 EST
Created attachment 1134397 [details]
engine logs
Comment 2 Nelly Credi 2016-03-09 02:00 EST
Created attachment 1134398 [details]
hosts logs
Comment 3 Nelly Credi 2016-03-09 04:23:34 EST
*** Bug 1315959 has been marked as a duplicate of this bug. ***
Comment 4 Ala Hino 2016-03-10 04:53:14 EST
Hi Nelly,

Please note that it is impossible to put domain into maintenance if there are running tasks.
Action type 1010 and status 2 mean that there are 123 *running* Live Migrate Disks tasks.
You (ci env) have to make sure there are no running tasks before trying to put the domain into maintenance.
Comment 5 Nelly Credi 2016-03-27 03:55:44 EDT
The tasks were stuck. there was no indication for any live migration anywhere except for the DB
Comment 6 Allon Mureinik 2016-04-03 08:05:55 EDT
As bug 1312741 is ON_QA and there seems to be no other issue here, setting this one to ON_QA too for QA to verify.
Comment 7 Allon Mureinik 2016-04-03 08:07:25 EDT
(In reply to Allon Mureinik from comment #6)
> As bug 1312741 is ON_QA and there seems to be no other issue here, setting
> this one to ON_QA too for QA to verify.

Correction - moving to MODIFIED, as there is no 3.6.6 build yet. When there will be, this one should be moved to ON_QA and verified against it.
Comment 8 Elad 2016-05-02 08:56:22 EDT
Ala, are the verification steps here similar to the ones of bug 1312741? and if so, as bug 1312741 is CLOSED CURRENTRELEASE and got verified, can we move this one to VERIFIED as well?
Comment 9 Ala Hino 2016-05-02 16:32:09 EDT
Elad, the BZs are different. I'd suggest to verify this one too. 

I'd recommend first to run live migration and then try moving the domain to maintenance and see what happens. Keep in mind that if there are running jobs, domain cannot be moved to maintenance. So, as long as the migration is running, user cannot move domain to maintenance; however, once operation completes, the user should be able to move domain to maintenance.
Comment 10 Elad 2016-05-03 05:03:35 EDT
During live migration, moving the master domain to maintenance is not allowed. Once the live migration tasks are completed, moving the domain to maintenance is allowed and works well.


Verified using:
rhevm-3.6.6-0.1.el6.noarch
vdsm-4.17.27-0.el7ev.noarch

Note You need to log in before you can comment on or make changes to this bug.