Bug 1664045

Summary: A failure to deactivate SPM due to uncleared tasks is not reported via any API
Product: [oVirt] ovirt-engine Reporter: Elad <ebenahar>
Component: BLL.StorageAssignee: Ahmad Khiet <akhiet>
Status: CLOSED CURRENTRELEASE QA Contact: Ahmad Khiet <akhiet>
Severity: medium Docs Contact:
Priority: unspecified    
Version: 4.3.0CC: aefrat, akhiet, bugs, tnisan
Target Milestone: ovirt-4.3.4Flags: rule-engine: ovirt-4.3+
Target Release: 4.3.4   
Hardware: x86_64   
OS: Unspecified   
Whiteboard:
Fixed In Version: ovirt-engine-4.3.4 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2019-06-20 11:48:02 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: Storage RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
engine and vdsm logs
none
screenshot for the change none

Description Elad 2019-01-07 15:22:07 UTC
Created attachment 1519041 [details]
engine and vdsm logs

Description of problem:
An attempt to deactivate the SPM host while it has running tasks fails and the failure is not propagated to the user.


Version-Release number of selected component (if applicable):
ovirt-engine-4.3.0-0.6.alpha2.el7.noarch
vdsm-4.30.4-1.el7ev.x86_64


How reproducible:
Always

Steps to Reproduce:
1. Have a task in state 'finished' on SPM
2. Try to deactivate the SPM host


Actual results:
Host maintenance fails:

2019-01-07 16:57:22,077+02 INFO  [org.ovirt.engine.core.bll.MaintenanceNumberOfVdssCommand] (default task-204) [hosts_syncAction_137c8740-e9aa-4b8d] Running command: MaintenanceNumberOfVdssCommand internal: fals
e. Entities affected :  ID: f288cfa3-a78f-4d70-91ed-607fb197d47a Type: VDSAction group MANIPULATE_HOST with role type ADMIN
2019-01-07 16:57:22,086+02 INFO  [org.ovirt.engine.core.vdsbroker.SetVdsStatusVDSCommand] (default task-204) [hosts_syncAction_137c8740-e9aa-4b8d] START, SetVdsStatusVDSCommand(HostName = host_mixed_1, SetVdsSta
tusVDSCommandParameters:{hostId='f288cfa3-a78f-4d70-91ed-607fb197d47a', status='PreparingForMaintenance', nonOperationalReason='NONE', stopSpmFailureLogged='true', maintenanceReason='null'}), log id: 3b8e3077
2019-01-07 16:57:22,086+02 INFO  [org.ovirt.engine.core.vdsbroker.SetVdsStatusVDSCommand] (default task-204) [hosts_syncAction_137c8740-e9aa-4b8d] VDS 'host_mixed_1' is spm and moved from up calling resetIrs.
2019-01-07 16:57:22,088+02 INFO  [org.ovirt.engine.core.vdsbroker.irsbroker.ResetIrsVDSCommand] (default task-204) [hosts_syncAction_137c8740-e9aa-4b8d] START, ResetIrsVDSCommand( ResetIrsVDSCommandParameters:{s
toragePoolId='2288b8b4-06e6-44cc-8294-3f6ceec565f5', ignoreFailoverLimit='false', vdsId='f288cfa3-a78f-4d70-91ed-607fb197d47a', ignoreStopFailed='false'}), log id: 6dcea640
2019-01-07 16:57:22,098+02 INFO  [org.ovirt.engine.core.vdsbroker.vdsbroker.SpmStopVDSCommand] (default task-204) [hosts_syncAction_137c8740-e9aa-4b8d] START, SpmStopVDSCommand(HostName = host_mixed_1, SpmStopVD
SCommandParameters:{hostId='f288cfa3-a78f-4d70-91ed-607fb197d47a', storagePoolId='2288b8b4-06e6-44cc-8294-3f6ceec565f5'}), log id: 1cb04123
2019-01-07 16:57:22,114+02 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.SpmStopVDSCommand] (default task-204) [hosts_syncAction_137c8740-e9aa-4b8d] SpmStopVDSCommand::Not stopping SPM on vds 'host_mixed_1', 
pool id '2288b8b4-06e6-44cc-8294-3f6ceec565f5' as there are uncleared tasks 'Task 'af6d25f9-85ae-43fa-bbe8-55de984f215f', status 'finished''



But it's not reported to the user. 
Via Webadmin, no error appears in the events nor error message. 
Via REST API, deactivation response code is OK:


url:/ovirt-engine/api/hosts/f288cfa3-a78f-4d70-91ed-607fb197d47a/deactivate body:<action>
    <async>false</async>
    <grace_period>
        <expiry>10</expiry>
    </grace_period>
</action>


2019-01-07 16:57:21,816 - MainThread - hosts - INFO - Using Correlation-Id: hosts_syncAction_137c8740-e9aa-4b8d
2019-01-07 16:57:22,205 - MainThread - hosts - DEBUG - Cleaning Correlation-Id: hosts_syncAction_137c8740-e9aa-4b8d
2019-01-07 16:57:22,206 - MainThread - hosts - DEBUG - Response code is valid: [200, 201]



Expected results:
In case of a failure to deactivate the SPM due to uncleared tasks, this should be propagated to the user.


Additional info:

Comment 1 Avihai 2019-06-02 14:07:38 UTC
Hi Ahmad,

We need a way to reach this state (deactivate the SPM host while it has running tasks) in order to verify this bug.

Currently, the issue does not reproduce and we need your help to trigger such an event.

please help.

Comment 2 Ahmad Khiet 2019-06-02 14:19:12 UTC
Hi Avihai, 

I reproduced this issue on the development environment, by forcing a value change to access the error branch.
I'll attach images for after status.

Comment 3 Ahmad Khiet 2019-06-02 14:22:44 UTC
Created attachment 1576299 [details]
screenshot for the change

screenshot for the change

Comment 4 Avihai 2019-06-02 15:16:50 UTC
(In reply to Ahmad Khiet from comment #3)
> Created attachment 1576299 [details]
> screenshot for the change
> 
> screenshot for the change

Unfortunately, I can't verify it according to your verification on your dev env.
Either you verify this bug - which is fine by me :) 

For QE to verify this bug we need a way to hit the same issue(deactivate the SPM host while it has running tasks) and see the fix at the official build 4.3.4.2.
Can you connect to my env 'storage-ge-04.scl.lab.tlv.redhat.com' and try it there or tell me how to do it via a scenario or change in the VDSM code somehow?

Comment 5 Ahmad Khiet 2019-06-04 06:52:42 UTC
(In reply to Avihai from comment #4)
> (In reply to Ahmad Khiet from comment #3)
> > Created attachment 1576299 [details]
> > screenshot for the change
> > 
> > screenshot for the change
> 
> Unfortunately, I can't verify it according to your verification on your dev
> env.
> Either you verify this bug - which is fine by me :) 
> 
> For QE to verify this bug we need a way to hit the same issue(deactivate the
> SPM host while it has running tasks) and see the fix at the official build
> 4.3.4.2.
> Can you connect to my env 'storage-ge-04.scl.lab.tlv.redhat.com' and try it
> there or tell me how to do it via a scenario or change in the VDSM code
> somehow?

This bug reported by QE, and I tried to reproduce it as usual, but the bug did not re-produce and forced it to enter a branch to get this result.
I tried to reproduce within the last several days and before but I have no idea how to get it.

what do you think?

Comment 6 Avihai 2019-06-04 08:23:11 UTC
(In reply to Ahmad Khiet from comment #5)
> (In reply to Avihai from comment #4)
> > (In reply to Ahmad Khiet from comment #3)
> > > Created attachment 1576299 [details]
> > > screenshot for the change
> > > 
> > > screenshot for the change
> > 
> > Unfortunately, I can't verify it according to your verification on your dev
> > env.
> > Either you verify this bug - which is fine by me :) 
> > 
> > For QE to verify this bug we need a way to hit the same issue(deactivate the
> > SPM host while it has running tasks) and see the fix at the official build
> > 4.3.4.2.
> > Can you connect to my env 'storage-ge-04.scl.lab.tlv.redhat.com' and try it
> > there or tell me how to do it via a scenario or change in the VDSM code
> > somehow?
> 
> This bug reported by QE, and I tried to reproduce it as usual, but the bug
> did not re-produce and forced it to enter a branch to get this result.
Yes, it was reported by Elad running a random test suit so steps are unknown.
Also, manual/automation efforts did not yield a reproduction so verification assurance is low.

> I tried to reproduce within the last several days and before but I have no
> idea how to get it.
> 
> what do you think?
As we discussed, verifying this as you did before but instead of master build use a 4.3.4.2 DS build is the closest thing to verification in this case.
Please move it to verify once you're done.

Comment 8 Sandro Bonazzola 2019-06-20 11:48:02 UTC
This bugzilla is included in oVirt 4.3.4 release, published on June 11th 2019.

Since the problem described in this bug report should be
resolved in oVirt 4.3.4 release, it has been closed with a resolution of CURRENT RELEASE.

If the solution does not work for you, please open a new bug report.