Bug 1519497 - downloadImage task was finished but not cleared, thus preventing moving the SPM to maintenance
Summary: downloadImage task was finished but not cleared, thus preventing moving the S...
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: ovirt-engine
Classification: oVirt
Component: BLL.Storage
Version: 4.2.0
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: ovirt-4.2.2
: 4.2.2.2
Assignee: shani
QA Contact: Kevin Alon Goldblatt
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2017-11-30 19:00 UTC by Raz Tamir
Modified: 2018-03-29 10:59 UTC (History)
9 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2018-03-29 10:59:55 UTC
oVirt Team: Storage
Embargoed:
rule-engine: ovirt-4.2+
rule-engine: blocker+


Attachments (Terms of Use)
engine and vdsm logs (2.74 MB, application/x-gzip)
2017-11-30 19:00 UTC, Raz Tamir
no flags Details

Description Raz Tamir 2017-11-30 19:00:38 UTC
Created attachment 1361117 [details]
engine and vdsm logs

Description of problem:

Moving the SPM to maintenance fails with (engine.log):

2017-11-30 20:50:14,119+02 INFO  [org.ovirt.engine.core.vdsbroker.SetVdsStatusVDSCommand] (default task-18) [f0565c72-a6e4-4406-b6c3-630e921e286e] START, SetVdsStatusVDSCommand(HostName = host_mixed_2, SetVdsStatusVDSCommandParameters:{hostId='93b9001c-7c1a-4db1-86bc-a676ceaebcdb', status='PreparingForMaintenance', nonOperationalReason='NONE', stopSpmFailureLogged='true', maintenanceReason='null'}), log id: 76a9336b
2017-11-30 20:50:14,120+02 INFO  [org.ovirt.engine.core.vdsbroker.SetVdsStatusVDSCommand] (default task-18) [f0565c72-a6e4-4406-b6c3-630e921e286e] VDS 'host_mixed_2' is spm and moved from up calling resetIrs.
2017-11-30 20:50:14,126+02 INFO  [org.ovirt.engine.core.vdsbroker.irsbroker.ResetIrsVDSCommand] (default task-18) [f0565c72-a6e4-4406-b6c3-630e921e286e] START, ResetIrsVDSCommand( ResetIrsVDSCommandParameters:{storagePoolId='4eabbc3f-aae9-404f-bde3-262d505fa6d0', ignoreFailoverLimit='false', vdsId='93b9001c-7c1a-4db1-86bc-a676ceaebcdb', ignoreStopFailed='false'}), log id: 3df0464a
2017-11-30 20:50:14,136+02 INFO  [org.ovirt.engine.core.vdsbroker.vdsbroker.SpmStopVDSCommand] (default task-18) [f0565c72-a6e4-4406-b6c3-630e921e286e] START, SpmStopVDSCommand(HostName = host_mixed_2, SpmStopVDSCommandParameters:{hostId='93b9001c-7c1a-4db1-86bc-a676ceaebcdb', storagePoolId='4eabbc3f-aae9-404f-bde3-262d505fa6d0'}), log id: 46461564
2017-11-30 20:50:14,179+02 INFO  [org.ovirt.engine.core.vdsbroker.vdsbroker.SpmStopVDSCommand] (default task-18) [f0565c72-a6e4-4406-b6c3-630e921e286e] SpmStopVDSCommand::Not stopping SPM on vds 'host_mixed_2', pool id '4eabbc3f-aae9-404f-bde3-262d505fa6d0' as there are uncleared tasks
2017-11-30 20:50:14,179+02 INFO  [org.ovirt.engine.core.vdsbroker.vdsbroker.SpmStopVDSCommand] (default task-18) [f0565c72-a6e4-4406-b6c3-630e921e286e] FINISH, SpmStopVDSCommand, log id: 46461564
2017-11-30 20:50:14,179+02 INFO  [org.ovirt.engine.core.vdsbroker.irsbroker.ResetIrsVDSCommand] (default task-18) [f0565c72-a6e4-4406-b6c3-630e921e286e] FINISH, ResetIrsVDSCommand, log id: 3df0464a
2017-11-30 20:50:14,189+02 WARN  [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (default task-18) [f0565c72-a6e4-4406-b6c3-630e921e286e] EVENT_ID: VDS_STATUS_CHANGE_FAILED_DUE_TO_STOP_SPM_FAILURE(27), Failed to change status of host host_mixed_2 due to a failure to stop the spm.
2017-11-30 20:50:14,190+02 INFO  [org.ovirt.engine.core.vdsbroker.SetVdsStatusVDSCommand] (default task-18) [f0565c72-a6e4-4406-b6c3-630e921e286e] FINISH, SetVdsStatusVDSCommand, log id: 76a9336b


** Nothing in relevant vdsm.log (attached)

Version-Release number of selected component (if applicable):
rhvm-4.2.0-0.5.master.el7
vdsm-4.20.8-1.el7ev.x86_64

How reproducible:
100%

Steps to Reproduce:
1. Move SPM to maintenance
2.
3.

Actual results:
No ERROR in UI but the SPM stays UP

Expected results:
SPM should move to maintenance and new SPM should be elected

Additional info:

Comment 1 Allon Mureinik 2017-12-03 17:37:14 UTC
Something here doesn't add up.

Ala - can you please take a look?
Raz - can we get the same automation run in DEBUG level please?

Comment 2 Raz Tamir 2017-12-03 18:32:29 UTC
Allon,

I've managed to reproduce it before changing the log to DEBUG level but after running engine-setup (to ensure the change will take place) that includes ovirt-engine service restart this bug is not reproducible.

The logs prior restart ovirt-engine:

2017-12-03 20:21:27,763+02 INFO  [org.ovirt.engine.core.bll.MaintenanceNumberOfVdssCommand] (default task-16) [c3e4577c-f333-4b6a-9a45-c4a6f029d250] Running command: MaintenanceNumberOfVdssCommand internal: false. 
Entities affected :  ID: 93b9001c-7c1a-4db1-86bc-a676ceaebcdb Type: VDSAction group MANIPULATE_HOST with role type ADMIN
2017-12-03 20:21:27,773+02 INFO  [org.ovirt.engine.core.vdsbroker.SetVdsStatusVDSCommand] (default task-16) [c3e4577c-f333-4b6a-9a45-c4a6f029d250] START, SetVdsStatusVDSCommand(HostName = host_mixed_2, SetVdsStatus
VDSCommandParameters:{hostId='93b9001c-7c1a-4db1-86bc-a676ceaebcdb', status='PreparingForMaintenance', nonOperationalReason='NONE', stopSpmFailureLogged='true', maintenanceReason='null'}), log id: 1f488a1
2017-12-03 20:21:27,773+02 INFO  [org.ovirt.engine.core.vdsbroker.SetVdsStatusVDSCommand] (default task-16) [c3e4577c-f333-4b6a-9a45-c4a6f029d250] VDS 'host_mixed_2' is spm and moved from up calling resetIrs.
2017-12-03 20:21:27,779+02 INFO  [org.ovirt.engine.core.vdsbroker.irsbroker.ResetIrsVDSCommand] (default task-16) [c3e4577c-f333-4b6a-9a45-c4a6f029d250] START, ResetIrsVDSCommand( ResetIrsVDSCommandParameters:{stor
agePoolId='4eabbc3f-aae9-404f-bde3-262d505fa6d0', ignoreFailoverLimit='false', vdsId='93b9001c-7c1a-4db1-86bc-a676ceaebcdb', ignoreStopFailed='false'}), log id: 5ca12c21
2017-12-03 20:21:27,791+02 INFO  [org.ovirt.engine.core.vdsbroker.vdsbroker.SpmStopVDSCommand] (default task-16) [c3e4577c-f333-4b6a-9a45-c4a6f029d250] START, SpmStopVDSCommand(HostName = host_mixed_2, SpmStopVDSCo
mmandParameters:{hostId='93b9001c-7c1a-4db1-86bc-a676ceaebcdb', storagePoolId='4eabbc3f-aae9-404f-bde3-262d505fa6d0'}), log id: 2a559f6d
2017-12-03 20:21:27,828+02 INFO  [org.ovirt.engine.core.vdsbroker.vdsbroker.SpmStopVDSCommand] (default task-16) [c3e4577c-f333-4b6a-9a45-c4a6f029d250] SpmStopVDSCommand::Not stopping SPM on vds 'host_mixed_2', poo
l id '4eabbc3f-aae9-404f-bde3-262d505fa6d0' as there are uncleared tasks
2017-12-03 20:21:27,829+02 INFO  [org.ovirt.engine.core.vdsbroker.vdsbroker.SpmStopVDSCommand] (default task-16) [c3e4577c-f333-4b6a-9a45-c4a6f029d250] FINISH, SpmStopVDSCommand, log id: 2a559f6d
2017-12-03 20:21:27,829+02 INFO  [org.ovirt.engine.core.vdsbroker.irsbroker.ResetIrsVDSCommand] (default task-16) [c3e4577c-f333-4b6a-9a45-c4a6f029d250] FINISH, ResetIrsVDSCommand, log id: 5ca12c21
2017-12-03 20:21:27,837+02 WARN  [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (default task-16) [c3e4577c-f333-4b6a-9a45-c4a6f029d250] EVENT_ID: VDS_STATUS_CHANGE_FAILED_DUE_TO_STOP_SPM_FA
ILURE(27), Failed to change status of host host_mixed_2 due to a failure to stop the spm.
2017-12-03 20:21:27,837+02 INFO  [org.ovirt.engine.core.vdsbroker.SetVdsStatusVDSCommand] (default task-16) [c3e4577c-f333-4b6a-9a45-c4a6f029d250] FINISH, SetVdsStatusVDSCommand, log id: 1f488a1
2017-12-03 20:21:27,838+02 INFO  [org.ovirt.engine.core.bll.MaintenanceNumberOfVdssCommand] (default task-16) [c3e4577c-f333-4b6a-9a45-c4a6f029d250] Lock freed to object 'EngineLock:{exclusiveLocks='', sharedLocks=
'[4eabbc3f-aae9-404f-bde3-262d505fa6d0=POOL]'}'
2017-12-03 20:23:32,494+02 INFO  [org.ovirt.engine.core.bll.MaintenanceNumberOfVdssCommand] (default task-31) [1e7f19f9-800c-45f1-954c-88e9a4ad5348] Lock Acquired to object 'EngineLock:{exclusiveLocks='', sharedLoc
ks='[4eabbc3f-aae9-404f-bde3-262d505fa6d0=POOL]'}'
2017-12-03 20:23:32,651+02 INFO  [org.ovirt.engine.core.bll.MaintenanceNumberOfVdssCommand] (default task-31) [1e7f19f9-800c-45f1-954c-88e9a4ad5348] Running command: MaintenanceNumberOfVdssCommand internal: false. 
Entities affected :  ID: 93b9001c-7c1a-4db1-86bc-a676ceaebcdb Type: VDSAction group MANIPULATE_HOST with role type ADMIN
2017-12-03 20:23:32,661+02 INFO  [org.ovirt.engine.core.vdsbroker.SetVdsStatusVDSCommand] (default task-31) [1e7f19f9-800c-45f1-954c-88e9a4ad5348] START, SetVdsStatusVDSCommand(HostName = host_mixed_2, SetVdsStatus
VDSCommandParameters:{hostId='93b9001c-7c1a-4db1-86bc-a676ceaebcdb', status='PreparingForMaintenance', nonOperationalReason='NONE', stopSpmFailureLogged='true', maintenanceReason='null'}), log id: 7da9246a
2017-12-03 20:23:32,661+02 INFO  [org.ovirt.engine.core.vdsbroker.SetVdsStatusVDSCommand] (default task-31) [1e7f19f9-800c-45f1-954c-88e9a4ad5348] VDS 'host_mixed_2' is spm and moved from up calling resetIrs.
2017-12-03 20:23:32,667+02 INFO  [org.ovirt.engine.core.vdsbroker.irsbroker.ResetIrsVDSCommand] (default task-31) [1e7f19f9-800c-45f1-954c-88e9a4ad5348] START, ResetIrsVDSCommand( ResetIrsVDSCommandParameters:{stor
agePoolId='4eabbc3f-aae9-404f-bde3-262d505fa6d0', ignoreFailoverLimit='false', vdsId='93b9001c-7c1a-4db1-86bc-a676ceaebcdb', ignoreStopFailed='false'}), log id: 615ab2c3
2017-12-03 20:23:32,678+02 INFO  [org.ovirt.engine.core.vdsbroker.vdsbroker.SpmStopVDSCommand] (default task-31) [1e7f19f9-800c-45f1-954c-88e9a4ad5348] START, SpmStopVDSCommand(HostName = host_mixed_2, SpmStopVDSCo
mmandParameters:{hostId='93b9001c-7c1a-4db1-86bc-a676ceaebcdb', storagePoolId='4eabbc3f-aae9-404f-bde3-262d505fa6d0'}), log id: 3961a20a
2017-12-03 20:23:32,712+02 INFO  [org.ovirt.engine.core.vdsbroker.vdsbroker.SpmStopVDSCommand] (default task-31) [1e7f19f9-800c-45f1-954c-88e9a4ad5348] SpmStopVDSCommand::Not stopping SPM on vds 'host_mixed_2', poo
l id '4eabbc3f-aae9-404f-bde3-262d505fa6d0' as there are uncleared tasks
2017-12-03 20:23:32,712+02 INFO  [org.ovirt.engine.core.vdsbroker.vdsbroker.SpmStopVDSCommand] (default task-31) [1e7f19f9-800c-45f1-954c-88e9a4ad5348] FINISH, SpmStopVDSCommand, log id: 3961a20a
2017-12-03 20:23:32,712+02 INFO  [org.ovirt.engine.core.vdsbroker.irsbroker.ResetIrsVDSCommand] (default task-31) [1e7f19f9-800c-45f1-954c-88e9a4ad5348] FINISH, ResetIrsVDSCommand, log id: 615ab2c3
2017-12-03 20:23:32,724+02 WARN  [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (default task-31) [1e7f19f9-800c-45f1-954c-88e9a4ad5348] EVENT_ID: VDS_STATUS_CHANGE_FAILED_DUE_TO_STOP_SPM_FAILURE(27), Failed to change status of host host_mixed_2 due to a failure to stop the spm.


* Lowering severity for now

Comment 3 Raz Tamir 2017-12-05 16:59:25 UTC
Seen in our automation many times (raising severity).

Comment 4 Ala Hino 2017-12-06 08:20:51 UTC
The reason for failing to stop the SPM is tasks that are not cleared yet:

2017-11-30 16:35:51,637+02 INFO  [org.ovirt.engine.core.vdsbroker.vdsbroker.SpmStopVDSCommand] (default task-20) [hosts_syncAction_32e8cdc8-2f8f-45d0] SpmStopVDSCommand::Not stopping SPM on vds 'host_mixed_2', pool id '4eabbc3f-aae9-404f-bde3-262d505fa6d0' as there are uncleared tasks

Cannot tell why this happens now, and what the tasks that fail are.
Maybe add some sleep time before stopping the SPM?

Comment 5 Raz Tamir 2017-12-06 10:04:32 UTC
We do wait for the tasks to finish but this is not the case.
In case there are running tasks I should get CDA explaining that there are running tasks ...
In this bug, the issue is that the SPM can't move to maintenance - never unless I restart the ovirt-engine service

Comment 6 Raz Tamir 2017-12-17 09:49:18 UTC
Ala,

I see that there is a task that is not cleared:

[root@storage-ge10-vdsm1 ~]# vdsm-client Host getAllTasksInfo
{
    "52178810-4d00-4967-9361-a00793145164": {
        "verb": "downloadImage", 
        "id": "52178810-4d00-4967-9361-a00793145164"
    }
}
[root@storage-ge10-vdsm1 ~]# vdsm-client Task getInfo taskID=52178810-4d00-4967-9361-a00793145164
{
    "verb": "downloadImage", 
    "id": "52178810-4d00-4967-9361-a00793145164"
}
[root@storage-ge10-vdsm1 ~]# vdsm-client Task getStatus taskID=52178810-4d00-4967-9361-a00793145164
{
    "message": "1 jobs completed successfully", 
    "code": 0, 
    "taskID": "52178810-4d00-4967-9361-a00793145164", 
    "taskResult": "success", 
    "taskState": "finished"
}

Maybe this is the issue - task/s are not cleared?

Comment 7 Raz Tamir 2017-12-17 09:52:58 UTC
I cleared this task and indeed, the host now successfully moved to maintenance

Comment 8 Ala Hino 2017-12-17 12:27:01 UTC
The uncleared task is for sure the reason for failing to move the SPM to maintenance.

Can you please describe the test case that fails? Mostly I am mainly interested to see what's done before and after moving the SPM to maintenance. In addition, how do you check that moving to maintenance passed or failed?

Comment 9 Raz Tamir 2017-12-17 16:16:26 UTC
(In reply to Ala Hino from comment #8)
> The uncleared task is for sure the reason for failing to move the SPM to
> maintenance.
> 
> Can you please describe the test case that fails? Mostly I am mainly
> interested to see what's done before and after moving the SPM to
> maintenance. In addition, how do you check that moving to maintenance passed
> or failed?

Not sure if some case has failed - the task finished successfully but not cleared.
The environment was idle for the weekend and I just log into it and tried to move the SPM to maintenance so nothing unusual before.
In UI you'll see nothing unless you'll check the events tab and in our automation, we just wait for the SPM status to return False for the SPM we just put into maintenance.

Comment 10 Ala Hino 2018-01-09 12:16:04 UTC
Changing the title of the bug to better describe the issue.

Comment 11 shani 2018-02-22 10:17:04 UTC
I've tried to reproduce this bug (by trying to upload some glance images, also as templates), but tasks seem to be cleaned after execution.
 
We aware of some issues CoCo had on 4.2.2, and maybe as a result of Ravi's fixes this bug was fixed.

Following Elad's suggestion, bug's status has changed.

Comment 15 Sandro Bonazzola 2018-03-29 10:59:55 UTC
This bugzilla is included in oVirt 4.2.2 release, published on March 28th 2018.

Since the problem described in this bug report should be
resolved in oVirt 4.2.2 release, it has been closed with a resolution of CURRENT RELEASE.

If the solution does not work for you, please open a new bug report.


Note You need to log in before you can comment on or make changes to this bug.