Created attachment 1361117 [details] engine and vdsm logs Description of problem: Moving the SPM to maintenance fails with (engine.log): 2017-11-30 20:50:14,119+02 INFO [org.ovirt.engine.core.vdsbroker.SetVdsStatusVDSCommand] (default task-18) [f0565c72-a6e4-4406-b6c3-630e921e286e] START, SetVdsStatusVDSCommand(HostName = host_mixed_2, SetVdsStatusVDSCommandParameters:{hostId='93b9001c-7c1a-4db1-86bc-a676ceaebcdb', status='PreparingForMaintenance', nonOperationalReason='NONE', stopSpmFailureLogged='true', maintenanceReason='null'}), log id: 76a9336b 2017-11-30 20:50:14,120+02 INFO [org.ovirt.engine.core.vdsbroker.SetVdsStatusVDSCommand] (default task-18) [f0565c72-a6e4-4406-b6c3-630e921e286e] VDS 'host_mixed_2' is spm and moved from up calling resetIrs. 2017-11-30 20:50:14,126+02 INFO [org.ovirt.engine.core.vdsbroker.irsbroker.ResetIrsVDSCommand] (default task-18) [f0565c72-a6e4-4406-b6c3-630e921e286e] START, ResetIrsVDSCommand( ResetIrsVDSCommandParameters:{storagePoolId='4eabbc3f-aae9-404f-bde3-262d505fa6d0', ignoreFailoverLimit='false', vdsId='93b9001c-7c1a-4db1-86bc-a676ceaebcdb', ignoreStopFailed='false'}), log id: 3df0464a 2017-11-30 20:50:14,136+02 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.SpmStopVDSCommand] (default task-18) [f0565c72-a6e4-4406-b6c3-630e921e286e] START, SpmStopVDSCommand(HostName = host_mixed_2, SpmStopVDSCommandParameters:{hostId='93b9001c-7c1a-4db1-86bc-a676ceaebcdb', storagePoolId='4eabbc3f-aae9-404f-bde3-262d505fa6d0'}), log id: 46461564 2017-11-30 20:50:14,179+02 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.SpmStopVDSCommand] (default task-18) [f0565c72-a6e4-4406-b6c3-630e921e286e] SpmStopVDSCommand::Not stopping SPM on vds 'host_mixed_2', pool id '4eabbc3f-aae9-404f-bde3-262d505fa6d0' as there are uncleared tasks 2017-11-30 20:50:14,179+02 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.SpmStopVDSCommand] (default task-18) [f0565c72-a6e4-4406-b6c3-630e921e286e] FINISH, SpmStopVDSCommand, log id: 46461564 2017-11-30 20:50:14,179+02 INFO [org.ovirt.engine.core.vdsbroker.irsbroker.ResetIrsVDSCommand] (default task-18) [f0565c72-a6e4-4406-b6c3-630e921e286e] FINISH, ResetIrsVDSCommand, log id: 3df0464a 2017-11-30 20:50:14,189+02 WARN [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (default task-18) [f0565c72-a6e4-4406-b6c3-630e921e286e] EVENT_ID: VDS_STATUS_CHANGE_FAILED_DUE_TO_STOP_SPM_FAILURE(27), Failed to change status of host host_mixed_2 due to a failure to stop the spm. 2017-11-30 20:50:14,190+02 INFO [org.ovirt.engine.core.vdsbroker.SetVdsStatusVDSCommand] (default task-18) [f0565c72-a6e4-4406-b6c3-630e921e286e] FINISH, SetVdsStatusVDSCommand, log id: 76a9336b ** Nothing in relevant vdsm.log (attached) Version-Release number of selected component (if applicable): rhvm-4.2.0-0.5.master.el7 vdsm-4.20.8-1.el7ev.x86_64 How reproducible: 100% Steps to Reproduce: 1. Move SPM to maintenance 2. 3. Actual results: No ERROR in UI but the SPM stays UP Expected results: SPM should move to maintenance and new SPM should be elected Additional info:
Something here doesn't add up. Ala - can you please take a look? Raz - can we get the same automation run in DEBUG level please?
Allon, I've managed to reproduce it before changing the log to DEBUG level but after running engine-setup (to ensure the change will take place) that includes ovirt-engine service restart this bug is not reproducible. The logs prior restart ovirt-engine: 2017-12-03 20:21:27,763+02 INFO [org.ovirt.engine.core.bll.MaintenanceNumberOfVdssCommand] (default task-16) [c3e4577c-f333-4b6a-9a45-c4a6f029d250] Running command: MaintenanceNumberOfVdssCommand internal: false. Entities affected : ID: 93b9001c-7c1a-4db1-86bc-a676ceaebcdb Type: VDSAction group MANIPULATE_HOST with role type ADMIN 2017-12-03 20:21:27,773+02 INFO [org.ovirt.engine.core.vdsbroker.SetVdsStatusVDSCommand] (default task-16) [c3e4577c-f333-4b6a-9a45-c4a6f029d250] START, SetVdsStatusVDSCommand(HostName = host_mixed_2, SetVdsStatus VDSCommandParameters:{hostId='93b9001c-7c1a-4db1-86bc-a676ceaebcdb', status='PreparingForMaintenance', nonOperationalReason='NONE', stopSpmFailureLogged='true', maintenanceReason='null'}), log id: 1f488a1 2017-12-03 20:21:27,773+02 INFO [org.ovirt.engine.core.vdsbroker.SetVdsStatusVDSCommand] (default task-16) [c3e4577c-f333-4b6a-9a45-c4a6f029d250] VDS 'host_mixed_2' is spm and moved from up calling resetIrs. 2017-12-03 20:21:27,779+02 INFO [org.ovirt.engine.core.vdsbroker.irsbroker.ResetIrsVDSCommand] (default task-16) [c3e4577c-f333-4b6a-9a45-c4a6f029d250] START, ResetIrsVDSCommand( ResetIrsVDSCommandParameters:{stor agePoolId='4eabbc3f-aae9-404f-bde3-262d505fa6d0', ignoreFailoverLimit='false', vdsId='93b9001c-7c1a-4db1-86bc-a676ceaebcdb', ignoreStopFailed='false'}), log id: 5ca12c21 2017-12-03 20:21:27,791+02 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.SpmStopVDSCommand] (default task-16) [c3e4577c-f333-4b6a-9a45-c4a6f029d250] START, SpmStopVDSCommand(HostName = host_mixed_2, SpmStopVDSCo mmandParameters:{hostId='93b9001c-7c1a-4db1-86bc-a676ceaebcdb', storagePoolId='4eabbc3f-aae9-404f-bde3-262d505fa6d0'}), log id: 2a559f6d 2017-12-03 20:21:27,828+02 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.SpmStopVDSCommand] (default task-16) [c3e4577c-f333-4b6a-9a45-c4a6f029d250] SpmStopVDSCommand::Not stopping SPM on vds 'host_mixed_2', poo l id '4eabbc3f-aae9-404f-bde3-262d505fa6d0' as there are uncleared tasks 2017-12-03 20:21:27,829+02 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.SpmStopVDSCommand] (default task-16) [c3e4577c-f333-4b6a-9a45-c4a6f029d250] FINISH, SpmStopVDSCommand, log id: 2a559f6d 2017-12-03 20:21:27,829+02 INFO [org.ovirt.engine.core.vdsbroker.irsbroker.ResetIrsVDSCommand] (default task-16) [c3e4577c-f333-4b6a-9a45-c4a6f029d250] FINISH, ResetIrsVDSCommand, log id: 5ca12c21 2017-12-03 20:21:27,837+02 WARN [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (default task-16) [c3e4577c-f333-4b6a-9a45-c4a6f029d250] EVENT_ID: VDS_STATUS_CHANGE_FAILED_DUE_TO_STOP_SPM_FA ILURE(27), Failed to change status of host host_mixed_2 due to a failure to stop the spm. 2017-12-03 20:21:27,837+02 INFO [org.ovirt.engine.core.vdsbroker.SetVdsStatusVDSCommand] (default task-16) [c3e4577c-f333-4b6a-9a45-c4a6f029d250] FINISH, SetVdsStatusVDSCommand, log id: 1f488a1 2017-12-03 20:21:27,838+02 INFO [org.ovirt.engine.core.bll.MaintenanceNumberOfVdssCommand] (default task-16) [c3e4577c-f333-4b6a-9a45-c4a6f029d250] Lock freed to object 'EngineLock:{exclusiveLocks='', sharedLocks= '[4eabbc3f-aae9-404f-bde3-262d505fa6d0=POOL]'}' 2017-12-03 20:23:32,494+02 INFO [org.ovirt.engine.core.bll.MaintenanceNumberOfVdssCommand] (default task-31) [1e7f19f9-800c-45f1-954c-88e9a4ad5348] Lock Acquired to object 'EngineLock:{exclusiveLocks='', sharedLoc ks='[4eabbc3f-aae9-404f-bde3-262d505fa6d0=POOL]'}' 2017-12-03 20:23:32,651+02 INFO [org.ovirt.engine.core.bll.MaintenanceNumberOfVdssCommand] (default task-31) [1e7f19f9-800c-45f1-954c-88e9a4ad5348] Running command: MaintenanceNumberOfVdssCommand internal: false. Entities affected : ID: 93b9001c-7c1a-4db1-86bc-a676ceaebcdb Type: VDSAction group MANIPULATE_HOST with role type ADMIN 2017-12-03 20:23:32,661+02 INFO [org.ovirt.engine.core.vdsbroker.SetVdsStatusVDSCommand] (default task-31) [1e7f19f9-800c-45f1-954c-88e9a4ad5348] START, SetVdsStatusVDSCommand(HostName = host_mixed_2, SetVdsStatus VDSCommandParameters:{hostId='93b9001c-7c1a-4db1-86bc-a676ceaebcdb', status='PreparingForMaintenance', nonOperationalReason='NONE', stopSpmFailureLogged='true', maintenanceReason='null'}), log id: 7da9246a 2017-12-03 20:23:32,661+02 INFO [org.ovirt.engine.core.vdsbroker.SetVdsStatusVDSCommand] (default task-31) [1e7f19f9-800c-45f1-954c-88e9a4ad5348] VDS 'host_mixed_2' is spm and moved from up calling resetIrs. 2017-12-03 20:23:32,667+02 INFO [org.ovirt.engine.core.vdsbroker.irsbroker.ResetIrsVDSCommand] (default task-31) [1e7f19f9-800c-45f1-954c-88e9a4ad5348] START, ResetIrsVDSCommand( ResetIrsVDSCommandParameters:{stor agePoolId='4eabbc3f-aae9-404f-bde3-262d505fa6d0', ignoreFailoverLimit='false', vdsId='93b9001c-7c1a-4db1-86bc-a676ceaebcdb', ignoreStopFailed='false'}), log id: 615ab2c3 2017-12-03 20:23:32,678+02 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.SpmStopVDSCommand] (default task-31) [1e7f19f9-800c-45f1-954c-88e9a4ad5348] START, SpmStopVDSCommand(HostName = host_mixed_2, SpmStopVDSCo mmandParameters:{hostId='93b9001c-7c1a-4db1-86bc-a676ceaebcdb', storagePoolId='4eabbc3f-aae9-404f-bde3-262d505fa6d0'}), log id: 3961a20a 2017-12-03 20:23:32,712+02 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.SpmStopVDSCommand] (default task-31) [1e7f19f9-800c-45f1-954c-88e9a4ad5348] SpmStopVDSCommand::Not stopping SPM on vds 'host_mixed_2', poo l id '4eabbc3f-aae9-404f-bde3-262d505fa6d0' as there are uncleared tasks 2017-12-03 20:23:32,712+02 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.SpmStopVDSCommand] (default task-31) [1e7f19f9-800c-45f1-954c-88e9a4ad5348] FINISH, SpmStopVDSCommand, log id: 3961a20a 2017-12-03 20:23:32,712+02 INFO [org.ovirt.engine.core.vdsbroker.irsbroker.ResetIrsVDSCommand] (default task-31) [1e7f19f9-800c-45f1-954c-88e9a4ad5348] FINISH, ResetIrsVDSCommand, log id: 615ab2c3 2017-12-03 20:23:32,724+02 WARN [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (default task-31) [1e7f19f9-800c-45f1-954c-88e9a4ad5348] EVENT_ID: VDS_STATUS_CHANGE_FAILED_DUE_TO_STOP_SPM_FAILURE(27), Failed to change status of host host_mixed_2 due to a failure to stop the spm. * Lowering severity for now
Seen in our automation many times (raising severity).
The reason for failing to stop the SPM is tasks that are not cleared yet: 2017-11-30 16:35:51,637+02 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.SpmStopVDSCommand] (default task-20) [hosts_syncAction_32e8cdc8-2f8f-45d0] SpmStopVDSCommand::Not stopping SPM on vds 'host_mixed_2', pool id '4eabbc3f-aae9-404f-bde3-262d505fa6d0' as there are uncleared tasks Cannot tell why this happens now, and what the tasks that fail are. Maybe add some sleep time before stopping the SPM?
We do wait for the tasks to finish but this is not the case. In case there are running tasks I should get CDA explaining that there are running tasks ... In this bug, the issue is that the SPM can't move to maintenance - never unless I restart the ovirt-engine service
Ala, I see that there is a task that is not cleared: [root@storage-ge10-vdsm1 ~]# vdsm-client Host getAllTasksInfo { "52178810-4d00-4967-9361-a00793145164": { "verb": "downloadImage", "id": "52178810-4d00-4967-9361-a00793145164" } } [root@storage-ge10-vdsm1 ~]# vdsm-client Task getInfo taskID=52178810-4d00-4967-9361-a00793145164 { "verb": "downloadImage", "id": "52178810-4d00-4967-9361-a00793145164" } [root@storage-ge10-vdsm1 ~]# vdsm-client Task getStatus taskID=52178810-4d00-4967-9361-a00793145164 { "message": "1 jobs completed successfully", "code": 0, "taskID": "52178810-4d00-4967-9361-a00793145164", "taskResult": "success", "taskState": "finished" } Maybe this is the issue - task/s are not cleared?
I cleared this task and indeed, the host now successfully moved to maintenance
The uncleared task is for sure the reason for failing to move the SPM to maintenance. Can you please describe the test case that fails? Mostly I am mainly interested to see what's done before and after moving the SPM to maintenance. In addition, how do you check that moving to maintenance passed or failed?
(In reply to Ala Hino from comment #8) > The uncleared task is for sure the reason for failing to move the SPM to > maintenance. > > Can you please describe the test case that fails? Mostly I am mainly > interested to see what's done before and after moving the SPM to > maintenance. In addition, how do you check that moving to maintenance passed > or failed? Not sure if some case has failed - the task finished successfully but not cleared. The environment was idle for the weekend and I just log into it and tried to move the SPM to maintenance so nothing unusual before. In UI you'll see nothing unless you'll check the events tab and in our automation, we just wait for the SPM status to return False for the SPM we just put into maintenance.
Changing the title of the bug to better describe the issue.
I've tried to reproduce this bug (by trying to upload some glance images, also as templates), but tasks seem to be cleaned after execution. We aware of some issues CoCo had on 4.2.2, and maybe as a result of Ravi's fixes this bug was fixed. Following Elad's suggestion, bug's status has changed.
This bugzilla is included in oVirt 4.2.2 release, published on March 28th 2018. Since the problem described in this bug report should be resolved in oVirt 4.2.2 release, it has been closed with a resolution of CURRENT RELEASE. If the solution does not work for you, please open a new bug report.