Created attachment 939974 [details] Engine and VDSM logs Description of problem: Handling non responsive Host task doesn't complete when SPM host network goes down or system is rebooted Version-Release number of selected component (if applicable): 3.5 vt3.1 How reproducible: 100% Steps to Reproduce: 1. On a Data Center with one or more hosts and at least one Storage domain defined, reboot the SPM host or bring down its rhevm network 2. Check the Tasks pane for tasks related to handling when hosts are unresponsive Actual results: The Handling non responsive Host <hostName> task never completes Expected results: The task should complete (in this case it sounds like with a failure) Additional info: The DB shows these tasks never complete: engine=# SELECT action_type,description, status,start_time,end_time from job; action_type | description | status | start_time | end_time ---------------------------+--------------------------------------------------------------------+---------+----------------------------+---------------------------- VdsNotRespondingTreatment | Handling non responsive Host gold-vdsd.qa.lab.tlv.redhat.com | STARTED | 2014-09-22 11:02:30.889+03 | VdsNotRespondingTreatment | Handling non responsive Host gold-vdsd.qa.lab.tlv.redhat.com | STARTED | 2014-09-22 10:24:57.99+03 | VdsNotRespondingTreatment | Handling non responsive Host gold-vdsd.qa.lab.tlv.redhat.com | FAILED | 2014-09-22 10:43:27.282+03 | 2014-09-22 10:43:27.317+03 SshSoftFencing | Executing SSH Soft Fencing on host gold-vdsc.qa.lab.tlv.redhat.com | FAILED | 2014-09-22 10:51:22.713+03 | 2014-09-22 10:52:25.807+03 SshSoftFencing | Executing SSH Soft Fencing on host gold-vdsd.qa.lab.tlv.redhat.com | FAILED | 2014-09-22 11:01:27.786+03 | 2014-09-22 11:02:30.877+03 VdsNotRespondingTreatment | Handling non responsive Host gold-vdsd.qa.lab.tlv.redhat.com | STARTED | 2014-09-22 10:06:44.817+03 | SshSoftFencing | Executing SSH Soft Fencing on host gold-vdsd.qa.lab.tlv.redhat.com | FAILED | 2014-09-22 10:42:24.157+03 | 2014-09-22 10:43:27.269+03 VdsNotRespondingTreatment | Handling non responsive Host gold-vdsc.qa.lab.tlv.redhat.com | STARTED | 2014-09-22 10:52:25.82+03 | (8 rows)
Created attachment 939975 [details] Tasks not completing
host flow is infra
Patch posted: https://gerrit.ovirt.org/#/c/44136/3 The problem happens when VdsNotRespondingTreatment command invokes SetSpmStatus command. There's a mixup with the execution-context, which results in SetSpmStatus being marked as completed twice, and VdsNotRespondingTreatment never being marked as completed. This was fixed locally, but the problem probably happens for all monitored commands which are invoked by another command. A general fix is required, but a lot of verification is required for that, and that will be done in the future. The fix is to make CommandContext.clone() clone the ExecutionContext too.
Verified on 3.6.2-10