Bug 1145099 - Engine never completes task VdsNotRespondingTreatmentCommand (Handling non responsive Host <hostName>) in case of SPM host reboot
Summary: Engine never completes task VdsNotRespondingTreatmentCommand (Handling non re...
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: Red Hat Enterprise Virtualization Manager
Classification: Red Hat
Component: ovirt-engine
Version: 3.5.0
Hardware: Unspecified
OS: Unspecified
low
medium
Target Milestone: ovirt-3.6.0-rc
: 3.6.0
Assignee: Ori Liel
QA Contact: Petr Matyáš
URL:
Whiteboard:
Depends On: 1257610
Blocks:
TreeView+ depends on / blocked
 
Reported: 2014-09-22 11:37 UTC by Gilad Lazarovich
Modified: 2016-06-23 04:52 UTC (History)
11 users (show)

Fixed In Version: 3.6.0-9
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2016-04-20 01:11:49 UTC
oVirt Team: Infra
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
Engine and VDSM logs (979.17 KB, application/x-gzip)
2014-09-22 11:37 UTC, Gilad Lazarovich
no flags Details
Tasks not completing (39.40 KB, image/png)
2014-09-22 11:38 UTC, Gilad Lazarovich
no flags Details


Links
System ID Private Priority Status Summary Last Updated
oVirt gerrit 44136 0 master MERGED engine: Completed task still shown as 'running' in UI Never
oVirt gerrit 44595 0 ovirt-engine-3.6 MERGED engine: Completed task still shown as 'running' in UI Never

Description Gilad Lazarovich 2014-09-22 11:37:27 UTC
Created attachment 939974 [details]
Engine and VDSM logs

Description of problem:
Handling non responsive Host task doesn't complete when SPM host network goes down or system is rebooted

Version-Release number of selected component (if applicable):
3.5 vt3.1

How reproducible:
100%

Steps to Reproduce:
1. On a Data Center with one or more hosts and at least one Storage domain defined, reboot the SPM host or bring down its rhevm network
2. Check the Tasks pane for tasks related to handling when hosts are unresponsive

Actual results:
The Handling non responsive Host <hostName> task never completes

Expected results:
The task should complete (in this case it sounds like with a failure)

Additional info:
The DB shows these tasks never complete:
engine=# SELECT action_type,description, status,start_time,end_time from job;
        action_type        |                            description                             | status  |         start_time         |          end_time          
---------------------------+--------------------------------------------------------------------+---------+----------------------------+----------------------------
 VdsNotRespondingTreatment | Handling non responsive Host gold-vdsd.qa.lab.tlv.redhat.com       | STARTED | 2014-09-22 11:02:30.889+03 | 
 VdsNotRespondingTreatment | Handling non responsive Host gold-vdsd.qa.lab.tlv.redhat.com       | STARTED | 2014-09-22 10:24:57.99+03  | 
 VdsNotRespondingTreatment | Handling non responsive Host gold-vdsd.qa.lab.tlv.redhat.com       | FAILED  | 2014-09-22 10:43:27.282+03 | 2014-09-22 10:43:27.317+03
 SshSoftFencing            | Executing SSH Soft Fencing on host gold-vdsc.qa.lab.tlv.redhat.com | FAILED  | 2014-09-22 10:51:22.713+03 | 2014-09-22 10:52:25.807+03
 SshSoftFencing            | Executing SSH Soft Fencing on host gold-vdsd.qa.lab.tlv.redhat.com | FAILED  | 2014-09-22 11:01:27.786+03 | 2014-09-22 11:02:30.877+03
 VdsNotRespondingTreatment | Handling non responsive Host gold-vdsd.qa.lab.tlv.redhat.com       | STARTED | 2014-09-22 10:06:44.817+03 | 
 SshSoftFencing            | Executing SSH Soft Fencing on host gold-vdsd.qa.lab.tlv.redhat.com | FAILED  | 2014-09-22 10:42:24.157+03 | 2014-09-22 10:43:27.269+03
 VdsNotRespondingTreatment | Handling non responsive Host gold-vdsc.qa.lab.tlv.redhat.com       | STARTED | 2014-09-22 10:52:25.82+03  | 
(8 rows)

Comment 1 Gilad Lazarovich 2014-09-22 11:38:22 UTC
Created attachment 939975 [details]
Tasks not completing

Comment 2 Michal Skrivanek 2015-06-02 09:39:38 UTC
host flow is infra

Comment 3 Ori Liel 2015-07-30 08:06:25 UTC
Patch posted: 

  https://gerrit.ovirt.org/#/c/44136/3

The problem happens when VdsNotRespondingTreatment command invokes SetSpmStatus command. There's a mixup with the execution-context, which results in SetSpmStatus being marked as completed twice, and VdsNotRespondingTreatment never being marked as completed. 

This was fixed locally, but the problem probably happens for all monitored commands which are invoked by another command. A general fix is required, but a lot of verification is required for that, and that will be done in the future. The fix is to make CommandContext.clone() clone the ExecutionContext too.

Comment 5 Petr Matyáš 2016-01-21 12:56:24 UTC
Verified on 3.6.2-10


Note You need to log in before you can comment on or make changes to this bug.