Bug 1218548

Summary: VdsNotRespondingTreatment Job remains in status STARTED even after Manual fencing the host
Product: Red Hat Enterprise Virtualization Manager Reporter: sefi litmanovich <slitmano>
Component: ovirt-engineAssignee: Eli Mesika <emesika>
Status: CLOSED CURRENTRELEASE QA Contact: sefi litmanovich <slitmano>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 3.5.1CC: bazulay, daniel.helgenberger, gklein, lpeer, lsurette, mburman, mshira, oourfali, rbalakri, Rhev-m-bugs, srevivo, ykaul, ylavi
Target Milestone: ovirt-3.6.0-rc   
Target Release: 3.6.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2016-04-20 01:10:12 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: Infra RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
engine log
none
screenshot
none
engine logs none

Description sefi litmanovich 2015-05-05 08:57:14 UTC
Created attachment 1022112 [details]
engine log

Description of problem:

My env had a single host in a cluster with nfs SD. host was SPM and connected to stroage.
The host is running rhel 7 with vdsm-4.16.13-1.el7ev and has not PM agent.
At some point host lost connectivity and became non responsive and fence flow was initiated.
after SshSoftFencing failed (due to host's non-connectivity) 'VdsNotRespondingTreatment' is initiated with Message:
Handling non responsive Host {hostname}.

At this point host remained non connective and after several hours I chose to 'confirm manual reboot' to release the host from SPM role and then put it in maintenance.

But the 'VdsNotRespondingTreatment' is still showing as STARTED in DB and the message still persists in task list in the ui.
After engine restart job's status become UNKNOWN.

Version-Release number of selected component (if applicable):

rhevm-3.5.1-0.4.el6ev.noarch


Actual results:

Job gets stuck on STARTED status and message remains in task list

Expected results:

Either job should change to status FAILED after fencing fails or Finished after manual reboot on the host is confirmed.

Comment 1 Eli Mesika 2015-06-17 08:04:54 UTC
*** Bug 1203143 has been marked as a duplicate of this bug. ***

Comment 2 Eli Mesika 2015-06-21 11:12:37 UTC
*** Bug 1228992 has been marked as a duplicate of this bug. ***

Comment 3 Max Kovgan 2015-06-28 14:13:33 UTC
ovirt-3.6.0-3 release

Comment 4 Michael Burman 2015-07-01 06:07:27 UTC
Please note, i still see this issue in new 3.6.0-3 eninge--> 3.6.0-0.0.master.20150627185750.git6f063c1.el6

Tasks are remain in adding status for example from yesterday. 
Attaching screen shot.

Comment 5 Michael Burman 2015-07-01 06:09:00 UTC
Created attachment 1044902 [details]
screenshot

Comment 6 Oved Ourfali 2015-07-01 06:43:19 UTC
Liran - is this related to the job/step issue?
Eli - can you verify it works with the latest patches?

Comment 7 Eli Mesika 2015-07-01 09:11:09 UTC
(In reply to Oved Ourfali from comment #6)
> Liran - is this related to the job/step issue?
> Eli - can you verify it works with the latest patches?

Rebased on master and tested again, works fine

Comment 8 sefi litmanovich 2015-07-13 16:03:07 UTC
Verified with ovirt-engine-3.6.0-0.0.master.20150627185750.git6f063c1.el6.noarch.

steps:

1. Have a host with no PM configured.
2. block connection between host and engine.

result: host goes to connecting state and after that to non-responsive state.
in DB: job VdsNotRespondingTreatment STARTED -> after several attempts and failures to connect to host VdsNotRespondingTreatment FAILED - as expected.

3. confirm host has been rebooted.
4. put host to maintenance.
5. restore connectivity between host and engine.
6. activate host.

result: host is up.

Michael - Can you try to reproduce the same flow? it seems we got different results in the same version, maybe you can specify your steps?

Comment 9 Michael Burman 2015-07-14 05:12:29 UTC
Hi Sefi, Eli, Oved

Not sure about steps, but i have tasks stuck in the tasks log UI 
For example Adding new host from July 09..to 3.6.0-0.0.master.20150627185750.git6f063c1.el6

	
2015-Jul-09, 11:07 Adding new Host puma22.scl.lab.tlv.redhat.com to Cluster mburman_1

the task just stay there and looks like it still trying to resolve. 
when actually puma22 server is installed with success.

Feel free to contact me if you would like to enter my setup.

Comment 10 Oved Ourfali 2015-07-14 05:34:51 UTC
Liran - please take a look and make sure this is covered with recent master and your recent additions.

Comment 11 Liran Zelkha 2015-07-14 05:50:51 UTC
Michael - can you send server logs (server.log and engine.log)? 
I'm adding hosts and it works fine.

Comment 12 Michael Burman 2015-07-14 06:32:23 UTC
Created attachment 1051628 [details]
engine logs