Created attachment 1022112 [details] engine log Description of problem: My env had a single host in a cluster with nfs SD. host was SPM and connected to stroage. The host is running rhel 7 with vdsm-4.16.13-1.el7ev and has not PM agent. At some point host lost connectivity and became non responsive and fence flow was initiated. after SshSoftFencing failed (due to host's non-connectivity) 'VdsNotRespondingTreatment' is initiated with Message: Handling non responsive Host {hostname}. At this point host remained non connective and after several hours I chose to 'confirm manual reboot' to release the host from SPM role and then put it in maintenance. But the 'VdsNotRespondingTreatment' is still showing as STARTED in DB and the message still persists in task list in the ui. After engine restart job's status become UNKNOWN. Version-Release number of selected component (if applicable): rhevm-3.5.1-0.4.el6ev.noarch Actual results: Job gets stuck on STARTED status and message remains in task list Expected results: Either job should change to status FAILED after fencing fails or Finished after manual reboot on the host is confirmed.
*** Bug 1203143 has been marked as a duplicate of this bug. ***
*** Bug 1228992 has been marked as a duplicate of this bug. ***
ovirt-3.6.0-3 release
Please note, i still see this issue in new 3.6.0-3 eninge--> 3.6.0-0.0.master.20150627185750.git6f063c1.el6 Tasks are remain in adding status for example from yesterday. Attaching screen shot.
Created attachment 1044902 [details] screenshot
Liran - is this related to the job/step issue? Eli - can you verify it works with the latest patches?
(In reply to Oved Ourfali from comment #6) > Liran - is this related to the job/step issue? > Eli - can you verify it works with the latest patches? Rebased on master and tested again, works fine
Verified with ovirt-engine-3.6.0-0.0.master.20150627185750.git6f063c1.el6.noarch. steps: 1. Have a host with no PM configured. 2. block connection between host and engine. result: host goes to connecting state and after that to non-responsive state. in DB: job VdsNotRespondingTreatment STARTED -> after several attempts and failures to connect to host VdsNotRespondingTreatment FAILED - as expected. 3. confirm host has been rebooted. 4. put host to maintenance. 5. restore connectivity between host and engine. 6. activate host. result: host is up. Michael - Can you try to reproduce the same flow? it seems we got different results in the same version, maybe you can specify your steps?
Hi Sefi, Eli, Oved Not sure about steps, but i have tasks stuck in the tasks log UI For example Adding new host from July 09..to 3.6.0-0.0.master.20150627185750.git6f063c1.el6  2015-Jul-09, 11:07 Adding new Host puma22.scl.lab.tlv.redhat.com to Cluster mburman_1 the task just stay there and looks like it still trying to resolve. when actually puma22 server is installed with success. Feel free to contact me if you would like to enter my setup.
Liran - please take a look and make sure this is covered with recent master and your recent additions.
Michael - can you send server logs (server.log and engine.log)? I'm adding hosts and it works fine.
Created attachment 1051628 [details] engine logs