Bug 1175824

Summary: [JSON RPC] shutdown/reboot a host on state 'up' result in fault behaviour which is resolved only by engine restart
Product: Red Hat Enterprise Virtualization Manager Reporter: sefi litmanovich <slitmano>
Component: vdsm-jsonrpc-javaAssignee: Piotr Kliczewski <pkliczew>
Status: CLOSED CURRENTRELEASE QA Contact: sefi litmanovich <slitmano>
Severity: urgent Docs Contact:
Priority: unspecified    
Version: unspecifiedCC: alukiano, amureini, bazulay, cshao, ecohen, gklein, hadong, huiwa, iheim, leiwang, lsurette, oourfali, pstehlik, Rhev-m-bugs, slitmano, smizrahi, yaniwang, ycui
Target Milestone: ---   
Target Release: 3.5.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard: infra
Fixed In Version: vt13.5 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2015-02-17 17:07:05 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: Infra RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1164308, 1164311    
Attachments:
Description Flags
engine log none

Description sefi litmanovich 2014-12-18 16:37:00 UTC
Created attachment 970654 [details]
engine log

Description of problem:

fault behaviour when hosts work with json rpc .

reboot/shutdown a host in 'up' state resulted in few wrong behaviours:

a. rebooting/shutting down a host which is SPM -  host becomes "connecting", I can't 'confirm host has been rebooted' and it's stuck that way until engine restart.

b. rebooting/shutting down a host which is HSM - host stays up regardless to the fact it should be non responsive. only upon engine reboot does this modify.

this behaviour doesn't occur if updating the hosts to use xml rpc and re-installing them.

Version-Release number of selected component (if applicable):

rhevm-3.5.0-0.25.el6ev.noarch
vdsm-jsonrpc-java-1.0.12-1.el6ev.noarch (this is the version from the next release vt13.4 which was patched for testing for bz: https://bugzilla.redhat.com/show_bug.cgi?id=1149832.

vdsm-4.16.8.1-4.el7ev.x86_64 (all the hosts are rh7)


Steps to Reproduce:
1. setup rh7 hosts on 3.5 engine set to work with json.
2. (in my setup I have two hosts on one dc 3rd on another - this doesn't seem crucial)
3. choose the spm host, shut it down manually (make sure no power management is configured) - scenario (a)
4. after all host are back up again - choose an HSM host and do the same - scenario (b)

Actual results:

according to my description above.
in both case only resolution is service ovirt-engine restart.


Expected results:

host becomes non responsive upon shutdown.
switching the host on and choosing 'confirm host has been rebooted' option in the ui to invoke manual fencing flow results in the host back to 'up' state. 


Additional info:

Comment 1 Piotr Kliczewski 2014-12-19 07:51:09 UTC
Can you please retest with newer build?

Comment 3 sefi litmanovich 2014-12-22 10:29:10 UTC
Verified with vt13.4 +  http://gerrit.ovirt.org/#/c/36332/ patch, so practically this will work on next release.

verified according to the same two scenarios as in description.
Waiting for you guys to move the bz to Post and then on_qa and I will verify

Comment 4 Piotr Kliczewski 2014-12-22 12:17:31 UTC
*** Bug 1176527 has been marked as a duplicate of this bug. ***

Comment 5 sefi litmanovich 2014-12-31 12:28:49 UTC
Verified with rhevm-3.5.0-0.27.el6ev.noarch according to scenario in description.

on hosts:
vdsm- vdsm-4.16.8.1-4.el7ev.x86_64.

Comment 6 Eyal Edri 2015-02-17 17:07:05 UTC
rhev 3.5.0 was released. closing.