Bug 871768

Summary: power management: Fence Host fails if something went wrong in FenceQuietTimeBetweenOperationsInSec window [180seconds]
Product: Red Hat Enterprise Virtualization Manager Reporter: Tareq Alayan <talayan>
Component: ovirt-engineAssignee: Eli Mesika <emesika>
Status: CLOSED CURRENTRELEASE QA Contact: Tareq Alayan <talayan>
Severity: high Docs Contact:
Priority: high    
Version: 3.1.0CC: bazulay, cpelland, dyasny, hateya, iheim, lpeer, mkenneth, oramraz, Rhev-m-bugs, talayan, yeylon, ykaul
Target Milestone: ---Keywords: ZStream
Target Release: 3.2.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard: infra
Fixed In Version: sf1 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
: 879719 (view as bug list) Environment:
Last Closed: Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: Infra RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 879719, 915537    
Attachments:
Description Flags
engin.log none

Description Tareq Alayan 2012-10-31 10:55:09 UTC
Description of problem:
The problem is that we will stay with unresponsive host forever unless manually restarted.

Version-Release number of selected component (if applicable):
si22.1

Steps to Reproduce:
1. Assume you have 2 hosts aqua1, aqua2
2. Restart aqua1 via power management [Result: aqua1 is rebooted and up again within 90sec]
3. VDSMD on aqua1 crashed or stopped. [Result: aqua2 will send pmCommand reboot to aqua1]
The reboot attempt will fail because 180sec didn't pass yet [FenceQuietTimeBetweenOperationsInSec=180sec]
  
Actual results:
aqua1 is unresposive and vdsmd is down

Expected results:
Consider to send 2nd or 3rd reboot attempt to make sure the other host is up

Comment 1 Tareq Alayan 2012-10-31 10:59:54 UTC
Created attachment 636019 [details]
engin.log

Comment 2 Eli Mesika 2012-11-13 12:01:15 UTC
http://gerrit.ovirt.org/#/c/9211/1

Comment 3 Eli Mesika 2012-11-20 09:37:58 UTC
fixed at commit : cb564a3

Comment 5 Tareq Alayan 2013-01-08 14:23:57 UTC
verified.
reboot is done after vdsmd is down in the 180 sec window.

Comment 6 Itamar Heim 2013-06-11 08:45:30 UTC
3.2 has been released

Comment 7 Itamar Heim 2013-06-11 08:45:30 UTC
3.2 has been released

Comment 8 Itamar Heim 2013-06-11 08:45:34 UTC
3.2 has been released

Comment 9 Itamar Heim 2013-06-11 08:51:36 UTC
3.2 has been released

Comment 10 Itamar Heim 2013-06-11 09:22:33 UTC
3.2 has been released