Bugzilla will be upgraded to version 5.0. The upgrade date is tentatively scheduled for 2 December 2018, pending final testing and feedback.
Bug 1520424 - [RFE] Fence hosts which became NonResponsive right after engine startup
[RFE] Fence hosts which became NonResponsive right after engine startup
Status: CLOSED CURRENTRELEASE
Product: ovirt-engine
Classification: oVirt
Component: BLL.Infra (Show other bugs)
---
Unspecified Unspecified
unspecified Severity unspecified (vote)
: ovirt-4.2.2
: 4.2.2
Assigned To: Eli Mesika
Petr Matyáš
: FutureFeature
Depends On:
Blocks: 1506217 1549899
  Show dependency treegraph
 
Reported: 2017-12-04 08:13 EST by Martin Perina
Modified: 2018-07-31 13:13 EDT (History)
7 users (show)

See Also:
Fixed In Version: ovirt-engine-4.2.2
Doc Type: Enhancement
Doc Text:
After starting up, the Manager will automatically attempt to fence unresponsive hosts that have power management enabled after the configurable quiet time (5 minutes by default) has elapsed. Previously the user needed to fence them manually.
Story Points: ---
Clone Of:
Environment:
Last Closed: 2018-03-29 07:16:36 EDT
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: Infra
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---
mperina: ovirt‑4.2?
pmatyas: testing_plan_complete-
mperina: planning_ack?
mperina: devel_ack+
pstehlik: testing_ack+


Attachments (Terms of Use)


External Trackers
Tracker ID Priority Status Summary Last Updated
oVirt gerrit 87182 master MERGED core:restart non responding hosts after quite time 2018-02-13 07:52 EST
oVirt gerrit 87564 ovirt-engine-4.2 MERGED core:restart non responding hosts after quite time 2018-02-13 09:16 EST

  None (edit)
Description Martin Perina 2017-12-04 08:13:50 EST
Fencing is disabled within 5 minutes interval from engine startup (interval can be changed using engine-config option DisableFenceAtStartupInSec). If some host become NonResponsive during that interval, it will not be fenced automatically and administrators are required to fence it manually (audit log error message is displayed for that) or the host needs to become responsive again by itself.

The DisableFenceAtStartupInSec option exists from 3.1 to prevent fencing storms after whole data center outage, because hosts are usually booting much longer than engine, so we need to give them time to recover and not fence them during booting up.

Unfortunately this option doesn't work well with hosted engine, especially with scenario described in [1].

To solve this issue we will schedule a job to start after DisableFenceAtStartupInSec interval is over and which will execute fencing on all NonResponsive hosts.

[1] https://bugzilla.redhat.com/show_bug.cgi?id=1506217#c4
Comment 1 Yaniv Kaul 2017-12-05 00:39:12 EST
We could easily change the default on hosted engine?
Comment 2 Martin Perina 2017-12-05 03:53:13 EST
(In reply to Yaniv Kaul from comment #1)
> We could easily change the default on hosted engine?

What do you mean by that? Enable that feature only on hosted engine? If so then yes, we could introduce an option do enable/disable that feature, so HE setup can change the default if needed
Comment 3 Petr Matyáš 2018-02-19 11:04:23 EST
Verified on ovirt-engine-4.2.2-0.1.el7.noarch

Non responsive hosts are fenced after grace period after engine startup sequence.
Comment 4 Sandro Bonazzola 2018-03-29 07:16:36 EDT
This bugzilla is included in oVirt 4.2.2 release, published on March 28th 2018.

Since the problem described in this bug report should be
resolved in oVirt 4.2.2 release, it has been closed with a resolution of CURRENT RELEASE.

If the solution does not work for you, please open a new bug report.

Note You need to log in before you can comment on or make changes to this bug.