Bug 1520424 - [RFE] Fence hosts which became NonResponsive right after engine startup
Summary: [RFE] Fence hosts which became NonResponsive right after engine startup
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: ovirt-engine
Classification: oVirt
Component: BLL.Infra
Version: ---
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ovirt-4.2.2
: 4.2.2
Assignee: Eli Mesika
QA Contact: Petr Matyáš
URL:
Whiteboard:
Depends On:
Blocks: 1506217 1549899
TreeView+ depends on / blocked
 
Reported: 2017-12-04 13:13 UTC by Martin Perina
Modified: 2022-03-13 15:13 UTC (History)
7 users (show)

Fixed In Version: ovirt-engine-4.2.2
Doc Type: Enhancement
Doc Text:
After starting up, the Manager will automatically attempt to fence unresponsive hosts that have power management enabled after the configurable quiet time (5 minutes by default) has elapsed. Previously the user needed to fence them manually.
Clone Of:
Environment:
Last Closed: 2018-03-29 11:16:36 UTC
oVirt Team: Infra
Embargoed:
mperina: ovirt-4.2?
pmatyas: testing_plan_complete-
mperina: planning_ack?
mperina: devel_ack+
pstehlik: testing_ack+


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Issue Tracker RHV-45187 0 None None None 2022-03-13 15:13:35 UTC
oVirt gerrit 87182 0 master MERGED core:restart non responding hosts after quite time 2018-02-13 12:52:25 UTC
oVirt gerrit 87564 0 ovirt-engine-4.2 MERGED core:restart non responding hosts after quite time 2018-02-13 14:16:14 UTC

Description Martin Perina 2017-12-04 13:13:50 UTC
Fencing is disabled within 5 minutes interval from engine startup (interval can be changed using engine-config option DisableFenceAtStartupInSec). If some host become NonResponsive during that interval, it will not be fenced automatically and administrators are required to fence it manually (audit log error message is displayed for that) or the host needs to become responsive again by itself.

The DisableFenceAtStartupInSec option exists from 3.1 to prevent fencing storms after whole data center outage, because hosts are usually booting much longer than engine, so we need to give them time to recover and not fence them during booting up.

Unfortunately this option doesn't work well with hosted engine, especially with scenario described in [1].

To solve this issue we will schedule a job to start after DisableFenceAtStartupInSec interval is over and which will execute fencing on all NonResponsive hosts.

[1] https://bugzilla.redhat.com/show_bug.cgi?id=1506217#c4

Comment 1 Yaniv Kaul 2017-12-05 05:39:12 UTC
We could easily change the default on hosted engine?

Comment 2 Martin Perina 2017-12-05 08:53:13 UTC
(In reply to Yaniv Kaul from comment #1)
> We could easily change the default on hosted engine?

What do you mean by that? Enable that feature only on hosted engine? If so then yes, we could introduce an option do enable/disable that feature, so HE setup can change the default if needed

Comment 3 Petr Matyáš 2018-02-19 16:04:23 UTC
Verified on ovirt-engine-4.2.2-0.1.el7.noarch

Non responsive hosts are fenced after grace period after engine startup sequence.

Comment 4 Sandro Bonazzola 2018-03-29 11:16:36 UTC
This bugzilla is included in oVirt 4.2.2 release, published on March 28th 2018.

Since the problem described in this bug report should be
resolved in oVirt 4.2.2 release, it has been closed with a resolution of CURRENT RELEASE.

If the solution does not work for you, please open a new bug report.


Note You need to log in before you can comment on or make changes to this bug.