Bug 1030441 - Handle crash of both ha services: agent and broker.
Handle crash of both ha services: agent and broker.
Status: CLOSED ERRATA
Product: Red Hat Enterprise Virtualization Manager
Classification: Red Hat
Component: ovirt-hosted-engine-ha (Show other bugs)
unspecified
Unspecified Unspecified
unspecified Severity high
: ovirt-3.6.1
: 3.6.1
Assigned To: Martin Sivák
Artyom
: Triaged
: 1275606 (view as bug list)
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2013-11-14 08:03 EST by Leonid Natapov
Modified: 2016-03-09 14:48 EST (History)
12 users (show)

See Also:
Fixed In Version:
Doc Type: Enhancement
Doc Text:
With this update, systemd is configured to restart the HA services(ovirt-ha-agent and ovirt-ha-broker) in case the services crash. The HA services are part of the high availability solution for the Manager virtual machine and must be highly available themselves.
Story Points: ---
Clone Of:
Environment:
Last Closed: 2016-03-09 14:48:13 EST
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: SLA
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)


External Trackers
Tracker ID Priority Status Summary Last Updated
oVirt gerrit 47212 ovirt-hosted-engine-ha-1.3 MERGED Restart hosted engine services after crash Never
oVirt gerrit 47227 ovirt-hosted-engine-ha-1.3 MERGED Restart hosted engine services after crash Never
oVirt gerrit 48270 master MERGED Restart hosted engine services after unclean shutdown too Never

  None (edit)
Description Leonid Natapov 2013-11-14 08:03:34 EST
Currently we don't really handle the situation when both services (agent and broker) crashed. We should handle such scenario.
Comment 1 Doron Fediuck 2013-11-14 11:45:25 EST
HA wise, the VM will keep running so it should be fine.
What we need is a way to improve it and notify the user / admin.
Comment 2 Itamar Heim 2013-11-15 01:09:36 EST
the services don't have/need a watchdog?
Comment 4 Greg Padgett 2013-12-03 10:30:04 EST
(In reply to Itamar Heim from comment #2)
> the services don't have/need a watchdog?

Probably need one, and don't have one yet.  Using watchdog.d and/or systemd could fill in some gaps.  We'd then need notifications, which I think we can leverage the broker's notification system for (with some self-monitoring).
Comment 6 Martin Sivák 2015-10-06 07:20:16 EDT
This might be fixed by us using systemd now without requiring any code change.
Comment 7 Artyom 2015-10-11 08:52:14 EDT
Description of bug very informative, so how we must handle crash of both services?

Checked on ovirt-hosted-engine-ha-1.3.0-1.el7ev.noarch
1) Finish deployment of hosted-engine
2) Kill both service ovirt-ha-agent and ovirt-ha-broker
3) Wait 5 minutes
4) Services still down
Comment 8 Martin Sivák 2015-10-12 04:15:01 EDT
Artyom: How exactly did you kill those services?
Comment 9 Artyom 2015-10-12 04:39:16 EDT
kill -9 pid_of_ovirt-ha-broker pid_of_ovirt-ha-agent
Comment 10 Sandro Bonazzola 2015-10-26 08:44:20 EDT
this is an automated message. oVirt 3.6.0 RC3 has been released and GA is targeted to next week, Nov 4th 2015.
Please review this bug and if not a blocker, please postpone to a later release.
All bugs not postponed on GA release will be automatically re-targeted to

- 3.6.1 if severity >= high
- 4.0 if severity < high
Comment 11 Martin Sivák 2015-10-26 09:59:18 EDT
This was already merged, however we might have a small issue on centos 7, where the systemd v. 208 does not support on-abnormal. This will be remedied once centos 7.2 is released with new systemd.
Comment 12 Artyom 2015-11-05 06:17:55 EST
I checked it on ovirt-hosted-engine-ha-1.3.2.1-1.el7ev.noarch
Problem still exist
Comment 13 Martin Sivák 2015-11-18 04:50:36 EST
*** Bug 1275606 has been marked as a duplicate of this bug. ***
Comment 14 Artyom 2015-11-26 10:37:05 EST
Verifie on ovirt-hosted-engine-ha-1.3.3-1.el7ev.noarch
1) pkill -9 ovirt-ha-agent ovirt-ha-broker
2) check services after minute, both services up
Comment 16 errata-xmlrpc 2016-03-09 14:48:13 EST
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHEA-2016-0422.html

Note You need to log in before you can comment on or make changes to this bug.