Bug 1030441
Summary: | Handle crash of both ha services: agent and broker. | ||
---|---|---|---|
Product: | Red Hat Enterprise Virtualization Manager | Reporter: | Leonid Natapov <lnatapov> |
Component: | ovirt-hosted-engine-ha | Assignee: | Martin Sivák <msivak> |
Status: | CLOSED ERRATA | QA Contact: | Artyom <alukiano> |
Severity: | high | Docs Contact: | |
Priority: | unspecified | ||
Version: | unspecified | CC: | alukiano, daniel.helgenberger, dfediuck, ebenahar, gpadgett, juwu, mavital, msivak, rgolan, sbonazzo, scohen, sherold |
Target Milestone: | ovirt-3.6.1 | Keywords: | Triaged |
Target Release: | 3.6.1 | ||
Hardware: | Unspecified | ||
OS: | Unspecified | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | Enhancement | |
Doc Text: |
With this update, systemd is configured to restart the HA services(ovirt-ha-agent and ovirt-ha-broker) in case the services crash. The HA services are part of the high availability solution for the Manager virtual machine and must be highly available themselves.
|
Story Points: | --- |
Clone Of: | Environment: | ||
Last Closed: | 2016-03-09 19:48:13 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | SLA | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
Leonid Natapov
2013-11-14 13:03:34 UTC
HA wise, the VM will keep running so it should be fine. What we need is a way to improve it and notify the user / admin. the services don't have/need a watchdog? (In reply to Itamar Heim from comment #2) > the services don't have/need a watchdog? Probably need one, and don't have one yet. Using watchdog.d and/or systemd could fill in some gaps. We'd then need notifications, which I think we can leverage the broker's notification system for (with some self-monitoring). This might be fixed by us using systemd now without requiring any code change. Description of bug very informative, so how we must handle crash of both services? Checked on ovirt-hosted-engine-ha-1.3.0-1.el7ev.noarch 1) Finish deployment of hosted-engine 2) Kill both service ovirt-ha-agent and ovirt-ha-broker 3) Wait 5 minutes 4) Services still down Artyom: How exactly did you kill those services? kill -9 pid_of_ovirt-ha-broker pid_of_ovirt-ha-agent this is an automated message. oVirt 3.6.0 RC3 has been released and GA is targeted to next week, Nov 4th 2015. Please review this bug and if not a blocker, please postpone to a later release. All bugs not postponed on GA release will be automatically re-targeted to - 3.6.1 if severity >= high - 4.0 if severity < high This was already merged, however we might have a small issue on centos 7, where the systemd v. 208 does not support on-abnormal. This will be remedied once centos 7.2 is released with new systemd. I checked it on ovirt-hosted-engine-ha-1.3.2.1-1.el7ev.noarch Problem still exist *** Bug 1275606 has been marked as a duplicate of this bug. *** Verifie on ovirt-hosted-engine-ha-1.3.3-1.el7ev.noarch 1) pkill -9 ovirt-ha-agent ovirt-ha-broker 2) check services after minute, both services up Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://rhn.redhat.com/errata/RHEA-2016-0422.html |