1479292 – [HA] Opendaylight service is not restarted when stopped abnormally

Bug 1479292 - [HA] Opendaylight service is not restarted when stopped abnormally

Summary: [HA] Opendaylight service is not restarted when stopped abnormally

Keywords:
Status:	CLOSED WONTFIX
Alias:	None
Product:	Red Hat OpenStack
Classification:	Red Hat
Component:	opendaylight
Sub Component:
Version:	12.0 (Pike)
Hardware:	Unspecified
OS:	Unspecified
Priority:	medium
Severity:	high
Target Milestone:	---
Target Release:	12.0 (Pike)
Assignee:	Stephen Kitt
QA Contact:	Itzik Brown
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2017-08-08 09:41 UTC by Itzik Brown
Modified:	2018-03-05 14:12 UTC (History)
CC List:	6 users (show)
Fixed In Version:
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2018-03-05 14:12:22 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
OpenDaylight gerrit	62258	0	None	None	None	2017-08-24 09:49:59 UTC

Description Itzik Brown 2017-08-08 09:41:56 UTC

Description of problem:
When the opendaylight service is stopped abnormally - the service is not restarted

Version-Release number of selected component (if applicable):
opendaylight-6.1.0-2.el7ost.noarch

How reproducible:


Steps to Reproduce:
1.Identify the PID of the opendaylight service (ps -ef |grep opendaylight)
2.Kill the process - kill -9 PID
3.Verify that the service is not restarted (ps -ef |grep opendaylight)

Actual results:


Expected results:


Additional info:

Comment 1 Michael Vorburger 2017-08-24 09:50:00 UTC

I've just raised https://git.opendaylight.org/gerrit/#/c/62258/ to start the process of doing something about this... But I'm not 100% sure what exact Restart policy we would want, see https://www.freedesktop.org/software/systemd/man/systemd.service.html :

Tim wrote in an internal email thread: "For ha-lite arch with OOO we set the service to always restart on failure." - so is Restart=on-failure what we really want instead of always, as in my initial proposal? My hesitation is that I don't know what exit value a JVM has in case of (a) an OOM (OutOfMemoryError), (b) the suicide thing. Is there really any disadvantage to just using "always" - I can't really think of a case where you would NOT want it to restart - can you? If you want to stop it, you just systemctl stop opendaylight, right?

One problem with always could be that if there is a real fatal error (like an installation problem) then it will KEEP TRYING and cycle like forever - is that accepted and "normal" for such services? Can systemd somehow be told to try and "back off" after N attempts of restarting?

Also, last point: Stephen once in a passing conversation mentioned that we could have Karaf give systemd a "hearbeat", so that it would make it restart even if the process is still up and running, but got "stuck" somehow. I would suggest that we get a basic simple Restart policy into the service first, and then do that idea as a future enhancement.

Comment 2 Sai Sindhur Malleni 2017-08-30 03:12:42 UTC

+1. We hit this in perf/scale testing with OSP12+carbon.

Comment 5 Mike Kolesnik 2018-02-19 12:50:56 UTC

Can you please check if this is relevant since ODL is running inside a container?

Comment 6 Itzik Brown 2018-03-05 14:12:22 UTC

I think we can close it.

Note You need to log in before you can comment on or make changes to this bug.