Bug 1075013

Summary:	'service ovirt-engine start' should return only after it is started
Product:	[oVirt] ovirt-engine	Reporter:	Yedidyah Bar David <didi>
Component:	Services	Assignee:	Yedidyah Bar David <didi>
Status:	CLOSED WONTFIX	QA Contact:	Petr Matyáš <pmatyas>
Severity:	low	Docs Contact:
Priority:	low
Version:	3.5.4	CC:	bugs, dfediuck, didi, lsvaty, lveyde, sbonazzo, srevivo
Target Milestone:	---
Target Release:	---
Hardware:	Unspecified
OS:	Unspecified
Whiteboard:
Fixed In Version:		Doc Type:	Bug Fix
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2019-03-03 12:56:46 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	Integration	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:

Description Yedidyah Bar David 2014-03-11 11:09:31 UTC

Description of problem:

Currently, 'service ovirt-engine start' followed by a loop of 'service ovirt-engine status', returns '0' (started) only after some time (up to a few seconds or more on a busy system).

This:
1. Is sometimes annoying - you manually start the engine, then have to wait some time until it's really started
2. Might cause problems in scripts that do not expect that
3. Requires various workarounds in existing code that wait till the engine is up - e.g. allinone plugin, dwh setup in 3.3.

Instead, the command should return only after the service is started, working, ready to serve requests.

Comment 1 Doron Fediuck 2014-03-12 04:16:40 UTC

Didi how would you define really started?
What happens if the DB is down or a port is taken? This leads to a stuck
service. So blocking the daemon is not considered as the right way to go,
since it may slow down the boot process.

Comment 2 Yedidyah Bar David 2014-03-12 07:16:48 UTC

(In reply to Doron Fediuck from comment #1)
> Didi how would you define really started?

I defined in the description "working, ready to serve requests". 'service ovirt-engine status' already checks something - perhaps that's enough. Perhaps better to also check the health page (if not done already by 'status').

> What happens if the DB is down or a port is taken?

Then we simply fail. I didn't say we must succeed always, just return when we know what our status is.

I don't know well the relevant code, so not sure the service knows very early if it succeeded starting or not. In principle it can take a long time - e.g. suppose that it only tries to connect to the db on the first request that actually needs db access. So we might need to add a loop of attempts with some maximum. Not sure.

> This leads to a stuck
> service. So blocking the daemon is not considered as the right way to go,
> since it may slow down the boot process.

Modern init systems start services in parallel, so I do not expect a significant impact here. Obviously this will have to be tested.

Comment 3 Sandro Bonazzola 2014-03-12 15:54:05 UTC

(In reply to Yedidyah Bar David from comment #2)
> (In reply to Doron Fediuck from comment #1)
> > Didi how would you define really started?
> 
> I defined in the description "working, ready to serve requests". 'service
> ovirt-engine status' already checks something - perhaps that's enough.
> Perhaps better to also check the health page (if not done already by
> 'status').


if not, please avoit to use the healt page, it's deprecated, check api availability instead.

> 
> > What happens if the DB is down or a port is taken?
> 
> Then we simply fail. I didn't say we must succeed always, just return when
> we know what our status is.
> 
> I don't know well the relevant code, so not sure the service knows very
> early if it succeeded starting or not. In principle it can take a long time
> - e.g. suppose that it only tries to connect to the db on the first request
> that actually needs db access. So we might need to add a loop of attempts
> with some maximum. Not sure.
> 
> > This leads to a stuck
> > service. So blocking the daemon is not considered as the right way to go,
> > since it may slow down the boot process.
> 
> Modern init systems start services in parallel, so I do not expect a
> significant impact here. Obviously this will have to be tested.

Comment 4 Sandro Bonazzola 2015-09-04 08:59:21 UTC

This is an automated message.
This Bugzilla report has been opened on a version which is not maintained anymore.
Please check if this bug is still relevant in oVirt 3.5.4.
If it's not relevant anymore, please close it (you may use EOL or CURRENT RELEASE resolution)
If it's an RFE please update the version to 4.0 if still relevant.

Comment 5 Yedidyah Bar David 2015-09-16 07:09:18 UTC

Still relevant IMO. Moving to 4.0 as it's considered low priority/severity - workarounds seem to mostly work...

Comment 6 Sandro Bonazzola 2015-09-16 07:17:43 UTC

Moving to new classification.

Comment 7 Red Hat Bugzilla Rules Engine 2015-10-19 10:59:13 UTC

Target release should be placed once a package build is known to fix a issue. Since this bug is not modified, the target version has been reset. Please use target milestone to plan a fix for a oVirt release.

Comment 8 Yaniv Kaul 2017-10-15 08:20:55 UTC

Can it be moved to MODIFIED or are there additional patches needed?

Comment 9 Lev Veyde 2017-10-15 10:07:45 UTC

(In reply to Yaniv Kaul from comment #8)
> Can it be moved to MODIFIED or are there additional patches needed?

No, we still checking if we can fix it for systemd.

Comment 10 Yedidyah Bar David 2019-03-03 12:56:46 UTC

Closing this. No-one other than me seems interested, and it's a bit non-trivial to get this right - and searching the net for stuff like 'systemctl start returns immediately' seems to show that this behavior is quite common.

Also, the linked patch is for the sysv init script, which is not used anymore - the engine is now supported only on el7 and fedora, meaning systemd. Porting to other init systems should be trivial, so if anyone wants to reopen current, it would still be nice to make the engine not less-compatible with other init systems.