Description of problem: Currently, 'service ovirt-engine start' followed by a loop of 'service ovirt-engine status', returns '0' (started) only after some time (up to a few seconds or more on a busy system). This: 1. Is sometimes annoying - you manually start the engine, then have to wait some time until it's really started 2. Might cause problems in scripts that do not expect that 3. Requires various workarounds in existing code that wait till the engine is up - e.g. allinone plugin, dwh setup in 3.3. Instead, the command should return only after the service is started, working, ready to serve requests.
Didi how would you define really started? What happens if the DB is down or a port is taken? This leads to a stuck service. So blocking the daemon is not considered as the right way to go, since it may slow down the boot process.
(In reply to Doron Fediuck from comment #1) > Didi how would you define really started? I defined in the description "working, ready to serve requests". 'service ovirt-engine status' already checks something - perhaps that's enough. Perhaps better to also check the health page (if not done already by 'status'). > What happens if the DB is down or a port is taken? Then we simply fail. I didn't say we must succeed always, just return when we know what our status is. I don't know well the relevant code, so not sure the service knows very early if it succeeded starting or not. In principle it can take a long time - e.g. suppose that it only tries to connect to the db on the first request that actually needs db access. So we might need to add a loop of attempts with some maximum. Not sure. > This leads to a stuck > service. So blocking the daemon is not considered as the right way to go, > since it may slow down the boot process. Modern init systems start services in parallel, so I do not expect a significant impact here. Obviously this will have to be tested.
(In reply to Yedidyah Bar David from comment #2) > (In reply to Doron Fediuck from comment #1) > > Didi how would you define really started? > > I defined in the description "working, ready to serve requests". 'service > ovirt-engine status' already checks something - perhaps that's enough. > Perhaps better to also check the health page (if not done already by > 'status'). if not, please avoit to use the healt page, it's deprecated, check api availability instead. > > > What happens if the DB is down or a port is taken? > > Then we simply fail. I didn't say we must succeed always, just return when > we know what our status is. > > I don't know well the relevant code, so not sure the service knows very > early if it succeeded starting or not. In principle it can take a long time > - e.g. suppose that it only tries to connect to the db on the first request > that actually needs db access. So we might need to add a loop of attempts > with some maximum. Not sure. > > > This leads to a stuck > > service. So blocking the daemon is not considered as the right way to go, > > since it may slow down the boot process. > > Modern init systems start services in parallel, so I do not expect a > significant impact here. Obviously this will have to be tested.
This is an automated message. This Bugzilla report has been opened on a version which is not maintained anymore. Please check if this bug is still relevant in oVirt 3.5.4. If it's not relevant anymore, please close it (you may use EOL or CURRENT RELEASE resolution) If it's an RFE please update the version to 4.0 if still relevant.
Still relevant IMO. Moving to 4.0 as it's considered low priority/severity - workarounds seem to mostly work...
Moving to new classification.
Target release should be placed once a package build is known to fix a issue. Since this bug is not modified, the target version has been reset. Please use target milestone to plan a fix for a oVirt release.
Can it be moved to MODIFIED or are there additional patches needed?
(In reply to Yaniv Kaul from comment #8) > Can it be moved to MODIFIED or are there additional patches needed? No, we still checking if we can fix it for systemd.
Closing this. No-one other than me seems interested, and it's a bit non-trivial to get this right - and searching the net for stuff like 'systemctl start returns immediately' seems to show that this behavior is quite common. Also, the linked patch is for the sysv init script, which is not used anymore - the engine is now supported only on el7 and fedora, meaning systemd. Porting to other init systems should be trivial, so if anyone wants to reopen current, it would still be nice to make the engine not less-compatible with other init systems.