Description of problem: In the past ovirt-ha-agent was directly starting vdsmd once started. Now it simply relies on systemd dependencies for that with the Wants directive. But the wants verb simply tries to start the services in parallel without any directive on the activation order and ovirt-ha-agent will stop if vdsmd is not up: Sep 29 12:17:55 ovirt-one.thaultanklines.com systemd-ovirt-ha-agent[1210]: Starting ovirt-ha-agent: [ OK ] Sep 29 12:17:55 ovirt-one.thaultanklines.com systemd[1]: Started oVirt Hosted Engine High Availability Monitoring Agent. Sep 29 12:17:55 ovirt-one.thaultanklines.com ovirt-ha-agent[1377]: ovirt-ha-agent ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine ERROR Service vdsmd is not running and the admin is responsible for starting it. Shutting down. Sep 29 12:17:55 ovirt-one.thaultanklines.com systemd[1]: ovirt-ha-agent.service: main process exited, code=exited, status=254/n/a Sep 29 12:17:55 ovirt-one.thaultanklines.com systemd[1]: Unit ovirt-ha-agent.service entered failed state. Manually try to start ovirt-ha-agent after that is enough to bring it up. Version-Release number of selected component (if applicable): oVirt 3.6 rc1 How reproducible: now and then, it's a race condition Steps to Reproduce: 1. reboot the host 2. 3. Actual results: Sometimes everything goes well, sometimes ovirt-ha-agent will stop cause vdsmd is still to be started Expected results: It always start Additional info:
The applied patch (46831) solves the race condition on CentOS 7.1 (64bit).
This bug report has Keywords: Regression or TestBlocker. Since no regressions or test blockers are allowed between releases, it is also being identified as a blocker for this release. Please resolve ASAP.
Verified on ovirt-hosted-engine-ha-1.3.2.1-1.el7ev.noarch Tried reboot 5 times all works fine
Since oVirt 3.6.0 has been released, moving from verified to closed current release.