Bug 1267511 - Race condition between ovirt-ha-agent and vdsmd startup with systemd
Summary: Race condition between ovirt-ha-agent and vdsmd startup with systemd
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: ovirt-hosted-engine-ha
Classification: oVirt
Component: Agent
Version: 1.3.0
Hardware: Unspecified
OS: Unspecified
urgent
unspecified
Target Milestone: ovirt-3.6.0-rc3
: 1.3.1
Assignee: Martin Sivák
QA Contact: Artyom
URL:
Whiteboard: sla
Depends On:
Blocks: 1234906
TreeView+ depends on / blocked
 
Reported: 2015-09-30 08:52 UTC by Simone Tiraboschi
Modified: 2016-02-10 19:19 UTC (History)
5 users (show)

Fixed In Version: ovirt-hosted-engine-ha-1.3.1
Clone Of:
Environment:
Last Closed: 2015-11-27 07:56:30 UTC
oVirt Team: SLA
Embargoed:
rule-engine: ovirt-3.6.0+
rule-engine: blocker+
mgoldboi: planning_ack+
msivak: devel_ack+
istein: testing_ack+


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
oVirt gerrit 46831 0 master MERGED Make sure VDSM had enough time to start before failing Never
oVirt gerrit 46835 0 ovirt-hosted-engine-ha-1.3 MERGED Make sure VDSM had enough time to start before failing Never

Description Simone Tiraboschi 2015-09-30 08:52:22 UTC
Description of problem:
In the past ovirt-ha-agent was directly starting vdsmd once started.
Now it simply relies on systemd dependencies for that with the Wants directive.

But the wants verb simply tries to start the services in parallel without any directive on the activation order and ovirt-ha-agent will stop if vdsmd is not up:

Sep 29 12:17:55 ovirt-one.thaultanklines.com systemd-ovirt-ha-agent[1210]: Starting ovirt-ha-agent: [  OK  ]
Sep 29 12:17:55 ovirt-one.thaultanklines.com systemd[1]: Started oVirt Hosted Engine High Availability Monitoring Agent.
Sep 29 12:17:55 ovirt-one.thaultanklines.com ovirt-ha-agent[1377]: ovirt-ha-agent ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine ERROR Service vdsmd is not running and the admin is responsible for starting it. Shutting down.
Sep 29 12:17:55 ovirt-one.thaultanklines.com systemd[1]: ovirt-ha-agent.service: main process exited, code=exited, status=254/n/a
Sep 29 12:17:55 ovirt-one.thaultanklines.com systemd[1]: Unit ovirt-ha-agent.service entered failed state.

Manually try to start ovirt-ha-agent after that is enough to bring it up.

Version-Release number of selected component (if applicable):
oVirt 3.6 rc1

How reproducible:
now and then, it's a race condition

Steps to Reproduce:
1. reboot the host
2.
3.

Actual results:
Sometimes everything goes well, sometimes ovirt-ha-agent will stop cause vdsmd is still to be started

Expected results:
It always start

Additional info:

Comment 1 Richard Neuboeck 2015-09-30 13:52:36 UTC
The applied patch (46831) solves the race condition on CentOS 7.1 (64bit).

Comment 2 Red Hat Bugzilla Rules Engine 2015-10-06 09:11:21 UTC
This bug report has Keywords: Regression or TestBlocker.
Since no regressions or test blockers are allowed between releases, it is also being identified as a blocker for this release. Please resolve ASAP.

Comment 3 Artyom 2015-11-05 14:40:58 UTC
Verified on ovirt-hosted-engine-ha-1.3.2.1-1.el7ev.noarch
Tried reboot 5 times all works fine

Comment 4 Sandro Bonazzola 2015-11-27 07:56:30 UTC
Since oVirt 3.6.0 has been released, moving from verified to closed current release.


Note You need to log in before you can comment on or make changes to this bug.