Bug 1119699

Summary: ovirt-ha-agent dead but subsys locked
Product: Red Hat Enterprise Virtualization Manager Reporter: rhev-integ
Component: ovirt-hosted-engine-haAssignee: Jiri Moskovcak <jmoskovc>
Status: CLOSED ERRATA QA Contact: Nikolai Sednev <nsednev>
Severity: urgent Docs Contact:
Priority: unspecified    
Version: 3.4.0CC: aburden, alukiano, dfediuck, gklein, iheim, jmoskovc, mavital, nsednev, sbonazzo, sherold
Target Milestone: ---Keywords: Rebase, Triaged, ZStream
Target Release: 3.4.2   
Hardware: x86_64   
OS: Linux   
Whiteboard: sla
Fixed In Version: Doc Type: Rebase: Bug Fixes Only
Doc Text:
Rebase package(s) to version: 1.1.5 Highlights and important bug fixes: The rebase just drop patches included in the rpm, now included in the source tarball. Other bugs addressed in the rebase will be attached to the errata. About this bug (leaving to Jiri to complete): Previously, ovirt-ha-agent did not always wait long enough before attempting to connect to storage, which would result in a failure to connect. Now, the wait time is configurable so that the agent will wait long enough, and will retry if necessary, to successfully connect to storage.
Story Points: ---
Clone Of: 1097767 Environment:
Last Closed: 2014-09-04 12:47:32 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: SLA RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1097767    
Bug Blocks: 1123858    

Comment 2 Nikolai Sednev 2014-08-12 18:04:16 UTC
Verified on these components:

2 Hosts with:
Linux version 2.6.32-431.23.3.el6.x86_64 (mockbuild.eng.bos.redhat.com) (gcc version 4.4.7 20120313 (Red Hat 4.4.7-4) (GCC) ) #1 SMP Wed Jul 16 06:12:23 EDT 2014

vdsm-4.14.13-1.el6ev.x86_64
qemu-kvm-rhev-0.12.1.2-2.415.el6_5.14.x86_64
ovirt-hosted-engine-setup-1.1.5-1.el6ev.noarch
ovirt-host-deploy-1.3.0-0.0.master.20140629072144.gitdc1f589.el6.noarch
libvirt-0.10.2-29.el6_5.10.x86_64
sanlock-2.8-1.el6.x86_64
ovirt-hosted-engine-ha-1.1.5-1.el6ev.noarch
qemu-kvm-rhev-tools-0.12.1.2-2.415.el6_5.14.x86_64
ovirt-host-deploy-java-1.3.0-0.0.master.20140629072144.gitdc1f589.el6.noarch

Engine av11:
Linux version 2.6.32-431.23.3.el6.x86_64 (mockbuild.eng.bos.redhat.com) (gcc version 4.4.7 20120313 (Red Hat 4.4.7-4) (GCC) ) #1 SMP Wed Jul 16 06:12:23 EDT 2014

rhevm-3.4.2-0.1.el6ev.noarch

Comment 4 Jiri Moskovcak 2014-09-01 08:56:09 UTC
Cause: agent doesn't wait long enough for vdsm to connect the storage
Consequence: agent tries to access the storage before it's ready
Fix: wait longer and retry a few times (this is configurable in case some systems needs a different grace time)
Result: agent waits long enough and the storage is successfully connected

(note: yes, this is similar to 1119702, but the fix was on a different place)

Comment 6 errata-xmlrpc 2014-09-04 12:47:32 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHBA-2014-1155.html