Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 1085523

Summary: Connections to HA broker hang if NFS I/O hangs
Product: [Retired] oVirt Reporter: Greg Padgett <gpadgett>
Component: ovirt-hosted-engine-haAssignee: Doron Fediuck <dfediuck>
Status: CLOSED DUPLICATE QA Contact: Artyom <alukiano>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 3.4CC: alukiano, amureini, bugs, dfediuck, gklein, gpadgett, msivak, nsednev, rbalakri, yeylon
Target Milestone: ---   
Target Release: 3.6.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard: sla
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2015-08-25 10:36:18 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: SLA RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Greg Padgett 2014-04-08 19:57:04 UTC
(May be related to bug 1083602)

The HA broker doesn't use asynchronous I/O for NFS storage, and the libraries for communicating with the broker over a socket don't use async I/O either.

Consequently, if I/O to the NFS storage hangs, the hang will propagate to anything issuing a metadata read/write request to ovirt-ha-broker.  This includes the HA agent, as well as vdsm if the HA packages are installed.

The patch at http://gerrit.ovirt.org/26580 is one possible (/partial?) solution, but I'm not yet convinced that it's the best place to stop the hang from propagating.

Comment 1 Nikolai Sednev 2014-04-09 08:35:20 UTC
Hi Greg,
Can you kindly describe in more details on your system versions of components i.e:
vdsm
libvirt
qemu-kvm-rhev
sanlock
ovirt-hosted-engine-ha
ovirt-hosted-engine-setup
rhevm

Comment 2 Greg Padgett 2014-04-09 12:15:48 UTC
(In reply to Nikolai Sednev from comment #1)
> Hi Greg,
> Can you kindly describe in more details on your system versions of
> components i.e:
> vdsm
> libvirt
> qemu-kvm-rhev
> sanlock
> ovirt-hosted-engine-ha
> ovirt-hosted-engine-setup
> rhevm

Hi Nikolai,
I reproduced this on a host without the need for most of these, only vdsm and ovirt-hosted-engine-ha, but here is a broader list of what I was running:

vdsm-4.14.2-16.gitdev.fc19.x86_64
libvirt-1.1.3.1-1.fc19.x86_64
ovirt-hosted-engine-ha-1.1.0-0.0.master.20140408193231.fc19.noarch
 (^^ built from latest master as of yesterday)
ovirt-hosted-engine-setup-1.1.0-0.0.master.fc19.noarch
sanlock-2.8-1.fc19.x86_64
qemu-kvm-1.6.1-2.fc19.x86_64

A short python script can be used to reproduce this with only the ovirt-hosted-engine-ha version making a difference, e.g.:

#!/usr/bin/python
from ovirt_hosted_engine_ha.client import client
s = client.HAClient().get_all_stats()
print("{0}".format(s))

Comment 3 Jiri Moskovcak 2014-05-05 13:35:29 UTC
It will timeout eventually, the timeout depends on the mount options and we can't change those. I don't think we can find a better place to stop the hang from propagating except trying to open and read the files in a separate thread/process, but adding a timeout to the socket communication is a good practice anyway and since it solves this problem I wouldn't complicate the code any further.

Comment 4 Jiri Moskovcak 2014-05-05 13:41:08 UTC
*** Bug 1093364 has been marked as a duplicate of this bug. ***

Comment 5 Sandro Bonazzola 2014-05-08 13:56:17 UTC
This is an automated message.

oVirt 3.4.1 has been released.
This issue has been retargeted to 3.5.0 since it has not been marked as high priority or severity issue, please retarget if needed.

Comment 7 Doron Fediuck 2015-08-25 10:36:18 UTC

*** This bug has been marked as a duplicate of bug 1208489 ***