Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 1208489

Summary: HE active hyper-visor not responding to "hosted-engine --vm-status" after "iptables -I INPUT -s 10.35.160.108 -j DROP" cast.
Product: Red Hat Enterprise Virtualization Manager Reporter: Nikolai Sednev <nsednev>
Component: ovirt-hosted-engine-haAssignee: Dudi Maroshi <dmaroshi>
Status: CLOSED ERRATA QA Contact: Nikolai Sednev <nsednev>
Severity: urgent Docs Contact:
Priority: low    
Version: 3.5.1CC: cshao, dougsland, fdeutsch, gklein, gpadgett, huiwa, istein, juwu, lsurette, rbarry, rgolan, sherold, ycui, ykaul
Target Milestone: ovirt-3.6.0-rcKeywords: Triaged
Target Release: 3.6.0   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
When a self-hosted engine host client requested status from the Manager virtual machine (hosted-engine --vm-status) and a connection to the storage domain could not be established, the client hanged indefinitely waiting for a response from the ovirt-ha-broker. With this update, connection timeout is added and if the storage domain cannot be accessed, an appropriate error message is returned.
Story Points: ---
Clone Of: Environment:
Last Closed: 2016-03-09 19:49:03 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: SLA RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
all logs none

Description Nikolai Sednev 2015-04-02 11:43:18 UTC
Description of problem:
HE active hyper-visor not responding to "hosted-engine --vm-status" after "iptables -I INPUT -s 10.35.160.108 -j DROP" cast.

Version-Release number of selected component (if applicable):
RHEVH6.6 20150304.0.el6ev:
sanlock-2.8-1.el6.x86_64
ovirt-node-selinux-3.2.1-9.el6.noarch
ovirt-host-deploy-offline-1.3.0-3.el6ev.x86_64
ovirt-node-plugin-vdsm-0.2.0-19.el6ev.noarch
ovirt-host-deploy-1.3.0-2.el6ev.noarch
ovirt-node-plugin-rhn-3.2.1-9.el6.noarch
ovirt-node-3.2.1-9.el6.noarch
vdsm-4.16.8.1-7.el6ev.x86_64
ovirt-hosted-engine-ha-1.2.5-1.el6ev.noarch
ovirt-node-plugin-hosted-engine-0.2.0-9.0.el6ev.x86_64
ovirt-node-plugin-cim-3.2.1-9.el6.noarch
ovirt-node-branding-rhev-3.2.1-9.el6.noarch
libvirt-0.10.2-46.el6_6.3.x86_64
qemu-kvm-rhev-0.12.1.2-2.446.el6.x86_64
ovirt-hosted-engine-setup-1.2.2-1.el6ev.noarch
ovirt-node-plugin-snmp-3.2.1-9.el6.noarch

Engine RHEL6.6
rhevm-guest-agent-common-1.0.10-2.el6ev.noarch
rhevm-3.5.1-0.2.el6ev.noarch
How reproducible:


Steps to Reproduce:
1.Assemble setup of two RHEVHs with NFS SD for HE only.
2.Cast on active hyper-visor "iptables -I INPUT -s 10.35.160.108 -j DROP", IP here is your SD IP.
3.Run "hosted-engine --vm-status" on host and see that its stuck.

Actual results:
HE being shifted to second host, that is expected, but "hosted-engine --vm-status" stuck and not replying anything on initially active host.

Expected results:
"hosted-engine --vm-status" should reply with the results.

Additional info:
logs from both hosts and engine.

Comment 1 Nikolai Sednev 2015-04-02 12:06:36 UTC
Created attachment 1010128 [details]
all logs

Comment 3 Doron Fediuck 2015-04-14 16:37:04 UTC
The status verb is reading current stats from the storage.
If storage is blocked then the utility is waiting for it.

We can add a timeout for the utility. If expired the utility will say
it cannot access the shared storage.

Comment 4 Roy Golan 2015-04-26 13:23:46 UTC
just a note on the reproducer, it doesn't need to be 2 host setup. 1 host and an IPTABLES rule to drop the packets will suffice.

Comment 5 Nikolai Sednev 2015-04-29 10:59:13 UTC
(In reply to Roy Golan from comment #4)
> just a note on the reproducer, it doesn't need to be 2 host setup. 1 host
> and an IPTABLES rule to drop the packets will suffice.

Yep, it's known and also was tested with a single host, but second host was required to check that HA passes the HE-VM properly.

Comment 6 Dudi Maroshi 2015-04-29 11:02:19 UTC
Problem reconstructed and repeated.

Diagnostics:
------------
When a hosted engine client is requesting status from hosted engine and there is
no connection to storage domain. The client is hanging indefinitely, waiting
for response from the hosted-engine-broker.

Solution:
---------
The fix is add timeout for calling storage domain information.
In lib.brokerlink.set_storage_domain

Comment 7 Doron Fediuck 2015-08-25 10:36:18 UTC
*** Bug 1085523 has been marked as a duplicate of this bug. ***

Comment 9 Nikolai Sednev 2015-11-05 16:59:20 UTC
Now previously active host isn't stuck after casting the "hosted-engine --vm-status" command on it, but takes a few seconds to response:
# hosted-engine --vm-status
Traceback (most recent call last):
  File "/usr/lib64/python2.7/runpy.py", line 162, in _run_module_as_main
    "__main__", fname, loader, pkg_name)
  File "/usr/lib64/python2.7/runpy.py", line 72, in _run_code
    exec code in run_globals
  File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_setup/vm_status.py", line 117, in <module>
    if not status_checker.print_status():
  File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_setup/vm_status.py", line 60, in print_status
    all_host_stats = ha_cli.get_all_host_stats()
  File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/client/client.py", line 160, in get_all_host_stats
    return self.get_all_stats(self.StatModes.HOST)
  File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/client/client.py", line 103, in get_all_stats
    self._configure_broker_conn(broker)
  File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/client/client.py", line 180, in _configure_broker_conn
    dom_type=dom_type)
  File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/lib/brokerlink.py", line 176, in set_storage_domain
    .format(sd_type, options, e))
ovirt_hosted_engine_ha.lib.exceptions.RequestError: Failed to set storage domain FilesystemBackend, options {'dom_type': 'iscsi', 'sd_uuid': 'df2356f7-8272-401a-97f7-63c14f37ec7a'}: Connection timed out

After releasing the iptables rule, host responds correctly:
# hosted-engine --vm-status


--== Host 1 status ==--

Status up-to-date                  : True
Hostname                           : alma03.qa.lab.tlv.redhat.com
Host ID                            : 1
Engine status                      : {"health": "good", "vm": "up", "detail": "up"}
Score                              : 3400
stopped                            : False
Local maintenance                  : False
crc32                              : d69bf92a
Host timestamp                     : 5992


--== Host 2 status ==--

Status up-to-date                  : True
Hostname                           : alma04.qa.lab.tlv.redhat.com
Host ID                            : 2
Engine status                      : {"reason": "vm not running on this host", "health": "bad", "vm": "down", "detail": "unknown"}
Score                              : 3400
stopped                            : False
Local maintenance                  : False
crc32                              : 3d75f9e9
Host timestamp                     : 3790
You have new mail in /var/spool/mail/root

Comment 11 errata-xmlrpc 2016-03-09 19:49:03 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHEA-2016-0422.html