Bug 1353608

Summary: when host being set in to maintenance via WEBUI, hosted-engine --vm-status returns errors in CLI for the first 40 seconds,
Product: [oVirt] ovirt-hosted-engine-ha Reporter: Nikolai Sednev <nsednev>
Component: AgentAssignee: Denis Chaplygin <dchaplyg>
Status: CLOSED CURRENTRELEASE QA Contact: Nikolai Sednev <nsednev>
Severity: low Docs Contact:
Priority: medium    
Version: 2.0.0CC: bugs, dchaplyg, mgoldboi, michal.skrivanek, nsednev, sbonazzo, stirabos, ylavi
Target Milestone: ovirt-4.2.1Flags: rule-engine: ovirt-4.2+
Target Release: 2.2.0   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: ovirt-hosted-engine-ha-2.2.0 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2018-02-22 10:01:14 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: Integration RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
sosreport from host that being set to maintenance
none
sosreport from the engine none

Description Nikolai Sednev 2016-07-07 14:40:39 UTC
Description of problem:
hosted-engine --vm-status returns errors in CLI, when host being set in to maintenance via WEBUI.

[root@alma03 ~]# hosted-engine --vm-status
Traceback (most recent call last):
  File "/usr/lib64/python2.7/runpy.py", line 162, in _run_module_as_main
    "__main__", fname, loader, pkg_name)
  File "/usr/lib64/python2.7/runpy.py", line 72, in _run_code
    exec code in run_globals
  File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_setup/vm_status.py", line 170, in <module>
    if not status_checker.print_status():
  File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_setup/vm_status.py", line 104, in print_status
    all_host_stats = self._get_all_host_stats()
  File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_setup/vm_status.py", line 74, in _get_all_host_stats
    all_host_stats = ha_cli.get_all_host_stats()
  File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/client/client.py", line 160, in get_all_host_stats
    return self.get_all_stats(self.StatModes.HOST)
  File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/client/client.py", line 103, in get_all_stats
    self._configure_broker_conn(broker)
  File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/client/client.py", line 180, in _configure_broker_conn
    dom_type=dom_type)
  File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/lib/brokerlink.py", line 176, in set_storage_domain
    .format(sd_type, options, e))
ovirt_hosted_engine_ha.lib.exceptions.RequestError: Failed to set storage domain FilesystemBackend, options {'dom_type': 'nfs3', 'sd_uuid': '29d459ea-989d-4127-b996-248928adf543'}: Request failed: <class 'ovirt_hosted_engine_ha.lib.storage_backends.BackendFailureException'>


Version-Release number of selected component (if applicable):


How reproducible:
100%

Steps to Reproduce:
1.Deploy HE on two hosts over NFS.
2.Set one of the hosts in to maintenance via WEBUI.
3.

Actual results:
Errors being received in host's CLI and  hosted-engine --vm-status not responding properly on host in maintenance.

Expected results:
"hosted-engine --vm-status" should return correct status without errors.

Additional info:

Comment 1 Nikolai Sednev 2016-07-07 14:41:38 UTC
Forgot to add component versions, so adding them here:
Version-Release number of selected component (if applicable):
Host:
ovirt-vmconsole-host-1.0.3-1.el7ev.noarch
ovirt-hosted-engine-ha-2.0.0-1.el7ev.noarch
ovirt-engine-sdk-python-3.6.7.0-1.el7ev.noarch
libvirt-client-1.2.17-13.el7_2.5.x86_64
ovirt-host-deploy-1.5.0-1.el7ev.noarch
ovirt-hosted-engine-setup-2.0.0.2-1.el7ev.noarch
ovirt-setup-lib-1.0.2-1.el7ev.noarch
qemu-kvm-rhev-2.3.0-31.el7_2.17.x86_64
mom-0.5.5-1.el7ev.noarch
ovirt-vmconsole-1.0.3-1.el7ev.noarch
ovirt-imageio-common-0.3.0-0.el7ev.noarch
vdsm-4.18.5.1-1.el7ev.x86_64
rhev-release-4.0.1-1-001.noarch
sanlock-3.2.4-2.el7_2.x86_64
ovirt-imageio-daemon-0.3.0-0.el7ev.noarch
Linux version 3.10.0-327.28.2.el7.x86_64 (mockbuild.eng.bos.redhat.com) (gcc version 4.8.5 20150623 (Red Hat 4.8.5-4) (GCC) ) #1 SMP Mon Jun 27 14:48:28 EDT 2016
Linux 3.10.0-327.28.2.el7.x86_64 #1 SMP Mon Jun 27 14:48:28 EDT 2016 x86_64 x86_64 x86_64 GNU/Linux
Red Hat Enterprise Linux Server release 7.2 (Maipo)

Engine:
rhevm-doc-4.0.0-2.el7ev.noarch
rhevm-setup-plugins-4.0.0.1-1.el7ev.noarch
rhevm-spice-client-x64-msi-4.0-2.el7ev.noarch
rhevm-4.0.2-0.2.rc1.el7ev.noarch
rhev-release-4.0.0-19-001.noarch
rhev-release-4.0.1-1-001.noarch
rhevm-guest-agent-common-1.0.12-2.el7ev.noarch
rhevm-dependencies-4.0.0-1.el7ev.noarch
rhevm-branding-rhev-4.0.0-2.el7ev.noarch
rhevm-spice-client-x86-msi-4.0-2.el7ev.noarch
rhev-guest-tools-iso-4.0-2.el7ev.noarch
Linux version 3.10.0-327.22.2.el7.x86_64 (mockbuild.eng.bos.redhat.com) (gcc version 4.8.5 20150623 (Red Hat 4.8.5-4) (GCC) ) #1 SMP Thu Jun 9 10:09:10 EDT 2016
Linux 3.10.0-327.22.2.el7.x86_64 #1 SMP Thu Jun 9 10:09:10 EDT 2016 x86_64 x86_64 x86_64 GNU/Linux
Red Hat Enterprise Linux Server release 7.2 (Maipo)

Comment 2 Nikolai Sednev 2016-07-07 14:43:13 UTC
Created attachment 1177353 [details]
sosreport from host that being set to maintenance

Comment 3 Nikolai Sednev 2016-07-07 14:45:04 UTC
Created attachment 1177354 [details]
sosreport from the engine

Comment 4 Nikolai Sednev 2016-07-07 14:45:35 UTC
Setting maintenance mode from the engine will disconnect also the HE shared storage domain and so hosted-engine --vm-status will fail.

Comment 5 Simone Tiraboschi 2016-07-07 15:14:28 UTC
ovirt-ha-agent will try reconnect the HE storage about every 35 seconds also while in maintenance mode so, after about 40 seconds, hosted-engine --vm-status will be fine again by itself also while in maintenance mode.

Not that serious but still worth to be fixed since it's introducing a lot of errors in the logs.
Lowering the severity.

Comment 8 Simone Tiraboschi 2018-02-12 09:52:50 UTC
It's there since ovirt-hosted-engine-ha-2.2.0

Comment 9 Nikolai Sednev 2018-02-12 11:59:34 UTC
For reproduction set ha-hsot without SHE-VM in to local maintenance in UI, wait until it becomes in local maintenance in UI, then activate the host back via UI and check in CLI "hosted-engine --vm-status", you will get stuck for a moment and so abort, using ctrl+c sequence, then you will see the same errors.
After ~40 seconds errors are gone and all working just fine.
I've seen errors on activating the host back:
[root@alma03 ~]# hosted-engine --vm-status
^CTraceback (most recent call last):
  File "/usr/lib64/python2.7/runpy.py", line 162, in _run_module_as_main
    "__main__", fname, loader, pkg_name)
  File "/usr/lib64/python2.7/runpy.py", line 72, in _run_code
    exec code in run_globals
  File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_setup/vm_status.py", line 213, in <module>
    if not status_checker.print_status():
  File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_setup/vm_status.py", line 110, in print_status
    all_host_stats = self._get_all_host_stats()
  File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_setup/vm_status.py", line 75, in _get_all_host_stats
    all_host_stats = ha_cli.get_all_host_stats()
  File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/client/client.py", line 148, in get_all_host_stats
    return self.get_all_stats(self.StatModes.HOST)
  File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/client/client.py", line 93, in get_all_stats
    stats = broker.get_stats_from_storage()
  File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/lib/brokerlink.py", line 135, in get_stats_from_storage
    result = self._proxy.get_stats()
  File "/usr/lib64/python2.7/xmlrpclib.py", line 1233, in __call__
    return self.__send(self.__name, args)
  File "/usr/lib64/python2.7/xmlrpclib.py", line 1587, in __request
    verbose=self.__verbose
  File "/usr/lib64/python2.7/xmlrpclib.py", line 1273, in request
    return self.single_request(host, handler, request_body, verbose)
  File "/usr/lib64/python2.7/xmlrpclib.py", line 1303, in single_request
    response = h.getresponse(buffering=True)
  File "/usr/lib64/python2.7/httplib.py", line 1089, in getresponse
    response.begin()
  File "/usr/lib64/python2.7/httplib.py", line 444, in begin
    version, status, reason = self._read_status()
  File "/usr/lib64/python2.7/httplib.py", line 400, in _read_status
    line = self.fp.readline(_MAXLINE + 1)
  File "/usr/lib64/python2.7/socket.py", line 476, in readline
    data = self._sock.recv(self._rbufsize)
KeyboardInterrupt

Comment 10 Red Hat Bugzilla Rules Engine 2018-02-12 11:59:40 UTC
Target release should be placed once a package build is known to fix a issue. Since this bug is not modified, the target version has been reset. Please use target milestone to plan a fix for a oVirt release.

Comment 11 Sandro Bonazzola 2018-02-19 08:32:29 UTC
The original issue was an exception while connecting:
ovirt_hosted_engine_ha.lib.exceptions.RequestError:

not a keyboard interrupt. Please wait instead of interrupting, the solution was around avoinding the request errorl.

Comment 12 Nikolai Sednev 2018-02-19 10:32:20 UTC
(In reply to Sandro Bonazzola from comment #11)
> The original issue was an exception while connecting:
> ovirt_hosted_engine_ha.lib.exceptions.RequestError:
> 
> not a keyboard interrupt. Please wait instead of interrupting, the solution
> was around avoinding the request errorl.
Original issue was not reproduced by following it's original reproduction steps.
Moving this specific bug to verified and opening a new one to cover the comment #9: https://bugzilla.redhat.com/show_bug.cgi?id=1546679

Comment 13 Sandro Bonazzola 2018-02-22 10:01:14 UTC
This bugzilla is included in oVirt 4.2.1 release, published on Feb 12th 2018.

Since the problem described in this bug report should be
resolved in oVirt 4.2.1 release, it has been closed with a resolution of CURRENT RELEASE.

If the solution does not work for you, please open a new bug report.