Bug 1353608 - when host being set in to maintenance via WEBUI, hosted-engine --vm-status returns errors in CLI for the first 40 seconds,
Summary: when host being set in to maintenance via WEBUI, hosted-engine --vm-status re...
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: ovirt-hosted-engine-ha
Classification: oVirt
Component: Agent
Version: 2.0.0
Hardware: x86_64
OS: Linux
medium
low
Target Milestone: ovirt-4.2.1
: 2.2.0
Assignee: Denis Chaplygin
QA Contact: Nikolai Sednev
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2016-07-07 14:40 UTC by Nikolai Sednev
Modified: 2018-02-22 10:01 UTC (History)
8 users (show)

Fixed In Version: ovirt-hosted-engine-ha-2.2.0
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2018-02-22 10:01:14 UTC
oVirt Team: Integration
rule-engine: ovirt-4.2+


Attachments (Terms of Use)
sosreport from host that being set to maintenance (7.37 MB, application/x-xz)
2016-07-07 14:43 UTC, Nikolai Sednev
no flags Details
sosreport from the engine (18.93 MB, application/x-xz)
2016-07-07 14:45 UTC, Nikolai Sednev
no flags Details


Links
System ID Priority Status Summary Last Updated
Red Hat Bugzilla 1263602 high CLOSED [RFE] Ability to set different mount options for hosted_engine nfs storage than the default 2020-10-14 00:28:05 UTC
oVirt gerrit 82590 master MERGED storage: 'clean' handler should not connect storage. 2017-10-11 12:24:46 UTC
oVirt gerrit 82591 master MERGED storage: Removed storage initialization code from the agent. 2017-10-11 12:24:58 UTC
oVirt gerrit 82593 master MERGED storage: Added domain monitoring submonitor 2017-10-11 12:24:50 UTC
oVirt gerrit 82613 master MERGED storage: Agent must use broker for storage domain monitoring. 2017-10-11 12:24:55 UTC
oVirt gerrit 82618 master ABANDONED monitoring: Added check for submonitor presence. 2017-10-11 08:51:31 UTC
oVirt gerrit 82619 master MERGED monitoring: Storage domain status must be verified by the broker. 2017-10-11 12:25:01 UTC
oVirt gerrit 82620 master ABANDONED storage: Do not restart storage domain monitor on every loop iteration. 2017-10-11 08:57:17 UTC
oVirt gerrit 82626 master MERGED storage: Added domain monitoring management to the broker 2017-10-11 12:24:52 UTC

Internal Links: 1263602

Description Nikolai Sednev 2016-07-07 14:40:39 UTC
Description of problem:
hosted-engine --vm-status returns errors in CLI, when host being set in to maintenance via WEBUI.

[root@alma03 ~]# hosted-engine --vm-status
Traceback (most recent call last):
  File "/usr/lib64/python2.7/runpy.py", line 162, in _run_module_as_main
    "__main__", fname, loader, pkg_name)
  File "/usr/lib64/python2.7/runpy.py", line 72, in _run_code
    exec code in run_globals
  File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_setup/vm_status.py", line 170, in <module>
    if not status_checker.print_status():
  File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_setup/vm_status.py", line 104, in print_status
    all_host_stats = self._get_all_host_stats()
  File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_setup/vm_status.py", line 74, in _get_all_host_stats
    all_host_stats = ha_cli.get_all_host_stats()
  File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/client/client.py", line 160, in get_all_host_stats
    return self.get_all_stats(self.StatModes.HOST)
  File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/client/client.py", line 103, in get_all_stats
    self._configure_broker_conn(broker)
  File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/client/client.py", line 180, in _configure_broker_conn
    dom_type=dom_type)
  File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/lib/brokerlink.py", line 176, in set_storage_domain
    .format(sd_type, options, e))
ovirt_hosted_engine_ha.lib.exceptions.RequestError: Failed to set storage domain FilesystemBackend, options {'dom_type': 'nfs3', 'sd_uuid': '29d459ea-989d-4127-b996-248928adf543'}: Request failed: <class 'ovirt_hosted_engine_ha.lib.storage_backends.BackendFailureException'>


Version-Release number of selected component (if applicable):


How reproducible:
100%

Steps to Reproduce:
1.Deploy HE on two hosts over NFS.
2.Set one of the hosts in to maintenance via WEBUI.
3.

Actual results:
Errors being received in host's CLI and  hosted-engine --vm-status not responding properly on host in maintenance.

Expected results:
"hosted-engine --vm-status" should return correct status without errors.

Additional info:

Comment 1 Nikolai Sednev 2016-07-07 14:41:38 UTC
Forgot to add component versions, so adding them here:
Version-Release number of selected component (if applicable):
Host:
ovirt-vmconsole-host-1.0.3-1.el7ev.noarch
ovirt-hosted-engine-ha-2.0.0-1.el7ev.noarch
ovirt-engine-sdk-python-3.6.7.0-1.el7ev.noarch
libvirt-client-1.2.17-13.el7_2.5.x86_64
ovirt-host-deploy-1.5.0-1.el7ev.noarch
ovirt-hosted-engine-setup-2.0.0.2-1.el7ev.noarch
ovirt-setup-lib-1.0.2-1.el7ev.noarch
qemu-kvm-rhev-2.3.0-31.el7_2.17.x86_64
mom-0.5.5-1.el7ev.noarch
ovirt-vmconsole-1.0.3-1.el7ev.noarch
ovirt-imageio-common-0.3.0-0.el7ev.noarch
vdsm-4.18.5.1-1.el7ev.x86_64
rhev-release-4.0.1-1-001.noarch
sanlock-3.2.4-2.el7_2.x86_64
ovirt-imageio-daemon-0.3.0-0.el7ev.noarch
Linux version 3.10.0-327.28.2.el7.x86_64 (mockbuild@x86-017.build.eng.bos.redhat.com) (gcc version 4.8.5 20150623 (Red Hat 4.8.5-4) (GCC) ) #1 SMP Mon Jun 27 14:48:28 EDT 2016
Linux 3.10.0-327.28.2.el7.x86_64 #1 SMP Mon Jun 27 14:48:28 EDT 2016 x86_64 x86_64 x86_64 GNU/Linux
Red Hat Enterprise Linux Server release 7.2 (Maipo)

Engine:
rhevm-doc-4.0.0-2.el7ev.noarch
rhevm-setup-plugins-4.0.0.1-1.el7ev.noarch
rhevm-spice-client-x64-msi-4.0-2.el7ev.noarch
rhevm-4.0.2-0.2.rc1.el7ev.noarch
rhev-release-4.0.0-19-001.noarch
rhev-release-4.0.1-1-001.noarch
rhevm-guest-agent-common-1.0.12-2.el7ev.noarch
rhevm-dependencies-4.0.0-1.el7ev.noarch
rhevm-branding-rhev-4.0.0-2.el7ev.noarch
rhevm-spice-client-x86-msi-4.0-2.el7ev.noarch
rhev-guest-tools-iso-4.0-2.el7ev.noarch
Linux version 3.10.0-327.22.2.el7.x86_64 (mockbuild@x86-030.build.eng.bos.redhat.com) (gcc version 4.8.5 20150623 (Red Hat 4.8.5-4) (GCC) ) #1 SMP Thu Jun 9 10:09:10 EDT 2016
Linux 3.10.0-327.22.2.el7.x86_64 #1 SMP Thu Jun 9 10:09:10 EDT 2016 x86_64 x86_64 x86_64 GNU/Linux
Red Hat Enterprise Linux Server release 7.2 (Maipo)

Comment 2 Nikolai Sednev 2016-07-07 14:43:13 UTC
Created attachment 1177353 [details]
sosreport from host that being set to maintenance

Comment 3 Nikolai Sednev 2016-07-07 14:45:04 UTC
Created attachment 1177354 [details]
sosreport from the engine

Comment 4 Nikolai Sednev 2016-07-07 14:45:35 UTC
Setting maintenance mode from the engine will disconnect also the HE shared storage domain and so hosted-engine --vm-status will fail.

Comment 5 Simone Tiraboschi 2016-07-07 15:14:28 UTC
ovirt-ha-agent will try reconnect the HE storage about every 35 seconds also while in maintenance mode so, after about 40 seconds, hosted-engine --vm-status will be fine again by itself also while in maintenance mode.

Not that serious but still worth to be fixed since it's introducing a lot of errors in the logs.
Lowering the severity.

Comment 8 Simone Tiraboschi 2018-02-12 09:52:50 UTC
It's there since ovirt-hosted-engine-ha-2.2.0

Comment 9 Nikolai Sednev 2018-02-12 11:59:34 UTC
For reproduction set ha-hsot without SHE-VM in to local maintenance in UI, wait until it becomes in local maintenance in UI, then activate the host back via UI and check in CLI "hosted-engine --vm-status", you will get stuck for a moment and so abort, using ctrl+c sequence, then you will see the same errors.
After ~40 seconds errors are gone and all working just fine.
I've seen errors on activating the host back:
[root@alma03 ~]# hosted-engine --vm-status
^CTraceback (most recent call last):
  File "/usr/lib64/python2.7/runpy.py", line 162, in _run_module_as_main
    "__main__", fname, loader, pkg_name)
  File "/usr/lib64/python2.7/runpy.py", line 72, in _run_code
    exec code in run_globals
  File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_setup/vm_status.py", line 213, in <module>
    if not status_checker.print_status():
  File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_setup/vm_status.py", line 110, in print_status
    all_host_stats = self._get_all_host_stats()
  File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_setup/vm_status.py", line 75, in _get_all_host_stats
    all_host_stats = ha_cli.get_all_host_stats()
  File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/client/client.py", line 148, in get_all_host_stats
    return self.get_all_stats(self.StatModes.HOST)
  File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/client/client.py", line 93, in get_all_stats
    stats = broker.get_stats_from_storage()
  File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/lib/brokerlink.py", line 135, in get_stats_from_storage
    result = self._proxy.get_stats()
  File "/usr/lib64/python2.7/xmlrpclib.py", line 1233, in __call__
    return self.__send(self.__name, args)
  File "/usr/lib64/python2.7/xmlrpclib.py", line 1587, in __request
    verbose=self.__verbose
  File "/usr/lib64/python2.7/xmlrpclib.py", line 1273, in request
    return self.single_request(host, handler, request_body, verbose)
  File "/usr/lib64/python2.7/xmlrpclib.py", line 1303, in single_request
    response = h.getresponse(buffering=True)
  File "/usr/lib64/python2.7/httplib.py", line 1089, in getresponse
    response.begin()
  File "/usr/lib64/python2.7/httplib.py", line 444, in begin
    version, status, reason = self._read_status()
  File "/usr/lib64/python2.7/httplib.py", line 400, in _read_status
    line = self.fp.readline(_MAXLINE + 1)
  File "/usr/lib64/python2.7/socket.py", line 476, in readline
    data = self._sock.recv(self._rbufsize)
KeyboardInterrupt

Comment 10 Red Hat Bugzilla Rules Engine 2018-02-12 11:59:40 UTC
Target release should be placed once a package build is known to fix a issue. Since this bug is not modified, the target version has been reset. Please use target milestone to plan a fix for a oVirt release.

Comment 11 Sandro Bonazzola 2018-02-19 08:32:29 UTC
The original issue was an exception while connecting:
ovirt_hosted_engine_ha.lib.exceptions.RequestError:

not a keyboard interrupt. Please wait instead of interrupting, the solution was around avoinding the request errorl.

Comment 12 Nikolai Sednev 2018-02-19 10:32:20 UTC
(In reply to Sandro Bonazzola from comment #11)
> The original issue was an exception while connecting:
> ovirt_hosted_engine_ha.lib.exceptions.RequestError:
> 
> not a keyboard interrupt. Please wait instead of interrupting, the solution
> was around avoinding the request errorl.
Original issue was not reproduced by following it's original reproduction steps.
Moving this specific bug to verified and opening a new one to cover the comment #9: https://bugzilla.redhat.com/show_bug.cgi?id=1546679

Comment 13 Sandro Bonazzola 2018-02-22 10:01:14 UTC
This bugzilla is included in oVirt 4.2.1 release, published on Feb 12th 2018.

Since the problem described in this bug report should be
resolved in oVirt 4.2.1 release, it has been closed with a resolution of CURRENT RELEASE.

If the solution does not work for you, please open a new bug report.


Note You need to log in before you can comment on or make changes to this bug.