Description of problem: hosted-engine --vm-status returns errors in CLI, when host being set in to maintenance via WEBUI. [root@alma03 ~]# hosted-engine --vm-status Traceback (most recent call last): File "/usr/lib64/python2.7/runpy.py", line 162, in _run_module_as_main "__main__", fname, loader, pkg_name) File "/usr/lib64/python2.7/runpy.py", line 72, in _run_code exec code in run_globals File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_setup/vm_status.py", line 170, in <module> if not status_checker.print_status(): File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_setup/vm_status.py", line 104, in print_status all_host_stats = self._get_all_host_stats() File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_setup/vm_status.py", line 74, in _get_all_host_stats all_host_stats = ha_cli.get_all_host_stats() File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/client/client.py", line 160, in get_all_host_stats return self.get_all_stats(self.StatModes.HOST) File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/client/client.py", line 103, in get_all_stats self._configure_broker_conn(broker) File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/client/client.py", line 180, in _configure_broker_conn dom_type=dom_type) File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/lib/brokerlink.py", line 176, in set_storage_domain .format(sd_type, options, e)) ovirt_hosted_engine_ha.lib.exceptions.RequestError: Failed to set storage domain FilesystemBackend, options {'dom_type': 'nfs3', 'sd_uuid': '29d459ea-989d-4127-b996-248928adf543'}: Request failed: <class 'ovirt_hosted_engine_ha.lib.storage_backends.BackendFailureException'> Version-Release number of selected component (if applicable): How reproducible: 100% Steps to Reproduce: 1.Deploy HE on two hosts over NFS. 2.Set one of the hosts in to maintenance via WEBUI. 3. Actual results: Errors being received in host's CLI and hosted-engine --vm-status not responding properly on host in maintenance. Expected results: "hosted-engine --vm-status" should return correct status without errors. Additional info:
Forgot to add component versions, so adding them here: Version-Release number of selected component (if applicable): Host: ovirt-vmconsole-host-1.0.3-1.el7ev.noarch ovirt-hosted-engine-ha-2.0.0-1.el7ev.noarch ovirt-engine-sdk-python-3.6.7.0-1.el7ev.noarch libvirt-client-1.2.17-13.el7_2.5.x86_64 ovirt-host-deploy-1.5.0-1.el7ev.noarch ovirt-hosted-engine-setup-2.0.0.2-1.el7ev.noarch ovirt-setup-lib-1.0.2-1.el7ev.noarch qemu-kvm-rhev-2.3.0-31.el7_2.17.x86_64 mom-0.5.5-1.el7ev.noarch ovirt-vmconsole-1.0.3-1.el7ev.noarch ovirt-imageio-common-0.3.0-0.el7ev.noarch vdsm-4.18.5.1-1.el7ev.x86_64 rhev-release-4.0.1-1-001.noarch sanlock-3.2.4-2.el7_2.x86_64 ovirt-imageio-daemon-0.3.0-0.el7ev.noarch Linux version 3.10.0-327.28.2.el7.x86_64 (mockbuild.eng.bos.redhat.com) (gcc version 4.8.5 20150623 (Red Hat 4.8.5-4) (GCC) ) #1 SMP Mon Jun 27 14:48:28 EDT 2016 Linux 3.10.0-327.28.2.el7.x86_64 #1 SMP Mon Jun 27 14:48:28 EDT 2016 x86_64 x86_64 x86_64 GNU/Linux Red Hat Enterprise Linux Server release 7.2 (Maipo) Engine: rhevm-doc-4.0.0-2.el7ev.noarch rhevm-setup-plugins-4.0.0.1-1.el7ev.noarch rhevm-spice-client-x64-msi-4.0-2.el7ev.noarch rhevm-4.0.2-0.2.rc1.el7ev.noarch rhev-release-4.0.0-19-001.noarch rhev-release-4.0.1-1-001.noarch rhevm-guest-agent-common-1.0.12-2.el7ev.noarch rhevm-dependencies-4.0.0-1.el7ev.noarch rhevm-branding-rhev-4.0.0-2.el7ev.noarch rhevm-spice-client-x86-msi-4.0-2.el7ev.noarch rhev-guest-tools-iso-4.0-2.el7ev.noarch Linux version 3.10.0-327.22.2.el7.x86_64 (mockbuild.eng.bos.redhat.com) (gcc version 4.8.5 20150623 (Red Hat 4.8.5-4) (GCC) ) #1 SMP Thu Jun 9 10:09:10 EDT 2016 Linux 3.10.0-327.22.2.el7.x86_64 #1 SMP Thu Jun 9 10:09:10 EDT 2016 x86_64 x86_64 x86_64 GNU/Linux Red Hat Enterprise Linux Server release 7.2 (Maipo)
Created attachment 1177353 [details] sosreport from host that being set to maintenance
Created attachment 1177354 [details] sosreport from the engine
Setting maintenance mode from the engine will disconnect also the HE shared storage domain and so hosted-engine --vm-status will fail.
ovirt-ha-agent will try reconnect the HE storage about every 35 seconds also while in maintenance mode so, after about 40 seconds, hosted-engine --vm-status will be fine again by itself also while in maintenance mode. Not that serious but still worth to be fixed since it's introducing a lot of errors in the logs. Lowering the severity.
It's there since ovirt-hosted-engine-ha-2.2.0
For reproduction set ha-hsot without SHE-VM in to local maintenance in UI, wait until it becomes in local maintenance in UI, then activate the host back via UI and check in CLI "hosted-engine --vm-status", you will get stuck for a moment and so abort, using ctrl+c sequence, then you will see the same errors. After ~40 seconds errors are gone and all working just fine. I've seen errors on activating the host back: [root@alma03 ~]# hosted-engine --vm-status ^CTraceback (most recent call last): File "/usr/lib64/python2.7/runpy.py", line 162, in _run_module_as_main "__main__", fname, loader, pkg_name) File "/usr/lib64/python2.7/runpy.py", line 72, in _run_code exec code in run_globals File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_setup/vm_status.py", line 213, in <module> if not status_checker.print_status(): File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_setup/vm_status.py", line 110, in print_status all_host_stats = self._get_all_host_stats() File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_setup/vm_status.py", line 75, in _get_all_host_stats all_host_stats = ha_cli.get_all_host_stats() File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/client/client.py", line 148, in get_all_host_stats return self.get_all_stats(self.StatModes.HOST) File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/client/client.py", line 93, in get_all_stats stats = broker.get_stats_from_storage() File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/lib/brokerlink.py", line 135, in get_stats_from_storage result = self._proxy.get_stats() File "/usr/lib64/python2.7/xmlrpclib.py", line 1233, in __call__ return self.__send(self.__name, args) File "/usr/lib64/python2.7/xmlrpclib.py", line 1587, in __request verbose=self.__verbose File "/usr/lib64/python2.7/xmlrpclib.py", line 1273, in request return self.single_request(host, handler, request_body, verbose) File "/usr/lib64/python2.7/xmlrpclib.py", line 1303, in single_request response = h.getresponse(buffering=True) File "/usr/lib64/python2.7/httplib.py", line 1089, in getresponse response.begin() File "/usr/lib64/python2.7/httplib.py", line 444, in begin version, status, reason = self._read_status() File "/usr/lib64/python2.7/httplib.py", line 400, in _read_status line = self.fp.readline(_MAXLINE + 1) File "/usr/lib64/python2.7/socket.py", line 476, in readline data = self._sock.recv(self._rbufsize) KeyboardInterrupt
Target release should be placed once a package build is known to fix a issue. Since this bug is not modified, the target version has been reset. Please use target milestone to plan a fix for a oVirt release.
The original issue was an exception while connecting: ovirt_hosted_engine_ha.lib.exceptions.RequestError: not a keyboard interrupt. Please wait instead of interrupting, the solution was around avoinding the request errorl.
(In reply to Sandro Bonazzola from comment #11) > The original issue was an exception while connecting: > ovirt_hosted_engine_ha.lib.exceptions.RequestError: > > not a keyboard interrupt. Please wait instead of interrupting, the solution > was around avoinding the request errorl. Original issue was not reproduced by following it's original reproduction steps. Moving this specific bug to verified and opening a new one to cover the comment #9: https://bugzilla.redhat.com/show_bug.cgi?id=1546679
This bugzilla is included in oVirt 4.2.1 release, published on Feb 12th 2018. Since the problem described in this bug report should be resolved in oVirt 4.2.1 release, it has been closed with a resolution of CURRENT RELEASE. If the solution does not work for you, please open a new bug report.