Bug 1460982
Summary: | [downstream clone - 4.1.5] [TEXT] Error message is confusing when hosted-engine Storage Domain can't be mounted | ||||||
---|---|---|---|---|---|---|---|
Product: | Red Hat Enterprise Virtualization Manager | Reporter: | rhev-integ | ||||
Component: | ovirt-hosted-engine-setup | Assignee: | Jenny Tokar <jtokar> | ||||
Status: | CLOSED WORKSFORME | QA Contact: | Nikolai Sednev <nsednev> | ||||
Severity: | medium | Docs Contact: | |||||
Priority: | medium | ||||||
Version: | 4.0.6 | CC: | dfediuck, gveitmic, jtokar, lsurette, mavital, mgoldboi, rmcswain, ykaul, ylavi | ||||
Target Milestone: | ovirt-4.1.5 | Keywords: | ZStream | ||||
Target Release: | --- | ||||||
Hardware: | Unspecified | ||||||
OS: | Unspecified | ||||||
Whiteboard: | |||||||
Fixed In Version: | ovirt-hosted-engine-setup-2.1.3.1-1.el7ev | Doc Type: | No Doc Update | ||||
Doc Text: |
undefined
|
Story Points: | --- | ||||
Clone Of: | 1434209 | Environment: | |||||
Last Closed: | 2017-08-20 10:50:39 UTC | Type: | --- | ||||
Regression: | --- | Mount Type: | --- | ||||
Documentation: | --- | CRM: | |||||
Verified Versions: | Category: | --- | |||||
oVirt Team: | SLA | RHEL 7.3 requirements from Atomic Host: | |||||
Cloudforms Team: | --- | Target Upstream Version: | |||||
Embargoed: | |||||||
Bug Depends On: | 1434209 | ||||||
Bug Blocks: | 1455341 | ||||||
Attachments: |
|
Description
rhev-integ
2017-06-13 10:08:54 UTC
I'm getting this error: ovirt_hosted_engine_ha.lib.exceptions.RequestError: Failed to set storage domain FilesystemBackend, options {'dom_type': 'nfs3', 'sd_uuid': '2e359244-de14-454c-b4aa-5f289da01e92'}: Connection timed out 1)Cast on alma04 "iptables -A OUTPUT -p udp -d <ip address of storage> --dport 2049 -j DROP" and "iptables -A OUTPUT -p tcp -d <ip address of storage> --dport 2049 -j DROP". 2)Cast on alma04 "systemctl restart ovirt-ha-broker && systemctl restart ovirt-ha-agent" and this caused for several minutes of host not being responsive (looked like it was stuck for several minutes). 3)alma04 ~]# hosted-engine --vm-status alma04 ~]# hosted-engine --vm-status Traceback (most recent call last): File "/usr/lib64/python2.7/runpy.py", line 162, in _run_module_as_main "__main__", fname, loader, pkg_name) File "/usr/lib64/python2.7/runpy.py", line 72, in _run_code exec code in run_globals File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_setup/vm_status.py", line 173, in <module> if not status_checker.print_status(): File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_setup/vm_status.py", line 103, in print_status all_host_stats = self._get_all_host_stats() File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_setup/vm_status.py", line 73, in _get_all_host_stats all_host_stats = ha_cli.get_all_host_stats() File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/client/client.py", line 160, in get_all_host_stats return self.get_all_stats(self.StatModes.HOST) File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/client/client.py", line 103, in get_all_stats self._configure_broker_conn(broker) File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/client/client.py", line 180, in _configure_broker_conn dom_type=dom_type) File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/lib/brokerlink.py", line 177, in set_storage_domain .format(sd_type, options, e)) ovirt_hosted_engine_ha.lib.exceptions.RequestError: Failed to set storage domain FilesystemBackend, options {'dom_type': 'nfs3', 'sd_uuid': '2e359244-de14-454c-b4aa-5f289da01e92'}: Connection timed out For ports and NFS detailed explanation please refer to https://www.centos.org/docs/5/html/Deployment_Guide-en-US/ch-nfs.html. Components on hosts: qemu-kvm-rhev-2.9.0-14.el7.x86_64 ovirt-vmconsole-host-1.0.4-1.el7ev.noarch mom-0.5.9-1.el7ev.noarch ovirt-imageio-daemon-1.0.0-0.el7ev.noarch ovirt-setup-lib-1.1.3-1.el7ev.noarch ovirt-imageio-common-1.0.0-0.el7ev.noarch ovirt-vmconsole-1.0.4-1.el7ev.noarch vdsm-4.19.20-1.el7ev.x86_64 ovirt-hosted-engine-ha-2.1.4-1.el7ev.noarch libvirt-client-3.2.0-14.el7.x86_64 ovirt-hosted-engine-setup-2.1.3.2-1.el7ev.noarch sanlock-3.5.0-1.el7.x86_64 ovirt-host-deploy-1.6.6-1.el7ev.noarch ovirt-engine-sdk-python-3.6.9.1-1.el7ev.noarch Linux version 3.10.0-663.el7.x86_64 (mockbuild.eng.bos.redhat.com) (gcc version 4.8.5 20150623 (Red Hat 4.8.5-14) (GCC) ) #1 SMP Tue May 2 16:00:29 EDT 2017 Linux 3.10.0-663.el7.x86_64 #1 SMP Tue May 2 16:00:29 EDT 2017 x86_64 x86_64 x86_64 GNU/Linux Red Hat Enterprise Linux Server release 7.4 (Maipo) I did not seen something like: - "The hosted engine configuration has not been retrieved from shared storage. Please ensure that ovirt-ha-agent is running and the storage server is reachable." May you provide more information on this please? Moving back to assigned as an error being received is different from the expected. The tested scenario is a bit different, here /var/run/ovirt-hosted-engine-ha/vm.conf exists in the system so the check that shows the error is passed successfully and the error is thrown from another place in the code. (In reply to Jenny Tokar from comment #7) > The tested scenario is a bit different, here > /var/run/ovirt-hosted-engine-ha/vm.conf exists in the system so the check > that shows the error is passed successfully and the error is thrown from > another place in the code. Reproduction was made correctly, I've caused an outage of storage using iptables, which is totally possible imitation of storage disconnection, which could happen either somewhere within the network or by disconnecting the storage physically and error was thrown not as was expected. The same errors being dropped out if you'll disable NIC and then will run "systemctl restart ovirt-ha-broker && systemctl restart ovirt-ha-agent" and then "hosted-engine --vm-status" on host with HE-VM. Created attachment 1294501 [details]
Screenshot from 2017-07-05 11-43-24.png
I didn't say it was incorrect, just that it causes a different reaction. This will be fixed by a different error message. I'm still getting the same error:
# iptables -A OUTPUT -p udp -d 10.35.80.5 --dport 2049 -j DROP
# iptables -A OUTPUT -p tcp -d 10.35.80.5 --dport 2049 -j DROP
# hosted-engine --vm-statussystemctl restart ovirt-ha-broker && systemctl restart ovirt-ha-agent &&
>
>
>
>
> ^C
[root@puma18 ~]# hosted-engine --vm-status && systemctl restart ovirt-ha-broker && systemctl restart ovirt-ha-agent && hosted-engine --vm-status
Traceback (most recent call last):
File "/usr/lib64/python2.7/runpy.py", line 162, in _run_module_as_main
"__main__", fname, loader, pkg_name)
File "/usr/lib64/python2.7/runpy.py", line 72, in _run_code
exec code in run_globals
File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_setup/vm_status.py", line 180, in <module>
if not status_checker.print_status():
File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_setup/vm_status.py", line 104, in print_status
all_host_stats = self._get_all_host_stats()
File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_setup/vm_status.py", line 73, in _get_all_host_stats
all_host_stats = ha_cli.get_all_host_stats()
File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/client/client.py", line 160, in get_all_host_stats
return self.get_all_stats(self.StatModes.HOST)
File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/client/client.py", line 103, in get_all_stats
self._configure_broker_conn(broker)
File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/client/client.py", line 180, in _configure_broker_conn
dom_type=dom_type)
File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/lib/brokerlink.py", line 177, in set_storage_domain
.format(sd_type, options, e))
ovirt_hosted_engine_ha.lib.exceptions.RequestError: Failed to set storage domain FilesystemBackend, options {'dom_type': 'nfs3', 'sd_uuid': '22488d8d-6f11-4c9d-b129-9a6e492d7a16'}: Connection timed out
Version-Release number of selected component: ovirt-hosted-engine-setup-2.1.3.6-1.el7ev.noarch ovirt-hosted-engine-ha-2.1.5-1.el7ev.noarch vdsm-4.19.27-1.el7ev.x86_64 qemu-kvm-rhev-2.9.0-16.el7_4.3.x86_64 ovirt-host-deploy-1.6.6-1.el7ev.noarch libvirt-client-3.2.0-14.el7_4.2.x86_64 libvirt-lock-sanlock-3.2.0-14.el7_4.2.x86_64 I've just tested this on ovirt-hosted-engine-setup-2.2.0-0.0.master.20170814052558.git066c94c.el7.centos.noarch and there it appears to be working fine, although with huge delay: [root@alma03 ~]# iptables -A OUTPUT -p udp -d <ip of SHE's storage server here> --dport 2049 -j DROP && iptables -A OUTPUT -p tcp -d 10.35.80.5 --dport 2049 -j DROP && systemctl restart ovirt-ha-broker && systemctl restart ovirt-ha-agent . Some delay of several minutes happens here... . [root@alma03 ~]# hosted-engine --vm-status The hosted engine configuration has not been retrieved from shared storage. Please ensure that ovirt-ha-agent is running and the storage server is reachable. I've also noticed for error message in /var/log/messages: Aug 17 10:42:01 alma03 kernel: watchdog watchdog0: watchdog did not stop! Aug 17 10:42:01 alma03 wdmd[698]: /dev/watchdog0 closed unclean Please see my reproduction on upstream latest build. I do not see the point in investing more time for an error message, which mostly works. Doron, is this working as expected if I'm getting this on latest 4.1.5? Wasn't this bug on fixing for the error message received after running "hosted-engine --vm-status" command on host without connectivity to its hosted-egnine storage domain? The error message which I'm getting is still the same as when this bug was opened: [root@puma18 ~]# hosted-engine --vm-status Traceback (most recent call last): File "/usr/lib64/python2.7/runpy.py", line 162, in _run_module_as_main "__main__", fname, loader, pkg_name) File "/usr/lib64/python2.7/runpy.py", line 72, in _run_code exec code in run_globals File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_setup/vm_status.py", line 180, in <module> if not status_checker.print_status(): File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_setup/vm_status.py", line 104, in print_status all_host_stats = self._get_all_host_stats() File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_setup/vm_status.py", line 73, in _get_all_host_stats all_host_stats = ha_cli.get_all_host_stats() File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/client/client.py", line 160, in get_all_host_stats return self.get_all_stats(self.StatModes.HOST) File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/client/client.py", line 103, in get_all_stats self._configure_broker_conn(broker) File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/client/client.py", line 180, in _configure_broker_conn dom_type=dom_type) File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/lib/brokerlink.py", line 177, in set_storage_domain .format(sd_type, options, e)) ovirt_hosted_engine_ha.lib.exceptions.RequestError: Failed to set storage domain FilesystemBackend, options {'dom_type': 'nfs3', 'sd_uuid': '22488d8d-6f11-4c9d-b129-9a6e492d7a16'}: Connection timed out (In reply to Nikolai Sednev from comment #18) > Doron, is this working as expected if I'm getting this on latest 4.1.5? > Wasn't this bug on fixing for the error message received after running > "hosted-engine --vm-status" command on host without connectivity to its > hosted-egnine storage domain? > The error message which I'm getting is still the same as when this bug was > opened: I didn't say as expected, I said we invested too much time in it and we can live with the below message, so it works for me- > ovirt_hosted_engine_ha.lib.exceptions.RequestError: Failed to set storage > domain FilesystemBackend, options {'dom_type': 'nfs3', 'sd_uuid': > '22488d8d-6f11-4c9d-b129-9a6e492d7a16'}: Connection timed out |