Description of problem: The default timeout on oVirt python SDK is infinite so, if for any reason an API call got stuck, it will simply wait forever and so the application, ovirt-hosted-engine-setup, will also got stuck forever. We saw it just once in Lago env. ovirt-hosted-engine-setup asked the engine to add the log via the REST API and then it start polling on the REST API with the oVirt python SDK waiting for the host to come up at engine eyes before continuing. hosts.add triggered host-deploy which reconfigured and restarted iptables on the host while ovirt-hosted-engine-setup was polling on the REST API. We found ovirt-hosted-engine-setup stuck at 2017-01-05 06:22:44 DEBUG otopi.plugins.ovirt_hosted_engine_setup.engine.add_host add_host._wait_host_ready:96 VDSM host in installing state 2017-01-05 06:22:45 DEBUG otopi.plugins.ovirt_hosted_engine_setup.engine.add_host add_host._wait_host_ready:96 VDSM host in installing state 2017-01-05 06:22:46 DEBUG otopi.plugins.ovirt_hosted_engine_setup.engine.add_host add_host._wait_host_ready:96 VDSM host in installing state And after two hours it was still there. checking iptables status we see that it got restarted by host-deploy exactly at: gen 05 06:22:47 lago-he-basic-suite-3-6-host0 systemd[1]: Starting IPv4 firewall with iptables... gen 05 06:22:47 lago-he-basic-suite-3-6-host0 iptables.init[15013]: iptables: Applying firewall rules: [ OK ] gen 05 06:22:47 lago-he-basic-suite-3-6-host0 systemd[1]: Started IPv4 firewall with iptables. And in iptables configuration we have: ACCEPT all -- anywhere anywhere state RELATED,ESTABLISHED At the end on the host we can see: ESTAB 0 0 192.168.202.3:60572 192.168.202.99:https but we have no sign of the counter part connection on the engine VM so, having no timeout at all, it got stuck forever. Version-Release number of selected component (if applicable): ovirt-hosted-engine-setup.noarch 1.3.7.4-0.0.master.20160823094509.git7add02e.el7.centos How reproducible: really really difficult, it's a race conditions between opening the TCP connection and restarting iptables. AFAIK the only case where it could happen is that SYN and SYN-ACK got correctly delivered but the ACK packet got lost so the client (oVirt python SDK on the engine VM) thinks that the connection is ESTABLISHED while the server (httpd on the engine VM) no. Steps to Reproduce: 1. deploy hosted-engine 2. 3. Actual results: ovirt-hosted-engine-setup stuck forever an an API call, the last line in the log is VDSM host in installing state Expected results: The timeout will trigger and ovirt-hosted-engine-setup will try polling again Additional info:
*** Bug 1406486 has been marked as a duplicate of this bug. ***
I see Doc Type: Bug Fix, no?
Works for me on these components on host: rhvm-appliance-4.1.20170119.1-1.el7ev.noarch ovirt-hosted-engine-ha-2.1.0-1.el7ev.noarch ovirt-hosted-engine-setup-2.1.0-2.el7ev.noarch ovirt-host-deploy-1.6.0-1.el7ev.noarch ovirt-imageio-common-0.5.0-0.el7ev.noarch ovirt-vmconsole-host-1.0.4-1.el7ev.noarch qemu-kvm-rhev-2.6.0-28.el7_3.3.x86_64 libvirt-client-2.0.0-10.el7_3.4.x86_64 mom-0.5.8-1.el7ev.noarch vdsm-4.19.2-2.el7ev.x86_64 ovirt-setup-lib-1.1.0-1.el7ev.noarch ovirt-engine-sdk-python-3.6.9.1-1.el7ev.noarch ovirt-imageio-daemon-0.5.0-0.el7ev.noarch ovirt-vmconsole-1.0.4-1.el7ev.noarch sanlock-3.4.0-1.el7.x86_64 Linux version 3.10.0-514.6.1.el7.x86_64 (mockbuild.eng.bos.redhat.com) (gcc version 4.8.5 20150623 (Red Hat 4.8.5-11) (GCC) ) #1 SMP Sat Dec 10 11:15:38 EST 2016 Linux 3.10.0-514.6.1.el7.x86_64 #1 SMP Sat Dec 10 11:15:38 EST 2016 x86_64 x86_64 x86_64 GNU/Linux Red Hat Enterprise Linux Server release 7.3 (Maipo) On engine: rhev-guest-tools-iso-4.1-3.el7ev.noarch rhevm-doc-4.1.0-1.el7ev.noarch rhevm-dependencies-4.1.0-1.el7ev.noarch rhevm-setup-plugins-4.1.0-1.el7ev.noarch rhevm-4.1.0.1-0.1.el7.noarch rhevm-guest-agent-common-1.0.12-3.el7ev.noarch rhevm-branding-rhev-4.1.0-0.el7ev.noarch Linux version 3.10.0-514.6.1.el7.x86_64 (mockbuild.eng.bos.redhat.com) (gcc version 4.8.5 20150623 (Red Hat 4.8.5-11) (GCC) ) #1 SMP Sat Dec 10 11:15:38 EST 2016 Linux 3.10.0-514.6.1.el7.x86_64 #1 SMP Sat Dec 10 11:15:38 EST 2016 x86_64 x86_64 x86_64 GNU/Linux Red Hat Enterprise Linux Server release 7.3 (Maipo)