Created attachment 940438 [details] engine log Description of problem: When a host is not accessible using its dns name (e.g. voodoo1.example.com) engine fail to connect to the host with java.net.UnknownHostException. In the event log, we see: Host voodoo1 is non responsive. And later: Host voodoo1 is not responding. It will stay in Connecting state for a grace period of 120 seconds and after that an attempt to fence the host will be issued. Since the host cannot be accessed, fencing the host fails, the host stays in non-operational state. Version-Release number of selected component (if applicable): 3.5.0-0.0.master.20140911091402.gite1c5ffd.fc20 How reproducible: 100% Steps to Reproduce: 1. Cause dns to host to fail (somehow) 2. Activate the host Actual results: Host stay in non-responsive mode forever Expected results: Complain about inaccessible host Additional info: I don't know why the host was not accessible using the dns name. While engine was failing pathetically with java.net.UnknownHostException I could ping and access this host using ssh from the same machine engine was running. Understanding the root cause of this exception is another issue, I'm not sure there is enough information here to resolve it.
Can't reproduce step 1 ^^ , edited /etc/resolv.conf only with 127.0.0.1 and couldn't ping to hostname, but managed to activate host with success in rhev-M.
Try executing a new action on the host. For example opening the setup networks dialog and changing something seems to work.
Hi Marcin, I had no issues to execute actions on host. Network changes were saved.
Did you change any networks? Try adding a new network to a nic, or change its properties. When I just click 'Ok', the connection is ok.
Yes, i removed network from NIC with success
This behaves differently than on my setup. Restarting the engine should always work.
Hi again Marcin, So i performed the next steps --> 1) RHEV-M 3.6.0-0.11.master.el6 with 5 servers installed (vdsm-4.17.2-1.el7ev.noarch) 2) Set 1 server to maintenance 3) edited /etc/resolv.conf only with : nameserver 127.0.0.1 and saved. 4) As long ovirt-engine service wasn't restarted, all servers in engine stayed UP and i managed to set UP the server from maintenance and perform some networks changes via Setup Networks with success. 5) Once restarted ovirt-engine, all servers changed their states for Non-responsive states. Couldn't put UP server from maintenance with error: "The address of host silver-vdsa.qa.lab.tlv.redhat.com could not be determined" This error message was displayed in the event log for all the servers in the setup. - So Marcin, this is the fix for this bug? The error message for cases in which host's hostname can't be resolved- "The address of host 'name-server' could not be determined" ? Please ACK, so i can move this bug to verified, thanks.
Yes, this is the added error message which should be appearing in this scenario.
Verified on - 3.6.0-0.11.master.el6
oVirt 3.6.0 has been released on November 4th, 2015 and should fix this issue. If problems still persist, please open a new BZ and reference this one.