Description of problem: When activating a hypervisor with more than one NFS storage domain configurated for the datacenter and the connection to the storage is broker than the host is flipping between "Up" and "not responding". That causes the WebAdmin to show the host as connecting for a long time. This is a problem for automation tasks. For example, activating the host via ansible roles fails and it is almost impossible to properly handle the error messages as the host cannot be activated or put into the maintenance mode. Version-Release number of selected component (if applicable): rhvm-4.2.1.6-0.1.el7.noarch How reproducible: 100% Steps to Reproduce: 1) deploy one host 2) Create two NFS SDs 3) Deactivate the host 4) Block the communication on the NFS share iptables -I INPUT -s ${Hpypervisor_IP} -j REJECT 4) Activate teh host Actual results: The host stays in the connecting state for a long time. Expected results: The host is marked as non-operational after a defined amount of time. Additional info: The engine sends repetitively ConnectStorageServerVDSCommand which returns VDSNetworkException as it takes a long time. Rhe issue does not occur when only one storage domain is connected. This is also happening on previous versions.
The regular timeout for NFS mount is 70 seconds. Is this the time the host is stuck in connecting?
*** Bug 1580243 has been marked as a duplicate of this bug. ***
(In reply to Yaniv Lavi from comment #1) > The regular timeout for NFS mount is 70 seconds. > Is this the time the host is stuck in connecting? Hi Yaniv, I think the other bug, which you set as duplicate, had a bit more info and some discussions already done. Anyway, this is not just NFS and not only caused by NFS timeout mount. ConnectStorageServerVDSCommand can take longer, due to network(TCP)/storage/... delays. If this happens, the engine throws VDSNetworkException and tries again. And again, and again, in a loop. The host is always on connecting -> not responding -> connecting dance. The correct status would be Non-Operational, not the Connecting->NotResponding dance. Or as Nir suggested on the other bug, this could be async.
*** This bug has been marked as a duplicate of bug 1580243 ***