Created attachment 1239364 [details] engine logs Description of problem: Hosts moving to connecting state if one of the servers in the DC is in non-responsive state Version-Release number of selected component (if applicable): 4.1.0-0.4.master.20170110134514.git1586fd4.el7.centos vdsm-4.19.1-26.gitc25fa08.el7.centos.x86_64 How reproducible: 100% Steps to Reproduce: 1. Have few hosts in a DC 2. Make one host non-responsive(stop vdsmd) or try to add host and fail 3. All servers and storage domain are going down, DC is down and all serves stuck in connecting state forever. Only engine restart make them come UP again.
Created attachment 1239365 [details] new engine log
There is wrong version of the library used so changing the version.
When I stop vdsm on one of the hosts (with PM) it stays in connecting for 60s and doesn't do anything to the other hosts. But after that, it goes to non responsive and isn't fenced, should I report this as a new bug or move this one to assigned?
Petr fencing is not part of this patch. I suggest to open new BZ for it.
In that case, verified on 4.1.0-8