Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 1412092

Summary: Hosts moving to connecting state if one of the servers in the DC is in non-responsive state
Product: [oVirt] ovirt-engine Reporter: Michael Burman <mburman>
Component: BLL.InfraAssignee: Piotr Kliczewski <pkliczew>
Status: CLOSED CURRENTRELEASE QA Contact: Petr Matyáš <pmatyas>
Severity: urgent Docs Contact:
Priority: unspecified    
Version: futureCC: bugs, gklein, mperina, oourfali, pkliczew
Target Milestone: ovirt-4.1.0-rcKeywords: Regression
Target Release: 4.1.0Flags: rule-engine: ovirt-4.1+
rule-engine: blocker+
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2017-02-01 14:59:03 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: Infra RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
engine logs
none
new engine log none

Description Michael Burman 2017-01-11 09:01:55 UTC
Created attachment 1239364 [details]
engine logs

Description of problem:
Hosts moving to connecting state if one of the servers in the DC is in non-responsive state

Version-Release number of selected component (if applicable):
4.1.0-0.4.master.20170110134514.git1586fd4.el7.centos
vdsm-4.19.1-26.gitc25fa08.el7.centos.x86_64

How reproducible:
100%

Steps to Reproduce:
1. Have few hosts in a DC
2. Make one host non-responsive(stop vdsmd) or try to add host and fail
3. All servers and storage domain are going down, DC is down and all serves stuck in connecting state forever. 
Only engine restart make them come UP again.

Comment 1 Michael Burman 2017-01-11 09:02:48 UTC
Created attachment 1239365 [details]
new engine log

Comment 2 Piotr Kliczewski 2017-01-11 10:42:19 UTC
There is wrong version of the library used so changing the version.

Comment 3 Petr Matyáš 2017-01-23 11:47:19 UTC
When I stop vdsm on one of the hosts (with PM) it stays in connecting for 60s and doesn't do anything to the other hosts. But after that, it goes to non responsive and isn't fenced, should I report this as a new bug or move this one to assigned?

Comment 4 Piotr Kliczewski 2017-01-23 11:51:40 UTC
Petr fencing is not part of this patch. I suggest to open new BZ for it.

Comment 5 Petr Matyáš 2017-01-23 11:56:33 UTC
In that case, verified on 4.1.0-8