Bug 1415677

Summary: Host is not fenced after stopping vdsm service
Product: [oVirt] ovirt-engine Reporter: Petr Matyáš <pmatyas>
Component: BLL.InfraAssignee: Oved Ourfali <oourfali>
Status: CLOSED NOTABUG QA Contact: Petr Matyáš <pmatyas>
Severity: urgent Docs Contact:
Priority: unspecified    
Version: 4.1.0CC: bugs, mperina, pstehlik
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2017-01-23 13:54:18 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: Infra RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
engine and vdsm logs
none
correct engine log none

Description Petr Matyáš 2017-01-23 12:17:15 UTC
Created attachment 1243568 [details]
engine and vdsm logs

Description of problem:
I have a host with PM set up and working (and other hosts) but after stopping vdsmd service the host stays in connecting for 60s and then goes to non responsive when it should be soft fenced but is not.

Version-Release number of selected component (if applicable):
4.1.0-8

How reproducible:
always

Steps to Reproduce:
1. stop vdsmd service on hosts with PM
2. wait for at lease 60s
3. host should be fenced

Actual results:
host is non responsive

Expected results:
host should be soft fenced

Additional info:

Comment 1 Petr Matyáš 2017-01-23 13:40:52 UTC
Created attachment 1243604 [details]
correct engine log

Comment 2 Martin Perina 2017-01-23 13:54:18 UTC
Fencing is not executed because there's too many hosts with connection issues (percentage of hosts with connection issues are higher than allowed in cluster fencing policy):

2017-01-23 13:03:28,552+01 WARN  [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (org.ovirt.thread.pool-6-thread-43) [] EVENT_ID: VDS_ALERT_FENCE_OPERATION_SKIPPED_BROKEN_CONNECTIVITY(9,013), Correlation ID: null, Call Stack: null, Custom Event ID: -1, Message: Host slot-11 became non responsive and was not restarted due to Fencing Policy: 50 percents of the Hosts in the Cluster have connectivity issues.