Bug 1415677 - Host is not fenced after stopping vdsm service
Summary: Host is not fenced after stopping vdsm service
Keywords:
Status: CLOSED NOTABUG
Alias: None
Product: ovirt-engine
Classification: oVirt
Component: BLL.Infra
Version: 4.1.0
Hardware: Unspecified
OS: Unspecified
unspecified
urgent
Target Milestone: ---
: ---
Assignee: Oved Ourfali
QA Contact: Petr Matyáš
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2017-01-23 12:17 UTC by Petr Matyáš
Modified: 2017-01-23 13:54 UTC (History)
3 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2017-01-23 13:54:18 UTC
oVirt Team: Infra


Attachments (Terms of Use)
engine and vdsm logs (2.59 MB, application/x-gzip)
2017-01-23 12:17 UTC, Petr Matyáš
no flags Details
correct engine log (509.58 KB, text/plain)
2017-01-23 13:40 UTC, Petr Matyáš
no flags Details

Description Petr Matyáš 2017-01-23 12:17:15 UTC
Created attachment 1243568 [details]
engine and vdsm logs

Description of problem:
I have a host with PM set up and working (and other hosts) but after stopping vdsmd service the host stays in connecting for 60s and then goes to non responsive when it should be soft fenced but is not.

Version-Release number of selected component (if applicable):
4.1.0-8

How reproducible:
always

Steps to Reproduce:
1. stop vdsmd service on hosts with PM
2. wait for at lease 60s
3. host should be fenced

Actual results:
host is non responsive

Expected results:
host should be soft fenced

Additional info:

Comment 1 Petr Matyáš 2017-01-23 13:40:52 UTC
Created attachment 1243604 [details]
correct engine log

Comment 2 Martin Perina 2017-01-23 13:54:18 UTC
Fencing is not executed because there's too many hosts with connection issues (percentage of hosts with connection issues are higher than allowed in cluster fencing policy):

2017-01-23 13:03:28,552+01 WARN  [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (org.ovirt.thread.pool-6-thread-43) [] EVENT_ID: VDS_ALERT_FENCE_OPERATION_SKIPPED_BROKEN_CONNECTIVITY(9,013), Correlation ID: null, Call Stack: null, Custom Event ID: -1, Message: Host slot-11 became non responsive and was not restarted due to Fencing Policy: 50 percents of the Hosts in the Cluster have connectivity issues.


Note You need to log in before you can comment on or make changes to this bug.