Bug 1312039 - Fencing via power management failed on hosted-engine host with HE vm
Summary: Fencing via power management failed on hosted-engine host with HE vm
Keywords:
Status: CLOSED WORKSFORME
Alias: None
Product: ovirt-engine
Classification: oVirt
Component: Backend.Core
Version: 3.6.3.2
Hardware: x86_64
OS: Linux
unspecified
high
Target Milestone: ---
: ---
Assignee: Martin Perina
QA Contact: Artyom
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2016-02-25 15:27 UTC by Artyom
Modified: 2016-03-17 11:36 UTC (History)
4 users (show)

Fixed In Version:
Clone Of:
Environment:
Last Closed: 2016-03-17 11:36:36 UTC
oVirt Team: Infra
Embargoed:
rule-engine: planning_ack?
rule-engine: devel_ack?
rule-engine: testing_ack?


Attachments (Terms of Use)
engine and vdsm of hosted_engine_1 logs (4.32 MB, application/zip)
2016-02-25 15:27 UTC, Artyom
no flags Details

Description Artyom 2016-02-25 15:27:00 UTC
Created attachment 1130585 [details]
engine and vdsm of hosted_engine_1 logs

Description of problem:
Fencing via power management failed on hosted-engine host with HE vm

Version-Release number of selected component (if applicable):
rhevm-backend-3.6.3.2-0.1.el6.noarch

How reproducible:
Always

Steps to Reproduce:
1. Deploy HE on two hosts
2. Configure PM on host with HE vm
3. Stop network on host with HE vm
4. Wait until HE vm start on second host
5. Wait for first host to be up

Actual results:
First host will stay in not-responding state forever

Expected results:
First host must be fenced via PM

Additional info:
From engine log I can see:
2016-02-25 15:33:34,353 ERROR [org.ovirt.engine.core.bll.pm.VdsNotRespondingTreatmentCommand] (org.ovirt.thread.pool-6-thread-34) [] Failed to run Fence script on vds 'hosted_engine_2'.
2016-02-25 15:33:34,398 WARN  [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (org.ovirt.thread.pool-6-thread-34) [] Correlation ID: null, Call Stack: null, Custom Event ID: -1, Message: Host hosted_engine_2 became non responsive. It has no power management configured. Please check the host status, manually reboot it, and click "Confirm Host Has Been Rebooted"

but host has PM:
<power_management type="ipmilan">
<enabled>true</enabled>
<address>rose05-mgmt.qa.lab.tlv.redhat.com</address>
<username>root</username>
<options />
 <pm_proxies>
<pm_proxy>
<type>cluster</type>
</pm_proxy>
<pm_proxy>
<type>dc</type>
</pm_proxy>
 </pm_proxies>
<agents>
 <agent type="ipmilan" id="afef3a9e-1f7a-405a-8e00-fd3f2bdb26d3">
<address>rose05-mgmt.qa.lab.tlv.redhat.com</address>
<username>root</username>
<options />
<order>1</order>
 </agent>
</agents>
<automatic_pm_enabled>true</automatic_pm_enabled>
<kdump_detection>true</kdump_detection>
 </power_management>

Comment 1 Roy Golan 2016-03-02 13:24:52 UTC
Why is that hosted engine exclusively?

Comment 2 Martin Perina 2016-03-17 11:36:36 UTC
I haven't been able to reproduce it while testing following fencing flows in 2 nodes hosted engine cluster (on each flow HE VM is running on host1):

  1. Stop networking on host1
  2. Block connection from engine to host1 using iptables
  3. Execute kdump on host1

In all above cases host1 was properly fenced and became up afterwards.

I tested on latest stable oVirt 3.6:

ovirt-hosted-engine-ha-1.3.4.3-1
ovirt-engine-3.6.3.4-1

I still don't get how this bug could be opened when exactly same case was tested and verified in BZ1266099.

Closing as WORKSFORME, feel free to reopen if you are able to reproduce it again.


Note You need to log in before you can comment on or make changes to this bug.