Bug 1312039 - Fencing via power management failed on hosted-engine host with HE vm
Fencing via power management failed on hosted-engine host with HE vm
Status: CLOSED WORKSFORME
Product: ovirt-engine
Classification: oVirt
Component: Backend.Core (Show other bugs)
3.6.3.2
x86_64 Linux
unspecified Severity high (vote)
: ---
: ---
Assigned To: Martin Perina
Artyom
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2016-02-25 10:27 EST by Artyom
Modified: 2016-03-17 07:36 EDT (History)
4 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2016-03-17 07:36:36 EDT
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: Infra
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---
rule-engine: planning_ack?
rule-engine: devel_ack?
rule-engine: testing_ack?


Attachments (Terms of Use)
engine and vdsm of hosted_engine_1 logs (4.32 MB, application/zip)
2016-02-25 10:27 EST, Artyom
no flags Details

  None (edit)
Description Artyom 2016-02-25 10:27:00 EST
Created attachment 1130585 [details]
engine and vdsm of hosted_engine_1 logs

Description of problem:
Fencing via power management failed on hosted-engine host with HE vm

Version-Release number of selected component (if applicable):
rhevm-backend-3.6.3.2-0.1.el6.noarch

How reproducible:
Always

Steps to Reproduce:
1. Deploy HE on two hosts
2. Configure PM on host with HE vm
3. Stop network on host with HE vm
4. Wait until HE vm start on second host
5. Wait for first host to be up

Actual results:
First host will stay in not-responding state forever

Expected results:
First host must be fenced via PM

Additional info:
From engine log I can see:
2016-02-25 15:33:34,353 ERROR [org.ovirt.engine.core.bll.pm.VdsNotRespondingTreatmentCommand] (org.ovirt.thread.pool-6-thread-34) [] Failed to run Fence script on vds 'hosted_engine_2'.
2016-02-25 15:33:34,398 WARN  [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (org.ovirt.thread.pool-6-thread-34) [] Correlation ID: null, Call Stack: null, Custom Event ID: -1, Message: Host hosted_engine_2 became non responsive. It has no power management configured. Please check the host status, manually reboot it, and click "Confirm Host Has Been Rebooted"

but host has PM:
<power_management type="ipmilan">
<enabled>true</enabled>
<address>rose05-mgmt.qa.lab.tlv.redhat.com</address>
<username>root</username>
<options />
 <pm_proxies>
<pm_proxy>
<type>cluster</type>
</pm_proxy>
<pm_proxy>
<type>dc</type>
</pm_proxy>
 </pm_proxies>
<agents>
 <agent type="ipmilan" id="afef3a9e-1f7a-405a-8e00-fd3f2bdb26d3">
<address>rose05-mgmt.qa.lab.tlv.redhat.com</address>
<username>root</username>
<options />
<order>1</order>
 </agent>
</agents>
<automatic_pm_enabled>true</automatic_pm_enabled>
<kdump_detection>true</kdump_detection>
 </power_management>
Comment 1 Roy Golan 2016-03-02 08:24:52 EST
Why is that hosted engine exclusively?
Comment 2 Martin Perina 2016-03-17 07:36:36 EDT
I haven't been able to reproduce it while testing following fencing flows in 2 nodes hosted engine cluster (on each flow HE VM is running on host1):

  1. Stop networking on host1
  2. Block connection from engine to host1 using iptables
  3. Execute kdump on host1

In all above cases host1 was properly fenced and became up afterwards.

I tested on latest stable oVirt 3.6:

ovirt-hosted-engine-ha-1.3.4.3-1
ovirt-engine-3.6.3.4-1

I still don't get how this bug could be opened when exactly same case was tested and verified in BZ1266099.

Closing as WORKSFORME, feel free to reopen if you are able to reproduce it again.

Note You need to log in before you can comment on or make changes to this bug.