Bug 1312039
Summary: | Fencing via power management failed on hosted-engine host with HE vm | ||||||
---|---|---|---|---|---|---|---|
Product: | [oVirt] ovirt-engine | Reporter: | Artyom <alukiano> | ||||
Component: | Backend.Core | Assignee: | Martin Perina <mperina> | ||||
Status: | CLOSED WORKSFORME | QA Contact: | Artyom <alukiano> | ||||
Severity: | high | Docs Contact: | |||||
Priority: | unspecified | ||||||
Version: | 3.6.3.2 | CC: | bugs, dmoessne, mavital, mperina | ||||
Target Milestone: | --- | Flags: | rule-engine:
planning_ack?
rule-engine: devel_ack? rule-engine: testing_ack? |
||||
Target Release: | --- | ||||||
Hardware: | x86_64 | ||||||
OS: | Linux | ||||||
Whiteboard: | |||||||
Fixed In Version: | Doc Type: | Bug Fix | |||||
Doc Text: | Story Points: | --- | |||||
Clone Of: | Environment: | ||||||
Last Closed: | 2016-03-17 11:36:36 UTC | Type: | Bug | ||||
Regression: | --- | Mount Type: | --- | ||||
Documentation: | --- | CRM: | |||||
Verified Versions: | Category: | --- | |||||
oVirt Team: | Infra | RHEL 7.3 requirements from Atomic Host: | |||||
Cloudforms Team: | --- | Target Upstream Version: | |||||
Embargoed: | |||||||
Attachments: |
|
Why is that hosted engine exclusively? I haven't been able to reproduce it while testing following fencing flows in 2 nodes hosted engine cluster (on each flow HE VM is running on host1): 1. Stop networking on host1 2. Block connection from engine to host1 using iptables 3. Execute kdump on host1 In all above cases host1 was properly fenced and became up afterwards. I tested on latest stable oVirt 3.6: ovirt-hosted-engine-ha-1.3.4.3-1 ovirt-engine-3.6.3.4-1 I still don't get how this bug could be opened when exactly same case was tested and verified in BZ1266099. Closing as WORKSFORME, feel free to reopen if you are able to reproduce it again. |
Created attachment 1130585 [details] engine and vdsm of hosted_engine_1 logs Description of problem: Fencing via power management failed on hosted-engine host with HE vm Version-Release number of selected component (if applicable): rhevm-backend-3.6.3.2-0.1.el6.noarch How reproducible: Always Steps to Reproduce: 1. Deploy HE on two hosts 2. Configure PM on host with HE vm 3. Stop network on host with HE vm 4. Wait until HE vm start on second host 5. Wait for first host to be up Actual results: First host will stay in not-responding state forever Expected results: First host must be fenced via PM Additional info: From engine log I can see: 2016-02-25 15:33:34,353 ERROR [org.ovirt.engine.core.bll.pm.VdsNotRespondingTreatmentCommand] (org.ovirt.thread.pool-6-thread-34) [] Failed to run Fence script on vds 'hosted_engine_2'. 2016-02-25 15:33:34,398 WARN [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (org.ovirt.thread.pool-6-thread-34) [] Correlation ID: null, Call Stack: null, Custom Event ID: -1, Message: Host hosted_engine_2 became non responsive. It has no power management configured. Please check the host status, manually reboot it, and click "Confirm Host Has Been Rebooted" but host has PM: <power_management type="ipmilan"> <enabled>true</enabled> <address>rose05-mgmt.qa.lab.tlv.redhat.com</address> <username>root</username> <options /> <pm_proxies> <pm_proxy> <type>cluster</type> </pm_proxy> <pm_proxy> <type>dc</type> </pm_proxy> </pm_proxies> <agents> <agent type="ipmilan" id="afef3a9e-1f7a-405a-8e00-fd3f2bdb26d3"> <address>rose05-mgmt.qa.lab.tlv.redhat.com</address> <username>root</username> <options /> <order>1</order> </agent> </agents> <automatic_pm_enabled>true</automatic_pm_enabled> <kdump_detection>true</kdump_detection> </power_management>