Bug 1192596 - High availability Virtual Machines are not restarted on another host during fencing.
Summary: High availability Virtual Machines are not restarted on another host during f...
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: oVirt
Classification: Retired
Component: ovirt-engine-core
Version: 3.5
Hardware: x86_64
OS: Linux
unspecified
urgent
Target Milestone: ---
: 3.5.2
Assignee: Martin Perina
QA Contact: Jiri Belka
URL:
Whiteboard: infra
Depends On:
Blocks: 1193058
TreeView+ depends on / blocked
 
Reported: 2015-02-13 19:55 UTC by Tim Macy
Modified: 2016-09-29 10:56 UTC (History)
8 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2015-04-29 06:18:47 UTC
oVirt Team: Infra
Embargoed:


Attachments (Terms of Use)
oVirt Engine Log. (11.38 KB, application/x-gzip)
2015-02-13 19:55 UTC, Tim Macy
no flags Details


Links
System ID Private Priority Status Summary Last Updated
oVirt gerrit 37839 0 ovirt-engine-3.5 MERGED core: Fix skipped status in fencing Never
oVirt gerrit 37857 0 master MERGED core: Fix skipped status in fencing Never
oVirt gerrit 38127 0 ovirt-engine-3.5.2 MERGED core: Fix skipped status in fencing Never

Description Tim Macy 2015-02-13 19:55:18 UTC
Created attachment 991539 [details]
oVirt Engine Log.

Description of problem:
High availability Virtual Machines fail to migrate during fencing.

Version-Release number of selected component (if applicable):
oVirt Engine Version: 3.5.1.1-1.el6 / CentOS 6.6
oVirt Hosts - Release 3.5.1 / CentOS 7

How reproducible:
Unknown

Steps to Reproduce:
1. Create 3 host cluster with Power Management enabled via DRAC. Tested with both Gluster and NFS Storage.  Skip fencing if host has live lease on storage not selected.
2. Create an HA VM on Host1.  
3. Remove network from Host1 (OOB still connected).

Actual results:  
Fencing power cycles host to attempt recovery.  HA virtual machines are not restarted on a new host.


Expected results:  
HA virtual machines should restart on another host.


Additional info:  
Supporting Logs attached.

Comment 1 Martin Perina 2015-02-17 07:35:20 UTC
Reproducing steps:

1) Create cluster1 with 2 hosts (host1 and host2)
2) Create cluster2 with 1 host (host3) in the same DC as cluster1
3) Block connection from host2 to PM interface of host1
4) Turn off host1 using its PM interface
5) host1 become NonResponding, PM stop operation of host1 using host2 as proxy fails due to blocked connection
6) PM stop operation using host3 as proxy will be skipped because host1 is already down
7) Engine will badly interpret result of PM stop operation: instead of "skipped, because host is already turned off", it will determine result as "skipped due to fencing policy" -> host1 will not be restarted -> HA VMs running on host1 will not be restarted on different host

Comment 2 Martin Perina 2015-02-24 13:46:24 UTC
Move back to POST until patch is merged into ovirt-engine-3.5.2 branch

Comment 3 Jiri Belka 2015-03-19 14:40:16 UTC
ok, rhevm-backend-3.5.1-0.1.el6ev.noarch

(ha restarted, spm moved, host fence [stop->start], but i had to manually uncheck two 'skip' options in cluster policy)

Comment 4 Jiri Belka 2015-03-19 15:14:30 UTC
HA VM was migrated and SPM moved even while 'ski' options were checked in cluster policy.

Comment 5 Eyal Edri 2015-04-29 06:18:47 UTC
ovirt 3.5.2 was GA'd. closing current release.


Note You need to log in before you can comment on or make changes to this bug.