Created attachment 991539 [details] oVirt Engine Log. Description of problem: High availability Virtual Machines fail to migrate during fencing. Version-Release number of selected component (if applicable): oVirt Engine Version: 3.5.1.1-1.el6 / CentOS 6.6 oVirt Hosts - Release 3.5.1 / CentOS 7 How reproducible: Unknown Steps to Reproduce: 1. Create 3 host cluster with Power Management enabled via DRAC. Tested with both Gluster and NFS Storage. Skip fencing if host has live lease on storage not selected. 2. Create an HA VM on Host1. 3. Remove network from Host1 (OOB still connected). Actual results: Fencing power cycles host to attempt recovery. HA virtual machines are not restarted on a new host. Expected results: HA virtual machines should restart on another host. Additional info: Supporting Logs attached.
Reproducing steps: 1) Create cluster1 with 2 hosts (host1 and host2) 2) Create cluster2 with 1 host (host3) in the same DC as cluster1 3) Block connection from host2 to PM interface of host1 4) Turn off host1 using its PM interface 5) host1 become NonResponding, PM stop operation of host1 using host2 as proxy fails due to blocked connection 6) PM stop operation using host3 as proxy will be skipped because host1 is already down 7) Engine will badly interpret result of PM stop operation: instead of "skipped, because host is already turned off", it will determine result as "skipped due to fencing policy" -> host1 will not be restarted -> HA VMs running on host1 will not be restarted on different host
Move back to POST until patch is merged into ovirt-engine-3.5.2 branch
ok, rhevm-backend-3.5.1-0.1.el6ev.noarch (ha restarted, spm moved, host fence [stop->start], but i had to manually uncheck two 'skip' options in cluster policy)
HA VM was migrated and SPM moved even while 'ski' options were checked in cluster policy.
ovirt 3.5.2 was GA'd. closing current release.