Bug 1399013

Summary: Changing hypervisor DNS entry causes HA virtual guests to start on multiple hosts
Product: Red Hat Enterprise Virtualization Manager Reporter: Marcus West <mwest>
Component: ovirt-engineAssignee: Nobody <nobody>
Status: CLOSED DUPLICATE QA Contact: sefi litmanovich <slitmano>
Severity: high Docs Contact:
Priority: unspecified    
Version: 4.0.4CC: dfediuck, gklein, gveitmic, lsurette, michal.skrivanek, mwest, rbalakri, Rhev-m-bugs, srevivo, ykaul
Target Milestone: ---   
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2016-11-30 12:23:06 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: Virt RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:

Description Marcus West 2016-11-28 04:38:02 UTC
## Description of problem:

If I change my DNS (or /etc/hosts) entry to point host1 => host2 and restart ovirt-engine, any HA VM's on host1 will be immediately started on host2.  They are also still running on host1, so filesystem corruption will usually ensue.  There doesn't seem to be any sanity checking that host1 really is host1 (checking host UUID, etc).  Nor does there seem to be any safeguards provided by sanlock (ie, would host1 still be updating the lease for the VM device?)

## Version-Release number of selected component (if applicable):

ovirt-engine-4.0.4.4-0.1.el7ev.noarch
rhevm-4.0.4.4-0.1.el7ev.noarch

## How reproducible:

Always

## Steps to Reproduce:

1. Enable HA on a VM and start it on host1
2. Update /etc/hosts so that host1 has the same IP as host2
3. restart ovirt-engine

## Actual results:

ovirt-engine thinks that 'host1' and host2 are actually running, but really, it's just talking to host2 twice.  It sees that the HA VM is not running, so starts it on host2.  But it's already running on the real host1...

## Expected results:

Ideally, ovirt-engine should be able to tell that what it thinks is 'host1' is not really host1, and allow the administrator to take some sort of corrective action.

When ovirt-engine initiates a connection to a hypervisor, can we get it to check the host UUID against it's database entry so it knows for sure it's talking to the expected host?

Can sanlock be used to ensure that nothing else is updating a VM resource lease?  If so, then perhaps a fencing workflow can be invoked before HA restarts happen.

## Additional info:

I have tested this and can reproduce this by changing /etc/hosts on the manager, however it should work with DNS also.  I don't have fencing configured in my environment, but I don't think this will help, as the hosts are actually up, and the manager doesn't see any as down or not responding.

Comment 9 Michal Skrivanek 2016-11-30 12:23:06 UTC
bug 804272 is the solution using sanlock vm leases

*** This bug has been marked as a duplicate of bug 804272 ***