Bug 1473035

Summary: HA VM with VM lease not being restarted by engine when lease is in a separate storage domain
Product: Red Hat Enterprise Virtualization Manager Reporter: Gordon Watson <gwatson>
Component: ovirt-engineAssignee: Nobody <nobody>
Status: CLOSED NOTABUG QA Contact: meital avital <mavital>
Severity: high Docs Contact:
Priority: unspecified    
Version: 4.1.2CC: gwatson, lsurette, michal.skrivanek, rbalakri, Rhev-m-bugs, srevivo, tjelinek, ykaul
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2017-08-08 15:49:51 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: Virt RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Gordon Watson 2017-07-19 22:26:41 UTC
Description of problem:

I ran some tests with an HA VM with a VM lease, one of which was with a VM in a separate storage domain to the disk. In this case, when access to the storage domain containing the lease was blocked, the engine did not restart the VM after it was terminated by sanlock.

In the case where the VM's disk and VM lease were in the same storage domain, the VM was restarted after it was killed by sanlock.


Version-Release number of selected component (if applicable):

RHV 4.1.2
RHEL 7.3 hosts;
  vdsm-4.19.20-1.el7ev.x86_64
  sanlock-3.4.0-1.el7.x86_64

How reproducible:

100%

Steps to Reproduce:

1. Create VM with disk in one SD, call it NFS-A 
2. Configure VM as HA
3. Configure VM lease in separate SD, call it NFS-B
4. Start VM
5. When VM up, block access to NFS-B
6. VM should get killed by sanlock
7. Immediately unblock access to NFS-B
8. Check if engine tries to restart the VM

Actual results:

See below for more details.

Expected results:

HA VM should get restarted regardless.

Additional info:

Comment 1 Gordon Watson 2017-07-19 22:29:00 UTC
Tests performed;

-  HA VM with disk and VM lease in different SDs;

  - HA VM 
  - disk in NFS-A
  - VM lease in NFS-B

  - Started VM
  - When up, blocked access to NFS-B
  - VM was killed by sanlock
  - Immediately unblocked access to NFS-B
  - Engine reported VM as "Down"
  - Engine reported "User shut down from within the guest"
  - Engine did not attempt to restart the VM

Result: HA VM not restarted.



-  HA VM with disk and VM lease in same SD;

  - HA VM 
  - disk in NFS-B
  - VM lease in NFS-B

  - Started VM
  - When up, blocked access to NFS-B
  - VM went into a non-responsive state, "Up*" on the host
  - Engine reported VM as "NotResponding" 
  - VM was killed by sanlock, but it took several attempts
  - Immediately unblocked access to NFS-B
  - Engine reported VM as Down"
  - Engine reported "Lost connection with qemu process"
  - Engine then restarted the VM

Result: HA VM was restarted.



-  Summary;

The host's state was 'Up' the entire time during the above tests. That is, none of these tests involved the host going non-responsive, and so Power Management or lack thereof was not a factor either.

Then depending upon how the process was killed determines if the engine will attempt to restart the VM or not.

Comment 3 Michal Skrivanek 2017-07-20 05:05:50 UTC
For correct identification of SIGTERM vs user's shutdown you need to run the ovirt-guest-agent. Can you check that?
See bug 1341106 discussing the same problem

Comment 14 Gordon Watson 2017-08-01 22:19:13 UTC
I confirmed that the VM got restarted automatically with the following guest agents installed;

    ovirt-guest-agent-common-1.0.13-5.el7   -  RHEL guest
    rhev-guest-tools-iso-4.1-5.el7          -  Windows guest

Comment 15 Tomas Jelinek 2017-08-07 14:47:20 UTC
(In reply to Gordon Watson from comment #14)
> I confirmed that the VM got restarted automatically with the following guest
> agents installed;
> 
>     ovirt-guest-agent-common-1.0.13-5.el7   -  RHEL guest
>     rhev-guest-tools-iso-4.1-5.el7          -  Windows guest

great!
Do you need anything more or can I close this BZ?

Comment 16 Gordon Watson 2017-08-07 18:28:43 UTC
Tomas,

Yes, you can close it. Thanks very much for all the help.

Regards, GFW.

Comment 17 Tomas Jelinek 2017-08-08 15:49:51 UTC
Thank you for your great cooperation!