Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 910746

Summary: engine: there is no vm migration timeout in engine
Product: Red Hat Enterprise Virtualization Manager Reporter: Dafna Ron <dron>
Component: ovirt-engineAssignee: Vinzenz Feenstra [evilissimo] <vfeenstr>
Status: CLOSED WONTFIX QA Contact:
Severity: high Docs Contact:
Priority: unspecified    
Version: 3.1.3CC: acathrow, dyasny, iheim, lpeer, michal.skrivanek, Rhev-m-bugs, scohen, yeylon, ykaul
Target Milestone: ---   
Target Release: 3.2.0   
Hardware: x86_64   
OS: Linux   
Whiteboard: virt
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2013-02-20 16:27:43 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
logs none

Description Dafna Ron 2013-02-13 13:21:27 UTC
Created attachment 696800 [details]
logs

Description of problem:

vdsm and libvirt have 300 seconds migration timeout but in case we fail to aburt the vm will also be stuck in engine on migrating. 
can we adda timeout in engine so that if the vm is stuck the user will know that there is an issue? 

Version-Release number of selected component (if applicable):

si27 (3.1.3) 

How reproducible:

100%

Steps to Reproduce:
1. create iscsi pool on two hosts cluster
2. run a vm -> suspend it -> resume it
3. try to migrate the vm
  
Actual results:

vm migration is stuck in libvirt/qemu and they also fail to aburt the migration. 
the engine will just show the vm as migrating forever until the user has powered it off (cancel migration will not help as well)

Expected results:

we should alert the user in engine that the migration should have ended and suggest checking the issue. 

Additional info: all logs and audit log from db

Comment 1 Michal Skrivanek 2013-02-14 07:22:09 UTC
I would not be in favor to add yet another layer of timeout. But IIRC wasn't this an issue when engine didn't really reflect the actual libvirt state? Is this the one we were looking at together recently?

Comment 2 Dafna Ron 2013-02-14 09:02:34 UTC
This is the bug we were looking at together. 

1. there is a bug for vdsm for the vm state issue. 
in this case, the engine is looking at the vm's state in the dst (which is pause) and will wait forever for the state to be changed. 

2. since the timeut for vm migration is 300 seconds in vdsm/libvirt, I am not sure why we should keep a vm in migrating state for over an hour. 
perhaps sampling dst and src to find out what the vm's state actually is? 

3. having said that, a warning to the user that the vm migration should have ended and that there is a possibility that the vm migration is stuck will be a nice comprimise.

Comment 3 Michal Skrivanek 2013-02-14 12:45:39 UTC
I think we absolutely need to address 1. If it had worked we would not seen this issue. You should get a migration failure and see that in event log....should be good enough, would you agree?

Comment 4 Dafna Ron 2013-02-14 13:05:39 UTC
actually I don't agree :) 
it's importent that we fix #1 issue, but its one bug and this issue might happen for other bugs as well.
what this bug is about is making sure that if we do have a bug which will get migration stuck that engine has a way of dealing and/or reporting it to the user.

Comment 6 Michal Skrivanek 2013-02-15 14:30:56 UTC
so....missing logs. Dafna, would be great if you can get them.
and at the same time the vdsm code handling libvirt timeout should be revisited