Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 910746

Summary:

engine: there is no vm migration timeout in engine

Product:

Red Hat Enterprise Virtualization Manager

Reporter:

Dafna Ron <dron>

Component:

ovirt-engine

Assignee:

Vinzenz Feenstra [evilissimo] <vfeenstr>

Status:

CLOSED WONTFIX

QA Contact:

Severity:

high

Docs Contact:

Priority:

unspecified

Version:

3.1.3

CC:

acathrow, dyasny, iheim, lpeer, michal.skrivanek, Rhev-m-bugs, scohen, yeylon, ykaul

Target Milestone:

---

Target Release:

3.2.0

Hardware:

x86_64

OS:

Linux

Whiteboard:

virt

Fixed In Version:

Doc Type:

Bug Fix

Doc Text:

Story Points:

---

Clone Of:

Environment:

Last Closed:

2013-02-20 16:27:43 UTC

Type:

Bug

Regression:

---

Mount Type:

---

Documentation:

---

CRM:

Verified Versions:

Category:

---

oVirt Team:

---

RHEL 7.3 requirements from Atomic Host:

Cloudforms Team:

---

Target Upstream Version:

Embargoed:

Attachments:

Description	Flags
logs	none

Description Dafna Ron 2013-02-13 13:21:27 UTC

Created attachment 696800 [details]
logs

Description of problem:

vdsm and libvirt have 300 seconds migration timeout but in case we fail to aburt the vm will also be stuck in engine on migrating. 
can we adda timeout in engine so that if the vm is stuck the user will know that there is an issue? 

Version-Release number of selected component (if applicable):

si27 (3.1.3) 

How reproducible:

100%

Steps to Reproduce:
1. create iscsi pool on two hosts cluster
2. run a vm -> suspend it -> resume it
3. try to migrate the vm
  
Actual results:

vm migration is stuck in libvirt/qemu and they also fail to aburt the migration. 
the engine will just show the vm as migrating forever until the user has powered it off (cancel migration will not help as well)

Expected results:

we should alert the user in engine that the migration should have ended and suggest checking the issue. 

Additional info: all logs and audit log from db

Comment 1 Michal Skrivanek 2013-02-14 07:22:09 UTC

I would not be in favor to add yet another layer of timeout. But IIRC wasn't this an issue when engine didn't really reflect the actual libvirt state? Is this the one we were looking at together recently?

Comment 2 Dafna Ron 2013-02-14 09:02:34 UTC

This is the bug we were looking at together. 

1. there is a bug for vdsm for the vm state issue. 
in this case, the engine is looking at the vm's state in the dst (which is pause) and will wait forever for the state to be changed. 

2. since the timeut for vm migration is 300 seconds in vdsm/libvirt, I am not sure why we should keep a vm in migrating state for over an hour. 
perhaps sampling dst and src to find out what the vm's state actually is? 

3. having said that, a warning to the user that the vm migration should have ended and that there is a possibility that the vm migration is stuck will be a nice comprimise.

Comment 3 Michal Skrivanek 2013-02-14 12:45:39 UTC

I think we absolutely need to address 1. If it had worked we would not seen this issue. You should get a migration failure and see that in event log....should be good enough, would you agree?

Comment 4 Dafna Ron 2013-02-14 13:05:39 UTC

actually I don't agree :) 
it's importent that we fix #1 issue, but its one bug and this issue might happen for other bugs as well.
what this bug is about is making sure that if we do have a bug which will get migration stuck that engine has a way of dealing and/or reporting it to the user.

Comment 6 Michal Skrivanek 2013-02-15 14:30:56 UTC

so....missing logs. Dafna, would be great if you can get them.
and at the same time the vdsm code handling libvirt timeout should be revisited