Bug 910746
| Summary: | engine: there is no vm migration timeout in engine | ||||||
|---|---|---|---|---|---|---|---|
| Product: | Red Hat Enterprise Virtualization Manager | Reporter: | Dafna Ron <dron> | ||||
| Component: | ovirt-engine | Assignee: | Vinzenz Feenstra [evilissimo] <vfeenstr> | ||||
| Status: | CLOSED WONTFIX | QA Contact: | |||||
| Severity: | high | Docs Contact: | |||||
| Priority: | unspecified | ||||||
| Version: | 3.1.3 | CC: | acathrow, dyasny, iheim, lpeer, michal.skrivanek, Rhev-m-bugs, scohen, yeylon, ykaul | ||||
| Target Milestone: | --- | ||||||
| Target Release: | 3.2.0 | ||||||
| Hardware: | x86_64 | ||||||
| OS: | Linux | ||||||
| Whiteboard: | virt | ||||||
| Fixed In Version: | Doc Type: | Bug Fix | |||||
| Doc Text: | Story Points: | --- | |||||
| Clone Of: | Environment: | ||||||
| Last Closed: | 2013-02-20 16:27:43 UTC | Type: | Bug | ||||
| Regression: | --- | Mount Type: | --- | ||||
| Documentation: | --- | CRM: | |||||
| Verified Versions: | Category: | --- | |||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||
| Embargoed: | |||||||
| Attachments: |
|
||||||
I would not be in favor to add yet another layer of timeout. But IIRC wasn't this an issue when engine didn't really reflect the actual libvirt state? Is this the one we were looking at together recently? This is the bug we were looking at together. 1. there is a bug for vdsm for the vm state issue. in this case, the engine is looking at the vm's state in the dst (which is pause) and will wait forever for the state to be changed. 2. since the timeut for vm migration is 300 seconds in vdsm/libvirt, I am not sure why we should keep a vm in migrating state for over an hour. perhaps sampling dst and src to find out what the vm's state actually is? 3. having said that, a warning to the user that the vm migration should have ended and that there is a possibility that the vm migration is stuck will be a nice comprimise. I think we absolutely need to address 1. If it had worked we would not seen this issue. You should get a migration failure and see that in event log....should be good enough, would you agree? actually I don't agree :) it's importent that we fix #1 issue, but its one bug and this issue might happen for other bugs as well. what this bug is about is making sure that if we do have a bug which will get migration stuck that engine has a way of dealing and/or reporting it to the user. so....missing logs. Dafna, would be great if you can get them. and at the same time the vdsm code handling libvirt timeout should be revisited |
Created attachment 696800 [details] logs Description of problem: vdsm and libvirt have 300 seconds migration timeout but in case we fail to aburt the vm will also be stuck in engine on migrating. can we adda timeout in engine so that if the vm is stuck the user will know that there is an issue? Version-Release number of selected component (if applicable): si27 (3.1.3) How reproducible: 100% Steps to Reproduce: 1. create iscsi pool on two hosts cluster 2. run a vm -> suspend it -> resume it 3. try to migrate the vm Actual results: vm migration is stuck in libvirt/qemu and they also fail to aburt the migration. the engine will just show the vm as migrating forever until the user has powered it off (cancel migration will not help as well) Expected results: we should alert the user in engine that the migration should have ended and suggest checking the issue. Additional info: all logs and audit log from db