Bug 499835
| Summary: | detect failed virtual machine migrations in a cluster | ||||||
|---|---|---|---|---|---|---|---|
| Product: | Red Hat Enterprise Linux 5 | Reporter: | Lon Hohberger <lhh> | ||||
| Component: | rgmanager | Assignee: | Lon Hohberger <lhh> | ||||
| Status: | CLOSED ERRATA | QA Contact: | Cluster QE <mspqa-list> | ||||
| Severity: | medium | Docs Contact: | |||||
| Priority: | low | ||||||
| Version: | 5.3 | CC: | cluster-maint, cward, djansa, edamato, federico.simoncelli, henry.robertson, hklein, jcastillo, jruemker, mrappa, pep, rbinkhor, samuel.kielek, tao, ywong | ||||
| Target Milestone: | rc | ||||||
| Target Release: | --- | ||||||
| Hardware: | All | ||||||
| OS: | Linux | ||||||
| Whiteboard: | |||||||
| Fixed In Version: | rgmanager-2.0.52-1.27.el5 | Doc Type: | Bug Fix | ||||
| Doc Text: | Story Points: | --- | |||||
| Clone Of: | Environment: | ||||||
| Last Closed: | 2010-03-30 08:49:03 UTC | Type: | --- | ||||
| Regression: | --- | Mount Type: | --- | ||||
| Documentation: | --- | CRM: | |||||
| Verified Versions: | Category: | --- | |||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||
| Embargoed: | |||||||
| Bug Depends On: | 412911 | ||||||
| Bug Blocks: | |||||||
| Attachments: |
|
||||||
|
Description
Lon Hohberger
2009-05-08 14:04:41 UTC
Talking with the virt developers, the 'virsh migrate' operation is supposed to be synchronous, despite the man page being wrong: https://bugzilla.redhat.com/show_bug.cgi?id=514532 Ergo, if 'virsh migrate' succeeds but the VM is still on the source host, the migration failed and action can be taken immediately to remedy the situation. Pursuant to comment #1: when using virsh, we can detect the failed migrations easily as 'virsh migrate' is a synchronous operation. Ergo, if 'virsh migrate' fails, we can simply do a 'status check' followed by a destroy if required on src/target. if 'xm migrate' is expected to be synchronous, we can simplify this operation immensely: - migrate vm - if successful, flip state This works with 'virsh'. Created attachment 364985 [details]
Fix rgmanager behavior based on expectation that migration is a synchronous operation
After muddling through the Xen xend and xm code, it appears that 'xm' is expected to be sync. Effectively, 'xm suspend' and 'xm migrate' call the same backend utility: xc_save. They pass in a different file descriptor. Naturally, one is over the network and the other is a file on disk where the memory is dumped. Talking with engineers here who work on Xen, xm migrate is also synchronous. So, at a minimum, we need: - sync migrate patch (provided) - test-after-migrate patch: if 'virsh migrate' or 'xm migrate' fails, recheck status of the VM locally. If it's still in a good state, then we need to NOT return a failure and/or return a non-fatal error so rgmanager does not mark the VM as 'failed' (which would require a restart) *** Bug 315131 has been marked as a duplicate of this bug. *** Hi Lon, do we have an estimate when a fix will be available? best regards, Hari Within the above outlined constraints for what needs to be done, I should have a package today. http://git.fedorahosted.org/git/?p=cluster.git;a=commit;h=94ce529d73ea7113f31fb9a369d7780e44fb7f5a http://git.fedorahosted.org/git/?p=cluster.git;a=commit;h=3bcd3e1017cf489c9baa6d6fb7d16a43862f22df ~~ Attention Customers and Partners - RHEL 5.5 Beta is now available on RHN ~~ RHEL 5.5 Beta has been released! There should be a fix present in this release that addresses your request. Please test and report back results here, by March 3rd 2010 (2010-03-03) or sooner. Upon successful verification of this request, post your results and update the Verified field in Bugzilla with the appropriate value. If you encounter any issues while testing, please describe them and set this bug into NEED_INFO. If you encounter new defects or have additional patch(es) to request for inclusion, please clone this bug per each request and escalate through your support representative. An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on therefore solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHBA-2010-0280.html |