Bug 315131
| Summary: | clustat reports wrong status when live migrate failed | ||||||
|---|---|---|---|---|---|---|---|
| Product: | Red Hat Enterprise Linux 5 | Reporter: | Fai Wong <ywong> | ||||
| Component: | xen | Assignee: | Jiri Denemark <jdenemar> | ||||
| Status: | CLOSED DUPLICATE | QA Contact: | Virtualization Bugs <virt-bugs> | ||||
| Severity: | medium | Docs Contact: | |||||
| Priority: | high | ||||||
| Version: | 5.0 | CC: | casmith, clalance, cluster-maint, jwilleford, kenchan, lhh, tao, xen-maint | ||||
| Target Milestone: | --- | ||||||
| Target Release: | --- | ||||||
| Hardware: | i386 | ||||||
| OS: | Linux | ||||||
| Whiteboard: | |||||||
| Fixed In Version: | Doc Type: | Bug Fix | |||||
| Doc Text: | Story Points: | --- | |||||
| Clone Of: | Environment: | ||||||
| Last Closed: | 2009-11-23 15:41:04 UTC | Type: | --- | ||||
| Regression: | --- | Mount Type: | --- | ||||
| Documentation: | --- | CRM: | |||||
| Verified Versions: | Category: | --- | |||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||
| Embargoed: | |||||||
| Bug Depends On: | |||||||
| Bug Blocks: | 466197 | ||||||
| Attachments: |
|
||||||
|
Description
Fai Wong
2007-10-02 08:57:11 UTC
Created attachment 213211 [details]
cluster configuration
All cluster version 5 defects should be reported under red hat enterprise linux 5 product name - not cluster suite. This request was previously evaluated by Red Hat Product Management for inclusion in the current Red Hat Enterprise Linux release, but Red Hat was unable to resolve it in time. This request will be reviewed for a future Red Hat Enterprise Linux release. I'm not entirely sure what the exact problem is here. The original bug report seems to be about a situation when migrating attempt is not accepted by a target host. However, xm migrate reports errors in such cases depending why the migration couldn't be started: [root@virval ~]# xm migrate --live rhel5-64 mig Error: can't connect: Connection refused or [root@virval ~]# xm migrate --live rhel5-64 mig Error: (104, 'Connection reset by peer') Another option is that a guest is migrated to a target machine but then it fails to start there. In that case, xm migrate returns success because source xend just connects to the target, transfers guest's memory image and closes the connection. Xend doesn't support any kind of error reporting back to the original host once the image is transferred. Or is there another problem I don't see? (In reply to comment #13) > Another option is that a guest is migrated to a target machine but then it > fails to start there. In that case, xm migrate returns success because source > xend just connects to the target, transfers guest's memory image and closes the > connection. Xend doesn't support any kind of error reporting back to the > original host once the image is transferred. > > Or is there another problem I don't see? Lon can correct me if I'm wrong, but this is the most likely culprit. If the machine you are migrating to does not have enough RAM or some other system resources then you'll have a situation where the vm is migrated from one host to another but cannot be started. In this case the 'migration' was successful, but the 'start' was not. I would contend that a 'live migration' is actually two parts. First is migrating the memory, second is starting the vm. If it's not required to start the vm to complete the migration, then it seems to me that it's no longer a 'live' migration but a dead one. The migration should only be considered a success iff the vm could be started on the destination machine. Otherwise the whole operation should fail, and the vm should continue running on the original host. Anything else leaves you with _no_ vm running on either host. Hmm, but according to Lon, cluster already has some code for detecting and restarting guests when they don't actually start after being migrated. Anyway, if this is the case, we have better bug report for it: bug 513431. Unfortunately, fixing it would require changing significantly the way source and target xend communicate with each other during migration. yeah, Lon corrected me after I made that comment. So you can ignore comment #14 :) So, there's no way to "hook" rgmanager in to migration after 'virsh|xm migrate' completes but before the start as far as I know, at least, not without coupling rgmanager to libvirtd. The original bug deals with the fact that if migration aborts early (e.g. no connection, etc.), that the service goes to the 'failed' state instead of staying 'started'. I thought most of this was already fixed later actually. In 5.4, rgmanager shouldn't mark the migration as failed if obvious errors (e.g. failed to connect to remote hypervisor) occur. So I guess we can close this as current version, right? *** This bug has been marked as a duplicate of bug 499835 *** Resolving as duplicate of later bugzilla since the problem space is more narrowly defined in that bugzilla. This bug was closed during 5.5 development and it's being removed from the internal tracking bugs (which are now for 5.6). |