Red Hat Bugzilla – Bug 303111
Ensure success of a failover or migration of Virtual Machine prior to the attempt
Last modified: 2016-04-26 11:29:24 EDT
Prior to migrating or starting a guest VM on another node in the cluster, ensure
that there is adequate memory & other required resources to run. Either choose
a different node, or fail to attempt if it's not the case.
This will hopefully rely on the libvirt and VM infrastructure as opposed to
solely implementing this in rgmanager and the cluster infrastructure
So, at a minimum...
(a) Look for memory footprint of VM and make sure enough is available on the
(b) Look availability of that VM on the other node (e.g. xen domain config, disk
(c) Check to ensure that both local migration server and remote migration server
Pushing to 5.4, would be easist if libvirt just provided this capability.
*** Bug 456556 has been marked as a duplicate of this bug. ***
(In reply to comment #3)
> Pushing to 5.4, would be easist if libvirt just provided this capability.
Unclear that libvirt should provide this functionality, since it is dependent on SLA requirements for the VM. For example... Let's say a physical host has 4 processors. And there are 3 VMs presently running each taking up 1 vCPU. And the guest you're attempting to migrate requires 2 vCPUs. Is there enough room to move this guest? Two possibilities:
1. SLA indicates that there must be a physical CPU for every vCPU, in which case no, you can't start the new vm
2. SLA indicates that vCPU overcommit can't exceed 2 times the physical CPUs, in which case you can start the new vm
3. SLA allows arbitrary overcommit but uses other metrics like system load to determine if a new guest can start (needs more info from other areas of the system)
libvirt can't really track this type of business logic. It belong further up in the stack. For example, in oVirt we'll implement this SLA logic as part of the oVirt Server, not as part of libvirt. If this functionality needs to be implemented for cluster, it would really need to be done as part of the core cluster stack.
Ok, in talking to Lon about this a little. The question isn't about SLA or performance of the VM, it's more about 'Can this node even possibly start at all' which would depend on things like:
1. Is the architecture the same between the two migrating hosts (Intel vs. AMD)
2. Is the VM image accessible on shared storage from both hosts
3. Are the right network interfaces/bridges present
My understanding is that libvirt handles all of this as part of the migration process. The logic would be that rgmanager should try to invoke virsh migrate. If virsh migrate fails then it could be because one of the above reasons.
So the changes for this BZ are to make rgmanager use libvirt migration instead of xm, and to interpret the results of the migrate call to determine success or failure.
Since this is a function of porting to virsh instead of xm, I am closing it as a duplicate of the relevant bugzilla.
*** This bug has been marked as a duplicate of bug 412911 ***