Hide Forgot
Cloned from launchpad blueprint https://blueprints.launchpad.net/nova/+spec/host-maintenance. Description: Sometimes a sysadmin would like to reboot a Nova's compute node for maintenance operations due to several possible reasons such as hardware upgrade, patches installations, etc. In contrast to unexpected node failures which may take the host down and result in VMs down time, when a maintenance need arises, it is important to plan it ahead carefully (i.e. prohibiting future new VMs deploys on the node as well as evacuating the already existing VMs to other compute nodes), in order to minimize the effect on the users. Nova exposes a set_host_maintenance API (host_update –maintenance enable/disable); the current implementation targets the request directly to the compute node to be put in maintenance, and so the compute node itself is responsible of orchestrating the possibly complicated process which requires finding new targets to the existing VMs. Finding an appropriate target for a VM is not a foreign task to Nova - somewhat resemble operations (run_instance and resize operations) exist. Both of those operations are not directed to the compute node first, but rather orchestrated by nova-scheduler which in turn directs the request to the relevant compute nodes to perform a single step such as provision on the actual host. With the current implementation, only hypervisors that are themselves capable of performing re-scheduling of a VM, can be supported in the host_maintenance feature, e.g. XEN with VM.pool_migrate. Other hypervisor, such as KVM, are not, and trying to invoke the host_maintenance API will result in an error. In addition, it may be desired that all policies and constrains that were enforced during the first placement of a VM (e.g. run_instance), will be considered again in practice when performing re-scheduling of a VM, such as in the case of evacuating VMs for put in maintenance need. As a first step, we propose to direct the maintenance requests to nova-scheduler. For backwards compatibility, the default implementation for now will remain almost as-is and simply will send the requests to nova-compute. Next, a second patch with a more sophisticated scheduler driver implementation will process the maintenance request to create host evacuation plan and migrate the affected VMs to other compute nodes. In this way various hypervisors, among them KVM, will be able to be put in maintenance as well. Specification URL (additional information): None