Bug 1041080 - [RFE][nova]: Baremetal nodes can be migrated among compute hosts
Summary: [RFE][nova]: Baremetal nodes can be migrated among compute hosts
Keywords:
Status: CLOSED UPSTREAM
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: RFEs
Version: unspecified
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
: ---
Assignee: RHOS Maint
QA Contact:
URL: https://blueprints.launchpad.net/nova...
Whiteboard: upstream_milestone_none upstream_stat...
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2013-12-12 13:35 UTC by RHOS Integration
Modified: 2015-03-19 16:49 UTC (History)
2 users (show)

Fixed In Version:
Doc Type: Enhancement
Doc Text:
Clone Of:
Environment:
Last Closed: 2015-03-19 16:49:46 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)

Description RHOS Integration 2013-12-12 13:35:06 UTC
Cloned from launchpad blueprint https://blueprints.launchpad.net/nova/+spec/baremetal-compute-takeover.

Description:

In a baremetal cloud with multiple nova-compute hosts, each nova-compute host is a SPoF for the baremetal nodes which it manages. There is currently no mechanism to move a node from one compute-host to another compute-host, either manually or automatically; doing so requires deleting the node and adding it again, which will invalidate any instance currently deployed to that node. 

It is also worth pointing out that, if a nova-compute host goes offline, Nova is not able to control the baremetal nodes managed by that host, though any existing instances should continue to function as long as they do not restart.

Moving a node to another compute host could be accomplished by:
- adding a new bm state "migrating"
- adding a method to rebuild the tftp environment for a deployed instance on a new compute host.
- finding a means to update nova scheduler such that the (host, hypervisor_hostname) can change. This would need to be possible regardless of whether an instance was active on that compute node.

Additionally, by tracking the status in the nova_bm database, for each node, of the compute host which owns it, other compute hosts could "take over" for a dead host. This would require the following changes:
- add a timestamp column to bm_nodes table
- compute host periodic task that updates the timestamp
- compute host periodic task that looks for bm_nodes whose compute host has not checked in, and initiates take-over, with a distributed (iow, db-managed) lock on that node, compute_host, and instance.


This was discussed during Havana summit here:
  https://etherpad.openstack.org/HavanaBaremetalNextSteps

Specification URL (additional information):

None


Note You need to log in before you can comment on or make changes to this bug.