Bug 978500 (find-host-and-evacuate)

Summary: [RFE] Evacuate instance to a scheduled host
Product: Red Hat OpenStack Reporter: Russell Bryant <rbryant>
Component: openstack-novaAssignee: Sylvain Bauza <sbauza>
Status: CLOSED ERRATA QA Contact: Sean Toner <stoner>
Severity: medium Docs Contact:
Priority: high    
Version: 4.0CC: ajeain, ndipanov, pbrady, racedoro, sbauza, sgordon, slong, srevivo, stoner
Target Milestone: Upstream M2Keywords: FutureFeature, MoveUpstream, Triaged
Target Release: 6.0 (Juno)Flags: xqueralt: needinfo-
Hardware: Unspecified   
OS: Unspecified   
URL: https://blueprints.launchpad.net/nova/+spec/find-host-and-evacuate-instance
Whiteboard: upstream_milestone_juno-2 upstream_status_implemented upstream_definition_approved
Fixed In Version: openstack-nova-2014.2.1-5.el7ost Doc Type: Enhancement
Doc Text:
The host argument for the 'nova evacuate' command has been made optional. This means that the user no longer has to know the host destination, simplifying evacuation in the case of an unplanned failure.
Story Points: ---
Clone Of: Environment:
Last Closed: 2015-02-09 14:57:31 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 743661, 988992, 1003878, 1038706, 1077198    

Comment 1 Omri Hochman 2013-07-16 14:18:35 UTC
I would like to get some more information to write the test plan about 'Evacuate Instance to a scheduled host'.
 
From the blueprint I have this info :  
----------------------------------------
1) In the event of a unrecoverable hardware failure, support needs to relocate an instance to another compute so it can be rebuilt.

2) The API call should locate a suitable host within the same cell (using nova-scheduler), and perform an update of the instance's location to the new host (similar to a rebuild).

3) This call should only be available to users with the Admin role. 


Questions:
-----------
First I would like to know :  what considered as an event of 'unrecoverable hardware failure' ? and how should I reproduce it ? 
options I came with:  stop nova-compute service?  stop network service ? block communication with iptables? shutdown/reboot compute machine ? what will be the best ways to simulate - hardware failure  ?

As well, on what conditions 'Evacuate' should work? or shouldn' work  ? For example: should it work when every thing goes fine?  Is there a way for the user to determine a specific target host for the instance to be evacuate to? 
(using: CLI/API or configuration file)?

Comment 2 Russell Bryant 2013-07-23 19:28:03 UTC
(In reply to Omri Hochman from comment #1)
> Questions:
> -----------
> First I would like to know :  what considered as an event of 'unrecoverable
> hardware failure' ? and how should I reproduce it ? 
> options I came with:  stop nova-compute service?  stop network service ?
> block communication with iptables? shutdown/reboot compute machine ? what
> will be the best ways to simulate - hardware failure  ?

The primary use case for evacuate is to be able to restart an existing VM on a new compute node after the original compute node dies.  You could simulate this just by removing power from a node.

> As well, on what conditions 'Evacuate' should work? or shouldn' work  ? For
> example: should it work when every thing goes fine?  Is there a way for the
> user to determine a specific target host for the instance to be evacuate to? 
> (using: CLI/API or configuration file)?

You should not be able to evacuate an instance that is running on a compute node that is still up.  You should only be able to evacuate an instance if its host has gone down.

Yes, there is a way to specify a host.  That existed before.  This feature was about making the destination optional.  See "nova help evacuate".

On a related note, this feature makes the most sense when using shared storage for the instance store.  You can use evacuate to bring up a new VM that uses the same disk as the one that was on the failed compute node.  That's what makes evacuate useful over just deleting the instance and creating a new one.

Comment 3 Omri Hochman 2013-08-12 08:39:52 UTC
(In reply to Russell Bryant from comment #2)
> (In reply to Omri Hochman from comment #1)
> > Questions:

> The primary use case for evacuate is to be able to restart an existing VM on
> a new compute node after the original compute node dies.  You could simulate
> this just by removing power from a node.

> You should not be able to evacuate an instance that is running on a compute
> node that is still up.  You should only be able to evacuate an instance if
> its host has gone down.


Will it be possible to simulate host-in-power-down by stopping openstack-nova-compute service? 

> 
> Yes, there is a way to specify a host.  That existed before.  This feature
> was about making the destination optional.  See "nova help evacuate".
> 
> On a related note, this feature makes the most sense when using shared
> storage for the instance store.  You can use evacuate to bring up a new VM
> that uses the same disk as the one that was on the failed compute node. 
> That's what makes evacuate useful over just deleting the instance and
> creating a new one.

How should it act when attempting to evacuate instance that uses local storage?     

On the same subject, What if the instance has attached cinder-volumes (locate on 'remote' or 'local storage')?

Comment 4 Xavier Queralt 2013-08-13 09:58:38 UTC
(In reply to Omri Hochman from comment #3)
> (In reply to Russell Bryant from comment #2)
> > (In reply to Omri Hochman from comment #1)
> > > Questions:
> 
> > The primary use case for evacuate is to be able to restart an existing VM on
> > a new compute node after the original compute node dies.  You could simulate
> > this just by removing power from a node.
> 
> > You should not be able to evacuate an instance that is running on a compute
> > node that is still up.  You should only be able to evacuate an instance if
> > its host has gone down.
> 
> 
> Will it be possible to simulate host-in-power-down by stopping
> openstack-nova-compute service? 

I would recommend to power the host down for real. If you just stop the compute service the instance will be still running and using the disks and/or attached volumes.

Another option would be to kill the qemu process besides stopping the compute service.

> 
> > 
> > Yes, there is a way to specify a host.  That existed before.  This feature
> > was about making the destination optional.  See "nova help evacuate".
> > 
> > On a related note, this feature makes the most sense when using shared
> > storage for the instance store.  You can use evacuate to bring up a new VM
> > that uses the same disk as the one that was on the failed compute node. 
> > That's what makes evacuate useful over just deleting the instance and
> > creating a new one.
> 
> How should it act when attempting to evacuate instance that uses local
> storage?

When not using shared storage the local disks will have to be recreated from scratch using the base image and parameters used to create the original instance. This means that all the data from the old instance stored in local disks will be lost.

OTOH, when using shared storage the disks from the original instance can be used for the new instance and the data won't be lost.

> 
> On the same subject, What if the instance has attached cinder-volumes
> (locate on 'remote' or 'local storage')?

AFAIK cinder volumes can only be remote and will be reattached to the new instance. In this case using shared storage or not doesn't change anything.

Comment 7 Xavier Queralt 2013-09-05 14:21:39 UTC
This feature didn't made it to havanna-3, moving to icehouse

Comment 17 errata-xmlrpc 2015-02-09 14:57:31 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHEA-2015-0152.html