Bug 1196364

Summary: Live Migration doesn't handle RBD Storage failures
Product: Red Hat OpenStack Reporter: Pádraig Brady <pbrady>
Component: openstack-novaAssignee: Pádraig Brady <pbrady>
Status: CLOSED ERRATA QA Contact: Yogev Rabl <yrabl>
Severity: medium Docs Contact:
Priority: high    
Version: 6.0 (Juno)CC: berrange, dasmith, kchamart, ndipanov, nlevinki, pbrady, sbauza, sferdjao, sgordon, slong, stoner, vromanso, yeylon
Target Milestone: z3Keywords: ZStream
Target Release: 6.0 (Juno)   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: openstack-nova-2014.2.3-3.el7ost Doc Type: Bug Fix
Doc Text:
The libvirt driver previously handled shared file systems incorrectly (for example, using NFS or Ceph), and live migration would fail or hang with exceptions. With this update, raw or qcow images are now treated appropriately on shared file systems, and live migration now works reliably and efficiently.
Story Points: ---
Clone Of: 1196350 Environment:
Last Closed: 2015-05-05 13:30:16 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1111295    

Comment 2 Sean Toner 2015-04-20 15:23:44 UTC
I'm not sure what needs to be done to reproduce and test the fix.

For example, is it sufficient to create an NFS setup, perform some live migrations to verify that the base scenario works, and then disable (or maybe some of the dependency services, like rpcbind) and see how nova handles a live migration?

I'm assuming that if NFS (or ceph or gluster) fails, nova will fail gracefully?

Comment 3 Sean Toner 2015-04-23 16:26:23 UTC
From the doc text, it appears that either NFS or ceph can be used for verification.  If this is true, is NFS as shared storage sufficient to test this code path?

I can set up live migration wit NFS pretty easily, but ceph will require me to work with yogev

Comment 4 Pádraig Brady 2015-04-23 16:28:42 UTC
nfs is fine

Comment 5 Yogev Rabl 2015-04-30 11:42:44 UTC
verified with Ceph as the back end of Nova. 
version:
openstack-nova-common-2014.2.3-9.el7ost.noarch
openstack-nova-cert-2014.2.3-9.el7ost.noarch
openstack-nova-compute-2014.2.3-9.el7ost.noarch
python-novaclient-2.20.0-1.el7ost.noarch
python-nova-2014.2.3-9.el7ost.noarch
openstack-nova-novncproxy-2014.2.3-9.el7ost.noarch
openstack-nova-console-2014.2.3-9.el7ost.noarch
openstack-nova-scheduler-2014.2.3-9.el7ost.noarch
openstack-nova-conductor-2014.2.3-9.el7ost.noarch
openstack-nova-api-2014.2.3-9.el7ost.noarch

Setup topology:
host01 - controller + compute node
host02 - compute node

from /etc/nova/nova.conf (on both computes):

[libvirt]
virt_type=kvm
inject_password=false
inject_key=false
inject_partition = -2
rbd_user = <ceph user>
rbd_secret_uuid = <uuid>
images_type = rbd
images_rbd_pool = <pool name>
images_rbd_ceph_conf = /etc/ceph/ceph.conf
live_migration_flag = VIR_MIGRATE_UNDEFINE_SOURCE,VIR_MIGRATE_PEER2PEER,VIR_MIGRATE_LIVE

Steps:
1. launch an instance 
2. ssh to the instance 
3. live migrate the instance 
4. verify the migration in the second host

Comment 7 errata-xmlrpc 2015-05-05 13:30:16 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHBA-2015-0931.html