Bug 1220849

Summary: [RFE] Support offline migration of attached volumes when VM is inactive
Product: Red Hat OpenStack Reporter: Eoghan Glynn <eglynn>
Component: openstack-novaAssignee: OSP DFG:Compute <osp-dfg-compute>
Status: CLOSED WONTFIX QA Contact: OSP DFG:Compute <osp-dfg-compute>
Severity: medium Docs Contact:
Priority: medium    
Version: 11.0 (Ocata)CC: dasmith, dmaley, egallen, eglynn, kchamart, lyarwood, mbooth, sbauza, sclewis, sgordon, sgotliv, srevivo, stephenfin, vromanso
Target Milestone: ---Keywords: FutureFeature, Triaged
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Enhancement
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2020-09-29 11:52:26 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1442136    

Description Eoghan Glynn 2015-05-12 15:32:50 UTC
Description of problem:

An attached volume also cannot be migrated when the VM is inactive.

If the root volume of a boot-from-volume instance, it also cannot be detached to faciliate a volume migration.

Only online migration is supported for such volumes, since BZ 1218342.


Version-Release number of selected component (if applicable):

openstack-cinder-2014.1.4-*
openstack-nova-2014.1.4-*


How reproducible:

100%


Steps to Reproduce:
1. Boot a VM from the volume
2. Shutoff the VM
3. Try to migrate the volume between different storages of the same type (cinder retype with --migration-policy on-demand )
4. The process fails in Nova with the libvirt error in blockRebase, because libvirt can't find a VM instance.


Expected result:

The volume should move to another storage.


Actual results:

Nova fails with libvirt error in blockRebase, because libvirt can't find a VM instance.

Comment 3 Eoghan Glynn 2015-05-12 15:35:36 UTC
From Kashyap Chamarthy ...

Some notes on libvirt blockRebase API and offline migration:

As it was correctly identified, libvirt's 'blockRebase' API works only
(and that's not a bug) when the guest is online -- the main
functionality of blockRebase is that it allows the guest to concurrently
read/write while the copy is taking place. So the behavior of libvirt
(blockRebase API, to be precise) throwing an error when it doesn't see a
running domain (guest) is expected.

Some notes from an IRC discussion with Matthew Booth, Dan Berrange and
Peter Krempa).

  - It is pointless to allow blockRebase to work when a guest is
    offline, because one can use more efficient methods like `dd` (or 
    similar) to perform a copy.

  - Matthew Booth tried to explore an idea where libvirt driver could 
    try to create an ephemeral (transient) domain (guest in libvirt's
    parlance) to do a block migration. To However that's might not be
    acceptable in a 'Cloud environment', rationale from libvirt 
    developers:

      - Creating a guest consumes resources from the host whic his not
        acceptable if the guest is supposed to be shutoff. We've
        previously rejected this as an approach for migrating offline
        guests for this reason.

      - Allowing blockRebase to work offline probably will require to
        spawn a instance of QEMU anyway, so it will be rather ugly to
        implement.

Comment 4 Dave Maley 2015-06-12 17:36:12 UTC
*** Bug 1220526 has been marked as a duplicate of this bug. ***

Comment 5 Eoghan Glynn 2015-06-22 18:40:56 UTC
From https://bugs.launchpad.net/nova/+bug/1454252/comments/2:

> Today when Cinder understands that the volume is attached to VM it calls Nova's
> swap_volume feature to migrate a volume. The problem is that swap_volume uses
> libvirt's blockRebase which fails if libvirt doesn't find a VM and that's
> exactly what happens in this case because VM is shutoff so the qemu process for
> this VM doesn't exist.
>
> In order to fix it we probably have to change the flow. Nova should detect
> that VM is off and instead of calling blockRebase it has to attach a new
> volume to compute node and perform simple copy (dd) as Cinder does in generic
> case. Once it done, Nova must update a path to the new volume in all relevant 
> places. Important to mention that VM must be locked during the process to 
> prevent the user to run it.

@sgotliv: would performing a simple copy (in the sense of dd) require cinder-driver-specific knowledge on the nova side?

Or would this generic approach be completely agnostic to the underlying cinder-driver in use?

Comment 6 Sergey Gotliv 2015-06-25 06:55:55 UTC
(In reply to Eoghan Glynn from comment #5)
> From https://bugs.launchpad.net/nova/+bug/1454252/comments/2:
> 
> > Today when Cinder understands that the volume is attached to VM it calls Nova's
> > swap_volume feature to migrate a volume. The problem is that swap_volume uses
> > libvirt's blockRebase which fails if libvirt doesn't find a VM and that's
> > exactly what happens in this case because VM is shutoff so the qemu process for
> > this VM doesn't exist.
> >
> > In order to fix it we probably have to change the flow. Nova should detect
> > that VM is off and instead of calling blockRebase it has to attach a new
> > volume to compute node and perform simple copy (dd) as Cinder does in generic
> > case. Once it done, Nova must update a path to the new volume in all relevant 
> > places. Important to mention that VM must be locked during the process to 
> > prevent the user to run it.
> 
> @sgotliv: would performing a simple copy (in the sense of dd) require
> cinder-driver-specific knowledge on the nova side?

"dd" doesn't cover the case where the volume is not locally attachable [1] for example RBD volume migration scenario. To be honest I am not a huge fan of the code duplication and this Cinder patch [1] introduce changes to the generic migration behavior anyway so I guess we need to get it merged first and then see how to correctly build the flow.

[1] https://review.openstack.org/#/c/187270/

> 
> Or would this generic approach be completely agnostic to the underlying
> cinder-driver in use?

Comment 7 Stephen Gordon 2016-03-09 22:56:11 UTC
This does not appear to have progressed in Mitaka, moving to 10.

Comment 8 Stephen Gordon 2016-07-07 16:20:06 UTC
Does not appear to have progressed in Newton, moving to Ocata.