Bug 1220849 - [RFE] Support offline migration of attached volumes when VM is inactive
Summary: [RFE] Support offline migration of attached volumes when VM is inactive
Keywords:
Status: CLOSED WONTFIX
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: openstack-nova
Version: 11.0 (Ocata)
Hardware: x86_64
OS: Linux
medium
medium
Target Milestone: ---
: ---
Assignee: OSP DFG:Compute
QA Contact: OSP DFG:Compute
URL:
Whiteboard:
: 1220526 (view as bug list)
Depends On:
Blocks: 1442136
TreeView+ depends on / blocked
 
Reported: 2015-05-12 15:32 UTC by Eoghan Glynn
Modified: 2023-03-21 18:39 UTC (History)
14 users (show)

Fixed In Version:
Doc Type: Enhancement
Doc Text:
Clone Of:
Environment:
Last Closed: 2020-09-29 11:52:26 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Launchpad 1454252 0 None None None Never
Red Hat Bugzilla 1229843 0 high CLOSED sparseness is not preserved across online volume migration of a volume attached to an active VM 2023-03-21 18:40:29 UTC
Red Hat Issue Tracker OSP-23295 0 None None None 2023-03-21 18:39:56 UTC

Internal Links: 1229843

Description Eoghan Glynn 2015-05-12 15:32:50 UTC
Description of problem:

An attached volume also cannot be migrated when the VM is inactive.

If the root volume of a boot-from-volume instance, it also cannot be detached to faciliate a volume migration.

Only online migration is supported for such volumes, since BZ 1218342.


Version-Release number of selected component (if applicable):

openstack-cinder-2014.1.4-*
openstack-nova-2014.1.4-*


How reproducible:

100%


Steps to Reproduce:
1. Boot a VM from the volume
2. Shutoff the VM
3. Try to migrate the volume between different storages of the same type (cinder retype with --migration-policy on-demand )
4. The process fails in Nova with the libvirt error in blockRebase, because libvirt can't find a VM instance.


Expected result:

The volume should move to another storage.


Actual results:

Nova fails with libvirt error in blockRebase, because libvirt can't find a VM instance.

Comment 3 Eoghan Glynn 2015-05-12 15:35:36 UTC
From Kashyap Chamarthy ...

Some notes on libvirt blockRebase API and offline migration:

As it was correctly identified, libvirt's 'blockRebase' API works only
(and that's not a bug) when the guest is online -- the main
functionality of blockRebase is that it allows the guest to concurrently
read/write while the copy is taking place. So the behavior of libvirt
(blockRebase API, to be precise) throwing an error when it doesn't see a
running domain (guest) is expected.

Some notes from an IRC discussion with Matthew Booth, Dan Berrange and
Peter Krempa).

  - It is pointless to allow blockRebase to work when a guest is
    offline, because one can use more efficient methods like `dd` (or 
    similar) to perform a copy.

  - Matthew Booth tried to explore an idea where libvirt driver could 
    try to create an ephemeral (transient) domain (guest in libvirt's
    parlance) to do a block migration. To However that's might not be
    acceptable in a 'Cloud environment', rationale from libvirt 
    developers:

      - Creating a guest consumes resources from the host whic his not
        acceptable if the guest is supposed to be shutoff. We've
        previously rejected this as an approach for migrating offline
        guests for this reason.

      - Allowing blockRebase to work offline probably will require to
        spawn a instance of QEMU anyway, so it will be rather ugly to
        implement.

Comment 4 Dave Maley 2015-06-12 17:36:12 UTC
*** Bug 1220526 has been marked as a duplicate of this bug. ***

Comment 5 Eoghan Glynn 2015-06-22 18:40:56 UTC
From https://bugs.launchpad.net/nova/+bug/1454252/comments/2:

> Today when Cinder understands that the volume is attached to VM it calls Nova's
> swap_volume feature to migrate a volume. The problem is that swap_volume uses
> libvirt's blockRebase which fails if libvirt doesn't find a VM and that's
> exactly what happens in this case because VM is shutoff so the qemu process for
> this VM doesn't exist.
>
> In order to fix it we probably have to change the flow. Nova should detect
> that VM is off and instead of calling blockRebase it has to attach a new
> volume to compute node and perform simple copy (dd) as Cinder does in generic
> case. Once it done, Nova must update a path to the new volume in all relevant 
> places. Important to mention that VM must be locked during the process to 
> prevent the user to run it.

@sgotliv: would performing a simple copy (in the sense of dd) require cinder-driver-specific knowledge on the nova side?

Or would this generic approach be completely agnostic to the underlying cinder-driver in use?

Comment 6 Sergey Gotliv 2015-06-25 06:55:55 UTC
(In reply to Eoghan Glynn from comment #5)
> From https://bugs.launchpad.net/nova/+bug/1454252/comments/2:
> 
> > Today when Cinder understands that the volume is attached to VM it calls Nova's
> > swap_volume feature to migrate a volume. The problem is that swap_volume uses
> > libvirt's blockRebase which fails if libvirt doesn't find a VM and that's
> > exactly what happens in this case because VM is shutoff so the qemu process for
> > this VM doesn't exist.
> >
> > In order to fix it we probably have to change the flow. Nova should detect
> > that VM is off and instead of calling blockRebase it has to attach a new
> > volume to compute node and perform simple copy (dd) as Cinder does in generic
> > case. Once it done, Nova must update a path to the new volume in all relevant 
> > places. Important to mention that VM must be locked during the process to 
> > prevent the user to run it.
> 
> @sgotliv: would performing a simple copy (in the sense of dd) require
> cinder-driver-specific knowledge on the nova side?

"dd" doesn't cover the case where the volume is not locally attachable [1] for example RBD volume migration scenario. To be honest I am not a huge fan of the code duplication and this Cinder patch [1] introduce changes to the generic migration behavior anyway so I guess we need to get it merged first and then see how to correctly build the flow.

[1] https://review.openstack.org/#/c/187270/

> 
> Or would this generic approach be completely agnostic to the underlying
> cinder-driver in use?

Comment 7 Stephen Gordon 2016-03-09 22:56:11 UTC
This does not appear to have progressed in Mitaka, moving to 10.

Comment 8 Stephen Gordon 2016-07-07 16:20:06 UTC
Does not appear to have progressed in Newton, moving to Ocata.


Note You need to log in before you can comment on or make changes to this bug.