1229843 – sparseness is not preserved across online volume migration of a volume attached to an active VM

Bug 1229843 - sparseness is not preserved across online volume migration of a volume attached to an active VM

Summary: sparseness is not preserved across online volume migration of a volume attach...

Keywords:
Status:	CLOSED WONTFIX
Alias:	None
Product:	Red Hat OpenStack
Classification:	Red Hat
Component:	openstack-nova
Sub Component:
Version:	15.0 (Stein)
Hardware:	Unspecified
OS:	Unspecified
Priority:	high
Severity:	high
Target Milestone:	---
Target Release:	---
Assignee:	Kashyap Chamarthy
QA Contact:	OSP DFG:Compute
Docs Contact:
URL:
Whiteboard:
Depends On:	1232914 1277471 1297255 1533975
Blocks:
TreeView+	depends on / blocked

Reported:	2015-06-09 18:48 UTC by Eoghan Glynn
Modified:	2023-03-21 18:37 UTC (History)
CC List:	17 users (show)
Fixed In Version:
Doc Type:	Enhancement
Doc Text:
Clone Of:
Environment:
Last Closed:	2019-10-15 09:17:58 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Priority	Status	Summary	Last Updated
Launchpad	1398999	None	None	None	Never
Red Hat Bugzilla	1220849	medium	CLOSED	[RFE] Support offline migration of attached volumes when VM is inactive	2023-03-21 18:43:45 UTC
Red Hat Issue Tracker	OSP-13542	None	None	None	2022-03-13 14:21:12 UTC
Red Hat Knowledge Base (Solution)	2022583	None	None	None	2016-03-08 19:00:13 UTC

Internal Links: 1220849 1658833

Description Eoghan Glynn 2015-06-09 18:48:47 UTC

Description of problem:

When a volume attached to an active instance is migrated between NFS shares, sparseness is lost.

(This occurs on the customer site with NetApp, but the same principal should apply to any cinder driver that generally respects spareness, e.g. NFS)

The actual volume migration is orchestrated by cinder, but is actually acheived by delegating to nova via the swap_volume operation. Where libvirt is the hypervisor, this effectively boils down to the blockRebase operation. Maintaining sparseness in this context will also require changes to libvirt & qemu: https://bugzilla.redhat.com/1221468#c9
 

Additional info:
How reproducible:
100%


Steps to Reproduce:
1. Create a 1GB volume on NFS from the stock cirros image
2. Boot an instance from that volume, so that it's attached as the root volume
3. Check that the volume size on the disk is around 20MB
4. Migrate that volume to another NFS share
5. Check size again


Expected result:

The volume should remain the same size


Actual results:

The size is changed to ~1GB, which means the volume lost the sparseness.


Additional infomation:

Cleaved off from BZ 1221468 to cover the active VM case (leaving the original bug to solely represent the case where the VM is inactive).

Comment 3 Pádraig Brady 2015-06-11 06:53:57 UTC

Note bug #1219541 is tracking at least a very similar issue for libvirt and qemu.

That bug is currently against the qemu component, though there is an upstream patch proposed against libvirt (though not yet merged):
https://www.redhat.com/archives/libvir-list/2015-April/msg00130.html

Comment 4 Daniel Berrangé 2015-06-11 08:19:49 UTC

(In reply to Pádraig Brady from comment #3)
> Note bug #1219541 is tracking at least a very similar issue for libvirt and
> qemu.
> 
> That bug is currently against the qemu component, though there is an
> upstream patch proposed against libvirt (though not yet merged):
> https://www.redhat.com/archives/libvir-list/2015-April/msg00130.html

NB volume migration != guest migration.

Volume migration is the code using the driveMirror libvirt API, but that quoted patch is for guest migration.

Comment 5 Eoghan Glynn 2015-06-11 08:33:48 UTC

Yes, this bug is purely for the online *volume* migration case.

The guest stays put, but the volume needed to be moved between cinder backends in order to balance across multiple NetApps (in the customer usecase).

Comment 6 Pádraig Brady 2015-06-17 20:00:06 UTC

live block volume migration has been disabled as of Kilo/RHOS 7 as per http://pad.lv/1398999
(and also in Juno if https://review.openstack.org/176768 is merged)

Enablement will require:

1. Changes to qemu to detect zeros on NFS and propagate as holes
(bug #1232914)

2. API changes to libvirt to make the operation safe
(bug #1232919)

3. Nova changes to renable the feature and use the newer libvirt APIs
(tracked in this bug)

Comment 7 Daniel Berrangé 2015-06-18 09:13:33 UTC

@pbrady, once again that (In reply to Pádraig Brady from comment #6)
> live block volume migration has been disabled as of Kilo/RHOS 7 as per
> http://pad.lv/1398999
> (and also in Juno if https://review.openstack.org/176768 is merged)

These are again about *live migration* with block storage copy. ie the VM is relocated from one host to another host, and the storage is copied.

This bug is about volume migration. The VM stays running on the current host, and the volume is swapped out from beneath it.

Comment 8 Eoghan Glynn 2015-06-26 16:29:13 UTC

Status update:

Development is continuing in qemu, with another solution proposed, which is however a bit more invasive:

http://lists.nongnu.org/archive/html/qemu-block/2015-06/msg00292.html

Comment 13 Pádraig Brady 2015-07-10 14:17:00 UTC

http://git.qemu.org/?p=qemu.git;a=commitdiff;h=v2.3.0-1612-g0fc9f8e is scheduled to be backported next week to address this

Comment 27 Kashyap Chamarthy 2018-02-23 14:47:08 UTC

So this is still predicated on work in two lower layer components:

(1) [libvirt] https://bugzilla.redhat.com/show_bug.cgi?id=1297255 — 
    add possibility to sparsify image during block copy

(2) [QEMU] https://bugzilla.redhat.com/show_bug.cgi?id=1533975 — 
    detect-zeroes=unmap/on does not produce a sparse file on NFS v4.1 
    when attempting blockdev/drive-mirror

Comment 30 Matthew Booth 2019-10-15 09:17:58 UTC

I am closing this bug as it has not been addressed for a very long time. Please feel free to reopen if it is still relevant.

Note You need to log in before you can comment on or make changes to this bug.