Bug 1229843

Summary:	sparseness is not preserved across online volume migration of a volume attached to an active VM
Product:	Red Hat OpenStack	Reporter:	Eoghan Glynn <eglynn>
Component:	openstack-nova	Assignee:	Kashyap Chamarthy <kchamart>
Status:	CLOSED WONTFIX	QA Contact:	OSP DFG:Compute <osp-dfg-compute>
Severity:	high	Docs Contact:
Priority:	high
Version:	15.0 (Stein)	CC:	asoni, byount, dasmith, dmaley, dsafford, eblake, eglynn, jraju, jwaterwo, kchamart, lyarwood, mbooth, sbauza, sgordon, spurrier, srevivo, vromanso
Target Milestone:	---	Keywords:	Triaged, ZStream
Target Release:	---
Hardware:	Unspecified
OS:	Unspecified
Whiteboard:
Fixed In Version:		Doc Type:	Enhancement
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2019-10-15 09:17:58 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:
Bug Depends On:	1232914, 1277471, 1297255, 1533975
Bug Blocks:

Description Eoghan Glynn 2015-06-09 18:48:47 UTC

Description of problem:

When a volume attached to an active instance is migrated between NFS shares, sparseness is lost.

(This occurs on the customer site with NetApp, but the same principal should apply to any cinder driver that generally respects spareness, e.g. NFS)

The actual volume migration is orchestrated by cinder, but is actually acheived by delegating to nova via the swap_volume operation. Where libvirt is the hypervisor, this effectively boils down to the blockRebase operation. Maintaining sparseness in this context will also require changes to libvirt & qemu: https://bugzilla.redhat.com/1221468#c9
 

Additional info:
How reproducible:
100%


Steps to Reproduce:
1. Create a 1GB volume on NFS from the stock cirros image
2. Boot an instance from that volume, so that it's attached as the root volume
3. Check that the volume size on the disk is around 20MB
4. Migrate that volume to another NFS share
5. Check size again


Expected result:

The volume should remain the same size


Actual results:

The size is changed to ~1GB, which means the volume lost the sparseness.


Additional infomation:

Cleaved off from BZ 1221468 to cover the active VM case (leaving the original bug to solely represent the case where the VM is inactive).

Comment 3 Pádraig Brady 2015-06-11 06:53:57 UTC

Note bug #1219541 is tracking at least a very similar issue for libvirt and qemu.

That bug is currently against the qemu component, though there is an upstream patch proposed against libvirt (though not yet merged):
https://www.redhat.com/archives/libvir-list/2015-April/msg00130.html

Comment 4 Daniel Berrangé 2015-06-11 08:19:49 UTC

(In reply to Pádraig Brady from comment #3)
> Note bug #1219541 is tracking at least a very similar issue for libvirt and
> qemu.
> 
> That bug is currently against the qemu component, though there is an
> upstream patch proposed against libvirt (though not yet merged):
> https://www.redhat.com/archives/libvir-list/2015-April/msg00130.html

NB volume migration != guest migration.

Volume migration is the code using the driveMirror libvirt API, but that quoted patch is for guest migration.

Comment 5 Eoghan Glynn 2015-06-11 08:33:48 UTC

Yes, this bug is purely for the online *volume* migration case.

The guest stays put, but the volume needed to be moved between cinder backends in order to balance across multiple NetApps (in the customer usecase).

Comment 6 Pádraig Brady 2015-06-17 20:00:06 UTC

live block volume migration has been disabled as of Kilo/RHOS 7 as per http://pad.lv/1398999
(and also in Juno if https://review.openstack.org/176768 is merged)

Enablement will require:

1. Changes to qemu to detect zeros on NFS and propagate as holes
(bug #1232914)

2. API changes to libvirt to make the operation safe
(bug #1232919)

3. Nova changes to renable the feature and use the newer libvirt APIs
(tracked in this bug)

Comment 7 Daniel Berrangé 2015-06-18 09:13:33 UTC

@pbrady, once again that (In reply to Pádraig Brady from comment #6)
> live block volume migration has been disabled as of Kilo/RHOS 7 as per
> http://pad.lv/1398999
> (and also in Juno if https://review.openstack.org/176768 is merged)

These are again about *live migration* with block storage copy. ie the VM is relocated from one host to another host, and the storage is copied.

This bug is about volume migration. The VM stays running on the current host, and the volume is swapped out from beneath it.

Comment 8 Eoghan Glynn 2015-06-26 16:29:13 UTC

Status update:

Development is continuing in qemu, with another solution proposed, which is however a bit more invasive:

http://lists.nongnu.org/archive/html/qemu-block/2015-06/msg00292.html

Comment 13 Pádraig Brady 2015-07-10 14:17:00 UTC

http://git.qemu.org/?p=qemu.git;a=commitdiff;h=v2.3.0-1612-g0fc9f8e is scheduled to be backported next week to address this

Comment 27 Kashyap Chamarthy 2018-02-23 14:47:08 UTC

So this is still predicated on work in two lower layer components:

(1) [libvirt] https://bugzilla.redhat.com/show_bug.cgi?id=1297255 — 
    add possibility to sparsify image during block copy

(2) [QEMU] https://bugzilla.redhat.com/show_bug.cgi?id=1533975 — 
    detect-zeroes=unmap/on does not produce a sparse file on NFS v4.1 
    when attempting blockdev/drive-mirror

Comment 30 Matthew Booth 2019-10-15 09:17:58 UTC

I am closing this bug as it has not been addressed for a very long time. Please feel free to reopen if it is still relevant.