1532133 – Preallocated volume convert to sparse volume after live storage migration to file based storage domain

Bug 1532133 - Preallocated volume convert to sparse volume after live storage migration to file based storage domain

Summary: Preallocated volume convert to sparse volume after live storage migration to ...

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Enterprise Virtualization Manager
Classification:	Red Hat
Component:	vdsm
Sub Component:
Version:	4.1.4
Hardware:	x86_64
OS:	Linux
Priority:	medium
Severity:	high
Target Milestone:	ovirt-4.2.2
Target Release:	---
Assignee:	Nir Soffer
QA Contact:	Yosi Ben Shimon
Docs Contact:
URL:
Whiteboard:
Depends On:	1550117
Blocks:
TreeView+	depends on / blocked

Reported:	2018-01-08 05:57 UTC by Ribu Tho
Modified:	2021-06-10 14:20 UTC (History)
CC List:	14 users (show)
Fixed In Version:	v4.20.15
Doc Type:	Bug Fix
Doc Text:	Red Hat Virtualization uses the qemu-img tool to copy disks during live storage migration, instead of dd. This tool converts unused space in the image to holes, making the destination disk sparse. Raw preallocated disks copied during live storage migration were converted to raw sparse disks. Now, you can use the qemu-img preallocation option when copying raw preallocated disks to file-based storage domains, so that the disks are kept preallocated after the migration.
Clone Of:
Environment:
Last Closed:	2018-05-15 17:54:02 UTC
oVirt Team:	Storage
Target Upstream Version:
Embargoed:
Flags:	lsvaty: testing_plan_complete-

Attachments	(Terms of Use)
Engine_log (6.23 MB, text/plain) 2018-02-27 19:17 UTC, Yosi Ben Shimon	no flags	Details
VDSM_log (8.00 MB, text/plain) 2018-02-27 19:19 UTC, Yosi Ben Shimon	no flags	Details
View All

Links
System	ID	Priority	Status	Summary	Last Updated
Red Hat Knowledge Base (Solution)	3312611	None	None	None	2018-01-08 06:13:28 UTC
Red Hat Product Errata	RHEA-2018:1489	None	None	None	2018-05-15 17:55:30 UTC
oVirt gerrit	69848	master	MERGED	qemuimg: Add preallocation support to qemuimg.create()	2020-11-01 21:59:58 UTC
oVirt gerrit	74243	master	MERGED	qemuimg: Wrap create command into operation object	2020-11-01 21:59:59 UTC
oVirt gerrit	86278	master	MERGED	qemuimg: Add preallocation option to convert()	2020-11-01 21:59:59 UTC
oVirt gerrit	86279	master	MERGED	image: Keep volume preallocation during copy	2020-11-01 21:59:59 UTC

Description Ribu Tho 2018-01-08 05:57:47 UTC

Description of problem:

A gluster storage creates preallocated volumes as RAW-Sparse for which the GUI reports the actual size rather than virtual size.

Version-Release number of selected component (if applicable):

ovirt-engine-4.1.4.2-0.1.el7.noarch
vdsm-4.19.10.1-1.el7ev.x86_64
glusterfs-3.8.4-18.el7rhgs.x86_64

How reproducible:


Steps to Reproduce:

1. Create a device for a VM on Gluster SD as preallocated from GUI. 

2. The disk image ends up with format RAW indicating virtual size to size specified in GUI and actual size less than the virtual size specified in GUI. 

=----------------------------------------------------------------
# qemu-img info db5b9b50-98ec-43e7-8de3-4b40c1a54502
image: db5b9b50-98ec-43e7-8de3-4b40c1a54502
file format: raw
virtual size: 5.0G (5368709120 bytes)
disk size: 1.5G
=----------------------------------------------------------------

 
Actual results:

- The disk is being created as RAW-Sparse
- The GUI disk usage shows actual size used. 

Expected results:

- The GUI should report the original virtual size to avoid users from allocating the free space for other VM's and new devices. 
- The volume should be created as fully RAW image allocating the entire disk space.

Additional info:

Comment 3 Nir Soffer 2018-01-09 21:51:31 UTC

In vdsm log we see:

5.0g volume created:

2018-01-08 04:53:32,896+1100 INFO  (jsonrpc/7) [dispatcher] Run and protect: createVolume(sdUUID=u'0694def1-b588-4a43-b71f-bd66df4fef24', spUUID=u'598c8196-032c-02fa-00f6-000000000230', imgU
UID=u'4e849764-7876-4a46-bfaa-0b82dc283475', size=u'5368709120', volFormat=5, preallocate=1, diskType=2, volUUID=u'db5b9b50-98ec-43e7-8de3-4b40c1a54502', desc=u'{"DiskAlias":"test-iscsi_Disk
3","DiskDescription":"gluster-test"}', srcImgUUID=u'00000000-0000-0000-0000-000000000000', srcVolUUID=u'00000000-0000-0000-0000-000000000000', initialSize=None) (logUtils:51)

Vdsm preallocates 5.0G as asked:

2018-01-08 04:54:01,232+1100 DEBUG (tasks/3) [storage.Misc.excCmd] /usr/bin/taskset --cpu-list 0-1 /usr/bin/nice -n 19 /usr/bin/ionice -c 3 /usr/bin/dd if=/dev/zero of=/rhev/data-center/598c8196-032c-02fa-00f6-000000000230/0694def1-b588-4a43-b71f-bd66df4fef24/images/4e849764-7876-4a46-bfaa-0b82dc283475/db5b9b50-98ec-43e7-8de3-4b40c1a54502 bs=1048576 seek=0 skip=0 conv=notrunc count=5120 oflag=direct (cwd None) (commands:69)

conv=fsync flag is missing in this dd command.

Can try to run the same dd command with this storage - does it create sparse file?

Does it change is we add conv=notrunc,fsync?

Comment 4 Ribu Tho 2018-01-11 21:24:19 UTC

(In reply to Nir Soffer from comment #3)
> In vdsm log we see:
> 
> 5.0g volume created:
> 
> 2018-01-08 04:53:32,896+1100 INFO  (jsonrpc/7) [dispatcher] Run and protect:
> createVolume(sdUUID=u'0694def1-b588-4a43-b71f-bd66df4fef24',
> spUUID=u'598c8196-032c-02fa-00f6-000000000230', imgU
> UID=u'4e849764-7876-4a46-bfaa-0b82dc283475', size=u'5368709120',
> volFormat=5, preallocate=1, diskType=2,
> volUUID=u'db5b9b50-98ec-43e7-8de3-4b40c1a54502',
> desc=u'{"DiskAlias":"test-iscsi_Disk
> 3","DiskDescription":"gluster-test"}',
> srcImgUUID=u'00000000-0000-0000-0000-000000000000',
> srcVolUUID=u'00000000-0000-0000-0000-000000000000', initialSize=None)
> (logUtils:51)
> 
> Vdsm preallocates 5.0G as asked:
> 
> 2018-01-08 04:54:01,232+1100 DEBUG (tasks/3) [storage.Misc.excCmd]
> /usr/bin/taskset --cpu-list 0-1 /usr/bin/nice -n 19 /usr/bin/ionice -c 3
> /usr/bin/dd if=/dev/zero
> of=/rhev/data-center/598c8196-032c-02fa-00f6-000000000230/0694def1-b588-4a43-
> b71f-bd66df4fef24/images/4e849764-7876-4a46-bfaa-0b82dc283475/db5b9b50-98ec-
> 43e7-8de3-4b40c1a54502 bs=1048576 seek=0 skip=0 conv=notrunc count=5120
> oflag=direct (cwd None) (commands:69)
> 
> conv=fsync flag is missing in this dd command.
> 
> Can try to run the same dd command with this storage - does it create sparse
> file?
> 
> Does it change is we add conv=notrunc,fsync?

Nir,

I have checked the issue by creating with dd command below .

# /usr/bin/dd if=/dev/zero of=file  bs=1048576 seek=0 skip=0 conv=notrunc count=5120 oflag=direct 

The file output resulted in a raw image of virtual size and actual size equalling to 5GB in size. It was a RAW-RAW image and there was no issue as the original bug highlighted above in my comments. 

Ribu

Comment 5 Nir Soffer 2018-01-11 21:31:55 UTC

(In reply to Ribu Tho from comment #4)
> I have checked the issue by creating with dd command below .
> 
> # /usr/bin/dd if=/dev/zero of=file  bs=1048576 seek=0 skip=0 conv=notrunc
> count=5120 oflag=direct 

Ribu, where is file located?

We want to test writing to gluster volume. Please try:

    /usr/bin/dd if=/dev/zero \
        of=/rhev/data-center/598c8196-032c-02fa-00f6-000000000230/0694def1-b588-4a43-b71f-bd66df4fef24/images/4e849764-7876-4a46-bfaa-0b82dc283475/db5b9b50-98ec-43e7-8de3-4b40c1a54502 \
        bs=1048576 conv=notrunc count=5120 oflag=direct

If this creates sparse file, we need to move this bug to gluster.

Comment 6 Ribu Tho 2018-01-11 22:56:24 UTC

Nir,

Yes , this was tested writing to a gluster volume in my lab machine only. It completed successfully for me creating a 5GB raw device .

##################################################
sh-4.2$ whoami
vdsm

sh-4.2$ df -T 
gsslab-24-218.rhev.gsslab.bne.redhat.com:/labvol fuse.glusterfs  31434752   5315072  26119680  17% /rhev/data-center/mnt/glusterSD/gsslab-24-218.rhev.gsslab.bne.redhat.com:_labvol

sh-4.2$ qemu-img info file 
image: file
file format: raw
virtual size: 5.0G (5368709120 bytes)
disk size: 5.0G

sh-4.2$ pwd
/rhev/data-center/mnt/glusterSD/gsslab-24-218.rhev.gsslab.bne.redhat.com:_labvol/b4e62649-95ac-4006-8584-2d26bb3c6712/images

##################################################

Ribu

Comment 10 Nir Soffer 2018-01-12 00:35:49 UTC

Ribu, the customer case attached to this bug says:

    After live migration of storage on a VM successfully completed, the new
    storage does not accurately display the used space.

This is very different flow from what you describe in the bug. I don't think that
vdsm writing zeros to gluster can create sparse file, but live storage migration
may create sparse file.

Please check the customer case and add here detailed description of:
- The source storage
- The source image chain size and format before the migration
- The target storage
- The source image chain size and format after the migration

We had a bug in 4.0 or 4.1 about creating sparse files when copying images instead
of preallocated files. I think this was fixed in 4.2.

Comment 11 Nir Soffer 2018-01-12 01:34:24 UTC

I checked the code used during live storage migration, and I can confirm that 
we create sparse file if the target file is on file based domain (NFS, Gluster).

Here is a comment from the code:

    # To avoid prezeroing preallocated volumes on NFS domains
    # we create the target as a sparse volume (since it will be
    # soon filled with the data coming from the copy) and then
    # we change its metadata back to the original value.

This optimization was added for bug 910445 in 2013.

In the past, this optimization was harmless for raw files, because we copied raw
files using dd - so the unused space was filled by zeros, and the result was 
preallocated raw file.

However year later we switch to copy raw images using qemu-img convert for
bug 1156115. With qemu-img convert, allocated data is not copied, creating sparse
files.

Can be fixed by using qemu-img convert preallocation=falloc option.

If the file system does not support fallocate(), the copy can be much slower with
preallocation, but if the user wants a preallocated image, this is the price.

We have patches for adding this option to qemu-img create:
https://gerrit.ovirt.org/69848.

Adding the option to qemu-img convert should be easy after these patches are 
merged.

Maor don't you work on a similar bug? is this a duplicate?

Comment 12 Maor 2018-01-14 14:26:43 UTC

(In reply to Nir Soffer from comment #11)
> I checked the code used during live storage migration, and I can confirm
> that 
> we create sparse file if the target file is on file based domain (NFS,
> Gluster).
> 
> Here is a comment from the code:
> 
>     # To avoid prezeroing preallocated volumes on NFS domains
>     # we create the target as a sparse volume (since it will be
>     # soon filled with the data coming from the copy) and then
>     # we change its metadata back to the original value.
> 
> This optimization was added for bug 910445 in 2013.
> 
> In the past, this optimization was harmless for raw files, because we copied
> raw
> files using dd - so the unused space was filled by zeros, and the result was 
> preallocated raw file.
> 
> However year later we switch to copy raw images using qemu-img convert for
> bug 1156115. With qemu-img convert, allocated data is not copied, creating
> sparse
> files.
> 
> Can be fixed by using qemu-img convert preallocation=falloc option.
> 
> If the file system does not support fallocate(), the copy can be much slower
> with
> preallocation, but if the user wants a preallocated image, this is the price.
> 
> We have patches for adding this option to qemu-img create:
> https://gerrit.ovirt.org/69848.
> 
> Adding the option to qemu-img convert should be easy after these patches are 
> merged.
> 
> Maor don't you work on a similar bug? is this a duplicate?

This is the bug you mention, it does look like the same issue:
   https://bugzilla.redhat.com/1429286 - RAW-Preallocated disk is converted to RAW-sparse while cloning a VM in file based storage domain (edit)

Comment 13 Yosi Ben Shimon 2018-02-27 19:17:48 UTC

Created attachment 1401496 [details]
Engine_log

Comment 14 Yosi Ben Shimon 2018-02-27 19:19:37 UTC

Created attachment 1401497 [details]
VDSM_log

Comment 15 Yosi Ben Shimon 2018-02-27 19:20:30 UTC

Verification failed.
Tested using:
ovirt-engine-4.2.2.1-0.1.el7.noarch

Environment status at the time of failure:
- 1 VM
- 2 disks:
a) 20 GB NFS thin provision (bootable)
b) 5 GB glusterFS preallocated
- os installed.
- VM running.

Started live storage migration of disk b (5 GB)

After the failure, the disk is:
- thin provision
- actual size of 10 GB
- virtual size of 5 GB
- stays on the same SD (gluster)

The errors indicates on failure in SnapShot creation and VM disk replication.

Attached engine and VDSM logs.

Moving to ASSISGNED

Comment 16 Allon Mureinik 2018-02-28 09:57:36 UTC

I sincerely doubt this has anything to do with these patches, but it's troublesome:

2018-02-27 20:27:11,484+0200 ERROR (jsonrpc/5) [virt.vm] (vmId='72e757c1-b6d5-4872-b97e-67e24d75a926') Unable to take snapshot (vm:4484)
Traceback (most recent call last):
  File "/usr/lib/python2.7/site-packages/vdsm/virt/vm.py", line 4481, in snapshot
    self._dom.snapshotCreateXML(snapxml, snapFlags)
  File "/usr/lib/python2.7/site-packages/vdsm/virt/virdomain.py", line 98, in f
    ret = attr(*args, **kwargs)
  File "/usr/lib/python2.7/site-packages/vdsm/common/libvirtconnection.py", line 130, in wrapper
    ret = f(*args, **kwargs)
  File "/usr/lib/python2.7/site-packages/vdsm/common/function.py", line 92, in wrapper
    return func(inst, *args, **kwargs)
  File "/usr/lib64/python2.7/site-packages/libvirt.py", line 2585, in snapshotCreateXML
    if ret is None:raise libvirtError('virDomainSnapshotCreateXML() failed', dom=self)
libvirtError: internal error: unable to execute QEMU command 'transaction': Could not read L1 table: Input/output error

Yosi - can we please have a bug on this issue specifically (taking a snapshot on gluster fails), with all the relevant details?

Comment 17 Yosi Ben Shimon 2018-02-28 14:32:40 UTC

Done.
bug: https://bugzilla.redhat.com/show_bug.cgi?id=1550117

Comment 18 Yosi Ben Shimon 2018-03-11 09:32:38 UTC

Tested using:
ovirt-engine-4.2.2.2-0.1.el7.noarch
vdsm-4.20.20-1.el7ev.x86_64
qemu-img-rhev-2.10.0-21.el7.x86_64

Actual result:
The disk was successfully moved to other glusterFS SD.
The disk allocation policy was preallocated as it was before the live storage migration.
In GUI - the actual size = virtual size = 5 GiB as started.
No snapshot remained as a result of failure or timeout.

Moving to VERIFIED

Comment 19 Nir Soffer 2018-04-11 20:43:06 UTC

The bug is not blocked on anything, fixing title.

Comment 24 errata-xmlrpc 2018-05-15 17:54:02 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2018:1489

Comment 25 Franta Kust 2019-05-16 13:03:56 UTC

BZ<2>Jira Resync

Comment 26 Daniel Gur 2019-08-28 13:11:59 UTC

sync2jira

Comment 27 Daniel Gur 2019-08-28 13:16:11 UTC

sync2jira

Note You need to log in before you can comment on or make changes to this bug.