Bug 1532133

Summary: Preallocated volume convert to sparse volume after live storage migration to file based storage domain
Product: Red Hat Enterprise Virtualization Manager Reporter: Ribu Tho <rabraham>
Component: vdsmAssignee: Nir Soffer <nsoffer>
Status: CLOSED ERRATA QA Contact: Yosi Ben Shimon <ybenshim>
Severity: high Docs Contact:
Priority: medium    
Version: 4.1.4CC: appraprv, fgarciad, lsurette, mlipchuk, nsoffer, rabraham, ratamir, srevivo, tnisan, trichard, ybenshim, ycui, ykaul, ylavi
Target Milestone: ovirt-4.2.2Flags: lsvaty: testing_plan_complete-
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: v4.20.15 Doc Type: Bug Fix
Doc Text:
Red Hat Virtualization uses the qemu-img tool to copy disks during live storage migration, instead of dd. This tool converts unused space in the image to holes, making the destination disk sparse. Raw preallocated disks copied during live storage migration were converted to raw sparse disks. Now, you can use the qemu-img preallocation option when copying raw preallocated disks to file-based storage domains, so that the disks are kept preallocated after the migration.
Story Points: ---
Clone Of: Environment:
Last Closed: 2018-05-15 17:54:02 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: Storage RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1550117    
Bug Blocks:    
Attachments:
Description Flags
Engine_log
none
VDSM_log none

Description Ribu Tho 2018-01-08 05:57:47 UTC
Description of problem:

A gluster storage creates preallocated volumes as RAW-Sparse for which the GUI reports the actual size rather than virtual size.

Version-Release number of selected component (if applicable):

ovirt-engine-4.1.4.2-0.1.el7.noarch
vdsm-4.19.10.1-1.el7ev.x86_64
glusterfs-3.8.4-18.el7rhgs.x86_64

How reproducible:


Steps to Reproduce:

1. Create a device for a VM on Gluster SD as preallocated from GUI. 

2. The disk image ends up with format RAW indicating virtual size to size specified in GUI and actual size less than the virtual size specified in GUI. 

=----------------------------------------------------------------
# qemu-img info db5b9b50-98ec-43e7-8de3-4b40c1a54502
image: db5b9b50-98ec-43e7-8de3-4b40c1a54502
file format: raw
virtual size: 5.0G (5368709120 bytes)
disk size: 1.5G
=----------------------------------------------------------------

 
Actual results:

- The disk is being created as RAW-Sparse
- The GUI disk usage shows actual size used. 

Expected results:

- The GUI should report the original virtual size to avoid users from allocating the free space for other VM's and new devices. 
- The volume should be created as fully RAW image allocating the entire disk space.

Additional info:

Comment 3 Nir Soffer 2018-01-09 21:51:31 UTC
In vdsm log we see:

5.0g volume created:

2018-01-08 04:53:32,896+1100 INFO  (jsonrpc/7) [dispatcher] Run and protect: createVolume(sdUUID=u'0694def1-b588-4a43-b71f-bd66df4fef24', spUUID=u'598c8196-032c-02fa-00f6-000000000230', imgU
UID=u'4e849764-7876-4a46-bfaa-0b82dc283475', size=u'5368709120', volFormat=5, preallocate=1, diskType=2, volUUID=u'db5b9b50-98ec-43e7-8de3-4b40c1a54502', desc=u'{"DiskAlias":"test-iscsi_Disk
3","DiskDescription":"gluster-test"}', srcImgUUID=u'00000000-0000-0000-0000-000000000000', srcVolUUID=u'00000000-0000-0000-0000-000000000000', initialSize=None) (logUtils:51)

Vdsm preallocates 5.0G as asked:

2018-01-08 04:54:01,232+1100 DEBUG (tasks/3) [storage.Misc.excCmd] /usr/bin/taskset --cpu-list 0-1 /usr/bin/nice -n 19 /usr/bin/ionice -c 3 /usr/bin/dd if=/dev/zero of=/rhev/data-center/598c8196-032c-02fa-00f6-000000000230/0694def1-b588-4a43-b71f-bd66df4fef24/images/4e849764-7876-4a46-bfaa-0b82dc283475/db5b9b50-98ec-43e7-8de3-4b40c1a54502 bs=1048576 seek=0 skip=0 conv=notrunc count=5120 oflag=direct (cwd None) (commands:69)

conv=fsync flag is missing in this dd command.

Can try to run the same dd command with this storage - does it create sparse file?

Does it change is we add conv=notrunc,fsync?

Comment 4 Ribu Tho 2018-01-11 21:24:19 UTC
(In reply to Nir Soffer from comment #3)
> In vdsm log we see:
> 
> 5.0g volume created:
> 
> 2018-01-08 04:53:32,896+1100 INFO  (jsonrpc/7) [dispatcher] Run and protect:
> createVolume(sdUUID=u'0694def1-b588-4a43-b71f-bd66df4fef24',
> spUUID=u'598c8196-032c-02fa-00f6-000000000230', imgU
> UID=u'4e849764-7876-4a46-bfaa-0b82dc283475', size=u'5368709120',
> volFormat=5, preallocate=1, diskType=2,
> volUUID=u'db5b9b50-98ec-43e7-8de3-4b40c1a54502',
> desc=u'{"DiskAlias":"test-iscsi_Disk
> 3","DiskDescription":"gluster-test"}',
> srcImgUUID=u'00000000-0000-0000-0000-000000000000',
> srcVolUUID=u'00000000-0000-0000-0000-000000000000', initialSize=None)
> (logUtils:51)
> 
> Vdsm preallocates 5.0G as asked:
> 
> 2018-01-08 04:54:01,232+1100 DEBUG (tasks/3) [storage.Misc.excCmd]
> /usr/bin/taskset --cpu-list 0-1 /usr/bin/nice -n 19 /usr/bin/ionice -c 3
> /usr/bin/dd if=/dev/zero
> of=/rhev/data-center/598c8196-032c-02fa-00f6-000000000230/0694def1-b588-4a43-
> b71f-bd66df4fef24/images/4e849764-7876-4a46-bfaa-0b82dc283475/db5b9b50-98ec-
> 43e7-8de3-4b40c1a54502 bs=1048576 seek=0 skip=0 conv=notrunc count=5120
> oflag=direct (cwd None) (commands:69)
> 
> conv=fsync flag is missing in this dd command.
> 
> Can try to run the same dd command with this storage - does it create sparse
> file?
> 
> Does it change is we add conv=notrunc,fsync?

Nir,

I have checked the issue by creating with dd command below .

# /usr/bin/dd if=/dev/zero of=file  bs=1048576 seek=0 skip=0 conv=notrunc count=5120 oflag=direct 

The file output resulted in a raw image of virtual size and actual size equalling to 5GB in size. It was a RAW-RAW image and there was no issue as the original bug highlighted above in my comments. 

Ribu

Comment 5 Nir Soffer 2018-01-11 21:31:55 UTC
(In reply to Ribu Tho from comment #4)
> I have checked the issue by creating with dd command below .
> 
> # /usr/bin/dd if=/dev/zero of=file  bs=1048576 seek=0 skip=0 conv=notrunc
> count=5120 oflag=direct 

Ribu, where is file located?

We want to test writing to gluster volume. Please try:

    /usr/bin/dd if=/dev/zero \
        of=/rhev/data-center/598c8196-032c-02fa-00f6-000000000230/0694def1-b588-4a43-b71f-bd66df4fef24/images/4e849764-7876-4a46-bfaa-0b82dc283475/db5b9b50-98ec-43e7-8de3-4b40c1a54502 \
        bs=1048576 conv=notrunc count=5120 oflag=direct

If this creates sparse file, we need to move this bug to gluster.

Comment 6 Ribu Tho 2018-01-11 22:56:24 UTC
Nir,

Yes , this was tested writing to a gluster volume in my lab machine only. It completed successfully for me creating a 5GB raw device .

##################################################
sh-4.2$ whoami
vdsm

sh-4.2$ df -T 
gsslab-24-218.rhev.gsslab.bne.redhat.com:/labvol fuse.glusterfs  31434752   5315072  26119680  17% /rhev/data-center/mnt/glusterSD/gsslab-24-218.rhev.gsslab.bne.redhat.com:_labvol

sh-4.2$ qemu-img info file 
image: file
file format: raw
virtual size: 5.0G (5368709120 bytes)
disk size: 5.0G

sh-4.2$ pwd
/rhev/data-center/mnt/glusterSD/gsslab-24-218.rhev.gsslab.bne.redhat.com:_labvol/b4e62649-95ac-4006-8584-2d26bb3c6712/images

##################################################

Ribu

Comment 10 Nir Soffer 2018-01-12 00:35:49 UTC
Ribu, the customer case attached to this bug says:

    After live migration of storage on a VM successfully completed, the new
    storage does not accurately display the used space.

This is very different flow from what you describe in the bug. I don't think that
vdsm writing zeros to gluster can create sparse file, but live storage migration
may create sparse file.

Please check the customer case and add here detailed description of:
- The source storage
- The source image chain size and format before the migration
- The target storage
- The source image chain size and format after the migration

We had a bug in 4.0 or 4.1 about creating sparse files when copying images instead
of preallocated files. I think this was fixed in 4.2.

Comment 11 Nir Soffer 2018-01-12 01:34:24 UTC
I checked the code used during live storage migration, and I can confirm that 
we create sparse file if the target file is on file based domain (NFS, Gluster).

Here is a comment from the code:

    # To avoid prezeroing preallocated volumes on NFS domains
    # we create the target as a sparse volume (since it will be
    # soon filled with the data coming from the copy) and then
    # we change its metadata back to the original value.

This optimization was added for bug 910445 in 2013.

In the past, this optimization was harmless for raw files, because we copied raw
files using dd - so the unused space was filled by zeros, and the result was 
preallocated raw file.

However year later we switch to copy raw images using qemu-img convert for
bug 1156115. With qemu-img convert, allocated data is not copied, creating sparse
files.

Can be fixed by using qemu-img convert preallocation=falloc option.

If the file system does not support fallocate(), the copy can be much slower with
preallocation, but if the user wants a preallocated image, this is the price.

We have patches for adding this option to qemu-img create:
https://gerrit.ovirt.org/69848.

Adding the option to qemu-img convert should be easy after these patches are 
merged.

Maor don't you work on a similar bug? is this a duplicate?

Comment 12 Maor 2018-01-14 14:26:43 UTC
(In reply to Nir Soffer from comment #11)
> I checked the code used during live storage migration, and I can confirm
> that 
> we create sparse file if the target file is on file based domain (NFS,
> Gluster).
> 
> Here is a comment from the code:
> 
>     # To avoid prezeroing preallocated volumes on NFS domains
>     # we create the target as a sparse volume (since it will be
>     # soon filled with the data coming from the copy) and then
>     # we change its metadata back to the original value.
> 
> This optimization was added for bug 910445 in 2013.
> 
> In the past, this optimization was harmless for raw files, because we copied
> raw
> files using dd - so the unused space was filled by zeros, and the result was 
> preallocated raw file.
> 
> However year later we switch to copy raw images using qemu-img convert for
> bug 1156115. With qemu-img convert, allocated data is not copied, creating
> sparse
> files.
> 
> Can be fixed by using qemu-img convert preallocation=falloc option.
> 
> If the file system does not support fallocate(), the copy can be much slower
> with
> preallocation, but if the user wants a preallocated image, this is the price.
> 
> We have patches for adding this option to qemu-img create:
> https://gerrit.ovirt.org/69848.
> 
> Adding the option to qemu-img convert should be easy after these patches are 
> merged.
> 
> Maor don't you work on a similar bug? is this a duplicate?

This is the bug you mention, it does look like the same issue:
   https://bugzilla.redhat.com/1429286 - RAW-Preallocated disk is converted to RAW-sparse while cloning a VM in file based storage domain (edit)

Comment 13 Yosi Ben Shimon 2018-02-27 19:17:48 UTC
Created attachment 1401496 [details]
Engine_log

Comment 14 Yosi Ben Shimon 2018-02-27 19:19:37 UTC
Created attachment 1401497 [details]
VDSM_log

Comment 15 Yosi Ben Shimon 2018-02-27 19:20:30 UTC
Verification failed.
Tested using:
ovirt-engine-4.2.2.1-0.1.el7.noarch

Environment status at the time of failure:
- 1 VM
- 2 disks:
a) 20 GB NFS thin provision (bootable)
b) 5 GB glusterFS preallocated
- os installed.
- VM running.

Started live storage migration of disk b (5 GB)

After the failure, the disk is:
- thin provision
- actual size of 10 GB
- virtual size of 5 GB
- stays on the same SD (gluster)

The errors indicates on failure in SnapShot creation and VM disk replication.

Attached engine and VDSM logs.

Moving to ASSISGNED

Comment 16 Allon Mureinik 2018-02-28 09:57:36 UTC
I sincerely doubt this has anything to do with these patches, but it's troublesome:

2018-02-27 20:27:11,484+0200 ERROR (jsonrpc/5) [virt.vm] (vmId='72e757c1-b6d5-4872-b97e-67e24d75a926') Unable to take snapshot (vm:4484)
Traceback (most recent call last):
  File "/usr/lib/python2.7/site-packages/vdsm/virt/vm.py", line 4481, in snapshot
    self._dom.snapshotCreateXML(snapxml, snapFlags)
  File "/usr/lib/python2.7/site-packages/vdsm/virt/virdomain.py", line 98, in f
    ret = attr(*args, **kwargs)
  File "/usr/lib/python2.7/site-packages/vdsm/common/libvirtconnection.py", line 130, in wrapper
    ret = f(*args, **kwargs)
  File "/usr/lib/python2.7/site-packages/vdsm/common/function.py", line 92, in wrapper
    return func(inst, *args, **kwargs)
  File "/usr/lib64/python2.7/site-packages/libvirt.py", line 2585, in snapshotCreateXML
    if ret is None:raise libvirtError('virDomainSnapshotCreateXML() failed', dom=self)
libvirtError: internal error: unable to execute QEMU command 'transaction': Could not read L1 table: Input/output error

Yosi - can we please have a bug on this issue specifically (taking a snapshot on gluster fails), with all the relevant details?

Comment 17 Yosi Ben Shimon 2018-02-28 14:32:40 UTC
Done.
bug: https://bugzilla.redhat.com/show_bug.cgi?id=1550117

Comment 18 Yosi Ben Shimon 2018-03-11 09:32:38 UTC
Tested using:
ovirt-engine-4.2.2.2-0.1.el7.noarch
vdsm-4.20.20-1.el7ev.x86_64
qemu-img-rhev-2.10.0-21.el7.x86_64

Actual result:
The disk was successfully moved to other glusterFS SD.
The disk allocation policy was preallocated as it was before the live storage migration.
In GUI - the actual size = virtual size = 5 GiB as started.
No snapshot remained as a result of failure or timeout.

Moving to VERIFIED

Comment 19 Nir Soffer 2018-04-11 20:43:06 UTC
The bug is not blocked on anything, fixing title.

Comment 24 errata-xmlrpc 2018-05-15 17:54:02 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2018:1489

Comment 25 Franta Kust 2019-05-16 13:03:56 UTC
BZ<2>Jira Resync

Comment 26 Daniel Gur 2019-08-28 13:11:59 UTC
sync2jira

Comment 27 Daniel Gur 2019-08-28 13:16:11 UTC
sync2jira