Bug 1306121

Summary: Clone of sparse disk on nfs storage domain is extremely slow
Product: Red Hat Enterprise Linux 7 Reporter: Marina Kalinin <mkalinin>
Component: qemu-kvm-rhevAssignee: Ademar Reis <areis>
Status: CLOSED DUPLICATE QA Contact: Virtualization Bugs <virt-bugs>
Severity: high Docs Contact:
Priority: high    
Version: 7.2CC: amureini, bazulay, chayang, gklein, huding, juzhang, jwoods, knoel, kwolf, lsurette, mkalinin, pzhukov, tnisan, virt-maint, ycui, yeylon, ykaul
Target Milestone: pre-dev-freeze   
Target Release: 7.3   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2016-04-14 15:35:27 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: Storage RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Marina Kalinin 2016-02-10 04:45:53 UTC
Time to clone a sparse disk on nfs storage in RHEV is extremely slow, even when the actual used file size is tiny.

Version-Release number of selected component (if applicable):
3.5.7
vdsm-4.16.32-1
qemu-kvm-rhev-2.1.2-23.el7_1.8.x86_64

How reproducible:
Always.

Steps to Reproduce:
1. Create a VM with a big, 1TB disk on NFS storage domain. Disk type - thin provisioned.
2. Clone the VM (I do not see it is an option, but if it is there, it should clone to the same nfs storage domain).
3. Wait..

Actual results:
It takes about 3 hours to complete cloning 1TB disk. On customer site it was 30TB and it took more then 24H.

Expected results:
Since the thin provision disk does not contain much information, if any, the clone operation should be very quick.


Additional info:
VDSM is using qemu-img convert to perform the clone.
There is a very similar bug, only about copying to export domain, that was closed, cause could not reproduce: 
https://bugzilla.redhat.com/show_bug.cgi?id=1132219

If I try creating a sparse file of the same size myself and copying it using qemu-img convert, it takes less then a second. And this is how we tried reproduced this bug#1132219 as well. However, when I run the exact RHEV command, it is painfully slow. So probably there is something to do with how RHEV copies it. Maybe over nfs?? Rather then I just copy it here in the same directory? Note how the paths are different on RHEV command.

Comment 1 Marina Kalinin 2016-02-10 04:53:04 UTC
The command RHEV is using to create a disk clone:
~~~
# /usr/bin/nice -n 19 /usr/bin/ionice -c 3 
/usr/bin/qemu-img convert -t none -T none -f raw
/rhev/data-center/00000002-0002-0002-0002-0000000001df/b41747f4-90de-4bce-ab33-4966109edc1c/images/8a286733-fa86-4cc4-9ed1-19f398fecac2/e4b56180-1b73-414f-aa9b-bea3b188686a 
-O raw 
/rhev/data-center/mnt/10.12.215.221:_mku-nfs-rhev9m/b41747f4-90de-4bce-ab33-4966109edc1c/images/49eab878-6d26-4d79-8ac0-9ac5564447cf/91e13148-a0a6-4516-a9b3-a5cf5ee57d5f
~~~
If I run this exact command directly on the host, it takes the same 3+H.


We can see that both the original file and the cloned one are sparse files:
~~~
-bash-4.2$ ls -lsh 49eab878-6d26-4d79-8ac0-9ac5564447cf/91e13148-a0a6-4516-a9b3-a5cf5ee57d5f
0 -rw-rw----. 1 vdsm kvm 1.0T Feb  8 15:18 49eab878-6d26-4d79-8ac0-9ac5564447cf/91e13148-a0a6-4516-a9b3-a5cf5ee57d5f
-bash-4.2$ ls -lsh 8a286733-fa86-4cc4-9ed1-19f398fecac2/e4b56180-1b73-414f-aa9b-bea3b188686a
0 -rw-rw----. 1 vdsm kvm 1.0T Feb  8 15:16 8a286733-fa86-4cc4-9ed1-19f398fecac2/e4b56180-1b73-414f-aa9b-bea3b188686a
~~~

Another way to see the sparse file is using du:
~~~
-bash-4.2$ du -sh 8a286733-fa86-4cc4-9ed1-19f398fecac2/e4b56180-1b73-414f-aa9b-bea3b188686a
0	8a286733-fa86-4cc4-9ed1-19f398fecac2/e4b56180-1b73-414f-aa9b-bea3b188686a
-bash-4.2$ du -sh --apparent-size 8a286733-fa86-4cc4-9ed1-19f398fecac2/e4b56180-1b73-414f-aa9b-bea3b188686a
1.0T	8a286733-fa86-4cc4-9ed1-19f398fecac2/e4b56180-1b73-414f-aa9b-bea3b188686a
~~~


However, when I do this manually, I mean just creating a file and copying it within same directory, it is very quick!

~~
-bash-4.2$ dd if=/dev/zero of=sparsefile bs=1 count=0 seek=1T
0+0 records in
0+0 records out
0 bytes (0 B) copied, 7.9418e-05 s, 0.0 kB/s
-bash-4.2$ du -h sparsefile 
0	sparsefile
-bash-4.2$ ls -lsh sparsefile 
0 -rw-r--r--. 1 vdsm kvm 1.0T Feb  9 22:37 sparsefile
-bash-4.2$ time /usr/bin/nice -n 19 /usr/bin/ionice -c 3  /usr/bin/qemu-img convert -t none -T none -f raw sparsefile -O raw copysparsefile

real	0m0.031s
user	0m0.007s
sys	0m0.005s
~~~

Comment 2 Yaniv Kaul 2016-02-10 11:49:10 UTC
Marina,

1. Why aren't there any logs attached?
2. Are you comparing 'dd' to 'qemu-img' command performance? That's apples to oranges.
3. If we indeed see qemu-img being slow, please open a bug on qemu-img.

Comment 3 Marina Kalinin 2016-02-11 22:08:46 UTC
(In reply to Yaniv Kaul from comment #2)
> Marina,
> 
> 1. Why aren't there any logs attached?
I decided they are not relevant, since there are no errors or exceptions. It is just waiting on the command to complete. And when you run the command manually, the exact command from the logs, it takes same time. So, I believe it is how it is implemented, the qemu-img convert. 
Please tell me which logs do you think are relevant and I will provide.

> 2. Are you comparing 'dd' to 'qemu-img' command performance? That's apples
> to oranges.
I am not comparing. I am just saying, that the copy of a zero-sized file should not take 3 hours. Something is wrong.

> 3. If we indeed see qemu-img being slow, please open a bug on qemu-img.
Probably I should open it to qemu-kvm and not to vdsm. You are right. I just assumed maybe the way vdsm is performing the copy/clone can be optimized. I was out yesterday and didn't have time to run additional tests to suggest a possible workaround to vdsm to use a different command. But by opening this bug to vdsm I wanted to hear from vdsm engineering why the decision was made to use qemu-img convert in this case.

Comment 4 Yaniv Kaul 2016-02-12 19:51:57 UTC
(In reply to Marina from comment #3)
> (In reply to Yaniv Kaul from comment #2)
> > Marina,
> > 
> > 1. Why aren't there any logs attached?
> I decided they are not relevant, since there are no errors or exceptions. It
> is just waiting on the command to complete. And when you run the command
> manually, the exact command from the logs, it takes same time. So, I believe
> it is how it is implemented, the qemu-img convert. 
> Please tell me which logs do you think are relevant and I will provide.

Usually, it's best to have all logs available (specifically in this case vdsm.log) and let the developers decide if they are needed or not.

> 
> > 2. Are you comparing 'dd' to 'qemu-img' command performance? That's apples
> > to oranges.
> I am not comparing. I am just saying, that the copy of a zero-sized file
> should not take 3 hours. Something is wrong.

Clearly.

> 
> > 3. If we indeed see qemu-img being slow, please open a bug on qemu-img.
> Probably I should open it to qemu-kvm and not to vdsm. You are right. I just
> assumed maybe the way vdsm is performing the copy/clone can be optimized. I
> was out yesterday and didn't have time to run additional tests to suggest a
> possible workaround to vdsm to use a different command. But by opening this
> bug to vdsm I wanted to hear from vdsm engineering why the decision was made
> to use qemu-img convert in this case.

Please close or move to qemu-kvm. (setting NEEDINFO).

The reason we switched to use qemu-img (AFAIR!) was to have it consistent with what we do everywhere (and also hoping the performance was better). I think another reason was that we got progress report for that action that in 'dd' was a bit more difficult to get.

Comment 5 Marina Kalinin 2016-02-12 21:58:23 UTC
Hey, moving to qemu-kvm.
The test I performed originally in the bug description was done on host local storage, when I reproduce the same on the remote storage, it has the same slow result. So, it is not related to how vdsm is forming the command.

Comment 7 Marina Kalinin 2016-02-12 22:04:39 UTC
qemu team - copy of a sparse disk of zero size should not take same amount of time as copying a preallocated disk.
let me know if you need help reproducing or accessing my environment for tests.

I believe it is some variation of this bug:
https://bugzilla.redhat.com/show_bug.cgi?id=1132219

And I do not want to see it closed not able to reproduce.

Thank you!

Comment 9 Marina Kalinin 2016-02-13 02:59:01 UTC
# create sparse file on nfs mount:
-bash-4.2$ dd if=/dev/zero of=sparsefile bs=1 count=0 seek=1T
-bash-4.2$ du -h sparsefile 
0	sparsefile
-bash-4.2$ du -h --apparent-size sparsefile 
1.0T	sparsefile

# create a copy of this sparse file on the same nfs mount:
-bash-4.2$ time /usr/bin/nice -n 19 /usr/bin/ionice -c 3  /usr/bin/qemu-img convert -t none -T none -f raw sparsefile -O raw copysparsefile

real	192m46.286s
user	1m3.830s
sys	1m25.158s

^^^^ 192 minutes is way too long to copy 0 bytes.

Comment 10 Marina Kalinin 2016-02-13 04:26:37 UTC
Does qemu-img convert knows how to handle raw sparse files correctly at all?

Comment 11 Kevin Wolf 2016-02-15 13:53:38 UTC
(In reply to Marina from comment #10)
> Does qemu-img convert knows how to handle raw sparse files correctly at all?

Yes, as you saw yourself, doing the operation on a local raw file is very quick.
The problem seems to be with NFS, specifically SEEK_DATA/SEEK_HOLE support.
Unfortunately, you didn't specify your kernel version number.

Bug 1079385 is relevant for this; however, it claims to be fixed in 7.2, and I'm
still seeing failing SEEK_DATA, which causes the slow behaviour:

32219 lseek(7, 0, SEEK_DATA)            = -1 EOPNOTSUPP (Operation not supported)

The other BZ hints at the NFS 4.2 protocol being needed, but I can't seem to get
this working on my RHEL 7.2 laptop. I can't seem to mount NFS with vers=4.2, but
I'm sure if you know how to configure it, you can make it work.

Anyway, the assumption is that with new enough NFS server, NFS client and using
the new protocol, things should just work in qemu-img.

Comment 12 Marina Kalinin 2016-02-26 04:07:39 UTC
Kevin,
Thank you for your update.
I believe the kernel on the host was:
3.10.0 - 229.14.1.el7.x86_64

Indeed it was RHEL 7.1 and I was using my slow storage and RHEV defaults to nfs v3. 

Let me ask the customer to see if they were using maybe different nfs version and what was the SAN storage they are using.

Jason, can you help here?

Comment 13 Jason Woods 2016-02-26 04:16:18 UTC
(In reply to Marina from comment #12)
> Kevin,
> Thank you for your update.
> I believe the kernel on the host was:
> 3.10.0 - 229.14.1.el7.x86_64
> 
> Indeed it was RHEL 7.1 and I was using my slow storage and RHEV defaults to
> nfs v3. 
> 
> Let me ask the customer to see if they were using maybe different nfs
> version and what was the SAN storage they are using.
> 
> Jason, can you help here?

I saw the slow copy of thin provisioned disk problem on RHEV-H built with ISO 7.2 build 20160105.

The storage was NFS using RHEV Data Storage defaults. I did not verify NFS settings/version used in RHEV, assume the defaults. NFS was provided by OpenIndiana running ZFS.

I no longer have access to this customer or their environment.

Comment 22 Ademar Reis 2016-04-14 15:35:27 UTC

*** This bug has been marked as a duplicate of bug 1079385 ***