Bug 1026137

Summary: Volume download speed is slow
Product: Red Hat Enterprise Linux 6 Reporter: Luwen Su <lsu>
Component: libvirtAssignee: Martin Kletzander <mkletzan>
Status: CLOSED NEXTRELEASE QA Contact: Virtualization Bugs <virt-bugs>
Severity: medium Docs Contact:
Priority: medium    
Version: 6.6CC: acathrow, berrange, chhu, cross, cwei, dallan, david.pravec, dyuan, gsun, jsuchane, jyang, mjenner, mzhan, vbudikov, yanyang, ydu, zhwang
Target Milestone: rcKeywords: Reopened
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: 1026136 Environment:
Last Closed: 2015-06-10 08:47:03 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1026136    
Bug Blocks:    
Attachments:
Description Flags
Vector I/O version none

Description Luwen Su 2013-11-04 02:43:53 UTC
+++ This bug was initially created as a clone of Bug #1026136 +++

Remove comments that not related this bug , make up the needinfo

--- Additional comment from time.su on 2013-10-25 03:46:34 EDT ---

Test with:
libvirt-1.1.1-10.el7.x86_64
qemu-kvm-1.5.3-10.el7.x86_64


The comments 10's steps are okay for me , also test the raw img.
But , the vol-download's speed will down to too slow when the image large enough (1G in my environment)

#time virsh vol-download --vol vol-test.img --pool default --file /home/test1/a 

real	1m56.311s
user	1m24.267s
sys	0m2.743s

And the 500M's volumes time:
real  0m19.033s

virsh # vol-dumpxml vol-test.img --pool default
<volume>
  <name>vol-test.img</name>
  <key>/var/lib/libvirt/images/vol-test.img</key>
  <source>
  </source>
  <capacity unit='bytes'>1048576000</capacity>
  <allocation unit='bytes'>1048580096</allocation>
  <target>
    <path>/var/lib/libvirt/images/vol-test.img</path>
    <format type='raw'/>
    <permissions>
      <mode>0600</mode>
      <owner>0</owner>
      <group>0</group>
      <label>system_u:object_r:virt_image_t:s0</label>
    </permissions>
    <timestamps>
      <atime>1382683904.221608140</atime>
      <mtime>1382683904.144609566</mtime>
      <ctime>1382683904.145609548</ctime>
    </timestamps>
  </target>
</volume>


ll /home/test1/
total 63484
-rw-r--r--. 1 root root 65005760 Oct 25 14:53 a
# ll /home/test1/a 
-rw-r--r--. 1 root root 133943320 Oct 25 14:53 /home/test1/a
# ll /home/test1/a 
-rw-r--r--. 1 root root 152029600 Oct 25 14:53 /home/test1/a
.........
-rw-r--r--. 1 root root 853136008 Oct 25 14:54 /home/test1/a
# ll /home/test1/a 
-rw-r--r--. 1 root root 853987976 Oct 25 14:54 /home/test1/a
# ll /home/test1/a 
-rw-r--r--. 1 root root 854643336 Oct 25 14:54 /home/test1/a



Is it related libvirt ?

--- Additional comment from time.su on 2013-10-30 04:25:22 EDT ---

BTW , also reproduced it on rhel6.
1.
And in the second time , the speed can be accepted if the target is same .

# time virsh vol-download --vol test.img --pool default --file /home/a 
real	3m38.059s
user	3m32.044s
sys	0m2.489s

# time virsh vol-download --vol test.img --pool default --file /home/a 
real	0m10.085s
user	0m1.719s
sys	0m2.323s

# time virsh vol-download --vol test.img --pool default --file /home/b  ;;cancel
^C
real	1m49.476s
user	1m45.474s
sys	0m1.921s



2.
During the process , virsh will use almost 100% CPU , watched via top

KiB Mem:   7364840 total,  4982532 used,  2382308 free,   126120 buffers

PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND       28663 root      20   0  628020 312468   4784 R  98.3  4.2   0:48.99 virsh 
                                               /*HERE*/

--- Additional comment from Osier Yang on 2013-10-30 09:58:56 EDT ---

(In reply to time.su from comment #19)
> Test with:
> libvirt-1.1.1-10.el7.x86_64
> qemu-kvm-1.5.3-10.el7.x86_64
> 
> 
> The comments 10's steps are okay for me , also test the raw img.
> But , the vol-download's speed will down to too slow when the image large
> enough (1G in my environment)
> 
> #time virsh vol-download --vol vol-test.img --pool default --file
> /home/test1/a 
> 
> real	1m56.311s
> user	1m24.267s
> sys	0m2.743s
> 
> And the 500M's volumes time:
> real  0m19.033s
> 
> virsh # vol-dumpxml vol-test.img --pool default
> <volume>
>   <name>vol-test.img</name>
>   <key>/var/lib/libvirt/images/vol-test.img</key>
>   <source>
>   </source>
>   <capacity unit='bytes'>1048576000</capacity>
>   <allocation unit='bytes'>1048580096</allocation>
>   <target>
>     <path>/var/lib/libvirt/images/vol-test.img</path>
>     <format type='raw'/>
>     <permissions>
>       <mode>0600</mode>
>       <owner>0</owner>
>       <group>0</group>
>       <label>system_u:object_r:virt_image_t:s0</label>
>     </permissions>
>     <timestamps>
>       <atime>1382683904.221608140</atime>
>       <mtime>1382683904.144609566</mtime>
>       <ctime>1382683904.145609548</ctime>
>     </timestamps>
>   </target>
> </volume>
> 
> 
> ll /home/test1/
> total 63484
> -rw-r--r--. 1 root root 65005760 Oct 25 14:53 a
> # ll /home/test1/a 
> -rw-r--r--. 1 root root 133943320 Oct 25 14:53 /home/test1/a
> # ll /home/test1/a 
> -rw-r--r--. 1 root root 152029600 Oct 25 14:53 /home/test1/a
> .........
> -rw-r--r--. 1 root root 853136008 Oct 25 14:54 /home/test1/a
> # ll /home/test1/a 
> -rw-r--r--. 1 root root 853987976 Oct 25 14:54 /home/test1/a
> # ll /home/test1/a 
> -rw-r--r--. 1 root root 854643336 Oct 25 14:54 /home/test1/a
> 
> 
> 
> Is it related libvirt ?

I think it's not related with libvirt, but I also have no idea why it becomes slow in the end of the transfering (when we talked f2f, time said that for a 1G volume, it becomes slow about at the point of 800M, before that it just looks fine), Dan, I think you are the more appropriate person for this question?

Comment 1 Luwen Su 2013-11-04 02:48:28 UTC
It's speed also slowly on RHLE6 , 
with 
libvirt-0.10.2-29.el6
qemu-kvm-rhev-0.12.1.2-2.415.el6
kernel-2.6.32-429.el6

So cloned here.

Comment 3 Yang Yang 2014-02-26 08:38:27 UTC
@ mkletzan :

The original bug 1026136 was closed as NOTABUG, so can we close the bug.

Comment 4 Martin Kletzander 2014-02-26 08:44:09 UTC
(In reply to yangyang from comment #3)
That's true, thanks.  Closing as such.

Comment 5 cross 2015-06-03 16:53:24 UTC
I think we have the exact same problem described in this issue and looked into libvirt source. We can produce this with CentOS 6 libvirt as well as git HEAD.

The vol-download uses two functions from src/rpc/virnetclientstream.c: virNetClientStreamQueuePacket() and virNetClientStreamRecvPacket().

The QueuePacket() (I shorten the names a bit) function is the source for data (the VM image in case of vol-download) and RecvPacket() is the sink. With fast system QueuePacket() function can produce data faster than RecvPacket() can consume and RecvPacket() starts to do huge amounts of memmove().

This makes vol-download really slow. For example in our case the vol-download starts with ~80MB/s (as seen from iotop) and after a while drops to below 1MB/s because the buffer starts to fill up. CPU utilisation of virsh is 100% after this issue happens.

The source from RecvPacket():

 358 int virNetClientStreamRecvPacket(virNetClientStreamPtr st,
 359                                  virNetClientPtr client,
 360                                  char *data,
 361                                  size_t nbytes,
 362                                  bool nonblock)
 363 {
...
 399     if (st->incomingOffset) {
 400         int want = st->incomingOffset;
 401         if (want > nbytes)
 402             want = nbytes;
 403         memcpy(data, st->incoming, want);
 404         if (want < st->incomingOffset) {
 405             memmove(st->incoming, st->incoming + want, st->incomingOffset - want);
 406             st->incomingOffset -= want;
 407         } else {
 408             VIR_FREE(st->incoming);
 409             st->incomingOffset = st->incomingLength = 0;
 410         }
 411         rv = want;
 412     } else {
 413         rv = 0;
 414     }


The st->incomingOffset is usually (always?) bigger than nbytes (comparison in line 401). nbytes is defined as 64kB (src/libvirt-stream.c: virStreamRecvAll()). So want is same as nbytes (64kB) and thus smaller than st->incomingOffset (comparison in line 404) and the execution goes into lines 405-406.

On line 405 memmove() removes already memcpy()ed data from st->incoming. Consider a case where st->incoming contains for example 512MB of data. Then every 64kB handled by the RecvPacket() causes 512MB-64kB memmove() operation. Already half a gig of data requires 8192 memmove()s to resolve and thus poor performance. And the QueuePacket() function just keeps pushing more data into st->incoming.

Increasing the nbytes from 64kB to 1MB made vol-download perform better in our environment, but it doesn't remove the possibility of running in the same issue in the future.

I hope this helps :)

Yours,
Ossi Herrala
Codenomicon Oy

Comment 6 cross 2015-06-03 17:02:22 UTC
Forgot to say: Please, consider reopening this issue.

Comment 7 Daniel Berrangé 2015-06-03 17:04:00 UTC
Thanks for that, I think your analysis is sound. I'm re-opening this bug, so we can consider what, if anything, we can do to improve this situation in general. For example, rather than storing one giant 'st->incoming' array, it might be better it we use an iovec, so when reading more data off the wire, we just add entries to the iovec. Then virNetClientStreamRecvPacket could just read data from the iovecs, and if it did need to memmove, it'd only be moving a small amount of data from one iovec - most of the others would be unchanged.

Comment 8 cross 2015-06-05 11:56:47 UTC
Created attachment 1035176 [details]
Vector I/O version

Use I/O vector (iovec) instead of one huge memory buffer as suggested in https://bugzilla.redhat.com/show_bug.cgi?id=1026137#c7. This avoids doing memmove() to big buffers and performance doesn't degrade if source (virNetClientStreamQueuePacket()) is faster than sink (virNetClientStreamRecvPacket()).

Comment 9 Martin Kletzander 2015-06-05 13:04:59 UTC
Thank you for posting the patch.  Would you mind sending it to the upstream list in order to speed up the inclusion in libvirt?

Comment 10 cross 2015-06-08 07:26:09 UTC
Patch sent to list: https://www.redhat.com/archives/libvir-list/2015-June/msg00284.html

Comment 11 Jaroslav Suchanek 2015-06-10 08:47:03 UTC
This will be handled in RHEL7 releases, bug 1026136. As RHEL6 is at the end of production 1 phase we would need a valid business justification. Thank you.