690521 – Enlarging migrate_set_speed does not raise migration network transfer rates to the real network bandwidth

Bug 690521 - Enlarging migrate_set_speed does not raise migration network transfer rates to the real network bandwidth

Summary: Enlarging migrate_set_speed does not raise migration network transfer rates t...

Keywords:
Status:	CLOSED WONTFIX
Alias:	None
Product:	Red Hat Enterprise Linux 5
Classification:	Red Hat
Component:	kvm
Sub Component:
Version:	5.6
Hardware:	x86_64
OS:	Linux
Priority:	urgent
Severity:	urgent
Target Milestone:	rc
Target Release:	---
Assignee:	Glauber Costa
QA Contact:	Virtualization Bugs
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:	Rhel5KvmTier1 696155 707606 713392
TreeView+	depends on / blocked

Reported:	2011-03-24 15:01 UTC by Dan Yasny
Modified:	2013-07-02 07:15 UTC (History)
CC List:	19 users (show)
Fixed In Version:	kvm-83-230.el5
Doc Type:	Bug Fix
Doc Text:
Clone Of:
Clones:	713392 (view as bug list)
Environment:
Last Closed:	2011-07-11 06:36:07 UTC
Target Upstream Version:
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Description Dan Yasny 2011-03-24 15:01:06 UTC

Description of problem:
Migration speed is at it's default ~300Mbps. When updating the migrate_set_speed to 900000000 for example, we expect to see 900Mbps, but we only see a slight increase

Version-Release number of selected component (if applicable):
kmod-kvm-83-224.el5
kvm-tools-83-224.el5
kvm-qemu-img-83-224.el5
etherboot-zroms-kvm-5.4.4-13.el5
kvm-debuginfo-83-224.el5
kvm-83-224.el5


2.6.18-238.5.1.el5

How reproducible:
Always

Steps to Reproduce:
1. Start a VM migration, make sure the migration never ends, by running the following code inside the VM (more than one instance is preferred for an SMP VM)
//example to generate dirty pages and delay migration for demo purposes:
{
    unsigned char *array;
    long int i,j,k;
    unsigned char c;
    long int loop=0;
    array=malloc(1024*1024*1024);
    while(1)
    {
        for(i=0;i<1024;i++)
 {
            c=0;
            for(j=0;j<1024;j++)
            {
                c++;
                for(k=0;k<1024;k++)
                {
                    array[i*1024*1024+j*1024+k]=c;
                }
            }
        }
        loop++;
    }
}


2. watch the info migrate output, and count the transferred ram per second for an transfer speed approximation

3.
  
Actual results:
Max rates on a 1G network reach ~40MBps

Expected results:
be able to saturate the entire network if migrate_set_speed is set to the network speed or higher

Additional info:

Comment 1 Dor Laor 2011-03-28 10:11:20 UTC

Interesting issue. I wonder how rhel6 behaves since we have RCU for the dirty bits. Maybe additional tweaking is needed for the tcp connection, like setting Nigel on or maybe other means. 

Dan, do they have nested pages for the hardware?

Comment 2 Dan Yasny 2011-03-28 10:18:29 UTC

This is the hardware I reproduced this on:
processor	: 0
vendor_id	: GenuineIntel
cpu family	: 6
model		: 23
model name	: Intel(R) Xeon(R) CPU           E5420  @ 2.50GHz
stepping	: 6
cpu MHz		: 2493.751
cache size	: 6144 KB
physical id	: 0
siblings	: 4
core id		: 0
cpu cores	: 4
apicid		: 0
fpu		: yes
fpu_exception	: yes
cpuid level	: 10
wp		: yes
flags		: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm syscall lm constant_tsc pni monitor ds_cpl vmx est tm2 ssse3 cx16 xtpr sse4_1 lahf_lm
bogomips	: 4987.50
clflush size	: 64
cache_alignment	: 64
address sizes	: 38 bits physical, 48 bits virtual
power management:

Comment 33 Eduardo Habkost 2011-04-04 14:47:55 UTC

Bugzilla changed the bug component by itself. Reverting to 'kvm'.

Bugzilla bug reported: https://bugzilla.redhat.com/show_bug.cgi?id=693396

Comment 43 Mike Cao 2011-05-11 03:32:02 UTC

Verified this issue on 
# uname -r
2.6.18-260.el5
# rpm -q kvm
kvm-83-232.el5

steps:
Steps to Reproduce:
1. Start a VM migration, make sure the migration never ends,
eg :in the guest running #stress iozone ,dd at the same time
2.migrate_set_speed 1G
3. watch the info migrate output, and count the transferred ram per second for
an transfer speed approximation

Actual Results:
max speed could reach 500MB/s
on kvm-83-224.el5 ,max speed could reach reach 
~137MBps

Based on above ,migration speed improved alot ,this issue has been fixed.

Comment 44 Mike Cao 2011-05-11 03:35:04 UTC

(In reply to comment #41)
Hi , Dan

What you described should be a regression bug .Could you tell me which application is running in the guest so I can reproduce it ?

Best Regards,
Mike

Comment 46 Mike Cao 2011-05-12 02:46:37 UTC

Hi, Glommer

Could you review comment #41 and comment #43 ?
According to comment #43 ,the speed improved alot .
According to comment #41 ,the patch cause regression .I tried but I have not reproduce yet.

Based on above ,how to deal with this issue ?

Best Regards,
Mike

Comment 60 Mike Cao 2011-06-06 13:39:13 UTC

(In reply to comment #59)
> From the above dmesg I can see that kvmclock is being used by the guest

In RHEL5 so that we can not get the correct clocksource in #cat /sys/devices/system/clocksource/clocksource0/available_clocksource

btw ,Does this bug mean enlarge migration speed cause network does not work  ?
Whether Bug 703112 is dup of this one  ?

Comment 67 Mike Cao 2011-06-15 09:56:46 UTC

According to Open new Bug 713392 -Increase migration max_downtime/or speed cause guest stalls for more investigation.

According to comment #64 & comment #43 Change status to VERIFIED.

Comment 68 Mike Cao 2011-07-07 06:36:41 UTC

Tried on kvm-83-239.el5 , found this issue come back ,re-assign this issue ,
BTW ,this issue blocks me to verify Bug 713392

Steps:
1.start guest with -m 1G -cpu 4
eg:/usr/libexec/qemu-kvm -m 1G -smp 4,sockets=4,cores=1,threads=1 -name RHEL5u7
-uuid 13bd47ff-7458-a214-9c43-d311ed5ca5a3 -monitor stdio
-no-kvm-pit-reinjection -boot c -drive
file=/mnt/RHEL5.7-virtio.qcow2,if=virtio,format=qcow2,cache=none,boot=on -net
nic,macaddr=54:52:00:52:ed:61,vlan=0,model=virtio -net
tap,script=/etc/qemu-ifup,downscript=no,vlan=0 -serial pty -parallel none -usb
-vnc :1 -k en-us -vga cirrus -balloon none -M rhel5.6.0 -usbdevice tablet
2.in the guest 
#ping 8.8.8.8 -i 0.1
#stress -c 1 -m 1
3.(qemu)migrate_set_speed 1G
4.(qemu) migrate -d tcp:<hostB>:5888

Actual Results:
wait for more than 30 mins ,migration never finished

Additional info:
on kvm-83-238.el5 ,migration can be finished with the following steps.

Comment 69 FuXiangChun 2011-07-07 09:16:47 UTC

when do local migration, migration default transfer speed is about 35M/sec
after changed migrate_set_speed value to 1G, migration transfer speed is about 160M.

Comment 70 Dor Laor 2011-07-07 09:38:36 UTC

I think we need another bug here - it is visible that enlarging the migration speed does help but the migration convergence rate is too low.
But 713392 fixed an issue that seems to make migration convergence fast but eventually it left lots of pages that need to be transferred to the destination on the last stage of the migration. So in fact, the result is the same - you can increase the max downtime to 1-2 seconds (from 100ms) and see the migration converges fast enough.

Comment 71 Mike Cao 2011-07-07 14:12:45 UTC

(In reply to comment #70)
> I think we need another bug here - it is visible that enlarging the migration
> speed does help but the migration convergence rate is too low.

Referring to comment #0,seems this is the reason for reporting the bug 

"Migration speed is at it's default ~300Mbps. When updating the
migrate_set_speed to 900000000 for example, we expect to see 900Mbps, but we
only see a *slight* increase"


> But 713392 fixed an issue that seems to make migration convergence fast but
> eventually it left lots of pages that need to be transferred to the destination
> on the last stage of the migration. So in fact, the result is the same - you
> can increase the max downtime to 1-2 seconds (from 100ms) and see the migration
> converges fast enough.

As Bug 713392 reverted the patch of this Bug ,So no patch added for this bug's fix .

Referring to comment #27 &comment #69 ,From the results I can see the speed does not highly increased after (qemu)migrate_set_speed 1G between on kvm-83-224.el5(unfixed) and  on kvm-83-239.el5 (fixed).
It is clear that the bug does not fixed.

Based on above ,May I re-assign this issue ?

Mike

Comment 73 Mike Cao 2011-07-11 06:36:07 UTC

Referring to comment #71,comment #72 .
Since no patch added for this bug's fix ,I will close it as won't fix.

Note You need to log in before you can comment on or make changes to this bug.