Red Hat Bugzilla – Bug 690521
Enlarging migrate_set_speed does not raise migration network transfer rates to the real network bandwidth
Last modified: 2013-07-02 03:15:27 EDT
Description of problem:
Migration speed is at it's default ~300Mbps. When updating the migrate_set_speed to 900000000 for example, we expect to see 900Mbps, but we only see a slight increase
Version-Release number of selected component (if applicable):
Steps to Reproduce:
1. Start a VM migration, make sure the migration never ends, by running the following code inside the VM (more than one instance is preferred for an SMP VM)
//example to generate dirty pages and delay migration for demo purposes:
unsigned char *array;
long int i,j,k;
unsigned char c;
long int loop=0;
2. watch the info migrate output, and count the transferred ram per second for an transfer speed approximation
Max rates on a 1G network reach ~40MBps
be able to saturate the entire network if migrate_set_speed is set to the network speed or higher
Interesting issue. I wonder how rhel6 behaves since we have RCU for the dirty bits. Maybe additional tweaking is needed for the tcp connection, like setting Nigel on or maybe other means.
Dan, do they have nested pages for the hardware?
This is the hardware I reproduced this on:
processor : 0
vendor_id : GenuineIntel
cpu family : 6
model : 23
model name : Intel(R) Xeon(R) CPU E5420 @ 2.50GHz
stepping : 6
cpu MHz : 2493.751
cache size : 6144 KB
physical id : 0
siblings : 4
core id : 0
cpu cores : 4
apicid : 0
fpu : yes
fpu_exception : yes
cpuid level : 10
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm syscall lm constant_tsc pni monitor ds_cpl vmx est tm2 ssse3 cx16 xtpr sse4_1 lahf_lm
bogomips : 4987.50
clflush size : 64
cache_alignment : 64
address sizes : 38 bits physical, 48 bits virtual
Bugzilla changed the bug component by itself. Reverting to 'kvm'.
Bugzilla bug reported: https://bugzilla.redhat.com/show_bug.cgi?id=693396
Verified this issue on
# uname -r
# rpm -q kvm
Steps to Reproduce:
1. Start a VM migration, make sure the migration never ends,
eg :in the guest running #stress iozone ,dd at the same time
3. watch the info migrate output, and count the transferred ram per second for
an transfer speed approximation
max speed could reach 500MB/s
on kvm-83-224.el5 ,max speed could reach reach
Based on above ,migration speed improved alot ,this issue has been fixed.
(In reply to comment #41)
Hi , Dan
What you described should be a regression bug .Could you tell me which application is running in the guest so I can reproduce it ?
Could you review comment #41 and comment #43 ?
According to comment #43 ,the speed improved alot .
According to comment #41 ,the patch cause regression .I tried but I have not reproduce yet.
Based on above ,how to deal with this issue ?
(In reply to comment #59)
> From the above dmesg I can see that kvmclock is being used by the guest
In RHEL5 so that we can not get the correct clocksource in #cat /sys/devices/system/clocksource/clocksource0/available_clocksource
btw ,Does this bug mean enlarge migration speed cause network does not work ?
Whether Bug 703112 is dup of this one ?
According to Open new Bug 713392 -Increase migration max_downtime/or speed cause guest stalls for more investigation.
According to comment #64 & comment #43 Change status to VERIFIED.
Tried on kvm-83-239.el5 , found this issue come back ,re-assign this issue ,
BTW ,this issue blocks me to verify Bug 713392
1.start guest with -m 1G -cpu 4
eg:/usr/libexec/qemu-kvm -m 1G -smp 4,sockets=4,cores=1,threads=1 -name RHEL5u7
-uuid 13bd47ff-7458-a214-9c43-d311ed5ca5a3 -monitor stdio
-no-kvm-pit-reinjection -boot c -drive
tap,script=/etc/qemu-ifup,downscript=no,vlan=0 -serial pty -parallel none -usb
-vnc :1 -k en-us -vga cirrus -balloon none -M rhel5.6.0 -usbdevice tablet
2.in the guest
#ping 126.96.36.199 -i 0.1
#stress -c 1 -m 1
4.(qemu) migrate -d tcp:<hostB>:5888
wait for more than 30 mins ,migration never finished
on kvm-83-238.el5 ,migration can be finished with the following steps.
when do local migration, migration default transfer speed is about 35M/sec
after changed migrate_set_speed value to 1G, migration transfer speed is about 160M.
I think we need another bug here - it is visible that enlarging the migration speed does help but the migration convergence rate is too low.
But 713392 fixed an issue that seems to make migration convergence fast but eventually it left lots of pages that need to be transferred to the destination on the last stage of the migration. So in fact, the result is the same - you can increase the max downtime to 1-2 seconds (from 100ms) and see the migration converges fast enough.
(In reply to comment #70)
> I think we need another bug here - it is visible that enlarging the migration
> speed does help but the migration convergence rate is too low.
Referring to comment #0,seems this is the reason for reporting the bug
"Migration speed is at it's default ~300Mbps. When updating the
migrate_set_speed to 900000000 for example, we expect to see 900Mbps, but we
only see a *slight* increase"
> But 713392 fixed an issue that seems to make migration convergence fast but
> eventually it left lots of pages that need to be transferred to the destination
> on the last stage of the migration. So in fact, the result is the same - you
> can increase the max downtime to 1-2 seconds (from 100ms) and see the migration
> converges fast enough.
As Bug 713392 reverted the patch of this Bug ,So no patch added for this bug's fix .
Referring to comment #27 &comment #69 ,From the results I can see the speed does not highly increased after (qemu)migrate_set_speed 1G between on kvm-83-224.el5(unfixed) and on kvm-83-239.el5 (fixed).
It is clear that the bug does not fixed.
Based on above ,May I re-assign this issue ?
Referring to comment #71,comment #72 .
Since no patch added for this bug's fix ,I will close it as won't fix.