Description of problem: Migration speed is at it's default ~300Mbps. When updating the migrate_set_speed to 900000000 for example, we expect to see 900Mbps, but we only see a slight increase Version-Release number of selected component (if applicable): kmod-kvm-83-224.el5 kvm-tools-83-224.el5 kvm-qemu-img-83-224.el5 etherboot-zroms-kvm-5.4.4-13.el5 kvm-debuginfo-83-224.el5 kvm-83-224.el5 2.6.18-238.5.1.el5 How reproducible: Always Steps to Reproduce: 1. Start a VM migration, make sure the migration never ends, by running the following code inside the VM (more than one instance is preferred for an SMP VM) //example to generate dirty pages and delay migration for demo purposes: { unsigned char *array; long int i,j,k; unsigned char c; long int loop=0; array=malloc(1024*1024*1024); while(1) { for(i=0;i<1024;i++) { c=0; for(j=0;j<1024;j++) { c++; for(k=0;k<1024;k++) { array[i*1024*1024+j*1024+k]=c; } } } loop++; } } 2. watch the info migrate output, and count the transferred ram per second for an transfer speed approximation 3. Actual results: Max rates on a 1G network reach ~40MBps Expected results: be able to saturate the entire network if migrate_set_speed is set to the network speed or higher Additional info:
Interesting issue. I wonder how rhel6 behaves since we have RCU for the dirty bits. Maybe additional tweaking is needed for the tcp connection, like setting Nigel on or maybe other means. Dan, do they have nested pages for the hardware?
This is the hardware I reproduced this on: processor : 0 vendor_id : GenuineIntel cpu family : 6 model : 23 model name : Intel(R) Xeon(R) CPU E5420 @ 2.50GHz stepping : 6 cpu MHz : 2493.751 cache size : 6144 KB physical id : 0 siblings : 4 core id : 0 cpu cores : 4 apicid : 0 fpu : yes fpu_exception : yes cpuid level : 10 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm syscall lm constant_tsc pni monitor ds_cpl vmx est tm2 ssse3 cx16 xtpr sse4_1 lahf_lm bogomips : 4987.50 clflush size : 64 cache_alignment : 64 address sizes : 38 bits physical, 48 bits virtual power management:
Bugzilla changed the bug component by itself. Reverting to 'kvm'. Bugzilla bug reported: https://bugzilla.redhat.com/show_bug.cgi?id=693396
Verified this issue on # uname -r 2.6.18-260.el5 # rpm -q kvm kvm-83-232.el5 steps: Steps to Reproduce: 1. Start a VM migration, make sure the migration never ends, eg :in the guest running #stress iozone ,dd at the same time 2.migrate_set_speed 1G 3. watch the info migrate output, and count the transferred ram per second for an transfer speed approximation Actual Results: max speed could reach 500MB/s on kvm-83-224.el5 ,max speed could reach reach ~137MBps Based on above ,migration speed improved alot ,this issue has been fixed.
(In reply to comment #41) Hi , Dan What you described should be a regression bug .Could you tell me which application is running in the guest so I can reproduce it ? Best Regards, Mike
Hi, Glommer Could you review comment #41 and comment #43 ? According to comment #43 ,the speed improved alot . According to comment #41 ,the patch cause regression .I tried but I have not reproduce yet. Based on above ,how to deal with this issue ? Best Regards, Mike
(In reply to comment #59) > From the above dmesg I can see that kvmclock is being used by the guest In RHEL5 so that we can not get the correct clocksource in #cat /sys/devices/system/clocksource/clocksource0/available_clocksource btw ,Does this bug mean enlarge migration speed cause network does not work ? Whether Bug 703112 is dup of this one ?
According to Open new Bug 713392 -Increase migration max_downtime/or speed cause guest stalls for more investigation. According to comment #64 & comment #43 Change status to VERIFIED.
Tried on kvm-83-239.el5 , found this issue come back ,re-assign this issue , BTW ,this issue blocks me to verify Bug 713392 Steps: 1.start guest with -m 1G -cpu 4 eg:/usr/libexec/qemu-kvm -m 1G -smp 4,sockets=4,cores=1,threads=1 -name RHEL5u7 -uuid 13bd47ff-7458-a214-9c43-d311ed5ca5a3 -monitor stdio -no-kvm-pit-reinjection -boot c -drive file=/mnt/RHEL5.7-virtio.qcow2,if=virtio,format=qcow2,cache=none,boot=on -net nic,macaddr=54:52:00:52:ed:61,vlan=0,model=virtio -net tap,script=/etc/qemu-ifup,downscript=no,vlan=0 -serial pty -parallel none -usb -vnc :1 -k en-us -vga cirrus -balloon none -M rhel5.6.0 -usbdevice tablet 2.in the guest #ping 8.8.8.8 -i 0.1 #stress -c 1 -m 1 3.(qemu)migrate_set_speed 1G 4.(qemu) migrate -d tcp:<hostB>:5888 Actual Results: wait for more than 30 mins ,migration never finished Additional info: on kvm-83-238.el5 ,migration can be finished with the following steps.
when do local migration, migration default transfer speed is about 35M/sec after changed migrate_set_speed value to 1G, migration transfer speed is about 160M.
I think we need another bug here - it is visible that enlarging the migration speed does help but the migration convergence rate is too low. But 713392 fixed an issue that seems to make migration convergence fast but eventually it left lots of pages that need to be transferred to the destination on the last stage of the migration. So in fact, the result is the same - you can increase the max downtime to 1-2 seconds (from 100ms) and see the migration converges fast enough.
(In reply to comment #70) > I think we need another bug here - it is visible that enlarging the migration > speed does help but the migration convergence rate is too low. Referring to comment #0,seems this is the reason for reporting the bug "Migration speed is at it's default ~300Mbps. When updating the migrate_set_speed to 900000000 for example, we expect to see 900Mbps, but we only see a *slight* increase" > But 713392 fixed an issue that seems to make migration convergence fast but > eventually it left lots of pages that need to be transferred to the destination > on the last stage of the migration. So in fact, the result is the same - you > can increase the max downtime to 1-2 seconds (from 100ms) and see the migration > converges fast enough. As Bug 713392 reverted the patch of this Bug ,So no patch added for this bug's fix . Referring to comment #27 &comment #69 ,From the results I can see the speed does not highly increased after (qemu)migrate_set_speed 1G between on kvm-83-224.el5(unfixed) and on kvm-83-239.el5 (fixed). It is clear that the bug does not fixed. Based on above ,May I re-assign this issue ? Mike
Referring to comment #71,comment #72 . Since no patch added for this bug's fix ,I will close it as won't fix.