Created attachment 396485 [details] info of cpu and mem on host Description of problem: Boot up a VM and do ping-pong migration, the time drift is too large(more than 300s). Version-Release number of selected component (if applicable): (guest)# rpm -qa|grep virtio kmod-virtio-0.1-17.el3 (guest) # ntpdate -v 26 Feb 14:35:42 ntpdate[1297]: ntpdate 4.1.2 Thu Feb 15 04:30:51 EST 2007 (1) 26 Feb 14:35:42 ntpdate[1297]: no servers can be used, exiting (host)# rpm -qa |grep kvm etherboot-zroms-kvm-5.4.4-13.el5 kvm-83-160.el5 kvm-qemu-img-83-160.el5 kmod-kvm-83-160.el5 kvm-tools-83-160.el5 kvm-debuginfo-83-160.el5 How reproducible: Reproduce every time Steps to Reproduce: 1. Boot up a VM and update the guest time with ntp. (guest)# ntpdate clock.redhat.com 26 Feb 14:09:44 ntpdate[1277]: adjust time server 66.187.233.4 offset -0.027956 sec 2. Do ping-pong migration. (monitor)# migrate tcp:0:5200 .. (monitor)# migrate tcp:0:5200 3. Check the drifted time with ntp. # ntpdate -q clock.redhat.com server 66.187.233.4, stratum 1, offset 337.587340, delay 0.31523 26 Feb 14:20:12 ntpdate[1281]: step time server 66.187.233.4 offset 337.587340 sec Actual results: The drifted time is more than 300s Expected results: The drifted time is less than 10s. Additional info: (host)# uname -a Linux amd-9600b-4-2.englab.nay.redhat.com 2.6.18-189.el5 #1 SMP Tue Feb 16 11:10:22 EST 2010 x86_64 x86_64 x86_64 GNU/Linux (guest)# uname -a Linux dhcp-66-82-241.nay.redhat.com 2.4.21-63.EL #1 SMP Wed Oct 28 23:13:12 EDT 2009 x86_64 x86_64 x86_64 GNU/Linux Command line of src VM1: # qemu-kvm -drive file=/tmp/kvm_autotest_root/images/RHEL-3.9-64-virtio.raw,if=virtio,cache=writethrough,boot=on -net nic,vlan=0,model=virtio,macaddr=00:19:30:25:46:00 -net tap,vlan=0,ifname=e1000_0_6001,script=/usr/local/staf/test/RHEV/kvm-new/kvm-test/tests/kvm/scripts/qemu-ifup-switch,downscript=no -m 4096 -smp 4 -soundhw ac97 -usbdevice tablet -rtc-td-hack -no-hpet -cpu qemu64,+sse2 -no-kvm-pit-reinjection -rtc-td-hack -vnc :0 Command line of dest VM2: # qemu-kvm -drive file=/tmp/kvm_autotest_root/images/RHEL-3.9-64-virtio.raw,if=virtio,cache=writethrough,boot=on -net nic,vlan=0,model=virtio,macaddr=00:19:30:25:46:00 -net tap,vlan=0,ifname=e1000_0_6002,script=/usr/local/staf/test/RHEV/kvm-new/kvm-test/tests/kvm/scripts/qemu-ifup-switch,downscript=no -m 4096 -smp 4 -soundhw ac97 -usbdevice tablet -rtc-td-hack -no-hpet -cpu qemu64,+sse2 -no-kvm-pit-reinjection -rtc-td-hack -vnc :1 -incoming tcp:0:5200
Amos, Can you remove the -no-kvm-pit-reinjection flag?
Err, ignore comment #2. Can you confirm if the problem is present with -smp 1?
(In reply to comment #3) > Err, ignore comment #2. > > Can you confirm if the problem is present with -smp 1? This problem also exists with -smp 1. After ping migration: [root@localhost root]# ntpdate -q clock.redhat.com server 66.187.233.4, stratum 1, offset 4.507959, delay 0.30522 2 Mar 21:42:34 ntpdate[1314]: step time server 66.187.233.4 offset 4.507959 sec After pong migration: [root@localhost root]# ntpdate -q clock.redhat.com server 66.187.233.4, stratum 1, offset 233.897184, delay 0.30522 2 Mar 21:51:05 ntpdate[1321]: step time server 66.187.233.4 offset 233.897184 sec
Just an update: I'm blocked on getting ISO images over slow link... but 90% there now.
Amos, it might be an issue with the RTC emulation too, what's the diff when you reboot a rhel3 VM?
it would also be most helpful to see /proc/cpuinfo from the host, can we get this information please? Also, can you check the ntpdate without doing any migration? Just use a reliable clock, measure 30 seconds, check guest date, wait 30 seconds, check again - and see if guest advances by 30 seconds or jumps ahead and then back.
(In reply to comment #6) > Amos, it might be an issue with the RTC emulation too, what's the diff when you > reboot a rhel3 VM? I have tested 5 times of reboot rhel3 VM, the drift time are all less than 1 sec. Result: offset -0.876304 sec offset -0.526459 sec offset -0.797713 sec offset -0.525171 sec offset -0.869227 sec
(In reply to comment #7) > it would also be most helpful to see /proc/cpuinfo from the host, can we get > this information please? I had uploaded cpuinfo and meminfo of host here: https://bugzilla.redhat.com/attachment.cgi?id=396485 > Also, can you check the ntpdate without doing any migration? Just use a > reliable clock, measure 30 seconds, check guest date, wait 30 seconds, check > again - and see if guest advances by 30 seconds or jumps ahead and then back. The change is very little without migration. steps: 1. boot a rhel3 VM 2. get guest IP address 3. execute below script ----------------------------------------- #!/bin/bash time=$1 guest_ip="10.66.83.68" sync_guest_ntpdate() { ssh $guest_ip ntpdate clock.redhat.com } get_guest_ntpdate() { ssh $guest_ip ntpdate -q clock.redhat.com } ssh-copy-id $guest_ip ~/.ssh/id_rsa.pub > /dev/null sync_guest_ntpdate sync_guest_ntpdate echo "---------------" get_guest_ntpdate sleep $time echo "---------------" get_guest_ntpdate sleep $time echo "---------------" get_guest_ntpdate ----------------------------------------- Result: 1) Set sleep time to 30 sec # ./timedrift.sh 30 10 Mar 23:59:17 ntpdate[2089]: adjust time server 66.187.233.4 offset -0.040228 sec 10 Mar 23:59:19 ntpdate[2098]: adjust time server 66.187.233.4 offset -0.036673 sec --------------- server 66.187.233.4, stratum 1, offset -0.034486, delay 0.29512 10 Mar 23:59:20 ntpdate[2107]: adjust time server 66.187.233.4 offset -0.034486 sec --------------- server 66.187.233.4, stratum 1, offset -0.024299, delay 0.30476 10 Mar 23:59:52 ntpdate[2116]: adjust time server 66.187.233.4 offset -0.024299 sec --------------- server 66.187.233.4, stratum 1, offset -0.005983, delay 0.30484 11 Mar 00:00:23 ntpdate[2130]: adjust time server 66.187.233.4 offset -0.005983 sec 2) Set sleep time to 300 sec # ./timedrift.sh 300 11 Mar 00:03:32 ntpdate[2159]: adjust time server 66.187.233.4 offset 0.008697 sec 11 Mar 00:03:33 ntpdate[2168]: adjust time server 66.187.233.4 offset 0.003830 sec --------------- server 66.187.233.4, stratum 1, offset 0.007391, delay 0.29539 11 Mar 00:03:35 ntpdate[2177]: adjust time server 66.187.233.4 offset 0.007391 sec --------------- server 66.187.233.4, stratum 1, offset -0.014029, delay 0.30519 11 Mar 00:08:36 ntpdate[2188]: adjust time server 66.187.233.4 offset -0.014029 sec --------------- server 66.187.233.4, stratum 1, offset -0.027170, delay 0.30522 11 Mar 00:13:37 ntpdate[2202]: adjust time server 66.187.233.4 offset -0.027170 sec
I can't reproduce on Intel hardware with kvm-83-164.el5 Can you try upgrading your kvm and see if it still happens? I am not using -rtc-td-hack however, let me try it and see if it matters.
-rtc-td-hack makes no difference. I don't see any drift after migration.
Does this still happen? All attempts I have made at it have been unable to reproduce the bug.
(In reply to comment #12) > Does this still happen? All attempts I have made at it have been unable to > reproduce the bug. I'm testing with kvm-83-170.el5 for 50 times (Intel & AMD) The result will be reported later.
(In reply to comment #12) > Does this still happen? All attempts I have made at it have been unable to > reproduce the bug. Kernel: 2.6.18-194.el5 # rpm -qa |grep kvm kvm-83-170.el5 etherboot-zroms-kvm-5.4.4-13.el5 kvm-qemu-img-83-170.el5 kvm-debuginfo-83-170.el5 kmod-kvm-83-170.el5 kvm-tools-83-170.el5 Result: ---- Host mem:4G smp:4 "AMD Phenom(tm) 9600B Quad-Core Processor" Guest mem:4G smp:4 RHEL-39-64: always reproduce Guest mem:4G smp:4 RHEL-39-32: always reproduce ---- Host mem:4G smp:4 "Intel(R) Core(TM)2 Quad CPU Q9400 @ 2.66GHz" Guest mem:4G smp:4 RHEL-39-64: always could not reproduce Guest mem:4G smp:4 RHEL-39-32: always reproduce I also test on other machines, it's easy to reproduce when guest's memory size is same as host's. cpuinfo and meminfo attached. # tree . |-- amd | |-- cpuinfo | `-- meminfo `-- intel |-- cpuinfo `-- meminfo
Created attachment 407521 [details] cpuinfo and meminfo of host (AMD & Intel) this is the cpuinfo & meminfo of the hosts (amd & intel)
(In reply to comment #14) > I also test on other machines, it's easy to reproduce when guest's memory size > is same as host's. That's almost certainly why the bug happens; likely that we are hitting swap on the host and one of the qemu processes gets hosed. It's also probable that this scenario is restricted to self migration. Note that self migration requires 1) reading all memory from old VM 2) passing data over network socket 3) writing all memory to new VM this strides over the entire host memory and requires easily > 2X the host memory plus resources for network and processes. You will swap, and performance will be bad. I suggest this is not a migration bug at all. Solving this problem is exactly equivalent to solving the problem where a user issues SIGSTOP to the qemu process for ~90 seconds. Does anyonne know if RHEL 3.9 has tick compensation or not? (It is 2.4, so I guess it does not). Perhaps dropping -kvm-no-pit-reinjection is all that is needed to solve the bug. Can you try that and see if the bug goes away?
(In reply to comment #16) > (In reply to comment #14) > > I also test on other machines, it's easy to reproduce when guest's memory size > > is same as host's. > > That's almost certainly why the bug happens; likely that we are hitting swap on > the host and one of the qemu processes gets hosed. > > It's also probable that this scenario is restricted to self migration. Note > that self migration requires > > 1) reading all memory from old VM > 2) passing data over network socket > 3) writing all memory to new VM > > this strides over the entire host memory and requires easily > 2X the host > memory plus resources for network and processes. You will swap, and > performance will be bad. yes, the performance is very bad. > I suggest this is not a migration bug at all. Solving this problem is exactly > equivalent to solving the problem where a user issues SIGSTOP to the qemu > process for ~90 seconds. > > Does anyonne know if RHEL 3.9 has tick compensation or not? (It is 2.4, so I > guess it does not). > > Perhaps dropping -kvm-no-pit-reinjection is all that is needed to solve the > bug. > > Can you try that and see if the bug goes away? Do offline migration, RESULT: with '-kvm-no-pit-reinjection' : bug exists without '-kvm-no-pit-reinjection' : bug exists host kernel: 2.6.18-194.el5 # rpm -qa |grep kvm kvm-83-170.el5 etherboot-zroms-kvm-5.4.4-13.el5 kvm-qemu-img-83-170.el5 kvm-debuginfo-83-170.el5 kmod-kvm-83-170.el5 kvm-tools-83-170.el5
(In reply to comment #17) > (In reply to comment #16) > > (In reply to comment #14) > > > I also test on other machines, it's easy to reproduce when guest's memory size > > > is same as host's. > > > > That's almost certainly why the bug happens; likely that we are hitting swap on > > the host and one of the qemu processes gets hosed. > > > > It's also probable that this scenario is restricted to self migration. Note > > that self migration requires > > > > 1) reading all memory from old VM > > 2) passing data over network socket > > 3) writing all memory to new VM > > > > this strides over the entire host memory and requires easily > 2X the host > > memory plus resources for network and processes. You will swap, and > > performance will be bad. > > yes, the performance is very bad. Can you make sure the host does not swap? We know for a fact that w/o pvclock time keeping is not 100% reliable. If the host is loaded or swap the guests it will greatly influence time keeping
(In reply to comment #18) > Can you make sure the host does not swap? We know for a fact that w/o pvclock > time keeping is not 100% reliable. If the host is loaded or swap the guests it > will greatly influence time keeping This scenario is nearly 100% guarantee that it will introduce swap, not just small swap, but heavy, heavy swapping. I don't think it's necessarily a migration or timekeeping bug; however, swapping during migration could be particularly bad for timekeeping. There is possibly some hints we could provide to the mm layer using madvise; the access pattern here is going to run over 4G over memory with MADV_SEQUENTIAL semantics. However, a 4 gigabyte memory self-migration on a 4 gigabyte host is not going to be a good performance scenario in any case, and isn't a realistic usage pattern for a real deployment. If the problem won't exist on a host->host migration and won't exist if the memory sizes are within the capacity of the host, I suggest it's not a high priority bug. I will still try an experiment with inducing swapping on the host to see if the effect is the same, perhaps we can do better catchup of timekeeping after, but it isn't likely we can address this at all in RHEL 5.
(In reply to comment #19) > (In reply to comment #18) > > > Can you make sure the host does not swap? We know for a fact that w/o pvclock > > time keeping is not 100% reliable. If the host is loaded or swap the guests it > > will greatly influence time keeping > > This scenario is nearly 100% guarantee that it will introduce swap, not just > small swap, but heavy, heavy swapping. I don't think it's necessarily a > migration or timekeeping bug; however, swapping during migration could be > particularly bad for timekeeping. Yes, this bug only can be reproduce when swap is heavy. If set guest_mem to host_mem/2, the drift is very small. I've tested the migration between two host, the drift is very small(0.28, this can be ignored). The memory size of guest and host are all 4G. > There is possibly some hints we could provide to the mm layer using madvise; > the access pattern here is going to run over 4G over memory with > MADV_SEQUENTIAL semantics. > > However, a 4 gigabyte memory self-migration on a 4 gigabyte host is not going > to be a good performance scenario in any case, and isn't a realistic usage > pattern for a real deployment. If the problem won't exist on a host->host > migration and won't exist if the memory sizes are within the capacity of the > host, I suggest it's not a high priority bug. > > I will still try an experiment with inducing swapping on the host to see if the > effect is the same, perhaps we can do better catchup of timekeeping after, but > it isn't likely we can address this at all in RHEL 5.
In this case I'll close the bug as won't fix